Designing types for R, empirically

Abstract

The R programming language is widely used in a variety of domains. It was designed to favor an interactive style of programming with minimal syntactic and conceptual overhead. This design is well suited to data analysis, but a bad fit for tools such as compilers or program analyzers. In particular, R has no type annotations, and all operations are dynamically checked at run-time. The starting point for our work are the two questions: what expressive power is needed to accurately type R code? and which type system is the R community willing to adopt? Both questions are difficult to answer without actually experimenting with a type system. The goal of this paper is to provide data that can feed into that design process. To this end, we perform a large corpus analysis to gain insights in the degree of polymorphism exhibited by idiomatic R code and explore potential benefits that the R community could accrue from a simple type system. As a starting point, we infer type signatures for 25,215 functions from 412 packages among the most widely used open source R libraries. We then conduct an evaluation on 8,694 clients of these packages, as well as on end-user code from the Kaggle data science competition website.

Document Details

Document Type: Pub Defense Publication
Publication Date: Nov 13, 2020
Source ID: 10.1145/3428249

Entities

People

Alexi Turcotte
Aviral Goel
Filip Křikava
Jan Vitek

Organizations

Czech Technical University in Prague
National Science Foundation
Natural Sciences and Engineering Research Council
Northeastern University
Office of Naval Research

Designing types for R, empirically

Abstract

Document Details

Entities

People

Organizations

Tags

Fields of Study

Readers

Technology Areas