RLint: Reformatting R Code to Follow the Google Style Guide Andy Chen Alex Blocker, (andych@google.com), Andy Chu, Tim Hesterberg, Jeffrey D.
Summary RLint checks and reformats R code to follow R style guide. RLint used within Google. ● Eases checking correctness. ● Improves programmer productivity. Suggest experiment adopting consistent style guide + RLint.
Style guides improve correctness and productivity Q: How do we produce correct R code when ● correctness is hard to check, ● R programmer time is expensive? Q: How do we maintain correct R code when ● modified by different programmers? Google Confidential and Proprietary
Many R files modified by multiple users ~40% files modified by >1 Googler. ~50% directories contain code written by >1 Googler. # Googlers modifying R file % R files # Googlers modifying R code in directory % R directories 1 60.9% 1 52.0% 2-3 33.7% 2-3 36.6% 4-5 3.8% 4-5 7.0% 6+ 1.5% 6+ 4.
Style guides improve correctness and productivity Q: How do we produce correct R code when ● correctness is hard to check, ● R programmer time is expensive? Q: How do we maintain correct R code when ● modified by different programmers? A: R style guide specifies uniform coding Google Confidential and Proprietary
Style guides specify program structure Google R style guide specifies ● identifier naming: variable.name, FunctionName, kConstantName ● layout: indentation, spacing, ... ● comments ● function commenting ● ... Success criterion: Any programmer should be able to ● instantly understand structure of any code. Consistent style more important than "perfect" style.
RLint: Automate style checking and correction Goal: Minimize overhead of following style guide. RLint: Program warning style violations. ● Optionally produce style-conforming code. ● Key idea: Computers are cheap. Use within Google: ● All code violations flagged by code review tool. ● Violations must be corrected before code submission.
Ex: Spacing Code: foo <-function(x){ return (list ( a = sum(x[,1]), b = 1/3+1e-7*(x[1,1])) … Warnings: ● Place spaces around all binary operators (=, +, -, <-, etc.). ● Place a space before left parenthesis, except in a function call. Corrected: foo <- function(x) { return(list( a = sum(x[, 1]), b = 1/3 + 1e-7 * (x[1, 1]) ...
Ex: Indentation Code if (x == 5) while (x > 1) x <- x - 1 print(x) Is anything wrong? Google Confidential and Proprietary
Ex: Indentation Code if (x == 5) while (x > 1) x <- x - 1 print(x) # R-bleed bug? ;) Corrected code if (x == 5) while (x > 1) x <- x - 1 print(x) Google Confidential and Proprietary
Ex: Ease checking program correctness Code x <- -5:-1 x[x <-2] Is anything wrong? Google Confidential and Proprietary
Ex: Ease checking program correctness Code x <- -5:-1 x[x <-2] # Hmm ...
Ex: Ease checking program correctness Code if (format(Sys.
Ex: Ease checking program correctness Code if (format(Sys.
RLint implementation uses Python Use Python string functions and regular expressions. Algorithm: Stub out comments, strings, user-defined operators. ● Ex: Comment may contain code! ● Ex: Multi-line string Check spacing. Align & indent lines within {}, () and []. ● Align lines by opening bracket. ● Align lines by ‘=’ if they are in the same bracket. Align if/while/for (...) not followed by {}. Unstub comments, strings, user-defined operators.
Application: Improve R community's style consistency Proposal: Adopt R style guide + RLint. ● Run experiments to determine net benefit. Small scale: Individual teams (pkgs) adopt style guide + checker. ● Are these programmers more productive? ● More bug fixes and fewer (un-fixed) bug reports? Medium scale: CRAN packages opt into style guide + checker. ● Specify style guide + checker program. ● Enforced by CRAN server farm.
Summary RLint checks and reformats R code to follow R style guide. RLint used within Google ● Eases checking correctness. ● Improves programmer productivity. Suggest experiment adopting consistent style guide + RLint.
RLint: Reformatting R Code to Follow the Google Style Guide Andy Chen Alex Blocker, (andych@google.com), Andy Chu, Tim Hesterberg, Jeffrey D.
Coding conventions and checkers Coding conventions have existed for decades. ● 1918: The Elements of Style by Strunk & White (writing English) ● 1974: The Elements of Programming Style (writing code) ● 1997: Java code conventions ● 2001: Python style guide ● 2014: Google style guides for 12 languages available Style checkers have existed for decades.