Skip to content

R

Shiny

Tidyverse

ggplot2

See the ggplot2 book online for more information

List of extensions to GGplot

Legend

theme(legend.position = c(.8, .2)) # top, bottom, left, right
geom_point(aes(x,y), show.legend = FALSE) 

Scales/Palettes

Using RColorBrewer

RColorBrewer::display.brewer.all() # see all palettes 

scale_color_brewer(palette = "RdPu") # distinct colors
scale_color_distiller(palette = "RdPu") # for continuous data

Using manual scales

scale_linetype_manual(name = "Linetypes", labels=c(...), values = c("blank", "solid", "dashed", "dotted", "dotdash", "longdash", "twodash"))
# w/ grey
cbPalette <- c("#999999", "#E69F00", "#56B4E9", "#009E73", "#F0E442", "#0072B2", "#D55E00", "#CC79A7")
# w/ black
cbbPalette <- c("#000000", "#E69F00", "#56B4E9", "#009E73", "#F0E442", "#0072B2", "#D55E00", "#CC79A7")
scale_color_manual(name = "Colors", values=cbPalette)
scale_fill_manual(name = "Colors", values=cbPalette)

Fishualize color palettes

scale_color_fish_d()
scale_color_fish_c()

Full list of color palettes: Overview fishualize palettes

Labels

labs(x="x",
     y="y",
     title="Title",
     subtitle="Subtitle")

Statistical Summaries

ggplot(diamonds, aes(color, price)) + 
  geom_bar(stat = "summary_bin", fun.y = mean)

Themes

Ch 15 of ggplot2 book has comprehensive guide on themes.

Multiplots

library(gridExtra)
plot1 <- qplot(1)
plot2 <- qplot(1)
grid.arrange(plot1, plot2, ncol=2)

Ggplotly

Cool example of detailed visualizations in ggplot

Saving plots

Programming with dplyr

  • quo - is used to mark the variable name as literal, kind of like quotes "
  • enquo - is used to look INSIDE what the variable, and quote that! It uses "dark magic"
my_summarise <- function(df, group_var) {
  df %>%
    group_by(!! group_var) %>%
    summarise(a = mean(a))
}

my_summarise(df, quo(g1))

In order to make the call my_summarise(df, g1), we need to make the following change:

my_summarise <- function(df, group_var) {
  group_var <- enquo(group_var)
  df %>%
    group_by(!! group_var) %>%
    summarise(a = mean(a))
}

my_summarise(df, g1)
  • If you have a string that you want denoted, you should use rlang::sym as suggested here.
  • When you want a variable for the column name, you need to make use of the syntax !! colname := function(x) ..., otherwise it's not valid R code.
  • You can filter with expressions using the parse_expr function in the rlang package as follows
expr <- paste0("s",1:2, " == ", c(0,1), collapse = " & ") # s1 == 0 & s2 == 1
filter(df, !!parse_expr(expr)) 

Recipes

  1. Getting relative frequencies from dplyr - note that summarize unwraps the last grouping, can check with groups(). useful for bar plots of the frequencies
mtcars %>%
  group_by(am, gear) %>%
  summarise(n = n()) %>%
  mutate(freq = n / sum(n)) # Gets Rel Freq from WITHIN all counts of am
  1. Reordering x for bar graphs
ggplot(theTable,
       aes(x=reorder(Position,Position,
                     function(x)-length(x)))) +
       geom_bar()

Lubridate

Tibble

  • as_tibble(...,)
  • For forcing matrices, as_tibble(mat, .name_repair = ~c("a", "b", "c")) allows you to name the columns for matrices without column names in one line
  • bind_rows or bind_col is more efficient for combining lots of dataframes

Packages

Creating them

R official reference - pretty complete version of creating the packages

Spatial

Source list for spatial with crowd sourced twitter

List of packages
  • sf
  • raster
  • sp
  • leaflet
  • tmap
  • mapview
  • rgdal
  • cartography
  • gstat
  • tigris

Useful Functions to remember

  • gl() - Factor level generation
  • xtabs() - Cross table generation
  • table() - see table of distribution
  • search() see all packages that have been loaded

Cookbook

Working with formula object in R

formula(y~x1 + x2) # create the formula object
frame <- model.frame(formula, data, drop.unused.levels = TRUE) # Get the relevant columns in the data frame used in the formula
model.matrix(formula, frame) # create the `X` matrix for working with the levels in the frame above based on formula

attr(rhs, "assign") - attribute contains the index of variable that each column of model matrix comes from. If 3 levels, you would need intercept, and 2 columns of the model matrix.

Row-wise workflows

Examples in this workshop

Working with factors

as.numeric(levels(temp))[temp] # unfactor temp
factor(cheese$additive, levels(cheese$additive)[c(4,1:3)]) # Reorder the factors
reorder(cheese$additive, cheese$scores, FUN = mean) # Relevel additives factor, by their average score

Data Cleaning

Use library(tidyr), this is the new interface into reshape

  • gather() - fat to narrow
  • spread() - narrow to fat
gather(key, value, columns)

GGplot force legend with constant aes

Shortcuts

Option+ Shift + k - show keyboard shortcuts

Kable

kable(TABLE, booktabs=T) %>% kable_styling(position = center) %>% row_spec(2, hline_after=T)

Advanced R

Data Structures

  • 6 Atomic types: logical, integer, double, character, complex, raw
  • 3 properties of vectors: typeof(), length(), attributes()
  • Coercion is least to most flexible, atomic type
  • Most important attributes: Names, Dimensions, Class

Objects

Need library(pryr) to use the main functions.

  • typeof() - base type
  • otype() - object type
  • ftype() - tells you if "generic" function or "method"
  • is.object(x) - Checks if has "class" attribute

Generic Functions

In the S3 Object system, methods don't belong to a class, but belong to a generic function.

Generic are responsible for something called method dispatch in which they are supposed to find the right method to use for the call. This is a system for polymorphism that is used in the Lisp system, but not very common otherwise.

ftype()
summary # the generic function
summary.lm # methods associated with generic function
summary.glm 

> ftype(summary)
[1] "s3"      "generic"
> ftype(summary.lm)
[1] "s3"     "method"
> methods(summary)
 [1] summary.aov                    summary.aovlist*              
 [3] summary.aspell*                summary.check_packages_in_dir*
 [5] summary.connection             summary.corAR1*               
 [7] summary.corARMA*               summary.corCAR1*   

Classes are very casual in S3, it's essentially treated as an attribute, and can even have multiple classes, eg. glm. UseMethod though will look for the appropriate method based on the class that you pass to the class. DO NOT USE . when naming methods! It will make it look like an S3 object

foo <- c(1, 2, 3, 4)
class(foo) <- "something"
t(foo) # will look for the method t.something to run

Environments

Environments are basically just like lists, with 4 special characteristics. 1. They have parents, 2. elements of the list are not numbered 3. Each element defined is unique 4. Must use rm() to remove from environment, rather than setting to null.

  • Special Environments
  • .GlobalEnv (globalenv()) - the global environment
  • baseenv() environment of base, parent is emptyenv
  • emptyenv() # ancestor of all. Only environment without a parent.
  • Packages have two associated environments,

ls() - examine the elements of an environment rm() - to remove something from environment ls.str() - looks at the environment with structure as well

pryr::where("name") - What environment is the name found in, with regular scoping rules

do.call(what, args, envir) - construct

Meta-programming

Convert a string to an R variable name on the fly (and vice versa)

x <- 42
eval(parse(text = "x")) # returns 42
deparse(substitute(x)) # returns "x"

Environment things

version
sessionInfo()
.libPaths()

Error checking

When debugging a detailed series of function calls, can use:

options(error = recover)

Matrix Package

These are the dense matrices * dgeMatrix - Real matrices in general storage mode * dsyMatrix - Symmetric real matrices in non-packed storage * dspMatrix - Symmetric real matrices in packed storage (one triangle only) * dtrMatrix - Triangular real matrices in non-packed storage * dtpMatrix - Triangular real matrices in packed storage (triangle only) * dpoMatrix - Positive semi-definite symmetric real matrices in non-packed storage * dppMatrix - ditto in packed storage

sparse matrices * dgTMatrix - general, numeric, sparse matrices in (a possibly redundant) triplet form. This can be a convenient form in which to construct sparse matrices. * dgCMatrix - general, numeric, sparse matrices in the (sorted) compressed sparse column format. * dsCMatrix - symmetric, real, sparse matrices in the (sorted) compressed sparse column format. Only the upper or the lower triangle is stored. Although there is provision for both forms, the lower triangle form works best with TAUCS. * dtCMatrix - triangular, real, sparse matrices in the (sorted) compressed sparse column format.

as(X, "dgCMatrix")               # Force into sparse Matrix
forceSymmetric(x, uplo = "U")    # take uplo = Upper (U) or Lower (L)
symmpart(x)                      # Average both upper and lower triangle
skewpart(x)                      # The difference between x and symmpart

Matrix Decompositions

qrX <- qr(X) # Gives QR object
qr.Q (qrX)   # Gives the Q part
qr.R(qrX)    # Gives the R part
qrX$rank     # Gives the rank

Analysis

ANOVA/LM

Nonlinear

  • nls - seems to be the function to do so in R
mod <- nls(..., start = list(a = 0, b =2)) # initialize parameters yourself to avoid fit issues
profile(mod, which = "parameter")
confint(profile(mod)) # seems to be the safer way of doing things.. as to not run into optimization issues

Mixed Models

  • ranef(mod) - get random effects from model
    • dotplot.ranef.mer(randef(mod)) - dotplot of random effects, needs lattice
    • attr(ranef(mod)$Subject, "postVar") - There's also a "hidden" conditional variance associated with each of the random effects, these are accessed by the attribute "postVar" this is the vcov of each of the random effects I think?
      • as.data.frame(ranef(mod)) gives a secret representation of conditional variances
  • getMF(mod, ["X", "Z"]) - get the fixed and random effect model matrices

Further Reading * Getting subject level standard errors * Ben Bolker FAQ * Estimated zero variance of estimate, Bates Email thread

GLMs

  • model.matrix(mod) - gets the design matrix
  • mod$linear.predictors - X\beta

2x2 contingencies

  • table(dat$tx, dat$survival) - create contingency table
  • addmargins() - add row and column sums to 2x2 table
  • cut(dat$w6minBl, quantile(dat$w6minBl,0:5/5,na.rm=T), include.lowest = TRUE)) - Create strata
X <- table(dat$tx,dat$survival, dnn = c("TX", "Survival"))
X
d <- factor(c("treat","notreat"), levels=c("treat","notreat"))
modci <- glm(X~d, family=binomial)
summary(modci)
exp(confint(modci)[2,]) # Conf int

Residuals

plot(mod) # Pearson Residuals diagnostics
resid(mod, type="Deviance") # Deviance, Pearson, Working, Response, Partial
mod$residuals # Gives Working residuals

Contrast Coding