R
To sort/new links
Shiny
Resources and links
- Theme selector app
- Validation and errors
- Dropdown menu
- Shiny apps as R packages
Tidyverse
ggplot2
See the ggplot2 book online for more information
List of extensions to GGplot
- Website maintaining extensions - makes it easier to find ggplot extensions for what you need
- Flipbook to modify major parts of the graph - flipbook for messing with parts of the graph.
- Blog version of modifying everything in ggplot2
- Another step by step ggplot violin plot
Legend
theme(legend.position = c(.8, .2)) # top, bottom, left, right
geom_point(aes(x,y), show.legend = FALSE)
Scales/Palettes
Using RColorBrewer
RColorBrewer::display.brewer.all() # see all palettes
scale_color_brewer(palette = "RdPu") # distinct colors
scale_color_distiller(palette = "RdPu") # for continuous data
Using manual scales
/figure/unnamed-chunk-5-2.png)
scale_linetype_manual(name = "Linetypes", labels=c(...), values = c("blank", "solid", "dashed", "dotted", "dotdash", "longdash", "twodash"))
# w/ grey
cbPalette <- c("#999999", "#E69F00", "#56B4E9", "#009E73", "#F0E442", "#0072B2", "#D55E00", "#CC79A7")
# w/ black
cbbPalette <- c("#000000", "#E69F00", "#56B4E9", "#009E73", "#F0E442", "#0072B2", "#D55E00", "#CC79A7")
scale_color_manual(name = "Colors", values=cbPalette)
scale_fill_manual(name = "Colors", values=cbPalette)
Fishualize color palettes
scale_color_fish_d()
scale_color_fish_c()
Full list of color palettes: Overview fishualize palettes
Labels
labs(x="x",
y="y",
title="Title",
subtitle="Subtitle")
Statistical Summaries
ggplot(diamonds, aes(color, price)) +
geom_bar(stat = "summary_bin", fun.y = mean)
Themes
Ch 15 of ggplot2 book has comprehensive guide on themes.
Multiplots
library(gridExtra)
plot1 <- qplot(1)
plot2 <- qplot(1)
grid.arrange(plot1, plot2, ncol=2)
Ggplotly
Cool example of detailed visualizations in ggplot
Saving plots
Programming with dplyr
quo- is used to mark the variable name as literal, kind of like quotes "enquo- is used to look INSIDE what the variable, and quote that! It uses "dark magic"
my_summarise <- function(df, group_var) {
df %>%
group_by(!! group_var) %>%
summarise(a = mean(a))
}
my_summarise(df, quo(g1))
In order to make the call my_summarise(df, g1), we need to make the following change:
my_summarise <- function(df, group_var) {
group_var <- enquo(group_var)
df %>%
group_by(!! group_var) %>%
summarise(a = mean(a))
}
my_summarise(df, g1)
- If you have a string that you want denoted, you should use
rlang::symas suggested here. - When you want a variable for the column name, you need to make use of the syntax
!! colname := function(x) ..., otherwise it's not valid R code. - You can filter with expressions using the
parse_exprfunction in therlangpackage as follows
expr <- paste0("s",1:2, " == ", c(0,1), collapse = " & ") # s1 == 0 & s2 == 1
filter(df, !!parse_expr(expr))
Recipes
- Getting relative frequencies from dplyr - note that summarize unwraps the last grouping, can check with
groups(). useful for bar plots of the frequencies
mtcars %>%
group_by(am, gear) %>%
summarise(n = n()) %>%
mutate(freq = n / sum(n)) # Gets Rel Freq from WITHIN all counts of am
ggplot(theTable,
aes(x=reorder(Position,Position,
function(x)-length(x)))) +
geom_bar()
Lubridate
Tibble
as_tibble(...,)- For forcing matrices,
as_tibble(mat, .name_repair = ~c("a", "b", "c"))allows you to name the columns for matrices without column names in one line bind_rowsorbind_colis more efficient for combining lots of dataframes
Packages
Creating them
R official reference - pretty complete version of creating the packages
Spatial
Source list for spatial with crowd sourced twitter
List of packages
- sf
- raster
- sp
- leaflet
- tmap
- mapview
- rgdal
- cartography
- gstat
- tigris
Useful Functions to remember
gl()- Factor level generationxtabs()- Cross table generationtable()- see table of distributionsearch()see all packages that have been loaded
Cookbook
Working with formula object in R
formula(y~x1 + x2) # create the formula object
frame <- model.frame(formula, data, drop.unused.levels = TRUE) # Get the relevant columns in the data frame used in the formula
model.matrix(formula, frame) # create the `X` matrix for working with the levels in the frame above based on formula
attr(rhs, "assign") - attribute contains the index of variable that each column of model matrix comes from. If 3 levels, you would need intercept, and 2 columns of the model matrix.
Row-wise workflows
Examples in this workshop
Working with factors
as.numeric(levels(temp))[temp] # unfactor temp
factor(cheese$additive, levels(cheese$additive)[c(4,1:3)]) # Reorder the factors
reorder(cheese$additive, cheese$scores, FUN = mean) # Relevel additives factor, by their average score
Data Cleaning
Use library(tidyr), this is the new interface into reshape
gather()- fat to narrowspread()- narrow to fat
gather(key, value, columns)
GGplot force legend with constant aes
Shortcuts
Option+ Shift + k - show keyboard shortcuts
Kable
kable(TABLE, booktabs=T) %>% kable_styling(position = center) %>% row_spec(2, hline_after=T)
Advanced R
Data Structures
- 6 Atomic types: logical, integer, double, character, complex, raw
- 3 properties of vectors: typeof(), length(), attributes()
- Coercion is least to most flexible, atomic type
- Most important attributes: Names, Dimensions, Class
Objects
Need library(pryr) to use the main functions.
typeof()- base typeotype()- object typeftype()- tells you if "generic" function or "method"is.object(x)- Checks if has "class" attribute
Generic Functions
In the S3 Object system, methods don't belong to a class, but belong to a generic function.
Generic are responsible for something called method dispatch in which they are supposed to find the right method to use for the call. This is a system for polymorphism that is used in the Lisp system, but not very common otherwise.
ftype()
summary # the generic function
summary.lm # methods associated with generic function
summary.glm
> ftype(summary)
[1] "s3" "generic"
> ftype(summary.lm)
[1] "s3" "method"
> methods(summary)
[1] summary.aov summary.aovlist*
[3] summary.aspell* summary.check_packages_in_dir*
[5] summary.connection summary.corAR1*
[7] summary.corARMA* summary.corCAR1*
Classes are very casual in S3, it's essentially treated as an attribute, and can even have multiple classes, eg. glm. UseMethod though will look for the appropriate method based on the class that you pass to the class. DO NOT USE . when naming methods! It will make it look like an S3 object
foo <- c(1, 2, 3, 4)
class(foo) <- "something"
t(foo) # will look for the method t.something to run
Environments
Environments are basically just like lists, with 4 special characteristics.
1. They have parents,
2. elements of the list are not numbered
3. Each element defined is unique
4. Must use rm() to remove from environment, rather than setting to null.
- Special Environments
.GlobalEnv(globalenv()) - the global environmentbaseenv()environment of base, parent is emptyenvemptyenv()# ancestor of all. Only environment without a parent.- Packages have two associated environments,
ls() - examine the elements of an environment
rm() - to remove something from environment
ls.str() - looks at the environment with structure as well
pryr::where("name") - What environment is the name found in, with regular scoping rules
do.call(what, args, envir) - construct
Meta-programming
Convert a string to an R variable name on the fly (and vice versa)
x <- 42
eval(parse(text = "x")) # returns 42
deparse(substitute(x)) # returns "x"
Environment things
version
sessionInfo()
.libPaths()
Error checking
When debugging a detailed series of function calls, can use:
options(error = recover)
Matrix Related
Matrix Package
These are the dense matrices
* dgeMatrix - Real matrices in general storage mode
* dsyMatrix - Symmetric real matrices in non-packed storage
* dspMatrix - Symmetric real matrices in packed storage (one triangle only)
* dtrMatrix - Triangular real matrices in non-packed storage
* dtpMatrix - Triangular real matrices in packed storage (triangle only)
* dpoMatrix - Positive semi-definite symmetric real matrices in non-packed storage
* dppMatrix - ditto in packed storage
sparse matrices
* dgTMatrix - general, numeric, sparse matrices in (a possibly redundant) triplet form. This can be a convenient form in which to construct sparse matrices.
* dgCMatrix - general, numeric, sparse matrices in the (sorted) compressed sparse column format.
* dsCMatrix - symmetric, real, sparse matrices in the (sorted) compressed sparse column format. Only the upper or the lower triangle is stored. Although there is provision for both forms, the lower triangle form works best with TAUCS.
* dtCMatrix - triangular, real, sparse matrices in the (sorted) compressed sparse column format.
as(X, "dgCMatrix") # Force into sparse Matrix
forceSymmetric(x, uplo = "U") # take uplo = Upper (U) or Lower (L)
symmpart(x) # Average both upper and lower triangle
skewpart(x) # The difference between x and symmpart
Matrix Decompositions
qrX <- qr(X) # Gives QR object
qr.Q (qrX) # Gives the Q part
qr.R(qrX) # Gives the R part
qrX$rank # Gives the rank
Analysis
ANOVA/LM
Nonlinear
nls- seems to be the function to do so in R
mod <- nls(..., start = list(a = 0, b =2)) # initialize parameters yourself to avoid fit issues
profile(mod, which = "parameter")
confint(profile(mod)) # seems to be the safer way of doing things.. as to not run into optimization issues
Mixed Models
ranef(mod)- get random effects from modeldotplot.ranef.mer(randef(mod))- dotplot of random effects, needs latticeattr(ranef(mod)$Subject, "postVar")- There's also a "hidden" conditional variance associated with each of the random effects, these are accessed by the attribute "postVar" this is the vcov of each of the random effects I think?as.data.frame(ranef(mod))gives a secret representation of conditional variances
getMF(mod, ["X", "Z"])- get the fixed and random effect model matrices
Further Reading * Getting subject level standard errors * Ben Bolker FAQ * Estimated zero variance of estimate, Bates Email thread
GLMs
model.matrix(mod)- gets the design matrixmod$linear.predictors- X\beta
2x2 contingencies
table(dat$tx, dat$survival)- create contingency tableaddmargins()- add row and column sums to 2x2 tablecut(dat$w6minBl, quantile(dat$w6minBl,0:5/5,na.rm=T), include.lowest = TRUE))- Create strata
X <- table(dat$tx,dat$survival, dnn = c("TX", "Survival"))
X
d <- factor(c("treat","notreat"), levels=c("treat","notreat"))
modci <- glm(X~d, family=binomial)
summary(modci)
exp(confint(modci)[2,]) # Conf int
Residuals
plot(mod) # Pearson Residuals diagnostics
resid(mod, type="Deviance") # Deviance, Pearson, Working, Response, Partial
mod$residuals # Gives Working residuals
Contrast Coding
- Overview by UCLA
-
Crawley, 2002, is supposed to be a "transparent introduction" to contrast coding, according to
adonisdocumentation