☰ Menu

        Sept. 2019 Microbial Community Analysis Workshop

Home
Introduction and Lectures
Intro to the Workshop and Core
What is Bioinformatics?
Experimental Design and Cost Estimation
Introduction to Command-Line and the Cluster
Logging in and Transferring Files
Intro to Command-Line
Advanced Command-Line (extra)
Running jobs on the Cluster and using modules
Intro to R and Rstudio
Getting Started
Intro to R
Prepare Data in R (extra)
Data in R (extra)
dbcAmplicons
dbcAmplicons Installing Software
dbcAmplicons - Amplicons talk
dbcAmplicons - Bioinformatics talk
Dataset and Metadata
dbcAmplicons - Data processing
dbcAmplicons w/Dada2
Coming soon
Microbial Community Analysis in R
Prepare MCA Analysis
MCA Analysis in phyloseq
Support
Cheat Sheets
Software and Links
Scripts
ETC
Closing thoughts
Workshop Photos
Github
Biocore website

What is R?

R is a language and environment for statistical computing and graphics developed in 1993. It provides a wide variety of statistical and graphical techniques (linear and nonlinear modeling, statistical tests, time series analysis, classification, clustering, …), and is highly extensible, meaning that the user community can write new R tools. It is a GNU project (Free and Open Source).

The R language has its roots in the S language and environment which was developed at Bell Laboratories (formerly AT&T, now Lucent Technologies) by John Chambers and colleagues. R was created by Ross Ihaka and Robert Gentleman at the University of Auckland, New Zealand, and now, R is developed by the R Development Core Team, of which Chambers is a member. R is named partly after the first names of the first two R authors (Robert Gentleman and Ross Ihaka), and partly as a play on the name of S. R can be considered as a different implementation of S. There are some important differences, but much code written for S runs unaltered under R.

Some of R’s strengths:

The R environment

R is an integrated suite of software facilities for data manipulation, calculation and graphical display. It includes

The term “environment” is intended to characterize it as a fully planned and coherent system, rather than an incremental accretion of very specific and inflexible tools, as is frequently the case with other data analysis software.

R, like S, is designed around a true computer language, and it allows users to add additional functionality by defining new functions. Much of the system is itself written in the R dialect of S, which makes it easy for users to follow the algorithmic choices made. For computationally-intensive tasks, C, C++ and Fortran code can be linked and called at run time. Advanced users can write C code to manipulate R objects directly.

Many users think of R as a statistics system. The R group prefers to think of it of an environment within which statistical techniques are implemented.

The R Homepage

The R homepage has a wealth of information on it,

R-project.org

On the homepage you can:

RStudio

RStudio started in 2010, to offer R a more full featured integrated development environment (IDE) and modeled after matlabs IDE.

RStudio has many features:

RStudio and its team have contributed to many R packages.[13] These include:

1. Getting started

Let’s start RStudio

RStudio_open

2. Open a new RScript File

File -> New File -> R Script

RStudio_newfile

Then save the new empty file as Intro2R.R

File -> Save as -> Intro2R.R

3. Basics of your environment

The R prompt is the ‘>’ , when R is expecting more (command is not complete) you see a ‘+’

Prompt

4. Writing and running R commands

In the source editor (top left by default) type

getwd()

Then on the line Control + Enter (Linux/Windows), Command + Enter (Mac) to execute the line.

5. The assignment operator ( <- ) vs equals ( = )

The assignment operator is used assign data to a variable

x <- 1:10
x
[1] 1 2 3 4 5 6 7 8 9 10

In this case, the equal sign works as well

x = 1:10
x
[1] 1 2 3 4 5 6 7 8 9 10

But you should NEVER EVER DO THIS

1:10 -> x
x
[1] 1 2 3 4 5 6 7 8 9 10

The two act the same in most cases. The difference in assignment operators is clearer when you use them to set an argument value in a function call. For example:

median(x = 1:10)
x
Error: object 'x' not found

In this case, x is declared within the scope of the function, so it does not exist in the user workspace.

median(x <- 1:10)
x
[1] 1 2 3 4 5 6 7 8 9 10

In this case, x is declared in the user workspace, so you can use it after the function call has been completed. There is a general preference among the R community for using <- for assignment (other than in function signatures)

6. The RStudio Cheat Sheets

rstudio-ide.pdf

spend 15m getting to know RStudio a little