☰ Menu

      RNA-Seq Analysis

Home
Introduction and Lectures
Intro to the Workshop and Core
Schedule
What is Bioinformatics/Genomics?
Experimental Design and Cost Estimation
RNA Sequencing Technologies - Dr. Lutz Froenicke
Support
Zoom
Slack
Cheat Sheets
Software and Links
Scripts
Prerequisites
CLI
R
Data Reduction
Files and Filetypes
Prepare dataset
Preprocessing raw data
Indexing a Genome
Alignment with Star
Generating counts tables
Alignment/Counts with Salmon (Extra)
Data analysis
Prepare R for data analysis
Annotation from BioMart
Differential Expression Analysis
Pathway Analysis
Comparison between STAR and Salmon
ETC
Closing thoughts
Workshop Photos
Github page
Report Errors
Biocore website

R and Rstudio Prerequisites

The analyses in this course are done in R, using the Rstudio environment. Some familiarity with R will help you make the most of your time. While we will provide R code that can be run even if you have never used R before, we find that participants who have spent some time exploring R before the course get the maximum value out of our workshops. There is not be enough time to provide full instruction in introductory R, and still get through all of the material we need to cover.

Beginners without any previous knowledge will be able to complete this course, and achieve a more thorough understanding of the techniques and analyses covered, but will probably not be able to conduct an experiment on their own.

If you don’t have any experience with R (or it has been a while) working through the first part of the Introduction to R for Bioinformatics course should give you a good foundation for this workshop. Those who are interested in running their own analyses may want to keep going through parts two and three, or even register for the introduction to R course.

What is R?

R is a language and environment for statistical computing and graphics developed in 1993. It provides a wide variety of statistical and graphical techniques (linear and nonlinear modeling, statistical tests, time series analysis, classification, clustering, …), and is highly extensible, meaning that the user community can write new R tools. It is a GNU project (Free and Open Source).

The R language has its roots in the S language and environment which was developed at Bell Laboratories (formerly AT&T, now Lucent Technologies) by John Chambers and colleagues. R was created by Ross Ihaka and Robert Gentleman at the University of Auckland, New Zealand, and now, R is developed by the R Development Core Team, of which Chambers is a member. R is named partly after the first names of the first two R authors (Robert Gentleman and Ross Ihaka), and partly as a play on the name of S. R can be considered as a different implementation of S. There are some important differences, but much code written for S runs unaltered under R.

Some of R’s strengths:

The R environment

R is an integrated suite of software facilities for data manipulation, calculation and graphical display. It includes

The term “environment” is intended to characterize it as a fully planned and coherent system, rather than an incremental accretion of very specific and inflexible tools, as is frequently the case with other data analysis software.

R, like S, is designed around a true computer language, and it allows users to add additional functionality by defining new functions. Much of the system is itself written in the R dialect of S, which makes it easy for users to follow the algorithmic choices made. For computationally-intensive tasks, C, C++ and Fortran code can be linked and called at run time. Advanced users can write C code to manipulate R objects directly.

Many users think of R as a statistics system. The R group prefers to think of it of an environment within which statistical techniques are implemented.

The R Homepage

The R homepage has a wealth of information on it,

R-project.org

On the homepage you can:

RStudio

RStudio started in 2010, to offer R a more full featured integrated development environment (IDE) and modeled after matlabs IDE.

RStudio has many features:

RStudio and its team have contributed to many R packages.[13] These include:

1. Getting started

Let’s start RStudio

RStudio_open

2. Open a new RScript File

File -> New File -> R Script

RStudio_newfile

Then save the new empty file as Intro2R.R

File -> Save as -> Intro2R.R

3. Basics of your environment

The R prompt is the ‘>’ , when R is expecting more (command is not complete) you see a ‘+’

Prompt

4. Writing and running R commands

In the source editor (top left by default) type

getwd()

Then on the line Control + Enter (Linux/Windows), Command + Enter (Mac) to execute the line.

5. The assignment operator ( <- ) vs equals ( = )

The assignment operator is used assign data to a variable

x <- 1:10
x
[1] 1 2 3 4 5 6 7 8 9 10

In this case, the equal sign works as well

x = 1:10
x
[1] 1 2 3 4 5 6 7 8 9 10

But you can, but should NEVER EVER DO

1:10 -> x
x
[1] 1 2 3 4 5 6 7 8 9 10

The two act the same in most cases. The difference in assignment operators is clearer when you use them to set an argument value in a function call. For example:

median(x = 1:10)
x
Error: object 'x' not found

In this case, x is declared within the scope of the function, so it does not exist in the user workspace.

median(x <- 1:10)
x
[1] 1 2 3 4 5 6 7 8 9 10

In this case, x is declared in the user workspace, so you can use it after the function call has been completed. There is a general preference among the R community for using <- for assignment (other than in function signatures)

Spend 15m getting to know RStudio a little.