Home
Introduction and Lectures
Intro to the Workshop and Core
Schedule
What is Bioinformatics/Genomics?
Experimental Design and Cost Estimation
RNA Sequencing Technologies - Dr. Lutz Froenicke
Support
Using Zoom
Using Slack
Cheat Sheets
Software and Links
Scripts
Prerequisites
CLI - Logging in and Transferring Files
CLI - Intro to Command-Line
CLI - Advanced Command-Line (extra)
CLI - Running jobs on the Cluster and using modules
R - Getting Started
R - Intro to R
R - Prepare Data in R (extra)
R - Data in R (extra)
More Materials (extra)
Data Reduction
Files and Filetypes
Prepare dataset
Preprocessing raw data
Indexing a Genome
Alignment with Star
Generating counts tables
Alignment/Counts with Salmon (Extra)
Data analysis
Prepare R for data analysis
Annotation from BioMart
Differential Expression Analysis
Pathway Analysis
Comparison between STAR and Salmon
ETC
Closing thoughts
Workshop Photos
Github page
Report Errors
Biocore website

Topics covered in this introduction to R

Basic concepts
Basic data types in R
Import and export data in R
Functions in R
Basic statistics in R
Simple data visulization in R
Install packages in R
Save data in R session

Topic 1. Basic concepts

There are three concepts that we should be familiar with before working in R:

Operators
Functions
Variables

Operators

Assignment Operators in R
Operator	Description
<-, =	Assignment

Arithmetic Operators in R
Operator	Description
+	Addition
-	Subtraction
*	Multiplication
/	Division
^	Exponent
%%	Modulus
%/%	Integer Division

Relational Operators in R
Operator	Description
<	Less than
>	Greater than
<=	Less than or equal to
>=	Greater than or equal to
==	Equal to
!=	Not equal to

Logical Operators in R
Operator	Description
!	Logical NOT
&	Element-wise logical AND
&&	Logical AND
\|	Element-wise logical OR
\|\|	Logical OR

Functions

Functions are essential in all programming languages. A function takes zero or more parameters and return a result. The way to use a function in R is:

function.name(parameter1=value1, …)

read.delim("rnaseq_workshop_counts_mm.txt", row.names = 1)

	mouse_110_WT_C	mouse_110_WT_NC	mouse_148_WT_C	mouse_148_WT_NC	mouse_158_WT_C	mouse_158_WT_NC	mouse_183_KOMIR150_C	mouse_183_KOMIR150_NC	mouse_198_KOMIR150_C	mouse_198_KOMIR150_NC	mouse_206_KOMIR150_C	mouse_206_KOMIR150_NC	mouse_2670_KOTet3_C	mouse_2670_KOTet3_NC	mouse_7530_KOTet3_C	mouse_7530_KOTet3_NC	mouse_7531_KOTet3_C	mouse_7532_WT_NC	mouse_H510_WT_C	mouse_H510_WT_NC	mouse_H514_WT_C	mouse_H514_WT_NC
ENSMUSG00000102693	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0
ENSMUSG00000064842	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0
ENSMUSG00000051951	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0
ENSMUSG00000102851	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0
ENSMUSG00000103377	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0
ENSMUSG00000104017	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0

Variables

A variable is a named storage. The name of a variable can have letters, numbers, dot and underscore. However, a valid variable name cannot start with a underscore or a number, or start with a dot that is followed by a number.

Using an assignment operator, we can store the results of a function inside a variable.

counts <- read.delim("rnaseq_workshop_counts_mm.txt", row.names = 1)

CHALLENGE

Which of the following assignments will work? Do you notice anything different about the behavior of any of these variables?

a <- 1
b = 2
.c <- 3
1a <- 4
.1a <- 5
a.1 <- 6

Topic 2. Basic data types in R

Basic Data Types
Type	Example
Numeric	3, 3e-2
Character	"ENSMUSG00000102693"
Logical	TRUE, FALSE
Factor	any categorical variable

Simple variables: variables that have a numeric value, a character value (such as a string), or a logical value (True or False)

We can check the type of data stored within a variable using the class function.

a <- 1
class(a)

## [1] "numeric"

a <- "ENSMUSG00000102693"
class(a)

## [1] "character"

CHALLENGE

Try the following assignments. What is the class of each of the variables produced? Why? What is happening when converting between types using the as.numeric, as.logical, and as.character functions?

a <- 150
b <- 3e-2
c <- “ENSMUSG00000102693”
d <- TRUE
e <- a < 1
f <- b < 1
g <- is.numeric(a)
h <- is.character(a)
i <- as.numeric(d)
j <- as.character(a)
l <- as.logical(b)
m <- as.numeric(f)
n <- as.logical(0)

Vectors: a vector is a combination of multiple values(numeric, character or logical) in the same object. A vector is created using the function c() (for concatenate).

The values in the vector must all be of the same type.

gene_ids <- c("ENSMUSG00000102693", "ENSMUSG00000064842", "ENSMUSG00000051951")
gene_ids

## [1] "ENSMUSG00000102693" "ENSMUSG00000064842" "ENSMUSG00000051951"

of_interest <- c(TRUE, FALSE, FALSE)
of_interest

## [1]  TRUE FALSE FALSE

length(of_interest)

## [1] 3

Elements of a vector can be named by providing names to an existing vector, or assigning names when creating a vector.

names(of_interest) <- gene_ids
of_interest

## ENSMUSG00000102693 ENSMUSG00000064842 ENSMUSG00000051951
##               TRUE              FALSE              FALSE

of_interest <- c(ENSMUSG00000102693 = TRUE, ENSMUSG00000064842 = FALSE, ENSMUSG00000051951 = FALSE)
of_interest

## ENSMUSG00000102693 ENSMUSG00000064842 ENSMUSG00000051951
##               TRUE              FALSE              FALSE

Elements of a vector can be accessed by index, or by name if it is a named vector. The element or elements to be accessed are specified using square brackets.

of_interest[2]

## ENSMUSG00000064842
##              FALSE

of_interest["ENSMUSG00000064842"]

## ENSMUSG00000064842
##              FALSE

of_interest[c(1,3)]

## ENSMUSG00000102693 ENSMUSG00000051951
##               TRUE              FALSE

of_interest[c("ENSMUSG00000102693", "ENSMUSG00000051951")]

## ENSMUSG00000102693 ENSMUSG00000051951
##               TRUE              FALSE

Additionally, a subset of a vector can be selected using a logical vector.

gene_ids[of_interest]

## [1] "ENSMUSG00000102693"

NOTE: a vector can only hold elements of the same type.

Factors: a factor represents categorical or groups in data. The function factor() can be used to create a factor variable.

treatment <- factor(c(1,2,1,2))
treatment

## [1] 1 2 1 2
## Levels: 1 2

class(treatment)

## [1] "factor"

In R, categories are called factor levels. The function levels() can be used to access the factor levels.

levels(treatment)

## [1] "1" "2"

Factor levels can also be changed by assigning new levels to a factor.

levels(treatment) <- c("A", "C")
treatment

## [1] A C A C
## Levels: A C

Change the order of levels.

levels(treatment) <- c("C", "A")
treatment

## [1] C A C A
## Levels: C A

By default, the order of factor levels is taken in the order of numeric or alphabetic.

treatment <- factor(c("C", "A", "C", "A"))
treatment

## [1] C A C A
## Levels: A C

The factor levels can be specified when creating the factor, if the order does not follow the default rule.

treatment <- factor(c("C", "A", "C", "A"), levels=c("C", "A"))
treatment

## [1] C A C A
## Levels: C A

If you want to know the number of individuals at each levels, there are two functions.

summary(treatment)

## C A
## 2 2

table(treatment)

## treatment
## C A
## 2 2

Matrices: A matrix is like a speadsheet containing multiple rows and columns. It is used to combine vectors of the same type.

col1 <- c(1,0,0,0)
col2 <- c(0,1,0,0)
col3 <- c(0,0,1,0)

my_matrix <- cbind(col1, col2, col3)
my_matrix

##      col1 col2 col3
## [1,]    1    0    0
## [2,]    0    1    0
## [3,]    0    0    1
## [4,]    0    0    0

rownames(my_matrix) <- c("row1", "row2", "row3", "row4")
my_matrix

##      col1 col2 col3
## row1    1    0    0
## row2    0    1    0
## row3    0    0    1
## row4    0    0    0

dim(my_matrix) # matrix dimensions

## [1] 4 3

nrow(my_matrix)

## [1] 4

ncol(my_matrix)

## [1] 3

t(my_matrix) # transpose

##      row1 row2 row3 row4
## col1    1    0    0    0
## col2    0    1    0    0
## col3    0    0    1    0

Accessing elements of a matrix is done in similar ways to accessing elements of a vector.

my_matrix[1,3]

## [1] 0

my_matrix["row1", "col3"]

## [1] 0

my_matrix[1,]

## col1 col2 col3
##    1    0    0

my_matrix[,3]

## row1 row2 row3 row4
##    0    0    1    0

my_matrix[col3 > 0,]

## col1 col2 col3
##    0    0    1

Matrices may be used in calculations. We will not be focusing on calculations or operations on matrices today, but you may want to explore the results of the code below later.

my_matrix * 3

##      col1 col2 col3
## row1    3    0    0
## row2    0    3    0
## row3    0    0    3
## row4    0    0    0

log10(my_matrix)

##      col1 col2 col3
## row1    0 -Inf -Inf
## row2 -Inf    0 -Inf
## row3 -Inf -Inf    0
## row4 -Inf -Inf -Inf

Total of each row.

rowSums(my_matrix)

## row1 row2 row3 row4
##    1    1    1    0

Total of each column.

colSums(my_matrix)

## col1 col2 col3
##    1    1    1

It is also possible to use the function apply() to apply any statistical functions to rows/columns of matrices. The advantage of using apply() is that it can take a function created by user.

The simplified format of apply() is as following:

apply(X, MARGIN, FUN)

X: data matrix MARGIN: possible values are 1 (for rows) and 2 (for columns) FUN: the function to apply on rows/columns

We can calculate the mean or the median of each row with the following:

apply(my_matrix, 1, mean)

##      row1      row2      row3      row4
## 0.3333333 0.3333333 0.3333333 0.0000000

apply(my_matrix, 1, median)

## row1 row2 row3 row4
##    0    0    0    0

CHALLENGE

How would you calculate the mean and median of each column, instead?

Data frames: a data frame is like a matrix but can have columns with different types (numeric, character, logical).

A data frame can be created using the function data.frame(), from new or previously defined vectors.

genes <- data.frame(ensembl = gene_ids, interest = of_interest)
genes

##                               ensembl interest
## ENSMUSG00000102693 ENSMUSG00000102693     TRUE
## ENSMUSG00000064842 ENSMUSG00000064842    FALSE
## ENSMUSG00000051951 ENSMUSG00000051951    FALSE

genes <- data.frame(ensembl = c("ENSMUSG00000102693", "ENSMUSG00000064842", "ENSMUSG00000051951"), interest = c(TRUE, FALSE, FALSE))
genes

##              ensembl interest
## 1 ENSMUSG00000102693     TRUE
## 2 ENSMUSG00000064842    FALSE
## 3 ENSMUSG00000051951    FALSE

is.data.frame(genes)

## [1] TRUE

To obtain a subset of a data frame can be done in similar ways to vectors and matrices: by index, by row or column names, or by logical values.

genes[1,]

##              ensembl interest
## 1 ENSMUSG00000102693     TRUE

genes[, "ensembl"]

## [1] "ENSMUSG00000102693" "ENSMUSG00000064842" "ENSMUSG00000051951"

genes[of_interest,]

##              ensembl interest
## 1 ENSMUSG00000102693     TRUE

A column of a data frame can be specified using the “$”.

genes$ensembl

## [1] "ENSMUSG00000102693" "ENSMUSG00000064842" "ENSMUSG00000051951"

genes[genes$interest,]

##              ensembl interest
## 1 ENSMUSG00000102693     TRUE

Data frames are easily extended using the “$” as well.

genes$name <- c("4933401J01Rik", "Gm26206", "Xkr4")
genes

##              ensembl interest          name
## 1 ENSMUSG00000102693     TRUE 4933401J01Rik
## 2 ENSMUSG00000064842    FALSE       Gm26206
## 3 ENSMUSG00000051951    FALSE          Xkr4

When we used the function read.delim() at the very beginning, we created a data frame.

class(counts)

## [1] "data.frame"

CHALLENGE

What type of data is stored in counts? Can you access a subset of that data? Can you add a column to counts? Explore counts using the following functions:

rownames()
colnames()
class()

Topic 3. Import and export data in R

R base function read.table() is a general function that can be used to read a file in table format. There are also several variants of read.table that can be used to make reading a file even easier, like read.delim, which we used above to create counts. The data will be imported as a data frame.

# To read a local file, provide read.table with the path for to file location (same as taught in the command line session). Here we assume rnaseq_workshop_counts_mm.txt is in our current working directory
counts2 <- read.table(file="rnaseq_workshop_counts_mm.txt", sep="\t", header=T, stringsAsFactors=F)

# To read a file from the internet, provide read.table with the URL.
counts3 <- read.table(file="https://raw.githubusercontent.com/ucdavis-bioinformatics-training/2020-mRNA_Seq_Workshop/master/datasets/rnaseq_workshop_counts_mm.txt", sep="\t", header=T, stringsAsFactors=F)

Take a look at the beginning part of the data frame.

head(counts2)

##                    mouse_110_WT_C mouse_110_WT_NC mouse_148_WT_C
## ENSMUSG00000102693              0               0              0
## ENSMUSG00000064842              0               0              0
## ENSMUSG00000051951              0               0              0
## ENSMUSG00000102851              0               0              0
## ENSMUSG00000103377              0               0              0
## ENSMUSG00000104017              0               0              0
##                    mouse_148_WT_NC mouse_158_WT_C mouse_158_WT_NC
## ENSMUSG00000102693               0              0               0
## ENSMUSG00000064842               0              0               0
## ENSMUSG00000051951               0              0               0
## ENSMUSG00000102851               0              0               0
## ENSMUSG00000103377               0              0               0
## ENSMUSG00000104017               0              0               0
##                    mouse_183_KOMIR150_C mouse_183_KOMIR150_NC
## ENSMUSG00000102693                    0                     0
## ENSMUSG00000064842                    0                     0
## ENSMUSG00000051951                    0                     0
## ENSMUSG00000102851                    0                     0
## ENSMUSG00000103377                    0                     0
## ENSMUSG00000104017                    0                     0
##                    mouse_198_KOMIR150_C mouse_198_KOMIR150_NC
## ENSMUSG00000102693                    0                     0
## ENSMUSG00000064842                    0                     0
## ENSMUSG00000051951                    0                     0
## ENSMUSG00000102851                    0                     0
## ENSMUSG00000103377                    0                     0
## ENSMUSG00000104017                    0                     0
##                    mouse_206_KOMIR150_C mouse_206_KOMIR150_NC
## ENSMUSG00000102693                    0                     0
## ENSMUSG00000064842                    0                     0
## ENSMUSG00000051951                    0                     0
## ENSMUSG00000102851                    0                     0
## ENSMUSG00000103377                    0                     0
## ENSMUSG00000104017                    0                     0
##                    mouse_2670_KOTet3_C mouse_2670_KOTet3_NC mouse_7530_KOTet3_C
## ENSMUSG00000102693                   0                    0                   0
## ENSMUSG00000064842                   0                    0                   0
## ENSMUSG00000051951                   0                    0                   0
## ENSMUSG00000102851                   0                    0                   0
## ENSMUSG00000103377                   0                    0                   0
## ENSMUSG00000104017                   0                    0                   0
##                    mouse_7530_KOTet3_NC mouse_7531_KOTet3_C mouse_7532_WT_NC
## ENSMUSG00000102693                    0                   0                0
## ENSMUSG00000064842                    0                   0                0
## ENSMUSG00000051951                    0                   0                0
## ENSMUSG00000102851                    0                   0                0
## ENSMUSG00000103377                    0                   0                0
## ENSMUSG00000104017                    0                   0                0
##                    mouse_H510_WT_C mouse_H510_WT_NC mouse_H514_WT_C
## ENSMUSG00000102693               0                0               0
## ENSMUSG00000064842               0                0               0
## ENSMUSG00000051951               0                0               0
## ENSMUSG00000102851               0                0               0
## ENSMUSG00000103377               0                0               0
## ENSMUSG00000104017               0                0               0
##                    mouse_H514_WT_NC
## ENSMUSG00000102693                0
## ENSMUSG00000064842                0
## ENSMUSG00000051951                0
## ENSMUSG00000102851                0
## ENSMUSG00000103377                0
## ENSMUSG00000104017                0

Depending on the format of the file, several variants of read.table() are available to make reading a file easier.

read.csv(): for reading “comma separated value” files (.csv).

read.csv2(): variant used in countries that use a comma “,” as decimal point and a semicolon “;” as field separators.

read.delim(): for reading “tab separated value” files (“.txt”). By default, point(“.”) is used as decimal point.

read.delim2(): for reading “tab separated value” files (“.txt”). By default, comma (“,”) is used as decimal point.

Choosing the correct function (or parameters) is important! If we use read.csv() to read our tab-delimited file, it becomes a mess.

counts2 <- read.csv(file="rnaseq_workshop_counts_mm.txt", stringsAsFactors=F)

head(counts2)

##   mouse_110_WT_C.mouse_110_WT_NC.mouse_148_WT_C.mouse_148_WT_NC.mouse_158_WT_C.mouse_158_WT_NC.mouse_183_KOMIR150_C.mouse_183_KOMIR150_NC.mouse_198_KOMIR150_C.mouse_198_KOMIR150_NC.mouse_206_KOMIR150_C.mouse_206_KOMIR150_NC.mouse_2670_KOTet3_C.mouse_26 ...
## 1                                                                                                                                                                           ENSMUSG00000102693\t0\t0\t0\t0\t0\t0\t0\t0\t0\t0\t0\t0\t0\t0\t0\t0\t0\t0\t0\t0\t0\t0
## 2                                                                                                                                                                           ENSMUSG00000064842\t0\t0\t0\t0\t0\t0\t0\t0\t0\t0\t0\t0\t0\t0\t0\t0\t0\t0\t0\t0\t0\t0
## 3                                                                                                                                                                           ENSMUSG00000051951\t0\t0\t0\t0\t0\t0\t0\t0\t0\t0\t0\t0\t0\t0\t0\t0\t0\t0\t0\t0\t0\t0
## 4                                                                                                                                                                           ENSMUSG00000102851\t0\t0\t0\t0\t0\t0\t0\t0\t0\t0\t0\t0\t0\t0\t0\t0\t0\t0\t0\t0\t0\t0
## 5                                                                                                                                                                           ENSMUSG00000103377\t0\t0\t0\t0\t0\t0\t0\t0\t0\t0\t0\t0\t0\t0\t0\t0\t0\t0\t0\t0\t0\t0
## 6                                                                                                                                                                           ENSMUSG00000104017\t0\t0\t0\t0\t0\t0\t0\t0\t0\t0\t0\t0\t0\t0\t0\t0\t0\t0\t0\t0\t0\t0

Since the data contained in these files is the same, we don’t need to keep three copies.

rm(counts2, counts3)

R base function write.table() can be used to export data to a file.

# To write to a file called "genes.txt" in your current working directory.
write.table(genes, file="genes.txt", sep="\t", quote=F, row.names=T, col.names=T)

It is also possible to export data to a csv file.

write.csv()

write.csv2()

Topic 4. Functions in R

We have already used a number of functions, including read.delim, class, mean, and many others. Functions can be invoked with 0 or more arguments.

# No arguments: use getwd() to find out the current working directory
getwd()

## [1] "/Users/hannah/Desktop/workshop_de_analysis_2020_07_30"

# No arguments: use ls() to list all variables in the environment
ls()

##  [1] "a"           "col1"        "col2"        "col3"        "colFmt"     
##  [6] "counts"      "gene_ids"    "genes"       "my_matrix"   "of_interest"
## [11] "treatment"

# One argument: use str() to find out information on a variable
str(genes)

## 'data.frame':	3 obs. of  3 variables:
##  $ ensembl : chr  "ENSMUSG00000102693" "ENSMUSG00000064842" "ENSMUSG00000051951"
##  $ interest: logi  TRUE FALSE FALSE
##  $ name    : chr  "4933401J01Rik" "Gm26206" "Xkr4"

# More arguments: use seq() to generate a vector of sequential numbers
?seq # how does seq() work?
seq(from = 2, to = 3, by = 0.1)

##  [1] 2.0 2.1 2.2 2.3 2.4 2.5 2.6 2.7 2.8 2.9 3.0

seq(2, 3, 0.1) # less explicit, same result

##  [1] 2.0 2.1 2.2 2.3 2.4 2.5 2.6 2.7 2.8 2.9 3.0

One useful function to find out information on a variable: str().

Topic 5. Basic statistics in R

Description	R_function
Mean	mean()
Standard deviation	sd()
Variance	var()
Minimum	min()
Maximum	max()
Median	median()
Range of values: minimum and maximum	range()
Sample quantiles	quantile()
Generic function	summary()
Interquartile range	IQR()

Calculate the mean expression for each sample.

apply(counts, 2, mean)

##        mouse_110_WT_C       mouse_110_WT_NC        mouse_148_WT_C
##              42.06295              51.04228              50.62418
##       mouse_148_WT_NC        mouse_158_WT_C       mouse_158_WT_NC
##              47.02723              53.47496              47.81311
##  mouse_183_KOMIR150_C mouse_183_KOMIR150_NC  mouse_198_KOMIR150_C
##              45.60650              33.48397              51.08072
## mouse_198_KOMIR150_NC  mouse_206_KOMIR150_C mouse_206_KOMIR150_NC
##              52.55656              24.99558              17.16092
##   mouse_2670_KOTet3_C  mouse_2670_KOTet3_NC   mouse_7530_KOTet3_C
##              52.23667              52.85384              46.98483
##  mouse_7530_KOTet3_NC   mouse_7531_KOTet3_C      mouse_7532_WT_NC
##              51.88361              47.93512              48.82374
##       mouse_H510_WT_C      mouse_H510_WT_NC       mouse_H514_WT_C
##              46.50284              51.01271              41.38043
##      mouse_H514_WT_NC
##              47.66608

CHALLENGE

What is the highest number of counts for a gene in sample mouse_110_WT_C?
What is the range of counts for ENSMUSG00000026739?

Topic 6. Simple data visulization in R

Scatter plot and line plot can be produced using the function plot().

x <- c(1:50)
y <- 1 + sqrt(x)/2
plot(x,y)

plot(x,y, type="l")

# plot both the points and lines
## first plot points
plot(x,y)
lines(x,y, type="l")

## lines() can only be used to add information to a graph, while it cannot produce a graph on its own.

boxplot() can be used to summarize data.

boxplot(counts, xlab="Sample ID", ylab="Counts")

hist() can be used to create histograms of data.

hist(counts$mouse_110_WT_C)

# use user defined break points
hist(counts$mouse_110_WT_C, breaks=seq(0, 25000, by=50))

# lower counts only
hist(counts$mouse_110_WT_C, breaks=seq(0, 25000, by=5), xlim = c(0, 250))

Topic 7. Install packages in R

Starting from Bioconductor version 3.8, the installation of packages is recommended to use BiocManager.

if (!requireNamespace("BiocManager"))
	install.packages("BiocManager")
# install core packages
BiocManager::install()
# install specific packages
BiocManager::install(c("ggplot2", "ShortRead"))

Bioconductor has a repository and release schedule that differ from R (Bioconductor has a ‘devel’ branch to which new packages and updates are introduced, and a stable ‘release’ branch emitted once every 6 months to which bug fixes but not new features are introduced). This mismatch causes that the version detected by install.packages() is sometimes not the most recent ‘release’.
A consequence of the ‘devel’ branch is that install.packages() sometimes points only to the ‘release’ repository, while users might want to have access to the leading-edge features in the ‘devel’ version.
An indirect consequence of Bioconductor’s structured release is that packages generally have more extensive dependences with one another.

It is always recommended to update to the most current version of R and Bioconductor. If it is not possible and R < 3.5.0, please use the legacy approach to install Bioconductor packages

source("http://bioconductor.org/biocLite.R")
# install core packages
biocLite()
# install specific packages
biocLite("RCircos")
biocLite(c("IdeoViz", "devtools"))

The R function install.packages() can be used to install packages that are not part of Bioconductor.

install.packages("ggplot2", repos="http://cran.us.r-project.org")

Topic 8. Save data in R session

To save objects in R session

save(list=c("genes", "counts"), file="intro_20200603.RData")

☰ Menu

mRNA-Seq Workshop

Topics covered in this introduction to R

Topic 1. Basic concepts

Operators

Functions

Variables

CHALLENGE

Topic 2. Basic data types in R

Simple variables: variables that have a numeric value, a character value (such as a string), or a logical value (True or False)

CHALLENGE

Vectors: a vector is a combination of multiple values(numeric, character or logical) in the same object. A vector is created using the function c() (for concatenate).

NOTE: a vector can only hold elements of the same type.

Factors: a factor represents categorical or groups in data. The function factor() can be used to create a factor variable.

Matrices: A matrix is like a speadsheet containing multiple rows and columns. It is used to combine vectors of the same type.

CHALLENGE

Data frames: a data frame is like a matrix but can have columns with different types (numeric, character, logical).

CHALLENGE

Topic 3. Import and export data in R

Topic 4. Functions in R

One useful function to find out information on a variable: str().

Topic 5. Basic statistics in R

CHALLENGE

Topic 6. Simple data visulization in R

Topic 7. Install packages in R

It is always recommended to update to the most current version of R and Bioconductor. If it is not possible and R < 3.5.0, please use the legacy approach to install Bioconductor packages

Topic 8. Save data in R session

To save objects in R session