☰ Menu

      Genome Assembly Workshop 2020

Home
Introduction and Lectures
Intro to the Workshop and Core
Schedule
Genome Assembly
Introduction to the DNA Tech Core
A Brief Overview of Genome Annotation, with a Focus on the Use of Isoseq
Support
Cheat Sheets
Software and Links
Scripts
Prerequisites
CLI - Logging in and Transferring Files
CLI - Intro to Command-Line
CLI - Advanced Command-Line (extra)
CLI - Running jobs on the Cluster and using modules
Conda
R - Getting Started
R - Intro to R
R - Prepare Data in R (extra)
R - Data in R (extra)
More Materials (extra)
Snakemake
Introduction
Challenge Answers
K-mers
K-mers tutorial
PacBio
Introduction to PacBio HiFi Data and Applications
Genome Assembly with PacBio HiFi Data
Improved Phased Assembly (IPA) Using HiFi Data
Assembling the drosophila genome with IPA and HiFi data
ONT Assembly
Introduction to ONT
Assembly using ONT - Hands-on
Bionano
Optical mapping for accurate genome assembly, comparative genomics, and haplotype segregation
Phase Genomics
Using Proximity to Fix Assembly
Genome Assessment
BUSCO
Additional QA/QC and metrics
ETC
Closing thoughts
Workshop Photos
Github page
Report Errors
Biocore website

What is R?

R is a language and environment for statistical computing and graphics developed in 1993. It provides a wide variety of statistical and graphical techniques (linear and nonlinear modeling, statistical tests, time series analysis, classification, clustering, …), and is highly extensible, meaning that the user community can write new R tools. It is a GNU project (Free and Open Source).

The R language has its roots in the S language and environment which was developed at Bell Laboratories (formerly AT&T, now Lucent Technologies) by John Chambers and colleagues. R was created by Ross Ihaka and Robert Gentleman at the University of Auckland, New Zealand, and now, R is developed by the R Development Core Team, of which Chambers is a member. R is named partly after the first names of the first two R authors (Robert Gentleman and Ross Ihaka), and partly as a play on the name of S. R can be considered as a different implementation of S. There are some important differences, but much code written for S runs unaltered under R.

Some of R’s strengths:

The R environment

R is an integrated suite of software facilities for data manipulation, calculation and graphical display. It includes

The term “environment” is intended to characterize it as a fully planned and coherent system, rather than an incremental accretion of very specific and inflexible tools, as is frequently the case with other data analysis software.

R, like S, is designed around a true computer language, and it allows users to add additional functionality by defining new functions. Much of the system is itself written in the R dialect of S, which makes it easy for users to follow the algorithmic choices made. For computationally-intensive tasks, C, C++ and Fortran code can be linked and called at run time. Advanced users can write C code to manipulate R objects directly.

Many users think of R as a statistics system. The R group prefers to think of it of an environment within which statistical techniques are implemented.

The R Homepage

The R homepage has a wealth of information on it,

R-project.org

On the homepage you can:

RStudio

RStudio started in 2010, to offer R a more full featured integrated development environment (IDE) and modeled after matlabs IDE.

RStudio has many features:

RStudio and its team have contributed to many R packages.[13] These include:

1. Getting started

Let’s start RStudio

RStudio_open

2. Open a new RScript File

File -> New File -> R Script

RStudio_newfile

Then save the new empty file as Intro2R.R

File -> Save as -> Intro2R.R

3. Basics of your environment

The R prompt is the ‘>’ , when R is expecting more (command is not complete) you see a ‘+’

Prompt

4. Writing and running R commands

In the source editor (top left by default) type

getwd()

Then on the line Control + Enter (Linux/Windows), Command + Enter (Mac) to execute the line.

5. The assignment operator ( <- ) vs equals ( = )

The assignment operator is used assign data to a variable

x <- 1:10
x
[1] 1 2 3 4 5 6 7 8 9 10

In this case, the equal sign works as well

x = 1:10
x
[1] 1 2 3 4 5 6 7 8 9 10

But you should NEVER EVER DO THIS

1:10 -> x
x
[1] 1 2 3 4 5 6 7 8 9 10

The two act the same in most cases. The difference in assignment operators is clearer when you use them to set an argument value in a function call. For example:

median(x = 1:10)
x 
Error: object 'x' not found

In this case, x is declared within the scope of the function, so it does not exist in the user workspace.

median(x <- 1:10)
x
[1] 1 2 3 4 5 6 7 8 9 10

In this case, x is declared in the user workspace, so you can use it after the function call has been completed. There is a general preference among the R community for using <- for assignment (other than in function signatures)

6. The RStudio Cheat Sheets

rstudio-ide.pdf

spend 15m getting to know RStudio a little