☰ Menu

      Intermediate Visualization for Bioinformatics

Home
Introduction
Introduction to the Workshop and the Core
Course Materials
Prepare R and RStudio
Base R graphics
Introduction to ggplot2 and box plots
Scatter plots
Bar plots
Advanced plotting
ETC
Github page
Biocore website

BASE R GRAPHICS

Download and open the Rmd file for this section:

download.file("https://raw.githubusercontent.com/ucdavis-bioinformatics-training/2025-August-Intermediate-Visualization-for-Bioinformatics/refs/heads/master/R/baseRgraphics.Rmd", "baseRgrapics.Rmd")

First let’s download some data to work with and take a look at it:

counts = read.delim("https://raw.githubusercontent.com/ucdavis-bioinformatics-training/2025-August-Intermediate-Visualization-for-Bioinformatics/refs/heads/master/R/raw_counts.txt", header=T)
head(counts)
##            C61  C62  C63  C64  C91  C92  C93 C94 I561 I562 I563 I564 I591 I592
## AT1G01010  322  346  256  396  372  506  361 342  638  488  440  479  770  430
## AT1G01020  149   87  162  144  189  169  147 108  163  141  119  147  182  156
## AT1G01030   15   32   35   22   24   33   21  35   18    8   54   35   23    8
## AT1G01040  687  469  568  651  885  978  794 862  799  769  725  715  811  567
## AT1G01046    1    1    5    4    5    3    0   2    4    3    1    0    2    8
## AT1G01050 1447 1032 1083 1204 1413 1484 1138 938 1247 1516  984 1044 1374 1355
##           I593 I594 I861 I862 I863 I864 I891 I892 I893 I894
## AT1G01010  656  467  143  453  429  206  567  458  520  474
## AT1G01020  153  177   43  144  114   50  161  195  157  144
## AT1G01030   16   24   42   17   22   39   26   28   39   30
## AT1G01040  831  694  345  575  605  404  735  651  725  591
## AT1G01046    8    1    0    4    0    3    5    7    0    5
## AT1G01050 1437 1577  412 1338 1051  621 1434 1552 1248 1186

Let’s start with a simple plot with axis labels:

plot(counts[,"C61"], counts[,"C62"], xlab="C61", ylab="C62")

Let’s add a x=y line:

plot(counts[,"C61"], counts[,"C62"], xlab="C61", ylab="C62")
abline(0,1)

Change the plotting character:

plot(counts[,"C61"], counts[,"C62"], xlab="C61", ylab="C62", pch=23, col="red")
abline(0,1)

Create your own axes:

plot(counts[,"C61"], counts[,"C62"], xlab="C61", ylab="C62", pch=23, axes=F)
axis(1, at=seq(0,80000,by=10000))
axis(2, at=seq(0,100000,by=10000))

Add color to various things and change axis line widths:

plot(counts[,"C61"], counts[,"C62"], xlab="C61", ylab="C62", pch=23, axes=F, col="red")
axis(1, at=seq(0,80000,by=10000), col="blue", col.ticks="orange", col.axis="green", lwd=5)
axis(2, at=seq(0,100000,by=10000), col="blue", col.ticks="orange", col.axis="green", lwd=5)

Specify the ticks in a different way:

# xaxp=c(start,end,number of intervals)
plot(counts[,"C61"], counts[,"C62"], xlab="C61", ylab="C62", 
     xaxp=c(0,80000,4), 
     yaxp=c(0,100000,4))

Now, a simple boxplot:

boxplot(counts)

Let’s create a smaller dataset to work with, add some factor data, and then use the melt function to change the shape of our data:

library(reshape2)
tc = as.data.frame(t(counts))
regions = c(rep("region1",6), rep("region2",6), rep("region3",6), rep("region4",6))
tc2 = tc[,1:6]
tc2$regions = regions
tcm = melt(tc2)
## Using regions as id variables
head(tcm)
##   regions  variable value
## 1 region1 AT1G01010   322
## 2 region1 AT1G01010   346
## 3 region1 AT1G01010   256
## 4 region1 AT1G01010   396
## 5 region1 AT1G01010   372
## 6 region1 AT1G01010   506

Now create a boxplot with colors and with the data separated by region:

cols <- c("red", "blue", "orange", "green")
boxplot(value ~ regions, data=tcm, col = cols, xlab="GeneIDs", ylab="Counts")

We can also add the points to the boxplot:

cols <- c("red", "blue", "orange", "green")
boxplot(value ~ regions, data=tcm, col = cols, xlab="GeneIDs", ylab="Counts")
stripchart(value ~ regions, data=tcm,
           vertical=TRUE, 
           method="jitter", 
           pch=19,
           add=TRUE)

Something a little more complicated. Boxplots of regions vs. values, but separated by variable (GeneIDs):

boxplot(value ~ regions + variable, data=tcm, col = cols, xaxt="n", yaxt="n", xlab="GeneIDs", ylab="Counts")
# use the par function to get the dimensions of the plot
xlen = par("usr")[2]

axis(side=1, labels=colnames(tc2)[1:6], at=seq(2,xlen-2,length.out=6), cex.axis=0.7)
axis(side=2, las=2)
# add a legend
legend("topleft", fill=cols, legend=unique(regions))

Boxplot for a single gene, separated by regions:

tc3 = data.frame(AT1G01010=tc$AT1G01010, regions=regions)
rownames(tc3) = rownames(tc)
boxplot(AT1G01010 ~ regions, data=tc3)

xyplot (bivariate scatterplot) of count data, separated by GeneID:

library(lattice)
xyplot(value ~ variable, data=tcm, scales=list(rot=90), xlab="GeneIDs", ylab="Counts")

Same xyplot, but also separated by regions:

xyplot(value ~ variable | regions, data=tcm, scales=list(rot=90), xlab="GeneIDs", ylab="Counts")

Same xyplot but with both GeneIDs and regions:

xyplot(value ~ variable + regions, data=tcm, scales=list(rot=90), xlab="GeneIDs", ylab="Counts")

Boxplot for each GeneID in one plot:

boxplot(value ~ variable, data = tcm, las=2, ylab="Counts", xlab="")

Now, the same boxplot but separating by GeneIDs and regions together:

# need to change the margins of the plot to fit the labels
par(mar = c(10, 5, 2, 0))
boxplot(value ~ variable * regions, data = tcm, las=2, ylab="Counts", xlab="")
# use mtext to create an x-axis label above the graph
mtext("GeneID:Region", side = 3, line = 1)

Simple histogram:

hist(counts[,"C61"], xlab="Counts")

Refine the histogram:

hist(counts[,"C61"], xlab="Counts", breaks=1000, xlim=c(0,3000))

VISUALIZATION CHALLENGE

Download the gapminder dataset to your Rstudio session using the following URL:

https://raw.githubusercontent.com/ucdavis-bioinformatics-training/2025-August-Intermediate-Visualization-for-Bioinformatics/refs/heads/master/R/gapminder.csv

Take a look at the dataset. Subset the data to only look at rows from 1982 . Then make a scatterplot of gdp vs life exp (using the plot function) of the subset, adding x and y labels. Find out how to log scale the x axis from the “plot.default” documentation.

Next, make a named vector of colors with the names being the continents. Use the vector to add colors to each point based on the continent for that point. You may need to look at the “col” section in the “plot.default” documentation. You can also use the “colors()” function to get a list of all of the available colors. Add a legend to the plot showing which colors correspond to which continent.

Finally, create a function that takes in a numeric vector and a minimum and maximum circle size. Within the function, take the square root of the vector, then use the min-max normalization method to normalize each element of the vector. Once you have normalized data, use those values to scale between the minimum and maximum circle size. Use this function to change the size of the circles to correspond to the population of the country using the “cex” parameter for the scatter plot.