☰ Menu

      Bioinformatics: Command Line/R Prerequisites 2020

Home
Introduction
Intro to the Workshop and Core
Schedule
Resources
Zoom help
Slack help
Software and Links
Cheat Sheets
Intro CLI
Logging in
Introduction to the Command Line part 1
Introduction to the Command Line part 2
The cluster and modules
Introduction to Python
Python Part1, Background
Python Part2, Data Types
Python Part2, Solutions
Python Part3, Flow Control
Python Part3, Solutions
Python Part4, Working with Files
Python Part4, Solutions
Advanced CLI
Advanced Command Line
Advanced Challenge Solutions
A Simple Bioinformatics Workflow
Software Installation
Using make and cmake
Using Conda
Getting set up with AWS
AWS Event Engine
Starting and Stopping your AWS Instance
Introduction to R
Introduction to R
Introduction to tidyverse
Organizing, manipulating, and visualizing data in the tidyverse
Advanced R
Linear models in R
ETC
Closing thoughts
Github page
Biocore website

Part 2: Basics of Python and Basic Data Types

Outline

A few notes before we get started:

Setup

cd /share/workshop/prereq_workshop/$USER/python
python3.8

Basic Data Types: Integers, Floating-point numbers, booleans, strings.

Integers

whole numbers, negative or positive

a = 1
type(a)
a
>>> a = 1 >>> type(a) <class 'int'> >>> a 1

Floats

similar to decimal field

b = 1.2
type(b)
b
>>> b = 1.2 >>> type(b) <class 'float'> >>> b 1.2

Booleans

“In computer science, the Boolean data type is a data type that has one of two possible values (usually denoted true and false) which is intended to represent the two truth values of logic and Boolean algebra. It is named after George Boole, who first defined an algebraic system of logic in the mid 19th century.” -wikipedia

c = True
d = False
type(c)
c
int(d)
# TODO float(d), long(d),
>>> c = True >>> d = False >>> type(c) <class 'bool'> >>> c True >>> int(d) 0

Strings

my_string = "Hello world!"
type(my_string)
my_string
print(my_string)
my_string + my_string
my_string[1] # this will make more sense when we learn lists later
>>> my_string = "Hello world!" >>> type(my_string) <class 'str'> >>> my_string 'Hello world!' >>> print(my_string) Hello world! >>> my_string + my_string 'Hello world!Hello world!' >>> my_string[1] # this will make more sense when we learn lists later 'e'

Challenge Questions:

  1. What is an empty string and string in terms of a boolean value? (Hint: bool(some_string))

Arithmetic: Adding, subtracting, multiplication, assignment arithmetic (assignment operators).

if flow

# This is a comment
type(a)
type(b)

# Addition
a + b
type(a+b)

# Subtraction
b - a
type(b-a)

# Division
a/b
type(a/b)

# Exponents
4**b
#or
pow(4,b)
type(a**b)

# Remainder 
4 % 3

# Absolute value
abs(22-32)

# Round, Floor, Ceiling
round(3.2)
int(3.2)
import math
math.ceil(3.2)
math.floor(3.7)
int(3.7)

>>> # This is a comment >>> type(a) <class 'int'> >>> type(b) <class 'float'> >>> >>> # Addition >>> a + b 2.2 >>> type(a+b) <class 'float'> >>> >>> # Subtraction >>> b - a 0.19999999999999996 >>> type(b-a) <class 'float'> >>> >>> # Division >>> a/b 0.8333333333333334 >>> type(a/b) <class 'float'> >>> >>> # Exponents >>> 4**b 5.278031643091577 >>> type(a**b) <class 'float'>

Notice the error in the subtraction that produces 0.19999999999996 that should be 0.2

Assignment operators

if flow

a += 1
a

a -= 1
a
>>> a += 1 >>> a 2 >>> >>> a -= 1 >>> a 1

Challenge Questions:

  1. We have a value of 200 gene counts for a certain sample. One gene has a count of 7.
    • What is the relative proportion of reads for this gene?
    • We find some novel information saying that all values have had sqrt() or x**0.5 performed. Now what is the relative proportion?

Comparisons: <, >, <= , >= , ==, !=

if flow

1<1
1<2
2>1
1<=1
2>=1
1==1
0==1
>>> 1<1 False >>> 1<2 True >>> 2>1 True >>> 1<=1 True >>> 2>=1 True >>> 1==1 True >>> 0==1 False

Basic Data Structures: Lists, Sets, Tuples, Dictionaries.

Lists

my_list = [1,2,3,4,5,6]
type(my_list)

# getting the first element in a list (0 index)
my_list[0]

# getting the last element in a list
my_list[-1]
# OR
my_list[5]

# getting a range of the list
my_list[-3:]
my_list[1:3]
my_list[:3]

# some other features of integer lists
sum(my_list)
len(my_list)

my_list2 = [7,8,9]

# adding and subtracting lists
my_list+my_list2
my_list-my_list2


# string lists, double indexing
my_string_list = ['the', 'dog', 'says', 'woof']
my_string_list[0][1:]

# every x number of indexes
my_string_list[::2]
my_string_list[::3]

# join, splits, replace, 
" ".join(my_string_list)
list(my_string_list[0])
" ".join(my_string_list).split()
" ".join(my_string_list).split('s')
" ".join(my_string_list).replace(" ", "-")
>>> my_list = [1,2,3,4,5,6] >>> type(my_list) <class 'list'> >>> >>> # getting the first element in a list (0 index) >>> my_list[0] 1 >>> >>> # getting the last element in a list >>> my_list[-1] 6 >>> # OR >>> my_list[5] 6 >>> >>> # getting a range of the list >>> my_list[-3:] [4, 5, 6] >>> my_list[1:3] [2, 3] >>> my_list[:3] [1, 2, 3] >>> >>> # some other features of integer lists >>> sum(my_list) 21 >>> len(my_list) 6 >>> >>> my_list2 = [7,8,9] >>> >>> # adding and subtracting lists >>> my_list+my_list2 [1, 2, 3, 4, 5, 6, 7, 8, 9] >>> my_list-my_list2 Traceback (most recent call last): File TypeError: unsupported operand type(s) for -: 'list' and 'list' >>> >>> >>> # string lists, double indexing >>> my_string_list = ['the', 'dog', 'says', 'woof'] >>> my_string_list[0][1:] 'he' >>> >>> # every x number of indexes >>> my_string_list[::2] ['the', 'says'] >>> my_string_list[::3] ['the', 'woof'] >>> >>> # join, splits, replace, >>> " ".join(my_string_list) 'the dog says woof' >>> list(my_string_list[0]) ['t', 'h', 'e'] >>> " ".join(my_string_list).split() ['the', 'dog', 'says', 'woof'] >>> " ".join(my_string_list).split('s') ['the dog ', 'ay', ' woof'] >>> " ".join(my_string_list).replace(" ", "-") 'the-dog-says-woof'

Some more list features: count, pop, append, reassignment

my_string_list = ['the', 'dog', 'says', 'woof']
my_string_list.append('the')
my_string_list.count('the')
my_string_list.pop()
my_string_list.count('the')
my_string_list.reverse()
my_string_list
>>> my_string_list.append('the') >>> my_string_list.count('the') 2 >>> my_string_list.pop() 'the' >>> my_string_list.count('the') 1

Challenge Questions:

  1. We have the following dna sequence. We are curious about the adapters of the sequence (first and last 15 bps in the sequence.)
     some_dna = "ATCAATGCGCGCATACGATCAATGCGCGCATACGATCAATGCGCGCATACGGGTCCATACGCAATCAATGCGCGCATA"
    

    Can you tell me if they match? What is the GC content of each adapter?


Tuples

my_tuple = (1,2,3,4,5,6)
type(my_tuple)

# getting the first element in a tuple (0 index)
my_tuple[0]

# getting the last element in a tuple
my_tuple[-1]
# OR
my_tuple[5]

# getting a range of the tuple
my_tuple[-3:]
my_tuple[1:3]
my_tuple[:3]

my_tuple2 = (7,8,9)

# adding and subtracting tuples
my_tuple+my_tuple2
my_tuple-my_tuple2
>>> my_tuple = (1,2,3,4,5,6) >>> type(my_tuple) <class 'tuple'> >>> >>> # getting the first element in a tuple (0 index) >>> my_tuple[0] 1 >>> >>> # getting the last element in a tuple >>> my_tuple[-1] 6 >>> # OR >>> my_tuple[5] 6 >>> >>> # getting a range of the tuple >>> my_tuple[-3:] (4, 5, 6) >>> my_tuple[1:3] (2, 3) >>> my_tuple[:3] (1, 2, 3) >>> >>> my_tuple2 = (7,8,9) >>> >>> # adding and subtracting tuples >>> my_tuple+my_tuple2 (1, 2, 3, 4, 5, 6, 7, 8, 9) >>> my_tuple-my_tuple2 Traceback (most recent call last): File TypeError: unsupported operand type(s) for -: 'tuple' and 'tuple'

Tuples vs Lists

if flow

my_list[0] = 135
my_list
my_tuple[0] = 135
my_tuple

>>> my_list = [1,2,3,4,5,6] >>> my_list[0] = 135 >>> my_list [135, 2, 3, 4, 5, 6] >>> my_tuple[0] = 135 Traceback (most recent call last): File TypeError: 'tuple' object does not support item assignment >>> my_tuple (1, 2, 3, 4, 5, 6)

There are quite a few differences between the two and some other features we talked about earlier for lists may work with tuples. I don’t use tuples a ton but feel free to experiment and see what you can do with one but not the other. Check out the docs or see what features a class has using list.__dict__.keys() and tuple.__dict__.keys(). (These commands will make more sense later when we talk about attributes and dictionaries)

Sets

Lets consider the events in my_set and my_set2 with respect to events A and B commonly used in denoting events in set theory. With respect to the pictures in the graph below where GREEN is the results objects in the set, consider the following events:

if flow if flow

my_set = {1,2,3,4,5,5}
my_set
#OR 
my_set = set([1,2,3,4,5,5]) 
my_set
type(my_set)


# Can we index sets like we do lists? No
my_set[0]


my_set2 = {5,7,8,9}

# intersection and union of sets etc.

# union
my_set|my_set2
# intersection
my_set&my_set2
# subtracting sets
my_set-my_set2
my_set2-my_set

>>> my_set = {1,2,3,4,5,5} >>> my_set {1, 2, 3, 4, 5} >>> #OR >>> my_set = set([1,2,3,4,5,5]) >>> my_set {1, 2, 3, 4, 5} >>> type(my_set) <class 'set'> >>> >>> >>> # Can we index sets like we do lists? No >>> my_set[0] Traceback (most recent call last): File TypeError: 'set' object is not subscriptable >>> >>> >>> my_set2 = {5,7,8,9} >>> >>> # intersection and union of sets etc. >>> >>> # union >>> my_set|my_set2 {1, 2, 3, 4, 5, 7, 8, 9} >>> # intersection >>> my_set&my_set2 {5} >>> # subtracting sets >>> my_set-my_set2 {1, 2, 3, 4} >>> my_set2-my_set {8, 9, 7}

Challenge Questions:

  1. What is the union, intersection, and difference between the following two sets of amino acids in protein A and B?
    A = {'G', 'F', 'P', 'S', 'T', 'Y', 'C', 'M', 'M', 'Q'}
    B = { 'Y', 'C', 'Q', 'G', 'F', 'P', 'T', 'W', 'E', 'N'}
    

Dictionaries

my_dict = {'a':1, 'b':2, 'c':3}
type(my_dict)

my_dict.keys()
my_dict.values()
my_dict['a']

# adding a new key value set to the dictionary
my_dict['d'] = 4
my_dict
>>> my_dict = {'a':1, 'b':2, 'c':3} >>> type(my_dict) <class 'dict'> >>> >>> my_dict.keys() dict_keys(['a', 'b', 'c']) >>> my_dict.values() dict_values([1, 2, 3]) >>> my_dict['a'] 1 >>> my_dict['d'] {'a':1, 'b':2, 'c':3, 'd':4}

Boolean Operations: and, or, not

if flow if flow

1 and 1
0 and 1
0 or 1 
not 1
not 0
not True
not False
True and False
True or False
None and None
None or None


bool(None) # we will see why this is important later when we talk about if statements in part3
bool(0)
bool(1)
>>> 1 and 1 1 >>> 0 and 1 0 >>> 0 or 1 1 >>> not 1 False >>> not 0 True >>> not True False >>> not False True >>> True and False False >>> True or False True >>> None and None >>> None or None >>> >>> >>> bool(None) # we will see why this is important later when we talk about if statements in part3 False >>> bool(0) False >>> bool(1) True

Range

The range() function returns a sequence of numbers, starting from 0 by default, and increments by 1 (by default), and stops before a specified number.

if flow

Much more info is available on this here.

type(range(0,10))

# first arg is start, 2nd is finish, 3rd is gap size (default is 1)
list(range(0,10))
list(range(0,10,5))
<class 'range'> >>> # first arg is start, 2nd is finish, 3rd is gap size (default is 1) >>> list(range(0,10)) [0, 1, 2, 3, 4, 5, 6, 7, 8, 9] >>> list(range(0,10,5)) [0, 5]

String Formatting

This is just a brief intro. There are lots of features to this topic but this seems to solve 95% of tasks/cases for me. If you interested in more info regarding the other featrures checkout Pyformat.

if flow

extra = "Goodbye."
print("It was nice to meet you. %s" %extra)
print("It was nice to meet you. %s %s" %(extra,extra))
print(f"It was nice to meet you. {extra}")
# or
print("It was nice to meet you. {}".format(extra))
>>> extra = "Goodbye." >>> print("It was nice to meet you. %s" %extra) It was nice to meet you. Goodbye. >>> print("It was nice to meet you. %s %s" %(extra,extra)) It was nice to meet you. Goodbye. Goodbye. >>> print(f"It was nice to meet you. {extra}") It was nice to meet you. Goodbye. >>> print("It was nice to meet you. {}".format(extra)) It was nice to meet you. Goodbye.

Group Exercises (~30 mins)

  1. Create a list of all even values from 0 to 100.
    • What is the length of the list?
    • What is the average of the list?
    • What is the sum of the 13 to the 17th elements in the list?
    • What is the 16th element from the end?
    • Replace the 23rd element with the value 200. Now what is the average?
  2. Learning to use python it is good to have an experimental mentality to learn more about edge cases
    • What is the empty list in terms of a boolean value?
    • What is an empty dictionary in terms of a boolean value?
    • What about a float value of 0?
  3. Create a string in python with some information about yourself (Name, background, etc.)
    • Take the string you created and make a list out of it where each word is an element.
    • Now recreate the string with two spaces where there was originally one space. There are two ways, one using the list and one using the original string. Can you think of both?