Part 2: Basics of Python and Basic Data Types
Outline
- Basic data types
- Basic arithmetic
- Comparisons
- Basic data structures
- Boolean Operations
- Range
- String Formatting
- Group Exercise
A few notes before we get started:
- Learning all the nuances of python takes a long time! Our goal here is to introduce you to as many concepts as possible but if you are serious about mastering python you will need to apply yourself beyond this introduction.
- We bring up a lot of concepts to expose you to them but we encourage you to have a “scientific” mentality and highly encourage you to continue testing the waters beyond these materials.
Setup
cd /share/workshop/prereq_workshop/$USER/python
python3.8
Basic Data Types: Integers, Floating-point numbers, booleans, strings.
Integers
whole numbers, negative or positive
a = 1
type(a)
a
Floats
similar to decimal field
b = 1.2
type(b)
b
Booleans
“In computer science, the Boolean data type is a data type that has one of two possible values (usually denoted true and false) which is intended to represent the two truth values of logic and Boolean algebra. It is named after George Boole, who first defined an algebraic system of logic in the mid 19th century.” -wikipedia
c = True
d = False
type(c)
c
int(d)
# TODO float(d), long(d),
Strings
my_string = "Hello world!"
type(my_string)
my_string
print(my_string)
my_string + my_string
my_string[1] # this will make more sense when we learn lists later
Challenge Questions:
- What is an empty string and string in terms of a boolean value? (Hint:
bool(some_string)
)
Arithmetic: Adding, subtracting, multiplication, assignment arithmetic (assignment operators).
# This is a comment
type(a)
type(b)
# Addition
a + b
type(a+b)
# Subtraction
b - a
type(b-a)
# Division
a/b
type(a/b)
# Exponents
4**b
#or
pow(4,b)
type(a**b)
# Remainder
4 % 3
# Absolute value
abs(22-32)
# Round, Floor, Ceiling
round(3.2)
int(3.2)
import math
math.ceil(3.2)
math.floor(3.7)
int(3.7)
Notice the error in the subtraction that produces 0.19999999999996 that should be 0.2
Assignment operators
a += 1
a
a -= 1
a
Challenge Questions:
- We have a value of 200 gene counts for a certain sample. One gene has a count of 7.
- What is the relative proportion of reads for this gene?
- We find some novel information saying that all values have had
sqrt()
orx**0.5
performed. Now what is the relative proportion?
Comparisons: <, >, <= , >= , ==, !=
1<1
1<2
2>1
1<=1
2>=1
1==1
0==1
Basic Data Structures: Lists, Sets, Tuples, Dictionaries.
Lists
my_list = [1,2,3,4,5,6]
type(my_list)
# getting the first element in a list (0 index)
my_list[0]
# getting the last element in a list
my_list[-1]
# OR
my_list[5]
# getting a range of the list
my_list[-3:]
my_list[1:3]
my_list[:3]
# some other features of integer lists
sum(my_list)
len(my_list)
my_list2 = [7,8,9]
# adding and subtracting lists
my_list+my_list2
my_list-my_list2
# string lists, double indexing
my_string_list = ['the', 'dog', 'says', 'woof']
my_string_list[0][1:]
# every x number of indexes
my_string_list[::2]
my_string_list[::3]
# join, splits, replace,
" ".join(my_string_list)
list(my_string_list[0])
" ".join(my_string_list).split()
" ".join(my_string_list).split('s')
" ".join(my_string_list).replace(" ", "-")
Some more list features: count, pop, append, reassignment
my_string_list = ['the', 'dog', 'says', 'woof']
my_string_list.append('the')
my_string_list.count('the')
my_string_list.pop()
my_string_list.count('the')
my_string_list.reverse()
my_string_list
Challenge Questions:
- We have the following dna sequence. We are curious about the adapters of the sequence (first and last 15 bps in the sequence.)
some_dna = "ATCAATGCGCGCATACGATCAATGCGCGCATACGATCAATGCGCGCATACGGGTCCATACGCAATCAATGCGCGCATA"
Can you tell me if they match? What is the GC content of each adapter?
Tuples
my_tuple = (1,2,3,4,5,6)
type(my_tuple)
# getting the first element in a tuple (0 index)
my_tuple[0]
# getting the last element in a tuple
my_tuple[-1]
# OR
my_tuple[5]
# getting a range of the tuple
my_tuple[-3:]
my_tuple[1:3]
my_tuple[:3]
my_tuple2 = (7,8,9)
# adding and subtracting tuples
my_tuple+my_tuple2
my_tuple-my_tuple2
Tuples vs Lists
my_list[0] = 135
my_list
my_tuple[0] = 135
my_tuple
There are quite a few differences between the two and some other features we talked about earlier for lists may work with tuples.
I don’t use tuples a ton but feel free to experiment and see what you can do with one but not the other. Check out the docs or see
what features a class has using list.__dict__.keys()
and tuple.__dict__.keys()
. (These commands will make more sense later when
we talk about attributes and dictionaries)
Sets
Lets consider the events in my_set and my_set2 with respect to events A and B commonly used in denoting events in set theory. With respect to the pictures in the graph below where GREEN is the results objects in the set, consider the following events:
- The relative complement is considered set difference in statistics. That is events in one group (B or
my_set2
) not in the other group (A ormy_set
).- OR
- The relative complement is considered set difference in statistics. That is events in one group (A or
my_set
) not in the other group (B ormy_set2
).- OR
- In stats a union of two events is consider events in situation A and B. Similarly we consider
my_set
andmy_set2
.- OR
- In stats a union of two events is consider events in situation A or B. Similarly we consider
my_set
ormy_set2
.- OR
my_set = {1,2,3,4,5,5}
my_set
#OR
my_set = set([1,2,3,4,5,5])
my_set
type(my_set)
# Can we index sets like we do lists? No
my_set[0]
my_set2 = {5,7,8,9}
# intersection and union of sets etc.
# union
my_set|my_set2
# intersection
my_set&my_set2
# subtracting sets
my_set-my_set2
my_set2-my_set
Challenge Questions:
- What is the union, intersection, and difference between the following two sets of amino acids in protein A and B?
A = {'G', 'F', 'P', 'S', 'T', 'Y', 'C', 'M', 'M', 'Q'} B = { 'Y', 'C', 'Q', 'G', 'F', 'P', 'T', 'W', 'E', 'N'}
Dictionaries
- A list of codes, terms, keys, etc., and their meanings, used by a computer program or system.
- Dictionary values are pointed too by the keys. Values can be anything from int, float, and bool to lists, tuples, and dictionaries.
my_dict = {'a':1, 'b':2, 'c':3}
type(my_dict)
my_dict.keys()
my_dict.values()
my_dict['a']
# adding a new key value set to the dictionary
my_dict['d'] = 4
my_dict
Boolean Operations: and, or, not
1 and 1
0 and 1
0 or 1
not 1
not 0
not True
not False
True and False
True or False
None and None
None or None
bool(None) # we will see why this is important later when we talk about if statements in part3
bool(0)
bool(1)
Range
The range()
function returns a sequence of numbers, starting from 0 by default, and increments by 1 (by default), and stops before a specified number.
Much more info is available on this here.
type(range(0,10))
# first arg is start, 2nd is finish, 3rd is gap size (default is 1)
list(range(0,10))
list(range(0,10,5))
String Formatting
This is just a brief intro. There are lots of features to this topic but this seems to solve 95% of tasks/cases for me. If you interested in more info regarding the other featrures checkout Pyformat.
extra = "Goodbye."
print("It was nice to meet you. %s" %extra)
print("It was nice to meet you. %s %s" %(extra,extra))
print(f"It was nice to meet you. {extra}")
# or
print("It was nice to meet you. {}".format(extra))
Group Exercises (~30 mins)
- Create a list of all even values from 0 to 100.
- What is the length of the list?
- What is the average of the list?
- What is the sum of the 13 to the 17th elements in the list?
- What is the 16th element from the end?
- Replace the 23rd element with the value 200. Now what is the average?
- Learning to use python it is good to have an experimental mentality to learn more about edge cases
- What is the empty list in terms of a boolean value?
- What is an empty dictionary in terms of a boolean value?
- What about a float value of 0?
- Create a string in python with some information about yourself (Name, background, etc.)
- Take the string you created and make a list out of it where each word is an element.
- Now recreate the string with two spaces where there was originally one space. There are two ways, one using the list and one using the original string. Can you think of both?