☰ Menu

      Bioinformatics: Command Line/R Prerequisites 2020

Home
Introduction
Intro to the Workshop and Core
Schedule
Resources
Zoom help
Slack help
Software and Links
Cheat Sheets
Intro CLI
Logging in
Introduction to the Command Line part 1
Introduction to the Command Line part 2
The cluster and modules
Introduction to Python
Python Part1, Background
Python Part2, Data Types
Python Part2, Solutions
Python Part3, Flow Control
Python Part3, Solutions
Python Part4, Working with Files
Python Part4, Solutions
Advanced CLI
Advanced Command Line
Advanced Challenge Solutions
A Simple Bioinformatics Workflow
Software Installation
Using make and cmake
Using Conda
Getting set up with AWS
AWS Event Engine
Starting and Stopping your AWS Instance
Introduction to R
Introduction to R
Introduction to tidyverse
Organizing, manipulating, and visualizing data in the tidyverse
Advanced R
Linear models in R
ETC
Closing thoughts
Github page
Biocore website
  1. Use the below to perform the following tasks:
    seq = "CGGTAAGGCACTTGATATATTAATGTTCAGGCGGACATCGGCAGGGTATATTGCGTTAGGATCTAATTAATCTATCAGACTACTTGAGGATTGTCACCTGATGGTAATAGCGTAGTGGGGATCGCTAATCCTACTGTCGATAGACGGCTGCGGTTAAACTAAGCATCTTGCTTCCGGACGGTGGAACCGATTCAAGCGTTAGCAAATGTCAGGTTCACACTAAAGAATCAGCGGGTCTCCCCTACATCTTGAGTTTTATGGCTAACCCTATATCTGTCGATAGCATGGCAGGTTCAATTTAATCACAGTGCTTGCACTGACCTCGCCTACCGGAAGCCCCGCCGCCCAAAGTGACACCAGAGTCTTCGTCACGCAGATAACGCCCCGTGCTATCTGTCCCCCGTCCTTCGAGCAGGAGTTTGGTCTGACGCCTACTTCGGCGAACGTAACCCCTCCTTGTCTGTATTAAACTGCCCCGGATGTCGACTGGTAACAAGACGGACTACAAATAATGTGTACCTTTAGACGTTCTTTAGTGATATATTGTGACCACGTATTACAGAGATACACGACATCTCCTTTATAGGTACACATAACCTAGGATCCGTAGCACCGGGCCGGATTCTGCCCAGCAAGGTGCATCCCATAAGCGTAAACACTGCACGGCGGTCAGAACTCCCCACAATCGATACGGTGTCAACTATGGTACGGATGTCAGTCGGTCCTGTAGATGCACTTGGAAGTGTACCGCGTAGCGTCGCAGGAGCGTAAACATCACCTGGACGGTCCTTTCCAGTAATGAGGAACTCAAACATATAGGGCAGG"
    
    • What is the complement of this strand of DNA?
    • What is the reverse complement of this strand of DNA?
      comp = ["A" if bp=="T" else "T" if bp=="A" else "G" if bp=="C" else "C" for bp in seq]
      ''.join(comp)
      comp.reverse()
      ''.join(comp) 
      

      OR without list comprehension is a bit cleaner

      comp_dict = {"A":"T", "T":"A", "G":"C", "C":"G"}
      comp = []
      for bp in seq:
          comp.append(comp_dict[bp])
      ''.join(comp)
      comp.reverse()
      ''.join(comp) 
      

      OR with if statements instead

      comp = []
      for bp in seq:
          if bp == "T":
             comp.append("A")
          elif bp == "A":
             comp.append("T")
          elif bp == "G":
             comp.append("C")
          else: 
             comp.append("G")
      ''.join(comp)
      comp.reverse()
      ''.join(comp) 
      
  2. Use the below to perform the following tasks:

    table = { 
         'AUA':'I', 'AUC':'I', 'AUU':'I', 'AUG':'M',
         'ACA':'T', 'ACC':'T', 'ACG':'T', 'ACU':'T',
         'AAC':'N', 'AAU':'N', 'AAA':'K', 'AAG':'K',
         'AGC':'S', 'AGU':'S', 'AGA':'R', 'AGG':'R',
         'CUA':'L', 'CUC':'L', 'CUG':'L', 'CUU':'L',
         'CCA':'P', 'CCC':'P', 'CCG':'P', 'CCU':'P',
         'CAC':'H', 'CAU':'H', 'CAA':'Q', 'CAG':'Q',
         'CGA':'R', 'CGC':'R', 'CGG':'R', 'CGU':'R',
         'GUA':'V', 'GUC':'V', 'GUG':'V', 'GUU':'V',
         'GCA':'A', 'GCC':'A', 'GCG':'A', 'GCU':'A',
         'GAC':'D', 'GAU':'D', 'GAA':'E', 'GAG':'E',
         'GGA':'G', 'GGC':'G', 'GGG':'G', 'GGU':'G',
         'UCA':'S', 'UCC':'S', 'UCG':'S', 'UCU':'S',
         'UUC':'F', 'UUU':'F', 'UUA':'L', 'UUG':'L',
         'UAC':'Y', 'UAU':'Y', 'UAA':'Stop', 'UAG':'Stop',
         'UGC':'C', 'UGU':'C', 'UGA':'Stop', 'UGG':'W',
     }
    rna = "AUGAGCGCGAGUCUUGGUCUCGCAUGGCCCGUGGUCGGGACAGCUGCACAUGGGGGACCAGUUAUCCUCAGUAUAACUUUCUGCACCGAGACAAUGACUCAGUUACAGCGUAAUUUCAUUCUCCCUUCGCCAAAUGAUCCCUGGCUAAGUAACCAAAUUAAGCCCAAGCCUUGGACCGGCAGGAGUGAGCUGCUCUUUCAUAGGGGCAAAUCCAGCACGCCGCGGACGUCGAAGCCGUUGAAUCCACCACCGACGGUAUCGGGCUCCGGCAAUACACCAAUAAUUCCCUGCUUAACCCGUCCGUCUACUUCAGCUGUGACCCCAACGCGUGAUGACUGGCUGCGCCACGAGCACCCAGGACUCCGGAACCUUGAGCAUGGAAUGAUAAAAUUGGGCUCAGCCUGCUCAGUUCAAAGGGGCCCUUCAAGCUUACGCCCGAUCAGACAUUACAGCACUUCGAGGGAUCGUAUAAGAGCUUGUCUCCAUAACACGACCCGAUGGGGCCAACAGACACUACUUCGAACUGCCGGGGGUCGAGAAUUAAUUCUACUGACUCCGUCCAGACACUGGGGGUUCCAAACUCAUUGGCUUCUAUGUGCGCCAGAGUCGUGGUGUAGGAUUUUUACUCCAUCUAUGAUGCUCGCUCUAGCACCACGCUUAUCCUACGUUAAAGCGUCAACUUCUGCUCUCACCCGCUUCUAUAGAACCAAUAGGAUGACCUGUUACCUUAAUAUAUUCUGCGCAGGGGGUGAUAAGAGACCGAAGAGAUAUCAGACCCUUUAUCAGACCUUGGCGCCCGCUAGCGCGACAUGUAAUACCCCGAUCGGCCCAAUACAUUCGCCUCCCCACGUGGGCCCUCCCCGUGCCACUCACCCCCACCCCCUGUUAUUACAGCUCCGAUUCUGCUAUUUUACCAUGCUCCAUCCCUUUAGAACAUCCUGGCGAACCAGCUAUUCUGUUCCUCUGGGGACAAGUCACGGAGUUCUCCCCCCGACCCCAGCCGCGUUUGCCCGUAUUACGAUUAGGUAUCGUCGCUUGAUGCUCAUAUGCCACAGGUCCACGAAAAGUCGCCUAUUUCCAAUCAGAGACCCGAAGGCUGUACGUUGGUUUAUGACAAUGAUCUUGAGCAUCGUUUCGAGCGCAGAGACUGGAGAUUUCCCUUGUCUAGACCAGAUGACGGGCGAUACGAUACCCUGGUUCAUGAAUGCAAAACUCGUUUCGACCUUUCUAUGGAGAUCGACUGGAGAAGCGGGCGUAUUCUGUAAAACUCGCAAGACCUACUACUUGACUCAGUUGAGCAUAUAUCGCAUCAGCCGGAUUGGUCCACAGAAAGGUUGUAAACCCACAGCGCUCUUAGAAGGUAAAAGGCAUUACCUUAUCCUAUCAAGACCACAUGUUCUAUUUGCCAACCGAACACUCUGCAAAAGCCGUAAGUCGAGCGCUUUGAGGGGAUCAGGGUGCUCCCUUGUCGACUAUGCGGCGGCGACGUUCAUCGUGGACUGGAGUCUCUAUCGAGUUGUCGAGCAUGAGGAGUUUUUUGUCCCUCUCAAGUUGUACGGCGUAGAUUGGUCGGUGGAUCAACUCCGUGUCACUCGCCCCCCCACAGAAGGGAUUCGGCAAGUGUGUUAUAUGGGAAAUGUACAAGAAGAUAUGUCAGUCGUCGACCUAUCUACAAAGAACCGUAGGCGUCGUGAAUCUUCUAACACGCGGCCAAGAAACGAAGAUCUCGUGGUUGUUCACUCUCCGGGCCUAACUACAGACAGUUCCGGACCUUGUCCUCAUUGUACGGUGAUACCGUCAUCGCACACUCACCGUUUUUCCGCGCCAGGGAGACCGGGCGAUUCGCGCACAGCAUAUGGCAAGGUGAGAUGUCUGUCUCACCGCCCUCGCAUCACUGACCCUAGCGUAUUCUCACCGAAAUCACGUCGUUGCUUAAGGCUAGGUCAUUCAUCCACGCGACGGGUACCAGCAAGUCCUCGCAGCGCUGCCGUGCCACACCAAACACGUGUGGACGUACCAAGGGGAGGCCAAAGGCAUCAGAGCGGUCGGUCCCUCGUUUACAUUUAUCAGAAUAAUCCAAACGCAAUGCUUCUUGCAGGCCAAUGGCGAUGCCCCGCGUAUGCCGCACUAUUGCAAGCCCCCUUUCCGUGGUGGAUAAGCUUGUCUUGUUGGUUACUCAAUGUAACUUCACAUCCAUUGCUUGCCGUUAUUAAUAGUACUUACCUGGGUGUCUUUUAUAUGACGCCUCGAAACCAGUCAAGUCGGGCAGGCGAGAGACUACGUAGAUAUCCAGUGGUUACUGCUGUGCUUCCCAGGUUAAUCUUGAUUGUUAUCCGAUUUUCAGACUUCUGCGCCGCAAAGUGGAAGAGUAAGUGGCUCUAUUCUGAUGGGUGUCAAUUGGCAGAAGCAAAAGAGGGCCACAGCGCCGAUCGUUCCCCAACAAGGUCGAUGUUCAAAGUGCAAAAGGCGGAUUUGGUCACUCUGGGAGAAAACGGCGCCGUAAGUCUUUGGUCAUUUGGGCCCAGUGGCAACCGGUACAUAACCACUGGCUAUCGUAUCAACGCGACUUGCACACGGGCUAGUAGUGAGAGUGCGCACCUCUCGAGGGUGGGCGGGCUUCUGAAGGGGUGUGAUAACUGUGAGCUUCCCUCAUGCUUGUACGUUCUCUUACCCUCUCUUUCUUUGUGUCAUCAAUUUCCCAGUGAAUGGCAUGAGCGCAUACAAUCCAAGUGGUUACGCACUCCCAAAACCCCGGAUGAACUAGAACCGCCGGAGCCUCAAAUUGGUGUGGGCAGCACGACGCGCCCACUAGAGAUUUGUACUUUUAUCCGGAAGCAUCGUAUAGGAACCGCAUUAUGGAUUGUGUUCCGUGAACCCUACCAGGUAGCACGGCUUAACUCGUGUUCCGCGCCCGAAGCUCUAAACGAGUUGCCACAUCCUCUCUUCAAACUUGAAAGUAUAUUAGAAGCAGCCCUGCAAACACGCUCAAGCACGGAACAGUAUCUUAUACGAAGUAGCGGAUACGGAGAUAUUCGAUCUGGCCACGCGAUUUCCAUCGGACCAGACCUGCAAAAUGUGGAAGUGGUGUCCGUAUGGUAUCUCCAACGACAGUGGCGCCACAAGAGGUCAGGCUCUCCGCUUAGACUGCAUUCUUCAUUUCUAUGGAAGAAUGGGUUGGCUUCUCCAGCGCAGUCACUUAUAACAGCCUCGUGUGAGGAAGCUAAGAGGUGUCCUCAUAAGCGGUCGCACCUAUUGGAUCUUAAGCUAUAUAAGUUGACCGCGAAGGCGGACUUGCCACCGUUACCAGCAUCGCAACCGAUUAUACCGCCUGUCGACGACGGGUACACCCAUGUCCCUUGUCGCUCUUAUCAGGGGCGUGGUCGAGGACGGAACGUAGCGGUUGGUCCCUUCCCCAGUACCGGAAACCCUACUCUGCAUCAGUCCGUUCACUCACACACUCGAUCUCACGUACUCCCAACCCAAGGAUGUUGGACGGGGUCUAAAGGCACUAUCGCUCCGGCUAUGCAGUUCAUAGGGUUGAGGCGUCAUCAUGCGUUCAGGGUGCACCAACACCCUAAAUCUUCUUACAGCCAUUGCGGUUCCUCGCUAGAGCCAGGGUACGUCGUAGGCGUGCCGAUAAUUGGGUCAGUCCUGGCCGCGGUUACUAGAGGGAAUGGAAGAGAGUAUAAAAAAGGUAAUGCCUGGUCGCGUUACCUCACGACCGAACGUUUGGCGGUGCGAAUAGCGAGGGAUUCAGGGGCUAUUCCCGUGGCGAGGGUAAGAGUUACACUUAACGGAGAGGAUCGAAUUCGGAUCCCCUUGGCCUUACGCGUGUAUGCCUUUCCGCCUGUCCCAUCAGCCAAGUUACCGUCCAAAACGGAAUCCACCUAUAGCUUUGCCAUGCGCUGUACUGACUGUUUGUGCGUAGUGAGCUUUACCGGAUCACUGUGGUGGAGGGCCCGACUAUCCAAGAGUGGCGACUAUUUCCUGCAUUCCCCUCAGAAUGCACGGUCGGGGACCUUGACCGGUCAUUCCCGCAUUCUGUGGGUUCUUACGUCUUUACGAAAUCCCCUCUUCACGGUCGUUCCCGGCCAAUUCAACAGUAUUAAAGGACGACCGACCGAGCACCUAUCAGGGGUAAGGUCUAGACCACUCCUGGAACUCGCACGCAAGACGCCAGGCCAAGGGAUUUUCUCAUUAGGCCCUUACGGCGGGCCCCAAUAUUUGAAUGCACUGAAAUCCUCAUUAGCCAGCAGACAACAUCUCACCGUUAACGCGGGGCAGGUGGGUGAUCGAGCUCUAACCUACGGUAUAGCUGAGUCGCACUCCAAAAGUAGCAGCAACUUUCAGUCGCUCCGGCGGCACGAACGGGAUGGCUCAAAAGACAUUCUUACACCGACGAGCGCAAAUUACAUUCGGAAGCCCACGUGCUGGCUAGUCGCGGGUAAGGUGUGCUAUUUUAGACAAACAAUUCUAGUCCCCGCAGUAAUGCGCACAUCGGCUGUGCGUCCACUAAAACAGAUCUCCUAUGAUCAUUAUUUUUGGGAGUGGAAAUGCUAUUAUAUAUGGCCAACUGGCCCCUGCUCUGUCCCUGACUGGGUACGGGAUGACUCCCCAACUCUGUGGAGAAUCGCUUUUCUAGCUUGCCUACGGAUCUGUUCAAGCGUUUUUCCCGUCAAUAUACGGGGACCCGAUAUAAUACAAAUCUGCAAGGGGUUUGAAAAAGGGCACAUGCAUCAAAAUCACGCGAAGCGCUCACUGAGCGCCAAUAACCGAAACCUAAGGAUACUCUCCGGCGCUGAAAGGCGGGCGAUGAUGGAUUCCCACAGUGGCCUAGCGAGGGUAAGGGCGUUGCGUCGACAUAUGGAAGGUGCGCGAUGCGACGUAAGGGGCGUAACUCAUGGGUCACAAUAUAGCUGCGGUAUACCUCAUCAACUAAUGGAACGGCGUACCUCAGAAUACUGCCUAACGCCGAAGGAGGCCUGCCUUAGUCGGUUCAUUAGAUUUGCCUUACCGGGGAGUCAUAUUCUGCAAGUUACUCCUGAGUUAGGGCCGGUCAUCGUCCCAGAUCAGAACUCUACGAGCAGCACAGGGGCCCUCCGGCCAGCCAACCCAAAUUGGAUCAAACUCGUUGGCGUCGUAGUUUACAAACCUGGAUGUGGAAUGCUCGUUUACUUGUCCGCAAGCGGACAGCAUAUCGGACCUGACAACCGAAACGGAACCAAGAACCGAGCUCAUCGCACACGGGUGUUCUCCUACCUGAUCCUAACUAUAACACGGCACCCGUCUGGACUGUGCUUUCAGGGGGACCCUAGGUCCAUCAAUUUGGCUGCCGACUGGAAGUCGGGUUCUGUGAUCAUCUCAUUGCAGUAUAGGUUCUCGACACGCUCUCCACUGUUAUCAUCAUAUCAGUUUCUUCAAGGCUCAACUCCCGUCAGAGCCGAUGUACUCAGAACCGGCAAGAGGACCUCGGGUGGGUUGUCUCCCAUCGAUCGGCUGGUCACGCCUGGGUCCGGUUGUCCGCCCCCUAUCCGUACGGACAUCCCCGAACGAGACACCGUCGCGUCGACGUCUGGUUUGGUUAUAAACAUUCCAUGUACAGCAUGUGCGACUCGAAGGACUGAUCAUUUGCGUAGUACCAAAAAGUGGGAGUCUGGGUGUUGCCCUGCUGGAAGCCCCCGUUAUCCCCGGUCUGAUAUUUCGUUGCCACGGAAUGUAGAAGUCCUGCAGUUGUGUAUUGCCAUCAAACUCAUUACCCACAAGGCUUGUAUAUGCUAUUUUCUGGCGCCUCUGGUUAGCCUGGAUACAAUUGCUCAAUCAAGGUAUGUUCCAUCGGAGUCGAAGACUCCUGAGGUAAUGUCCCACUCUGACGAUACCCACAGUUUCUCAUCUCUAGUAUCAAACGAAAUAUCCUCUACACUAAAAGCUUCUUACAAAGCAGGUGACUUAUCAUCAGUAGCCUGCUACAUACUGCCCUUAAACCCCAGUCUGAGUUCUGUUGAAGGCUGCACAACCUUCUGUCUAUCGGGGUUACUGCAAGGUGGGGUACACUCUGCAGGGGCACAUCAAGGUAGUAUACUAGAUCGGAUUAGUUCUUGGAGCAUUCAUUUUGCAAUGUGCAUCAAAUCCCUCUAUUGUCAGAUCCACAAGUUUAUAUUCAUGCACCCCGACGCUAAUAGGAAAAACCAAGGUUCUUUGAUCCCAGGUCAUGUUUGGGCGCGCGCUUGGGCCACAAAAUGCUGCAUCUGCUUCUGCGUCGGAGUGGACGGGACUAAUAGCAGGGAGGUGAGCUUGGGUCAAAUGCCUGCCAUGGAACCUAUUGCGUCGAGAGGAGCUACAUCAAGACGCAUAACUCUGUAUUGCAGCAAUGAAGUGAGUGACAUGACGGCGUUUUACUUCUCCGCGAGGAGUCUACCGGUAAUGAAAAACAGACCGCAACUAUGGAGGGAAUUAUGUAAGUCCCCCUCCGAGUUAUACGUAUGCCGGCUCUACGCCGGGGCUAUUCGUCGAAUUCGAAAUUUGGAAUACAACUCUUGUAAGAGGACCCGGAGCGAUUGCAAAGCGUAUGCCCGAGCGAGCAUAAUACGGUGUGCCCCGCUCCCAAUGGUGUAUUCUCACCUUCAGGCGGGGUGGAAUAAACUCUGUGGCCAUUGGAGGCGAGGAGUUUGUGCGGCGAACUGGUGCAUAUAUAGACCCCCCUUAUGCACGUGCACCAAGCCCUUGUGUACCAGGGCCGCUGAUCCAAAGAACAAGGACGAUAGCGGUCUUUCUUUUCUAGGUUACGUUCGGCCGUCCGAAGAAUCCACCUCGGCAGCAGGCAAAGUGCCUGGUACAAAGGUCCGACAUUGUCACUCAAGGCUAUCCACGGAUGAGCAGAUGACUGGCCUAACUGGCAUGCCGCUGCCCUGGUCUGAAGUACACUCCAGGAGUUUUCGUAGUCAUGUAGGGUGGCCCCAGUCAAGGGCUAUGCCGCUGCGACUCCCCCUCACUAGCGGCGAAGGUGGUCAUGUCUUAUCUACUCGCUCUGCCAGACCACAGAAUAGGCUUGGUAUUUCCCACGUCCCCGCGAUCUUACCACAUGUGUCCAAACAGGAAGUCCGUUCCAAGCAGUCAUUUGCUCGAGGGAUGGCUAAAAAAAUAGUCGGAGUUGGCGGGAAAGCUAACGCCCGGCAUUCACUCGCGUGCAUGCUGGAUUUACACCUUUGUGCGGUCUAUUGCGACAAAGCCGAUUCUUUGCUAAUCGCAAAGAAAGCCGAUGAACUAACGCAGUACAUCACAGGGCAAAGAAAGCCUGACUCCGUGAAGGCUUGUAGCGCAACAUCUAUCCUAGAUGAAAGACCGAUUAAAGCAUUACACAGCUACGGACACAGCGCGGUCCCUCGCAUACCCGCCUCCAACAUGUCCGCCGGCGCUUCUAGACACUUGGGCUCACCGUGCCGCAUUGCGUCUUCAAUUUUGAGUAGUCCAUUCUAUGCGAGAAAUUAUCUGAGGGUUGCCACUAACCCAUUCCAAGUGAGGUCCCGCUCCCUCAAAACACCAAUAUUCUCCAAUCCCCUUGAUACUAUGGGGCGACUUUGCCGGCAGAACCCACCCGUCUGCACUUCCGGGGAAGUAAUUAGUACGGAGGUUCGGAGCUCCUUCCUGGCGAUGCAUACUCUCAUUUCGCUCCAGCUCUCUCAACCGUUUCACUACGGGCAGAACAACGGUCACGAUGUAAAUGUGCCCCCAGCAUCUGAAGCACAUUUUACGCGCAGCGUAGAAUUCAGGACGUGGUGCCGCUUGCCCUGUGUAUUCAAUGAGGUAUUAAAGCAAAACUAUCAUUGUUUCGAUCAUUUACGCUCGCGCCUCUGGACUGAAGUCCGACACUGUGAGGCCAUCAACAUCCGGAUAUACUUUUAUCAUAUUAAGGGCACUCCCGUGCCGCGCUUACAUUAUCAUUCGAUUACCACCCUCUUUGAGACACAGAAAUUUGUGAGACUGCUCCCAUCCCUUUGCGGUCAUUUUUGUGCGAGCUUCGAGUUGCACAGUUAUUGUUUACGUUGCAGGUCCACCUGGAUUGUAGGUAAGAGUCGAGAGAUGGGGUUGAUGCGAGCCGCCCAUAACACAUCUUGCGGUGGCAUGCCCAUCCGAACGCAGUAUCGAGGGAGCGUUGUACUGCCAGGCCGGUCCGACAUGUGGAUUUUACACCUCUGCACUAUGCGGAAGAAUGAAGGCACCACCGGCUUCUUCUGGGCAUACUUUAUUCUAGUCUGUGGCAUAUGCUUUUCAUCCAUAGAAGCAACCCGCACAAUGAGAAGACGAGACCGCAGAAUACGUUUAAGGGUGUUAGCACUGAAGCCUUCGGAACUGGUAACUGAUAAGCUACACUCUCAAUCACCGACUUCAGAUCUGUCUUGUAAACAGAAUCCGAUAGAUCCUACAGCCCUGGCCUCCUUCAGGACUCGCGUCACCCUCCGUAAGGCCGCUGCAAGCUGCAUUAUGGGCCCCUUGGAGCAGGGAAGUAAGGCCCAGAUUGCCAGGUCCGUGCUUAUAAAAUUUGGUGGGCCUUUCAAGCAAUUAACGAUGUGUAACAUCUCGGCCAGUAGUAGGGACCAGUGCGCCGGACCCACGUACGCCUUAGUUGGGACUCAUGGAGAGGUGCAUGGACAUGUGCUUGUCAGACACGAGCAACCCAGACAUAUGUUCGCUUUUACUUUACUGUGGUCCCAUCGAGUAUACUUAGGAAGUCCUCGUCGUUUUUACAUGCUCUGCCGGCGCCGUGCGCCGCUAUCUUUUGUAGAACGCAGAGUACUGCAAACAGCAGUUGAGUACUCACCAACCGGCGCACUACUCUCAUGGUUGGGCCGGUCACUACUCUGGCUCCCACACACCGCUAUGGGUGUAGGGAAGCGAUUUGCCACAAACAGAGCGGAGACUUUAAUUCAUUCAGGGUAUCCCGCAGACCGAGUCCAGUCUGGCAAUGGUCAUAUUAGUAUUAGCGAUUCCAAUGGGUCCCUUGCGCCUAGUGGUAGGGAUUCUAAUCUAUCAUGCAAACGGUUUUCUACGCCUCCCUCGGUUUGCUUCGCCCUAUCAAGUUAUGGAACUGCAAGUUACAAAUUGACAAUAGCUCGACCCACAGUAAUCGCCGGCACUCCUAGAUCACUUGUAGCUAAAGGUAGUUACUUCAUCCCCAGGGCGUGGGAAGUCGGAACUGCGAUGGGAUGUAGAUCAACCUACUGCGAGGACGGAAGCUUCAAGGUUCGUCAACAUAUGGGUGAGCUGGCGUUCACGCCAAUGGCGUCGAAGCUUGCCCCGCGACUCCAUCAUUUGUAUCUAGGGAUCUCGACGUAG"
    

    What is the protein encoded? (the first 3 bps are a start codon and the last 3 are a stop codon)

    protein = []
    position = 0
    while position < len(rna):
        protein.append(table[rna[position:position+3]])
        position += 3
    print (''.join(protein).replace('Stop', '')) 
    
  3. Use the below to perform the following tasks:
    str1 = "GCAGTGTCCACGGGATGTAAACCCCGTTTGGGACCTGCCAGGCTTCCGCACTACAGTGTGTTGCGCGTACTAGACCTTATGCCCATATGAACCTAGCTCGGGCTACTTGGATCGAGGAACTCGATACTTCCCCTACTGCAGCCCAACAATATCGTAGAAGGCAATCGAATCGCCTAAAAATTTATCGCCGCATTTTACGTATATGCCGGCTGTGGCGTATAGCAGAAAACCGCTCCTCGCAAAGTCTGGAGTATTGGATGAACAGTACCCTGGATAGAGTTAAAGGGCCAGAAAAAGCCCAAGACCATGGCTTACCAAGTTCCGTCCTATTCAATATAACCTCAGGCAGCTATGCGGAACTTTCAATAAAATACCAAGACATGTCTCTGTCTGCTCCAGAGCGAGTGGATAGGCCCGATTATTTATGTGTCGGAGGACCATGGTTCGCAATTACACTCAAGCGAGCGATTTTTTAACTGTCCGTTCACCTCACGGACCTCGGGAGGATACAGAATCGGGTGGTAATATAACGAACATAACCGTTTGTGATCCTGGAAATACACAAGTTCATAAAAAGTGTGACCCGAAAGTGGTTTCACTATATAGGTATTCCCCCTAGTATGGCAATCTCGGTCAGTCAGCACTGGCCGATGGCTCAAGGCAACATTAGGGGTCGGCGAACGCAGTTGCTCCAAATACAAGTGTGGTTGCAAGTAGCATACGGCACACCGCGGTCTGGGTATGAGCCATACGTGTGATTTTGGTCAATTACCTAAGCAGCTTGTAGTCATAGCTATCTTACCGATTAGCGAACCCAATGAAGAGTTCCAGCATTCGCGAATGGGGGGTACAAGTAGGCTCTCGGCCGCCTACTAGACCGGGCGGTAAGGTGGCGTGTAGACCAAATCCTTA"
    str2 = "AACTAATCAAACTGGCTGGGATTCCTTCGGCTAAATACAGTCCCATCCAAGTATAGTCTGTTATTTGTGCTACGCCTCAAGCCTCTGATATGCGGTACCCTTGGATTTGGAGTGTAGGCCACCAGATCTGTTTTGATCCGGCCGGATCGTATCGAAAGATACTGAAGGAGCTCATAAACATTGAGCGTGTCAGAGTGGCCGAACGCCGGCGGTCGCTACAACAACAGGTCGTCGACTCCCTAACGCGCTGACATAGGTCTAATAGTCACGTGTATGGTTCTATCTGTGGAACAAAAACTAAAGCTCTTGGCTTAAGATGATCGTACCTGACGGATGTAAGATTCGACAACTAAACCGTTTTATCGTTAAAACATGCCGACCTGACTGTTGCTTCTACAGAGCGCGTTCAAAATCGTGCCTATCCTACAGTCGTACGCTGAGTGAACGCTCTTAAAATCTAGGAAGAGTAATCTTGCATATCCGAACCGAGCTTAGCCCTCCAATGAATAGAAGTACGCGCCTTTGTATTCCGAACATTTTCGTCTGCGATACTGGAAACATGTATGATCGTAAACACTGGGACCCGTCATTAGGTCTTTGATAGTATCGGTGCCTATATGGTGTCATTCTTATTCGGGCTAAACGAACGCTTGGCTCTGCGAATTGTAACCTGAGCTGAGGCGCACTCATAACACAACAAACGGAGCCTGGATATATCATAATGATTCTCCTGGGATGGGTACACGAGGCACCTTAGACTGCGCATTGCTCTAAGCACACCTAAGAGCCTGGTCAAGAACAGCTATTTGCGTACGCAAAGATGCGTTGATTCAATGGGGAACGCCGGGATCTTCTAGACTCGCAGCGCCCTCACTGCCGGGTGCGTAGCCTTGCGCCTGTCCATGATCACCA"
    
    • How many matches are there between the two strings?
    • How many mismatches are there between the two strings?
    • What is the GC content of the first string?
    • What is the GC content of the second string?
      sum([1 if x==y else 0 for x,y in zip(str1,str2)])
      sum([1 if x!=y else 0 for x,y in zip(str1,str2)])
      len(str1) - sum([1 if x==y else 0 for x,y in zip(str1,str2)]) # just to double check
      sum([1 if x=="G" or x=="C" else 0 for x in str1])/len(str1)
      sum([1 if x=="G" or x=="C" else 0 for x in str2])/len(str2)
      # OR by using count
      (str1.count("G")+str1.count("C"))/len(str1)
      (str1.count("G")+str1.count("C"))/len(str1)
      
  4. If you have any free time after these tasks take a moment to check out Rosalind. It is a great platform for continuing to challenge your python and bioinformatics!