Introduction to Bioinformatics for Biologists

I have seen that Bioinformatics and Computational biology has been embraced by people from various fields. Being a biochemistry graduate, I would like to write for guys from a biology background with an interest in bioinformatics.

For everyone else…click here!

One of the main things to know is algorithms. These are set of instructions that one gives a computer to perform a certain duty. For instance, if we wanted to locate and identity Ori in the lab, you would cut the sequence at different positions at different times until the replication doesn’t occur then you will know you cut the Ori. This is fun but tedious. It is also expensive in terms of the volume of reagents used.

Bioinformatics tools have made life easier by finding the Ori using computers instead of benchwork. To do this, we need to know how to instruct the computer on how to find the Ori by use of algorithms.

When given a whole sequence of my favorite bacterial genome, Vibrio Cholerae, It is hard to figure out where the Ori could be without knowing things like;

To initiate the replication, DNA polymerase needs a protein called DnaA which binds to short (typically 9 nucleotides long) segments within the replication origin known as the DnaA box which is the hidden message. –Bioinformatics algorithms

  • and that this DnaA box appears the most frequently is, within the Ori so that incase one Mutates there is a backup. We also know the more the attachment sequence in the Ori, the higher the likelihood of the replication initiation protein, DnaA, attaching to the ‘hidden message’.

In short, we are looking for a K-mer of 9 sequences long with k being the length of the sequence: 9-mer. Since we are not programmers, yet, it’s easier to communicate using pseudocodes -human-readable instructions like cooking recipes. This will later be converted to computer-readable instructions using programming languages.

In bioinformatics, the most common languages used R and python

If using python-which is the language of choice in the course I will be referring to- download the 2.X version( I am using 2.7.18) instead of 3 which does not have a lot of bioinformatics-related libraries.

If using R, you might want to download R studio to make your work easier.

To start us off, follow the algorithm below;

  1. Do this course if you do not have a programming background. Biology meets programming: Bioinformatics for beginners. Once you sign up/in, enroll in the course and choose the FREE option without the certificate/Audit. However, if you would like some certificates on your CV, Do it!
  2. Use a computer to get access to the Stepik interactive learning
  3. Sign up/in for CodeAcademy and Audit for free or get a certificate on python – and R if you wish.
  4. Get the Top Bioinformatics Algorithms book for Free or buy it on Amazon.
  5. To keep a track of the exercises done and be part of a community of bioinformaticians, register for Rosalind where you will get to do the exercises there instead of Stepik.

The funny thing is all these materials are provided by two guys from the University of California, San Diego.

If you still want to learn with me, subscribe down below/sidebar to get summaries.

  1. Python Syntax
  2. How is python and DNA sequences connected?