Wednesday, February 21, 2024

Try your hand at some simple bioinformatics

image: a 7-transmembrane protein from Wikimedia CommonsYou can try some simple bioinformatics yourself if you like. Here are two things to try:
 
(1) Get a gene sequence and see what predictions can be made from it. I recommend starting by thinking of a protein or gene of interest — find a disease gene in a news article for example. Then get the protein sequence. You can find sequences here -- type in the name of any protein or gene (here for example is a resident ER protein, i.e. a protein that stays in the ER). Click on a specific protein's name and the sequence will be at the bottom of the next page. Click "FASTA" near the top of this page to get the sequence in a simpler format, and copy and paste that sequence into any of the programs below to try them out.

To start with, you might try a program that can predict the arrangement of transmembrane proteins based on sequence. The program, described in more detail than most people will want to see here, uses rules deduced by biologists, and was then fine-tuned with an artificial intelligence algorithm, training the program using transmembrane proteins with known orientations.  There is a similar program that can predict where in a cell a protein will end up.  

Then you can try other predictions that can be made based solely on sequence by searching online for other predictor tools.

(2) See how little a well-conserved protein has changed through evolution. Let's look at the current versions of beta-tubulin from yeast and human, and see how similar they are. Try this: copy the text of the yeast beta-tubulin sequence from here (from MREI... to the end of the protein's sequence), then paste it into the Enter Query Sequence box here, and next to Organism, type "human" and then select human from the dropdown menu that appears (2nd or 3rd one down - 'human' or 'humans'). Then click BLAST at the bottom. The page will automatically update for seconds or minutes, depending on how busy servers are. When it's done, you'll see the results in a section headed "Sequences producing significant alignments". Click on the top link ("... tubulin.... [Homo sapiens]") in that long list of links. Now you'll see the sequence you queried (the yeast beta-tubulin) and the subject sequence - the closest protein sequence that it found among all known human proteins. In between is a list of identical amino acids, and + signs for similar amino acids. Tubulin, actin, and histones are remarkably well conserved proteins. If you try the exercise with other kinds of proteins, you'll see that only parts of them are well conserved across diverse organisms, or that some don't exist in certain organisms.

More about Bioinformatics at Wikipedia.

(image: a 7-transmembrane protein from Wikimedia Commons)