Central dogma: DNA – RNA – Protein
DNA: ATGGGAGTTCTG... RNA: AUGGGAGUUCUG...
RNA: AUGGGAGUUCUG... PRO: M G V L ...
- Three bases (= codon) corresponds to one amino acid
- Amino acid sequence = protein
- Protein is a major substance constituting living body.
- Many of the proteins are “enzymes”, which act as catalysts to convert substances within a cell.
- Enzymes are responsible for “metabolism”.
- The sequence of the protein is folded into a certain form, depending on the nature of the amino acid.
- There are 20 types of amino acids used in vivo.
A Ala Alanine C Cys Cystein D Asp Aspartate E Glu Glutamate F Phe Phenylalanine G Gly Glycine H His Histidine I Ile Isoleucine K Lys Lysine L Leu Leucine M Met Methyonine N Asn Asparagine P Pro Proline Q Gln Glutamine R Arg Arginine S Ser Serine T Thr Threonine V Val Valine W Typ Tryptophan Y Tyr Tyrosine
Genome annotation
- Structural annotation
Annotation describing the structure of the gene - Functional annotation
Annotation describing gene function
- Practice: http://www.ncbi.nlm.nih.gov/entrez/viewer.fcgi?db=nuccore&id=4732164 (F10A2 entry)
See how structure / functional annotations as FEATURES are described.
Similarity search
Conventional method to predict the structure of genes
- Sequence regions analogous to known genes are genes (probably)
- Ortholog – are genes in different species that evolved from a common ancestral gene by speciation. Normally, orthologs retain the same function in the course of evolution. Identification of orthologs is critical for reliable prediction of gene function in newly sequenced genomes.
- Paralog – are genes related by duplication within a genome. Orthologs retain the same function in the course of evolution, whereas paralogs evolve new functions, even if these are related to the original one.
Basic Local Alignment Search Tool (BLAST)
- Why BLAST is so popular? – Good balance of sensitivity and speed.
- See: [Movie] Webinar: A Practical Guide to NCBI BLAST by NCBI
- program option for BLAST
program Query DB summary BLASTN nucleotide nucleotide No conversion is done on the query or database BLASTP protein protein No conversion is done on the query or database BLASTX nucleotide protein All six reading frames are translated on the query and used to search the database TBLASTN protein nucleotide All six frames are translated in the database and searched with the protein sequence TBLASTX nucleotide nucleotide All six frames are translated on the query and on the database
Training
NCBI BLAST
- Open https://blast.ncbi.nlm.nih.gov/Blast.cgi.
- Select “protein BLAST”.
- Copy and Paste the following sequence to the window (cmd-C then cmd-V).
>opsin Rh2(Drosophila melanogaster) MERSHLPETPFDLAHSGPRFQAQSSGNGSVLDNVLPDMAHLVNPYWSRFAPMDPMMSKIL GLFTLAIMIISCCGNGVVVYIFGGTKSLRTPANLLVLNLAFSDFCMMASQSPVMIINFYY ETWVLGPLWCDIYAGCGSLFGCVSIWSMCMIAFDRYNVIVKGINGTPMTIKTSIMKILFI WMMAVFWTVMPLIGWSAYVPEGNLTACSIDYMTRMWNPRSYLITYSLFVYYTPLFLICYS YWFIIAAVAAHEKAMREQAKKMNVKSLRSSEDCDKSAEGKLAKVALTTISLWFMAWTPYL VICYFGLFKIDGLTPLTTIWGATFAKTSAVYNPIVYGISHPKYRIVLKEKCPMCVFGNTD EPKPDAPASDTETTSEADSKA
- Choose Search Set as “UniProtKB/Swiss-Prot(swissprot)”.
- Click the “BLAST” button in the lower left to execute.
- First, “Conserved domains” is shown (even after BLAST results are returned, they can be viewed from “Show Conserved Domains” of “Graphic Summary”).
- Click “7tmA_photoreceptros_insect” area in “Conserved domains” image then:
- 7tmA_photoreceptors_insect cd15079 insect photoreceptors R1-R6 and similar proteins
- 7tm_1 pfam00001 7 transmembrane receptor (rhodopsin family)
- PHA03087 PHA03087 G protein-coupled chemokine receptor-like protein
were found as Conserved Domain (seven transmembrane receptors)
- When the result is shown, look at “Graphic Summary” & “Descriptions.”
- If “Related Information Gene-associated gene details” link was present in the “Alignment” panel, it will give you the information of the gene in the integration database by NCBI gene.
- From “Edit and Resubmit” link at the top of the result page, You can narrow down the results by species’ name, keywords etc.
- Practice: Put “cat family (taxid: 9681)” into “Organism” of “Choose Search Set”, Search again for similar genes of “cat family”.
-
- See: [Movie] BLAST Results: Expect Values, part 1 by NCBI
- See: [Movie] BLAST Results: Expect Values, part 2 by NCBI
- See: [Movie] New BLAST Databases Give Cleaner Results by NCBI
- See: [Movie] Introducing Magic-BLAST, NCBI’s Next-Gen Sequence Alignment Program by NCBI
- See: [Movie] Webinar: Introducing the Multiple Sequence Alignment Viewer by NCBI
- Advanced: [Movie] Webinar: The Statistics of Local Pairwise Sequence Alignment, Part 1 by NCBI
- Advanced: [Movie] Webinar: The Statistics of Local Pairwise Sequence Alignment, Part 2 by NCBI
DDBJ BLAST
Compared with NCBI’s BLAST output is simpler, but usually DDBJ’s BLAST is faster.
-
-
- Open http://blast.ddbj.nig.ac.jp/blastp?lang=en (for BLASTP).
- Copy and Paste the following sequence to the window (cmd-C then cmd-V).
>opsin Rh2(Drosophila melanogaster) MERSHLPETPFDLAHSGPRFQAQSSGNGSVLDNVLPDMAHLVNPYWSRFAPMDPMMSKIL GLFTLAIMIISCCGNGVVVYIFGGTKSLRTPANLLVLNLAFSDFCMMASQSPVMIINFYY ETWVLGPLWCDIYAGCGSLFGCVSIWSMCMIAFDRYNVIVKGINGTPMTIKTSIMKILFI WMMAVFWTVMPLIGWSAYVPEGNLTACSIDYMTRMWNPRSYLITYSLFVYYTPLFLICYS YWFIIAAVAAHEKAMREQAKKMNVKSLRSSEDCDKSAEGKLAKVALTTISLWFMAWTPYL VICYFGLFKIDGLTPLTTIWGATFAKTSAVYNPIVYGISHPKYRIVLKEKCPMCVFGNTD EPKPDAPASDTETTSEADSKA
- Choose “UniProt (Swiss-Prot)” from Data Sets.
- Click the “Send to BLAST” button to execute.
-
* Also see:
- Publication: Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ. Basic local alignment search tool. J Mol Biol. 1990 Oct 5;215(3):403–10.
- Book: The BLAST Sequence Analysis Tool in The NCBI Handbook [Internet]. 2nd edition. (2013)
GGRNA
Ultra-fast Google-like full text search engine for genes and transcripts. The web server accepts arbitrary words and phrases, such as gene names, IDs, gene descriptions, annotations of gene and even nucleotide/amino acid sequences through one simple search box, and quickly returns relevant RefSeq transcripts.
-
-
- See: [MOVIE] http://togotv.dbcls.jp/20120215.html by DBCLS (DOI: 10.7875/togotv.2012.012)
- Try examples on https://ggrna.dbcls.jp/en/.
- Publication: Naito Y and Bono H. (2012) Nucleic Acids Research, Volume 40, Issue W1, Pages W592–W596
-