2. Genome analyses


Progress in sequencing technology

Next Generation Sequencing (NGS)

Big Sequencing projects enabled by the emergence of NGS’s

From 1k to 10K genomes and now more.

Sequence archives in the INSDC

The International Nucleotide Sequence Database Collaboration (INSDC) is a long-standing foundational initiative that operates between DDBJ, EMBL-EBI and NCBI. INSDC covers the spectrum of data raw reads, though alignments and assemblies to functional annotation, enriched with contextual information relating to samples and experimental configurations. (The INSDC site)

Next generation reads Sequence Read Archive

European Nucleotide

Archive (ENA)

Sequence Read Archive
Capillary reads Trace Archive Trace Archive
Annotated sequences DDBJ GenBank
Samples BioSample BioSample
Studies BioProject BioProject
      • DDBJ – DNA Data Bank of Japan
      • ENA – European Nucleotide Archive
      • GenBank – the NIH genetic sequence database


NCBI Search

NCBI’s comprehensive search system which can find anything related to sequences and articles.

      1. Search and open the Search NCBI.
      2. See what kind of databases’ complex.
      3. Find out an accession J00264 from Nucleotide DB for full length cDNA of interleukin-2 (hint: “interleukin-2” AND human AND cDNA … etc.)
      4. Check the description of each FEATURE.
      • Practice: Search papers in Pubmed as well.
      • See: [Movie] My NCBI by NCBI and take advantage of it.


A database covering genome projects around the world

See: http://www.genomesonline.org

    • Studies – Metagenomic / Non-Metagenomic
    • Biosamples – Classification / Ecosystems (Host-associated, Engineered, Environmental)
    • Sequencing Projects: Complete Projects / Permanent Drafts / Incomplete Projects / Targeted Projects
    • Analysis Projects: Genome Analysis / Metagenome Analysis / etc.
    • Practice: How many genomes of E. coli O157 are determined? (hint: “Search” function)
    • Practice: How many E. coli genome projects exist?

NCBI Taxonomy

When you want to search biological taxonomy and sequence information

Find and use genome sequence data

DDBJ’s NGS archive and analysis pipeline