1. Search and Retrieve


Fundamentals of search

example: www.google.com

    • google
      • “Crawling robot” checks a website then index (example: logs for googlebot visits for this website)
        crawl-66-249-65-51.googlebot.com - - [24/Dec/2006:14:39:41 +0900] "GET /~yn/jp/?plugin=attach&pcmd=open&file=logo_jp.png&refer=LogoMark HTTP/1.1" 200 86078
        crawl-66-249-65-51.googlebot.com - - [24/Dec/2006:15:21:12 +0900] "GET /~yn/jp/?plugin=attach&pcmd=open&file=LogoAndMark_mono.png&refer=LogoMark HTTP/1.1" 200 73543
        crawl-66-249-65-51.googlebot.com - - [24/Dec/2006:15:27:35 +0900] "GET /~yn/jp/?cmd=backup&page=seikawakate2005 HTTP/1.1" 200 6828
        crawl-66-249-65-51.googlebot.com - - [24/Dec/2006:16:02:38 +0900] "GET /~yn/jp/?plugin=attach&pcmd=open&file=sample_namecard.ppt&refer=LogoMark HTTP/1.1" 200 278528
        crawl-66-249-65-51.googlebot.com - - [24/Dec/2006:16:06:19 +0900] "GET /~yn/jp/?Publications HTTP/1.1" 200 53032
        crawl-66-249-65-51.googlebot.com - - [24/Dec/2006:16:44:21 +0900] "GET /~yn/jp/?plugin=attach&pcmd=open&file=arc.jpg&refer=BusTable HTTP/1.1" 200 70808
      • See: How Search organizes information – Google
  • Basic search options
    • Use double quotations for exact match


      Mishima station (1,160,000 hits)
      "Mishima station" (36,900 hits)
    • Search using part or all of the title of a paper


      Lotus japonicus transcriptome
      Citrus unshiu metabolome

      You can also find links by Google Scholar.

    • Search by an accession number of a gene or gene name. Sometimes you can find supplemental files for articles.
    • Practice: What kind of ID of each species are the symbols? What kind of information can be obtained. Search. Also compare the results from another search engines such as bing.
      • sll1234
      • Sox2
    • Practice: Search by enzyme or gene names related to your research (eg. flower bud formation). What results are obtained?

    Advanced search options

    • In principle, google performs AND search. If you only need to include one of keywords, specify as OR search by “OR”.
      • Practice: Compare the meaning of the following search and the search results How are these different?
        Mus musculus domesticus
        Mus musculus OR domesticus
      • Answer: Above: will search for pages with all three keywords, basically. Bottom: will search for pages with either musculus OR domesticus, in addition to Mus
      • Practice: What kind of search is done for each of the following? What are those results?
        Mus OR musculus OR domesticus
        "Mus musculus" OR "Mus domesticus"
      • Answer: Above: will search for pages with only one keyword among Mus, musculus and domesticus. Bottom: will search for pages to have either “Mus musculus” OR “Mus domesticus”
    • *: Wild card
      vitamin * is good for *
    • ..: To specify a certain range
      Arabidopsis 2015..2017
    • -: To exclude certain keywords
      mouse -PC -computer -bluetooth
    • filetype: [filetype:ppt, filetype:pdf, filetype.doc, filetype:xls, filetype:txt, filetype:html]
      To specify file type (PowerPoint, PDF, Word document, Excel spreadsheet, text, html)


      BLAST filetype:ppt
      Arabidopsis filetype:xls
    • site: To search on a specific site.
      What are the following trying to find out? Search and compare differences in the number of searches:


      SOKENDAI site:soken.ac.jp
      SOKENDAI -site:soken.ac.jp
      • Practice: Search PDF documents and PowerPoint presentation files related to cancer research in MEXT site.
      • Answer: cancer research site:mext.go.jp filetype:pdf OR filetype:ppt
    • link: Pages with links to specific sites
    • Advanced: Let’s see how to control the google search: https://support.google.com/websearch/answer/134479?hl=en

    Academic keyword search engines

    PubMed: http://pubmed.gov (redirected to https://www.ncbi.nlm.nih.gov/pubmed/)

    Academic search site by National Center for Biotechnology Information (NCBI).

    • Example: to search the genome related papers for E. coli O111
      1. Enter E. coli in the search window and click [Search] -> Items: 1 to 20 of “393946” is the number of papers found
      2. Add O111 and refine search by “E. coli O111” -> Items: 1 to 20 of 1549
      3. Add genome and search “E. coli O111 genome” -> Items: 1 to 20 of 160 (narrowed down and the number decreases)
      4. In the right column “Datails”, how terms were actually searched is displayed. In this case:
        ("escherichia coli"[MeSH Terms] OR ("escherichia"[All Fields] AND "coli"[All Fields]) OR "escherichia coli"[All Fields] OR "e coli"[All Fields]) AND O111[All Fields] AND ("genome"[MeSH Terms] OR "genome"[All Fields])
      5. You can refine the fields and terms in the Details, to perform a more appropriate search.
      6. Further refinement of the search can be carried out from the upper “Advanced” link. You can narrow down by specifying “Title / Abstract”, “Author” or “Publication Date”
    • Practice: Search the paper that determined the draft whole genome base sequence of E. coli O111 strain by a suitable keyword combination.
    • Practice: Search for academic papers that describe cat washing techniques for those who suffer from cat allergies to keep cats. (Hint: wash allergy)
    • Answer: Search by keywords “cat wash allergy” then you will find a paper “Evaluation of different techniques for washing cats: quantitation of allergen removed from the cat and the effect on airborne Fel d 1.”. (Twice for once a week, three minutes wash will work.)
    • See: [Movie] Webinar: Pubmed for Scientists by NCBI
    • See: [Movie] Use MeSH to Build a Better PubMed Query by NCBI
    • See: [Movie] PubMed: The Filters Sidebar by NCBI
    • See: [Movie] Save Searches and Set E-mail Alerts by NCBI

    Google Scholar

    Google search specialized in scientific fields: http://scholar.google.com

      • To search academic materials in various fields such as journals, papers, books, abstracts etc. It is possible to know the approximate number of citations. For example, let’s look for published papers and abstracts of Nakamura, Yasukazu.
    author: yasukazu nakamura
    • Practice: Amoung your (or your superviser’s) paper, which is the most cited? what other papers cite the paper?
    • Practice: What is the citation number of the basic local alignment search tool (BLAST) paper? Also, what kind of papers cite the paper?
    • Answer: Search “google scholar” with “basic local alignment search tool” then check the “Cited” link.

    OReFiL: an Online Resource Finder for Lifesciences

    A search engine for web resources (databases, tools, etc.) described in publications in the field of life science: http://orefil.dbcls.jp/en

    • See: TogoTV (english version): How to use OReFiL. (DOI: 10.7875/togotv.2010.004)
    • Practice: Explore “codon usage” related WWW site with OReFiL.
    • Practice: Search the WWW site to help designing PCR-primer sets.

Allie: A Search Service for Abbreviation / Long Form

A search service for abbreviations and long forms utilized in life science: http://allie.dbcls.jp/en