Free Essay

Information Mgt

In:

Submitted By nna91
Words 11985
Pages 48
D38–D51 Nucleic Acids Research, 2011, Vol. 39, Database issue doi:10.1093/nar/gkq1172

Published online 20 November 2010

Database resources of the National Center for Biotechnology Information
Eric W. Sayers1,*, Tanya Barrett1, Dennis A. Benson1, Evan Bolton1, Stephen H. Bryant1, Kathi Canese1, Vyacheslav Chetvernin1, Deanna M. Church1, Michael DiCuccio1, Scott Federhen1, Michael Feolo1, Ian M. Fingerman1, Lewis Y. Geer1, Wolfgang Helmberg2, Yuri Kapustin1, David Landsman1, David J. Lipman1, Zhiyong Lu1, Thomas L. Madden1, Tom Madej1, Donna R. Maglott1, Aron Marchler-Bauer1, Vadim Miller1, Ilene Mizrachi1, James Ostell1, Anna Panchenko1, Lon Phan1, Kim D. Pruitt1, Gregory D. Schuler1, Edwin Sequeira1, Stephen T. Sherry1, Martin Shumway1, Karl Sirotkin1, Douglas Slotta1, Alexandre Souvorov1, Grigory Starchenko1, Tatiana A. Tatusova1, Lukas Wagner1, Yanli Wang1, W. John Wilbur1, Eugene Yaschenko1 and Jian Ye1
1

Downloaded from http://nar.oxfordjournals.org/ by guest on March 20, 2015

National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Building 38A, 8600 Rockville Pike, Bethesda, MD 20894, USA and 2University Clinic of Blood Group Serology and Transfusion Medicine, Medical University of Graz, Auenbruggerplatz 3, A-8036 Graz, Austria

Received September 16, 2010; Revised October 29, 2010; Accepted November 1, 2010

ABSTRACT In addition to maintaining the GenBank nucleic acid sequence database, the National Center for Biotechnology Information (NCBI) provides analysis and retrieval resources for the data in GenBank and other biological data made available through the NCBI Web site. NCBI resources include Entrez, the Entrez Programming Utilities, MyNCBI, PubMed, PubMed Central (PMC), Entrez Gene, the NCBI Taxonomy Browser, BLAST, BLAST Link (BLink), Primer-BLAST, COBALT, Electronic PCR, OrfFinder, Splign, ProSplign, RefSeq, UniGene, HomoloGene, ProtEST, dbMHC, dbSNP, dbVar, Epigenomics, Cancer Chromosomes, Entrez Genomes and related tools, the Map Viewer, Model Maker, Evidence Viewer, Trace Archive, Sequence Read Archive, Retroviral Genotyping Tools, HIV-1/Human Protein Interaction Database, Gene Expression Omnibus (GEO), Entrez Probe, GENSAT, Online Mendelian Inheritance in Man (OMIM), Online Mendelian Inheritance in Animals (OMIA), the Molecular Modeling Database (MMDB), the Conserved Domain Database (CDD), the Conserved Domain Architecture Retrieval Tool (CDART), IBIS, Biosystems, Peptidome, OMSSA,
Õ

Protein Clusters and the PubChem suite of small molecule databases. Augmenting many of the Web applications are custom implementations of the BLAST program optimized to search specialized data sets. All of these resources can be accessed through the NCBI home page at www.ncbi.nlm .nih.gov. INTRODUCTION The National Center for Biotechnology Information (NCBI) at the National Institutes of Health was created in 1988 to develop information systems for molecular biology. In addition to maintaining the GenBankÕ (1) nucleic acid sequence database, which receives data through the international collaboration with DDBJ and EMBL as well as from the scientific community, NCBI provides data retrieval systems and computational resources for the analysis of GenBank data and many other kinds of biological data. For the purposes of this article, after a summary of recent developments and an introduction to the Entrez system, the NCBI suite of resources is grouped into 10 broad categories based on those in the new NCBI Guide. All resources discussed are available from the NCBI Guide at www.ncbi.nlm.nih.gov and can also be located using the Entrez ‘Site Search’ database. In most cases, the data underlying these resources and

*To whom correspondence should be addressed. Tel: +1 301 496 2475; Fax: +1 301 480 9241; Email: sayers@ncbi.nlm.nih.gov
Published by Oxford University Press 2010. This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/ by-nc/2.5), which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.

Nucleic Acids Research, 2011, Vol. 39, Database issue

D39

executables for the software described are available for download at ftp.ncbi.nih.gov.

provided on the main Guide page in the ‘Popular Resources’ box and also as a list in the standard footer. Epigenomics

RECENT DEVELOPMENTS NCBI site redesign In late 2009, NCBI launched a long-term project of redesigning and standardizing the NCBI website. Containing more than 4000 pages, the NCBI website is a complex system of interconnected resources, many of which have unique design aspects that can make navigating the NCBI site challenging. To alleviate this, we have adopted a new set of web design standards and have applied them to several resources so far including PubMed, Nuccore, EST, GSS, Protein, Gene, dbVar and Epigenomics. The new pages have four standard elements: (i) the page header, which contains links to the NCBI home page and MyNCBI as well as two pull-down menus that provide navigation to NCBI resources and how-to guides; (ii) the search bar, which contains a pull-down menu of all Entrez databases along with links to search tools and help documentation; (iii) the page body, containing the page content such as search results or data records; and (iv) the page footer, containing five lists of links to information about NCBI, lists of categorized resources and several popular or featured resources. In the coming months, more resources will be adopting this new design that we expect will make the NCBI site more consistent and easier to navigate. Common elements in the new Entrez page designs In addition to the standard header and footer, resources that have been updated to conform to the new Entrez design share several common elements: a home page, search tools, display controls and download controls. The home page of a data resource (e.g. www.ncbi.nlm .nih.gov/protein/) contains links to documentation and other information for new users, to relevant tools and to related resources at NCBI. On pages containing search results and data records, new ‘Display Settings’ and ‘Send to’ controls appear on the left and right sides of the display, respectively. These new and simplified controls replace sets of pull-down menus and allow users to select multiple settings at once. The NCBI Guide In conjunction with the new web standards discussed above, we replaced the old NCBI home page with the NCBI Guide, an application that serves as an interactive directory of the NCBI site. On the main page of the NCBI Guide, the categories in the Resource pull-down menu in the standard header are duplicated in a list on the left of the page. Clicking on any category displays a list of relevant resources sorted into four groups: databases, downloads, submissions and tools. Popular resources are listed on the right under a ‘Quick Links’ heading. A list of how-to guides is also available via the ‘How-To’ tab on these pages. A list of the most heavily used resources is

The Epigenomics database (www.ncbi.nlm.nih.gov/ epigenomics/) is a new information resource at NCBI specifically aimed at highlighting epigenomics data. Epigenomics is an emerging field of research that studies how, despite sharing a common genomic sequence, different cell types and cell lineages acquire distinct patterns of gene expression. Epigenetic features examined include post-translational modifications of histone proteins, genomic DNA methylation, chromatin organization and the expression of non-coding regulatory RNA. Raw data from these experiments, together with extensive meta-data, are stored in the GEO (Gene Expression Omnibus) and SRA (Sequence Read Archive) databases. The new Epigenomics resource provides a higher-level view, allowing users to search and browse the data based on biological attributes such as cell type, tissue type, differentiation stage and heath status, among many others. Data have been pre-mapped to genomic coordinates (to make ‘genome tracks’), so users are not required to be familiar with or manipulate the raw data. Tracks may be visualized in either the NCBI or UCSC genome viewers or may be downloaded to the user’s computer for local analysis. Data from the Roadmap Epigenomics project, which are currently being hosted at GEO (www .ncbi.nlm.nih.gov/geo/roadmap/epigenomics/), are being mirrored and are available for viewing and downloading from this new resource. Database of Genomic Structural Variation In 2010, NCBI launched the Database of Genomic Structural Variation (dbVar), an archive of large-scale genomic variants such as insertions, deletions, translocations and inversions (www.ncbi.nlm.nih.gov/dbvar/). Currently, dbVar (2) contains over 50 studies from human, rhesus macaque, chimpanzee, mouse, dog, fruit fly and pig, and accepts data derived from several methods including computational sequence analysis and microarray experiments. Each variant is linked to a graphical view showing its genomic context. Inferred Biomolecular Interactions Server Recently, NCBI introduced the Inferred Biomoleculars Interactions Server (IBIS), a research server that analyzes and predicts interaction partners and binding site locations in proteins (3). IBIS (www.ncbi.nlm.nih .gov/Structure/ibis/ibis.cgi) integrates the interactions observed in structural complexes from the Molecular Modeling Database (MMDB) for different types of binding partners including proteins, chemical ligands, nucleic acids, peptides and ions. IBIS also infers binding sites and partners from homologous protein complexes. To emphasize biologically relevant binding sites, similar sites are clustered together based on their evolutionary conservation. In the future, NCBI plans to incorporate observed and inferred interactions of this kind throughout the Entrez 3D structure resources.

Downloaded from http://nar.oxfordjournals.org/ by guest on March 20, 2015

D40 Nucleic Acids Research, 2011, Vol. 39, Database issue

New outreach resources and services NCBI recently redesigned its main Education page (www .ncbi.nlm.nih.gov/Education/) and introduced several new outreach initiatives including training webinars and a new series of courses called Discovery Workshops (4). The new page has links to documentation, educational tools, upcoming conference exhibits and news items. Also on the page are links to the new NCBI pages on Facebook and Twitter, plus YouTube pages that contain short video tutorials and videos from special events at NCBI. BLAST and COBALT updates The Short Read Archive (SRA) BLAST page, accessible from the ‘Specialized BLAST’ section of the main BLAST page (blast.ncbi.nlm.nih.gov), now has an option for searching WGS sequences from 454 Sequencing systems. The WGS sequences are grouped by genus in a pull-down menu, and if multiple species have data within a genus, a separate menu appears allowing individual species to be selected. These data sets are updated daily, so new WGS data are available for searching quickly. The standard BLAST pages now have additional options for filtering searches. If the ‘Align two or more sequences’ checkbox is not checked, users can either include or exclude data from any number of specified organisms or taxons, greatly increasing the range of customized data sets available. In addition, checkboxes are available that allow users to exclude ‘model’ sequences (RefSeq XM and XP accessions) as well as sequences from uncultured or environmental samples. Finally, COBALT (5) users can download the output multiple alignment to a file in several popular formats including gapped FASTA, ClustalW, Phylip and Nexus. MyNCBI updates MyNCBI allows users to store personal configuration options such as search filters, LinkOut preferences and document delivery providers. Several enhancements have been made to MyNCBI in the past year, including an update to allow users to sign in using credentials for an account with a partner organization such as Google, eRA Commons, VeriSign or a local university. My Bibliography was enhanced to allow users to add citations from books, meetings, presentations, patents and articles not found in PubMed, and also to give users the ability to manage their compliance with the NIH Public Access Policy. In addition, the number of PubMed filter selections has been expanded from five to 15, and users may now change their PubMed default settings for display format, items per page, and the method for sorting search results. Updates to literature resources In addition to the changes outlined above for PubMed as part of the Entrez redesign, NCBI released several enhancements for both PubMed and PubMed Central (PMC). For the first time, PubMed now includes citations for book and book chapters available on the NCBI Bookshelf. To aid in searching, an autocomplete feature

was added to the PubMed search box, and the PubMed Clinical Queries page (www.ncbi.nlm.nih.gov/pubmed/ clinical) was redesigned to show immediate results for clinical studies, systematic reviews and medical genetics side by side. To assist users in finding related literature, PMC full-text views now include a list of related PubMed abstracts on the right. In addition, links to PubMed abstracts cited in the text now appear to the right of the paragraph containing the citation. New discovery components within the Entrez system NCBI continued to add new discovery components that assist researchers in finding particular Entrez links and using them to discover interesting relationships within the NCBI databases. Two such components were introduced on protein sequence view pages: an ad that alerts users that the protein being viewed is part of a biological pathway or other system within the Biosystems database, and which provides a link to that pathway; and an ad that describes and links to a cluster of sequences in the Protein Clusters database that includes the protein being viewed. Both of these ads appear on the right column of the sequence view page. For search operations that retrieve 20 or fewer nucleotide or protein sequences, links now appear in the right column that allow users to run BLAST and/or COBALT on all or any checked subset of the sequences. THE ENTREZ SEARCH AND RETRIEVAL SYSTEM Entrez databases Entrez (6) is an integrated database retrieval system that provides access to a diverse set of 38 databases that together contain over 450 million records (Table 1). Entrez supports text searching using simple Boolean queries, downloading of data in various formats and linking of records between databases based on biological relationships. In their simplest form, these links may be cross-references between a sequence and the abstract of the paper in which it is reported, or between a protein sequence and its coding DNA sequence or its threedimensional (3D) structure. Computationally derived links between ‘neighboring records’, such as those based on computed similarities among sequences or among PubMed abstracts, allow rapid access to groups of related records. A service called LinkOut expands the range of links to include external services, such as organism-specific genome databases. The records retrieved in Entrez can be displayed in many formats and downloaded singly or in batches. Entrez programming utilities (E-Utilities) The Entrez Programming Utilities (E-Utilities) are a suite of eight server-side programs supporting a uniform set of parameters used to search, link and download data from the Entrez databases. EInfo provides basic statistics on a given database, including the last update date and lists of all search fields and available links. ESearch returns the identifiers of records that match an Entrez text query, and

Downloaded from http://nar.oxfordjournals.org/ by guest on March 20, 2015

Nucleic Acids Research, 2011, Vol. 39, Database issue

D41

Table 1. The Entrez databases (as of 1 September 2010) Database Nucleotide PubChem Substance SNP EST GEO Profiles Protein PubChem Compound GSS PubMed Probe Gene UniGene PubMed Central NLM Catalog Taxonomy UniSTS dbVar Protein Clusters PubChem Bioassay 3D Domains Books MeSH Cancer Chromosomes Biosystems Homologene PopSet dbGaP GENSAT Structure CDD GEO Datasets Journals SRA OMIM Genome Genome Projectsa Site Search OMIA Epigenomics Peptidome a Records 105 131 187 72 112 459 71 036 396 66 693 283 63 811 486 35 020 254 28 801 560 28 560 647 20 139 180 10 243 420 7 578 739 4 304 399 2 041 249 1 417 314 653 718 528 865 510 291 507 133 462 678 313 714 288 700 219 574 140 494 135 309 123 767 118 358 99 307 97 980 67 522 40 561 28 853 25 887 25 432 21 140 12 399 5893 4902 2658 490 322

Section within this article DNA and RNA Chemicals and Bioassays Genetics and Medicine DNA and RNA Genes and Expression Proteins Chemicals and Bioassays DNA and RNA Literature Genes and Expression Genes and Expression Genes and Expression Literature Literature Taxonomy Genomes Recent Developments Proteins Chemicals and Bioassays Domains and Structures Literature Literature Genetics and Medicine Genes and Expression Genes and Expression DNA and RNA Genetics and Medicine Genes and Expression Domains and Structures Domains and Structures Genes and Expression Literature DNA and RNA Genetics and Medicine Genomes Genomes Introduction Genetics and Medicine Recent Developments Proteins

have abstracts, the earliest from the 1880s, and 11 million have links to their full-text articles, with 3 million having both an abstract and a link to full text. PubMed is heavily linked to other core Entrez databases, thereby providing a crucial bridge between the data of molecular biology and the scientific literature. PubMed records are also linked to one another within Entrez as ‘related citations’ on the basis of computationally detected similarities using indexed Medical Subject Heading (MeSH) (7) terms and the text of titles and abstracts. The default Abstract display format shows the abstract of a paper along with succinct descriptions of the top five related articles and numerous Discovery Components (see above), increasing the potential for the discovery of important relationships. PubMed Central PMC (8) is a digital archive of peer reviewed journals in the life sciences and now contains over 2 million full-text articles, growing by 11% over the past year. More than 1000 journals, including Nucleic Acids Research, deposit the full text of their articles in PMC, and more than 400 of these began depositing their data in the last year. Publisher participation in PMC requires a commitment to free access to full text, either immediately after publication or within a 12-month period. As a consequence of the mandatory NIH Public Access Policy that went into effect on 7 April 2008, PMC is also the repository for all final peer-reviewed manuscripts arising from research using NIH funds. All PMC articles are identified in PubMed search results and PMC itself can be searched using Entrez. The NCBI Bookshelf, the NLM Catalog and the Journals database The NCBI Bookshelf is an online resource of textbooks, reports and databases in the biomedical sciences. Supported by both the PMC database framework for publishing and archiving and the Entrez system for search and retrieval, the Bookshelf provides users free access to the full text of this content. Bookshelf is now home to over 600 titles, which include NIH-funded reports from the National Academies of Sciences and Clinical Guidelines from UK’s National Institute for Health and Clinical Excellence. Databases such as GeneReviews and MICAD (Molecular Imaging and Contrast Agent Database) are updated regularly. Earlier this year, Bookshelf began submitting records to PubMed for a subset of books and chapters in its database. Book records in PubMed can be identified by the ‘Books and Documents’ label and link back to the respective book or chapter in Bookshelf. The NLM Catalog provides bibliographic data for over 1.4 million NLM holdings including journals, books, manuscripts, computer software, audio recordings and other electronic resources. Each record is linked to the NLM LocatorPlus service as well as related catalog records with similar title words or associated MeSH terms. The Journals database contains all journals referenced in any Entrez database. Currently holding
Downloaded from http://nar.oxfordjournals.org/ by guest on March 20, 2015

Soon to be renamed ‘BioProjects’.

when combined with EFetch or ESummary, provides a mechanism for downloading the corresponding data records. ELink gives users access to the vast array of links within Entrez so that data related to an input set can be retrieved. By assembling URL or Simple Object Access Protocol (SOAP) calls to the E-utilities within simple scripts, users can create powerful applications to automate Entrez functions to accomplish batch tasks that are impractical using web browsers. Instructions for using the E-Utilities are now found on the NCBI Bookshelf at www.ncbi.nlm.nih.gov/bookshelf/br .fcgi?book=helpeutils. LITERATURE PubMed The PubMed database now contains more than 20 million citations dating back to the 1860s from more than 22 000 life science journals. Over 11 million of these citations

D42 Nucleic Acids Research, 2011, Vol. 39, Database issue

over 25 000 records, the database indexes for each journal the title abbreviation, the International Organization for Standardization (ISO) abbreviation, publication data and links to the NLM catalog and all Entrez records associated with articles from that journal. TAXONOMY The NCBI taxonomy database serves as a central organizing principle for the Entrez biological databases and provides links to all data for each taxonomic node, from superkingdoms to subspecies. The database is growing at the rate of 3800 new taxa per month and indexes over 380 000 organisms named at the genus level or lower that are represented in Entrez by at least one nucleotide or protein sequence. The Taxonomy Browser (www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi) can be used to view the taxonomy tree or retrieve data from any of the Entrez databases for a particular organism or group. DNA AND RNA Reference sequences The NCBI Reference Sequence (RefSeq) database (9) is a non-redundant set of curated and computationally derived sequences for transcripts, proteins and genomic regions. The number of nucleotide records in the RefSeq collection has grown by 10% over the past year so that Release 42 (July 2010) contains 4.4 million sequences representing over 10 700 organisms. RefSeq DNA and RNA sequences can be searched and retrieved from the Entrez Nucleotide database, and the complete RefSeq collection is available in the RefSeq directory on the NCBI FTP site. Sequences from GenBank and other sources Sequences from GenBank (1) can be searched in and retrieved from three Entrez databases: Nucleotide, EST and GSS (specified as nuccore, nucest and nucgss within the E-utilities). Entrez Nucleotide contains all GenBank sequences except those within the Expressed Sequence Tag (EST) or Genome Survey Sequence (GSS) GenBank divisions. The database also contains Whole Genome Shotgun (WGS) sequences, Third Party Annotation (TPA) sequences and sequences imported from the Entrez Structure database. In addition, those sequences that have been submitted as part of a population, phylogenetic or environmental study are placed in the PopSet database. The Trace and Assembly archives The Trace Archive contains over 2 billion traces (12% human) from gel and capillary electrophoresis sequencers. More than 10 000 species are represented. The Trace Assembly Archive links reads in the Trace Archive with genetic sequences in GenBank. An Assembly Viewer displays multiple alignments of assembled reads against consensus sequences to provide support for GenBank deposits.

Sequence Read Archive The Sequence Read Archive (SRA; 10) is a repository for sequencing data generated from the new generation of sequencers, including the Roche-454 GS and FLX, Illumina Genome Analyzer, Applied Biosystems SOLiD System, Helicos Heliscope, and CompleteGenomics platforms. The SRA is part of the Entrez system and contains over 56 Terabasepairs (Tbp) of biological sequence data. Within Entrez SRA (www.ncbi.nlm.nih.gov/sra/), the data are organized in four types of interlinked records: studies (SRP), experiments (SRX), samples (SRS) and runs (SRR). A study is a collection of related experiments, and each experiment is a set of laboratory operations performed on one or more samples. The results of these experiments are called runs. Additional information about these SRA concepts, along with documentation on using and submitting data to the resource, is available in a new help manual at www.ncbi.nlm.nih.gov/bookshelf/br .fcgi?book=helpsra. Sequence read BLAST searches are now offered for transcript and whole-genome sequence data sets from 454 Sequencing systems, and regular expression pattern matching against short reads of all types is possible. A version of the SRA has been deployed behind dbGaP authorized access in order to provide archive services for human sequencing data under usage or privacy restrictions. PROTEINS Databases Reference sequences. In addition to genomic and transcript sequences, the RefSeq database (9) contains protein sequences that are curated and computationally derived from these DNA and RNA sequences. The number of protein records in the RefSeq collection has grown by 29% over the past year so that Release 42 (July 2010) contains 10.6 million protein sequences. RefSeq protein sequences can be searched and retrieved from the Entrez Protein database, and the complete RefSeq collection is available in the RefSeq directory on the NCBI FTP site. Sequences from GenBank and other sources. As part of standard submission procedures, NCBI produces conceptual translations for any sequence in GenBank (1) that contains a coding sequence and places these protein sequences in the Entrez Protein database. In addition to these 23 million ‘GenPept’ sequences, the Protein database also contains sequences from TPA, SWISSPROT (11), the Protein Information Resource (PIR) (12), the Protein Research Foundation (PRF) and the Protein Data Bank (PDB) (13). Protein Clusters. The Protein Clusters database (www .ncbi.nlm.nih.gov/proteinclusters/) contains over 500 000 sets of almost identical RefSeq proteins encoded by complete genomes from prokaryotes, eukaryotic organelles (mitochondria and chloroplasts), viruses and plasmids as well as from some protozoans and plants. The clusters are organized in a taxonomic hierarchy and

Downloaded from http://nar.oxfordjournals.org/ by guest on March 20, 2015

Nucleic Acids Research, 2011, Vol. 39, Database issue

D43

are created based on reciprocal best-hit protein BLAST scores (14). These clusters are used as a basis for genome-wide comparison at NCBI as well as to provide simplified BLAST searches via Concise Microbial Protein BLAST (www.ncbi.nlm.nih.gov/genomes/prokhits.cgi). Protein Clusters provides annotations, publications, domains, structures, external links and analysis tools, including multiple sequence alignments and phylogenetic trees. Peptidome. Peptidome (15) is a data repository for tandem mass spectrometry peptide and protein identification data generated by the scientific community. Data from all stages of a mass spectrometry experiment are captured, including original mass spectra files, experimental metadata and conclusion-level results. The submission process is facilitated through acceptance of data in commonly used open formats, and all submissions undergo syntactic validation and curation in an effort to uphold data integrity and quality. Peptidome accepts data from any tandem mass spectrometry experiment and from any species. In addition to data storage, web-based interfaces are available to help users query, browse and explore individual peptides, proteins, or entire Samples and Studies. Metadata for all public Samples and Studies along with that for the associated proteins in each Sample are loaded into Entrez Peptidome. HIV-1/Human Protein Interaction Database. The Division of Acquired Immunodeficiency Syndrome of the National Institute of Allergy and Infectious Diseases, in collaboration with the Southern Research Institute and NCBI, maintains a comprehensive HIV Protein-Interaction Database of documented interactions between HIV-1 proteins, host cell proteins, other HIV-1 proteins or proteins from disease organisms associated with HIV or AIDS (16). Summaries, including protein RefSeq accession numbers, Entrez Gene IDs, lists of interacting amino acids, brief descriptions of interactions, keywords and PubMed IDs for supporting journal articles, are presented at www.ncbi.nlm.nih.gov/RefSeq/HIVInteractions/ . All protein–protein interactions documented in the HIV Protein-Interaction Database are listed in Entrez Gene reports in the HIV-1 protein interactions section. Analysis Tools COBALT. COBALT (5) is a multiple alignment algorithm that finds a collection of pair-wise constraints derived from both the NCBI Conserved Domain database and the sequence similarity programs RPSBLAST, BLASTP and PHI-BLAST. These pair-wise constraints are then incorporated into a progressive multiple alignment. COBALT searches can be launched either from a BLASTP result page or from the main COBALT search page (http://www.ncbi.nlm.nih.gov/tools/cobalt/), where either FASTA sequences or accessions (or a combination thereof) may be entered into the query sequence box. Links at the top of the COBALT report provide access to a phylogenetic tree view of the multiple alignment and allow users either to launch a modified search or download the alignment in several popular formats.

BLink. BLAST Link (BLink) displays pre-computed BLAST alignments of similar sequences for each protein sequence in Entrez Protein. BLink can display alignment subsets limited by either taxonomic criteria or the database of origin, and provides links to a COBALT multiple sequence alignment of the resulting sequences or a BLAST search with the query protein. BLink links are presented on protein records in Entrez as well as within Entrez Gene reports. The Open Mass Spectrometry Search Algorithm. The Open Mass Spectrometry Search Algorithm (OMSSA) (21) analyzes MS/MS peptide spectra by searching libraries of known protein sequences, assigning significant hits an expectation value computed in the same way as the E-value of BLAST. The web interface to OMSSA allows up to 2000 spectra to be analyzed in a single session using either the BLAST nr, RefSeq or Swiss-Prot sequence libraries for comparison. Standalone versions of OMSSA that accept larger batches of spectra and allow searches of custom sequence libraries can be downloaded at pubchem.ncbi.nlm.nih.gov/omssa/download.htm. BLAST SEQUENCE ANALYSIS BLAST The BLAST programs (17–19) perform sequencesimilarity searches against a variety of nucleotide and protein databases, returning a set of gapped alignments with links to full sequence records as well as to related transcript clusters (UniGene), annotated gene loci (Gene), 3D structures (MMDB) or microarray studies (GEO). The NCBI web interface for BLAST allows users to assign titles to searches, to review recent search results and to save parameter sets in MyNCBI for future use. The basic BLAST programs are also available as standalone command line programs, as network clients and as a local Web-server package at ftp.ncbi.nih.gov/ blast/executables/LATEST/ (Table 2). BLAST databases The default database for nucleotide BLAST searches (‘Human genomic plus transcript’) contains human RefSeq transcript and genomic sequences arising from the NCBI annotation of the human genome. Searches of this database generate a tabular display that partitions the BLAST hits by sequence type (genomic or transcript) and allows sorting by BLAST score, percent identity within the alignment and the percent of the query sequence contained in the alignment. A similar database is available for mouse. Several other databases are also available and are described in links from the BLAST input form. Each of these databases can be limited to an arbitrary taxonomic node or those records satisfying any Entrez query. For proteins the default database (nr) is a nonredundant set of all CDS translations from GenBank along with all RefSeq, Swiss-Prot, PDB, PIR and PRF proteins. Subsets of this database are also available, such as the PDB or Swiss-Prot sequences, along with separate databases for sequences from patents and

Downloaded from http://nar.oxfordjournals.org/ by guest on March 20, 2015

D44 Nucleic Acids Research, 2011, Vol. 39, Database issue

Table 2. Selected NCBI software available for download Software Available binaries Category within this article BLAST Sequence Analysis BLAST Sequence Analysis BLAST Sequence Analysis Domains and Structures Domains and Structures Chemicals and Bioassays Genomes Genes and Expression Genomes Domains and Structures Genomes Genomes Genomes

alignments. Dis-contiguous MegaBLAST is far more rapid than a translated search such as blastx, yet maintains a competitive degree of sensitivity when comparing coding regions. Primer-BLAST Primer-BLAST is a tool for designing and analyzing polymerase chain reaction (PCR) primers based on the existing program Primer3 (22) that designs PCR primers given a template DNA sequence. Primer-BLAST extends this functionality by running a BLAST search against a chosen database with the designed primers as queries, and then returns only those primer pairs specific to the input template DNA, in that they do not generate valid PCR products on sequences other than the template. Users can also specify a forward or reverse primer in addition to a DNA template, in which case the other primer will be designed and analyzed. If both primers are specified along with a template, the tool performs only the final BLAST analysis. Users may also enter two primers without a template, in which case the BLAST analysis will display those templates in the chosen database that best match the primer pair. The available databases range from RefSeq mRNA or genomic sets for one of twelve model organisms to the entire BLAST nr database. GENES AND EXPRESSION Entrez Gene Entrez Gene (23) provides an interface to curated sequences and descriptive information about genes with links to NCBI’s Map Viewer, Evidence Viewer, Model Maker, BLink, protein domains from the Conserved Domain Database (CDD), and other gene-related resources. Gene contains data for almost 6.7 million genes from over 6700 organisms. These data are accumulated and maintained through several international collaborations in addition to curation by in-house staff. Links within Gene to the newest citations in PubMed are maintained by curators and provided as Gene References into Function (GeneRIF). The complete Entrez Gene data set, as well as organism-specific subsets, is available in the compact NCBI ASN.1 format on the NCBI FTP site. The gene2xml tool converts the native Gene ASN.1 format into XML and is available at ftp.ncbi.nih.gov/ toolbox/ncbi_tools/converters/by_program/gene2xml/. RefSeqGene In collaboration with Locus Reference Genomic (LRG) (www.lrg-sequence.org), RefSeqGene provides stable, standard genomic sequences annotated with standard mRNAs for well-characterized human genes (9). RefSeqGene records are part of the RefSeq collection and are created in consultation with authoritative locus-specific databases or other experts on particular loci and provide a stable genomic sequence for establishing numbering systems for exons and introns and for reporting and identifying genomic variants, especially those

BLAST (stand alone) BLAST (network client) BLAST (web server) CD-Tree Cn3D PC3D e-PCR gene2xml Genome Workbench OMSSA splign prosplign tbl2asn

Win, Mac, LINUX, Solaris Win, Mac, LINUX, Solaris Mac, LINUX, Solaris Win, Mac Win, Mac, LINUX, Solaris Win, Mac, LINUX Win, LINUX Win, Mac, LINUX, Solaris Win, Mac, LINUX Win, Mac, LINUX LINUX, Solaris LINUX Win, Mac, LINUX, Solaris

Downloaded from http://nar.oxfordjournals.org/ by guest on March 20, 2015

environmental samples. Like the nucleotide databases, these collections can be limited by taxonomy or an arbitrary Entrez query. BLAST output formats Standard BLAST output formats include the default pairwise alignment, several query-anchored multiple sequence alignment formats, an easily parsable Hit Table and a report that organizes the BLAST hits by taxonomy. A ‘pairwise with identities’ mode better highlights differences between the query and a target sequence. A Tree View option for the Web BLAST service creates a dendrogram that clusters sequences according to their distances from the query sequence. Each alignment returned by BLAST is scored and assigned a measure of statistical significance, called the Expectation Value (E-value). The alignments returned can be limited by an E-value threshold or range. Genomic BLAST NCBI maintains Genomic BLAST pages for more than 100 organisms shown in the Map Viewer. By default, genomic BLAST searches the genomic sequence of an organism, but additional databases are also available, such as the nucleotide and protein RefSeqs annotated on the genomic sequence, as well as sets of sequences such as ESTs that are mapped to the genomic sequence. The default search program for the NCBI Genomic BLAST pages is MegaBLAST (20), a faster version of standard nucleotide BLAST designed to find alignments between nearly identical sequences, typically from the same species. For rapid cross-species nucleotide queries, NCBI offers Dis-contiguous MegaBLAST, which uses a non-contiguous word match (21) as the nucleus for its

Nucleic Acids Research, 2011, Vol. 39, Database issue

D45

of clinical importance (24). By default, a RefSeqGene record begins 5 kb upstream of the first exon of the gene and ends 2 kb downstream of the final exon, but those positions will be adjusted on request. A RefSeqGene sequence may differ from the current genomic build so as to reflect standard alleles. RefSeqGene records can be retrieved from Entrez Nucleotide using the query ‘refseqgene[keyword]’, are available on corresponding Entrez Gene reports and can be downloaded from ftp .ncbi.nih.gov/refseq/H_sapiens/RefSeqGene. The Conserved CDS database The Conserved CDS database (CCDS) project (www.ncbi .nlm.nih.gov/CCDS/) is a collaborative effort among NCBI, the European Bioinformatics Institute, the Wellcome Trust Sanger Institute and University of California, Santa Cruz (UCSC) to identify a set of human and mouse protein coding regions that are consistently annotated and of high quality. To date, the CCDS database contains over 23 700 human and 17 700 mouse CDS annotations. The web interface to the CCDS allows searches by gene or sequence identifiers and provides links to Entrez Gene, record revisions histories, transcript and proteins sequences and gene views in Map Viewer, the Ensemble Genome Browser, the UCSC Genome Browser and the Sanger Institute Vega Browser. The CCDS sequence data are available at ftp.ncbi.nlm. nih.gov/pub/CCDS/. Gene Expression Omnibus Gene Expression Omnibus (GEO) (25) is a data repository and retrieval system for high-throughput functional genomic data generated by microarray and nextgeneration sequencing technologies. In addition to gene expression data, GEO accepts other categories of experiments including studies of genome copy number variation, genome-protein interaction surveys and methylation profiling studies. The repository can capture fully annotated raw and processed data, enabling compliance with major community-derived scientific reporting standards such as ‘Minimum Information About a Microarray Experiment’ (MIAME) (26,27). Several data deposit options and formats are supported, including web forms, spreadsheets, XML and plain text. GEO data are housed in two Entrez databases: GEO Profiles, which contains quantitative gene expression measurements for one gene across an experiment, and GEO Datasets, which contains entire experiments. Currently, the GEO database hosts over 18 000 studies submitted by 8000 laboratories and comprising 460 000 samples and 33 billion individual abundance measurements for over 1300 organisms. UniGene and ProtEST UniGene (28) is a system for partitioning transcript sequences (including ESTs) from GenBank into a non-redundant set of clusters, each of which represents a potential gene locus. UniGene clusters are created for all organisms for which there are 70 000 or more ESTs in GenBank and includes ESTs for 68 animals, 54 plants and fungi and another six eukaryotes. UniGene databases

are updated weekly with new EST sequences, and bimonthly with newly characterized sequences. As an aid to identifying a UniGene cluster, ProtEST presents precomputed BLAST alignments between protein sequences from model organisms and the six-frame translations of nucleotide sequences in UniGene. Homologene HomoloGene is a system that automatically detects homologs, including paralogs and orthologs, among the genes of 20 completely sequenced eukaryotic genomes. HomoloGene reports include homology and phenotype information drawn from Online Mendelian Inheritance in Man (OMIM) (29), Mouse Genome Informatics (MGI) (30), Zebrafish Information Network (ZFIN) (31), Saccharomyces Genome Database (SGD) (32), Clusters of Orthologous Groups (COG) (33) and FlyBase (34). The HomoloGene Downloader, appearing under the ‘Download’ link in HomoloGene displays, retrieves transcript, protein, or genomic sequences for the genes in a HomoloGene group; in the case of genomic sequence, upstream and downstream regions may be specified. GENSAT GENSAT (35–37) is a gene expression atlas of the mouse central nervous system produced with data supplied by the Rockefeller University and the St. Jude Children’s Research Hospital. GENSAT (www.ncbi.nlm.nih.gov/ projects/gensat/) catalogs images of histological sections of the mouse brain in which biochemical tags have been used to visualize local gene expression. In addition to search tools, GENSAT provides download, zoom and comparison facilities for the more than 97 000 images in the collection. Probe The NCBI Probe database is a public registry of nucleic acid reagents designed for use in a wide variety of biomedical research applications, together with information on reagent distributors, probe effectiveness and computed sequence similarities. The Probe database archives 10.2 million probe sequences, among them probes for genotyping, single-nucleotide polymorphism (SNP) discovery, gene expression, gene silencing and gene mapping. The probe database also provides submission templates to simplify the process of depositing data (www.ncbi.nlm.nih.gov/genome/probe/doc/Submitting. shtml). Biosystems NCBI Biosystems (www.ncbi.nlm.nih.gov/biosystems/) collects together molecules that interact in a biological system, such as a biochemical pathway or disease. Currently, Biosystems receives data from the Kyoto Encyclopedia of Genes and Genomes (KEGG) (38–40), BioCyc (41), Reactome (42) and the Pathway Interaction Database (43). These source databases provide diagrams of pathways that display the various components with

Downloaded from http://nar.oxfordjournals.org/ by guest on March 20, 2015

D46 Nucleic Acids Research, 2011, Vol. 39, Database issue

their substrates and products, as well as links to relevant literature. In addition to being linked to such literature in PubMed, each component within a Biosystem record is also linked to the corresponding records in Entrez Gene and Protein, while the substrates and products are linked to records in PubChem (see below) so that the Biosystem record centralizes NCBI data related to the pathway, greatly facilitating computation on such systems. GENOMES Databases Entrez Genome. Entrez Genome (44) provides access to genomic sequences from the RefSeq collection and is a convenient portal both for retrieving such sequences from multiple organisms and for viewing small genomes, such as those from prokaryotes. Currently, the database contains complete genomes for more than 1200 microbes and 3600 viruses, as well as for over 2400 eukaryotic organelles. For higher eukaryotes, the Genome database includes complete genomes for 39 species, as well as data from over 800 other genome sequencing projects. More than 11% of the 12 400 total sequences were added in the past year. For higher eukaryotes, Entrez Genome provides direct links to the NCBI Map Viewer; for prokaryotes, viruses and eukaryotic organelles, specialized viewers and BLAST pages are available. The Plant Genomes Central Web page serves as a portal to completed plant genomes, to information on plant genome sequencing projects or to other resources at NCBI such as the plant Genomic BLAST pages or Map Viewer. Entrez Genome Projects. The Entrez Genome Projects database, soon to be renamed Entrez BioProjects, provides an overview of the status of a variety of genomic and other biomedical projects, ranging from large-scale sequencing and assembly projects to projects focused on a particular locus, such as 16 S ribosomal RNA, or a viral disease, such as SARS. The scope of the database continues to expand so that only one-third of the more than 15 000 projects are traditional single-organism genome sequencing projects, while the other two-thirds are projects such as viral population projects, metagenome and environmental sampling projects, comparative genomics projects and transcriptome projects. Genome Projects links to project data in the other Entrez databases, such as Entrez Nucleotide and Genome, and to a variety of other NCBI and external resources. For prokaryotic organisms, Genome Projects indexes a number of characteristics of interest to biologists such as organism morphology and motility, pathogenicity and environmental requirements such as salinity, temperature, oxygen levels and pH range. NCBI encourages depositors to register their projects early in their development so that project data can be linked via the project ID to other NCBI-hosted data at the earliest opportunity. Influenza Genome resources. The Influenza Genome Sequencing Project (IGSP) (45) is providing researchers

with a growing collection of over 46 000 virus sequences essential to the identification of the genetic determinants of influenza pathogenicity. NCBI’s Influenza Virus Resource links the IGSP project data via PubMed to the most recent scientific literature on influenza as well as to a number of online analysis tools and databases. These databases include NCBI’s Influenza Virus Sequence Database, comprised of over 150 000 influenza sequences in the GenBank and RefSeq databases, as well as other Entrez databases containing 167 000 influenza protein sequences, 170 influenza protein structures and 590 influenza population studies. An online influenza genome annotation tool analyzes a novel sequence and produces output in a ‘feature table’ format that can be used by NCBI’s GenBank submission tools such as tbl2asn (1). NCBI now also provides the Virus Variation resource (www.ncbi.nlm.nih.gov/genomes/VirusVariation/) that extends services available for Influenza to other viruses, such as the Dengue virus. Virus Variation provides a portal for retrieving, downloading, analyzing and annotating virus sequences using pages customized to unique aspects of viral sequence data, including genotype, severity of the resulting disease and the year a sample was collected. Analysis tools Map Viewer. The NCBI Map Viewer (www.ncbi.nlm.nih .gov/mapview/) displays genome assemblies, genetic and physical markers and the results of annotation and other analyses using sets of aligned maps for 110 organisms. The available maps vary by organism and may include cytogenetic maps, physical maps and a variety of sequence-based maps. Maps from multiple organisms or multiple assemblies for the same organism can be displayed in a single view. Map Viewer also can display previous genome builds and can produce convenient formats for downloading data. Genome Workbench. NCBI’s Genome Workbench is a stand-alone application (Table 2) for sequence and genomic evaluation, offering tools for visualization and analysis, including integrated graphical views of sequences and alignments, text and tabular displays of annotation and common sequence analysis tools, including BLAST, MUSCLE and Splign. Genome Workbench offers the power of computation on a user’s own computer, and can easily mix private data with data available for public retrieval. The NCBI Genome Workbench Team has recently released an updated version, v.2.1.2. The new version contains many critical bug fixes, and all current users are encouraged to upgrade. Model Maker and Evidence Viewer. Model Maker is used to construct transcript models using combinations of putative exons derived from ab initio predictions or from the alignment of GenBank transcripts, including ESTs and RefSeqs, to the NCBI human genome assembly. The Evidence Viewer summarizes the sequence evidence supporting a gene annotation by displaying alignments of RefSeq and GenBank transcripts, along with ESTs, to genomic contigs. The tool also shows detailed alignments

Downloaded from http://nar.oxfordjournals.org/ by guest on March 20, 2015

Nucleic Acids Research, 2011, Vol. 39, Database issue

D47

for each exon, and highlights mismatches between the transcript and genomic sequences. Open Reading Frame Finder, Splign and ProSplign. NCBI provides several tools that assist in identifying coding sequences in genomic DNA. The Open Reading Frame (ORF) Finder (www.ncbi.nlm.nih.gov/projects/gorf/) performs a six-frame translation of a nucleotide sequence and returns the location of each ORF within a specified size range. Splign (46) (www.ncbi.nlm.nih.gov/ sutils/splign/splign.cgi) is a utility for computing cDNAto-genomic sequence alignments that is accurate in determining splice sites, tolerant of sequencing errors and supports cross-species alignments. Splign uses a version of the Needleman-Wunsch algorithm (47) that accounts for splice signals in combination with a compartmentalization algorithm to identify possible locations of genes and their copies. A link to download a standalone version designed for large-scale processing is provided on the Splign web page. Finally, ProSplign (www.ncbi.nlm .nih.gov/sutils/static/prosplign/prosplign.html) aligns protein sequences to genomic DNA sequences using an algorithm similar to that of Splign in that it accounts for introns and splice signals to yield optimal alignments. Standalone versions of the program are also available on the ProSplign web page. Electronic PCR. Forward electronic PCR (e-PCR) searches for matches to STS primer pairs in the UniSTS database of almost 530 000 markers. Reverse e-PCR is used to estimate the genomic binding site, amplicon size and specificity for sets of primer pairs by searching against genomic and transcript databases. Both e-PCR binaries and source code are available at ftp.ncbi.nlm.nih.gov/ pub/schuler/e-PCR. TaxPlot, GenePlot and gMap. TaxPlot plots similarities in the proteomes of two organisms to that of a reference organism for complete prokaryotic and eukaryotic genomes. A related tool, GenePlot, generates plots of protein similarity for a pair of complete microbial genomes to visualize deleted, transposed or inverted genomic segments. The gMap tool combines the results of pre-computed whole microbial genome comparisons with on-the-fly BLAST comparisons, clustering genomes with similar nucleotide sequences, and then graphically depicting the precomputed segments of similarity.

which were submitted in the past year, and each of which can be browsed by name or disease. To protect the confidentiality of study subjects, dbGaP accepts only de-identified data and requires investigators to go through an authorization process in order to access individual-level data. Study documents, protocols and subject questionnaires are available without restriction. Authorized access data distributed to primary investigators for use in approved research projects includes de-identified phenotypes and genotypes for individual study subjects, pedigrees and some precomputed associations between genotype and phenotype. Database of Single Nucleotide Polymorphisms Database of Single Nucleotide Polymorphisms (dbSNP) (49), a repository for single-base nucleotide substitutions and short deletion and insertion polymorphisms, contains over 30 million human records and 40 million more from a variety of other organisms. In addition to archiving the sequence that defines the variant, dbSNP maintains information about the validation status, population-specific allele frequencies, PubMed citations and individual genotypes for clustered reference records (rs#). These data are available on the dbSNP FTP site (ftp://ftp.ncbi.nih.gov/ snp/organisms/) in XML-structured genotype and VCF reports that include information about cell lines, pedigree IDs, allele frequency and error flags for genotype inconsistencies and incompatibilities. In collaboration with Locus Specific Databases (LSDBs), dbSNP integrates information about rare genetic variants with clinical relevance. Two web submission forms were created to facilitate submission of LSDB/ Clinical variant information and support variant descriptions using the HGVS standards with a RefSeq standard sequence. Users can search and annotate existing variations or submit novel ones, either as a single variation (http://www.ncbi.nlm.nih.gov/projects/SNP/tranSNP/ tranSNP.cgi) or as a batch (http://www.ncbi.nlm.nih.gov/ projects/SNP/tranSNP/VarBatchSub.cgi). GeneReviews and GeneTests NCBI hosts GeneReviews and GeneTests, two resources developed by a team led by Roberta A. Pagon, MD at the University of Washington. GeneReviews (www.ncbi.nlm .nih.gov/bookshelf/br.fcgi?book=gene) is a compendium of continually updated, expert-authored and peerreviewed disease descriptions that relate genetic testing to the diagnosis, management and genetic counseling of patients and families with specific inherited conditions (50,51). These reviews can be searched via the GeneReviews tab at the GeneTests home page (www .ncbi.nlm.nih.gov/sites/GeneTests/), NCBI’s Bookshelf site, NCBI’s All Databases interface, or major web search engines. The GeneTests Laboratory Directory and Clinic Directory list information voluntarily provided by laboratories about their tests and by genetics clinics about their clinical genetics services. As appropriate, users can search by a disease name, gene symbol, protein name, clinical genetics service and information

Downloaded from http://nar.oxfordjournals.org/ by guest on March 20, 2015

GENETICS AND MEDICINE The Database of Genotypes and Phenotypes Within Entrez, the Database of Genotypes and Phenotypes (dbGaP) (48) (www.ncbi.nlm.nih.gov/gap/) archives, distributes and supports submission of data that correlate genomic characteristics with observable traits. This database is a designated NIH repository for NIH-funded genome wide association study (GWAS) results (grants.nih.gov/grants/gwas/index.htm). The dbGaP collection contains over 240 studies, 33% of

D48 Nucleic Acids Research, 2011, Vol. 39, Database issue

about a lab/clinic, such as its name, director and location. Clinics in the United States can also be found via a map-based search. Together, GeneReviews and the GeneTests directories support the integration of information on genetic disorders and genetic testing into a single resource to facilitate the care of patients and families with inherited conditions. OMIM NCBI provides as part of Entrez the online version of the Mendelian Inheritance in Man catalog of human genes and genetic disorders authored and edited by the late Victor A. McKusick and his staff at The Johns Hopkins University (29). The database contains information on disease phenotypes and genes, including extensive descriptions, gene names, inheritance patterns, map locations, gene polymorphisms and detailed bibliographies. Entrez OMIM contains over 21 000 entries, including data on over 13 100 established gene loci and phenotypic descriptions. Online Mendelian Inheritance in Animals Online Mendelian Inheritance in Animals (OMIA) is a database of genes, inherited disorders and traits in animal species other than human and mouse, and is authored by Professor Frank Nicholas of the University of Sydney, Australia and colleagues (52). The database holds 2600 records containing textual information and references, as well as links to relevant records from OMIM, PubMed and Entrez Gene. Cancer Chromosomes Cancer Chromosomes (53) contains data on human and mouse chromosomal aberrations, such as deletions and translocations, which are associated with cancer. Cancer Chromosomes consists of three databases: the NCI/NCBI SKY (Spectral Karyotyping)/M-FISH (Multiplex-FISH) and CGH (Comparative Genomic Hybridization) Database, the National Cancer Institute Mitelman Database of Chromosome Aberrations in Cancer (54) and the NCI Recurrent Chromosome Aberrations in Cancer database. Graphical schematics of each aberration in the SKY/M-FISH and CGH collections are available along with clinical case information and links to relevant literature. Cancer Chromosomes also provides similarity reports that list terms common to a group of records returned by a search, including similarities between CGH data and karyotypes. Database cluster for routine clinical applications: dbMHC, dbLRC and dbRBC dbMHC (www.ncbi.nlm.nih.gov/projects/gv/mhc/) focuses on the Major Histocompatibility Complex (MHC) and contains sequences and frequency distributions for alleles of the MHC, an array of genes that play a central role in the success of organ transplants and an individual’s susceptibility to infectious diseases. dbMHC also contains HLA genotype and clinical outcome information on hematopoietic cell transplants performed

worldwide. dbLRC offers a comprehensive collection of alleles of the leukocyte receptor complex with a focus on KIR genes. dbRBC represents data on genes and their sequences for red blood cell antigens or blood groups. It hosts the Blood Group Antigen Gene Mutation Database (55) and integrates it with resources at NCBI. dbRBC provides general information on individual genes and access to the ISBT allele nomenclature of blood group alleles. All three databases, dbMHC, dbLRC and dbRBC, provide multiple sequence alignments, analysis tools to interpret homozygous or heterozygous sequencing results (56) and tools for DNA probe alignments. DOMAINS AND STRUCTURES The Molecular Modeling Database The NCBI Molecular Modeling Database (MMDB) (57) contains experimentally determined coordinate sets from the Protein Data Bank (13), augmented with domain annotations and links to relevant literature, protein and nucleotide sequences, chemicals (PDB heterogens) and conserved domains in CDD (58). Compact structural domains within protein structures are stored in the 3D Domains database, and structural neighbors computed by the VAST algorithm (59,60) are available for structures containing these domains. Structure record summaries retrieved by text searches display thumbnail images of structures that link to interactive views of the data in Cn3D (61), the NCBI structure and alignment viewer. NCBI also provides precomputed BLAST results against the PDB database for all proteins in Entrez through the ‘Related Structures’ link. CDD and CDART The Conserved Domain Database (CDD) (58) contains over 37 000 PSI-BLAST-derived Position Specific Score Matrices representing domains taken from the Simple Modular Architecture Research Tool (Smart) (62), Pfam (63), TIGRFAM (64) and from domain alignments derived from COGs and Entrez Protein Clusters. In addition, CDD includes 3100 superfamily records, each of which contains a set of CDs from one or more source databases that generate overlapping annotation on the same protein sequences. The NCBI Conserved Domain Search (CD-Search) service locates conserved domains within a protein sequence, and these results are available for all proteins in Entrez through the ‘Identify Conserved Domains’ link in the upper right of a sequence record. Wherever possible, protein sequences with known 3D structures are included in CDD alignments, which can be viewed along with these structures and also edited within Cn3D. The Conserved Domain Architecture Retrieval Tool (CDART) allows searches of protein databases on the basis of a conserved domain and returns the domain architectures of database proteins containing the query domain. CD alignments can be viewed online, edited or created de novo using CDTree. CDTree uses PSI-BLAST to add new sequences to an existing CD alignment and provides an interface for exploring
Downloaded from http://nar.oxfordjournals.org/ by guest on March 20, 2015

Nucleic Acids Research, 2011, Vol. 39, Database issue

D49

phylogenetic trends in domain architecture and for building hierarchies of alignment-based protein domains. CHEMICALS AND BIOASSAYS PubChem (65) is the informatics backbone for the NIH Roadmap Initiative on molecular libraries and focuses on the chemical, structural and biological properties of small molecules, in particular their roles as diagnostic and therapeutic agents. A suite of three Entrez databases, PCSubstance, PCCompound and PCBioAssay, contain the structural and bioactivity data of the PubChem project. The databases hold records for 72 million substances containing 29 million unique structures. Nearly 1.8 million of these substances have bioactivity data in at least one of the 460 000 PubChem BioAssays. PubChem also provides a single, low-energy 3D conformer for about 90% of the records in the PubChem Compound database. A viewing application, PC3D, is available to view both individual conformers and overlays of similar conformers. The PubChem databases link not only to other Entrez databases such as PubMed and PubMed Central but also to Entrez Structure and Protein to provide a bridge between the macromolecules of genomics and the small organic molecules of cellular metabolism. The PubChem databases are searchable using text queries as well as structural queries based on chemical SMILES, formulas or chemical structures provided in a variety of formats. An online structure-drawing tool (pubchem.ncbi.nlm.nih.gov/search/search.cgi) provides a simple way to construct a structure-based search. FOR FURTHER INFORMATION The resources described here include documentation, other explanatory material and references to collaborators and data sources on the respective websites. The NCBI Help Manual and the NCBI Handbook, both available in the NCBI Bookshelf, describe the principal NCBI resources in detail. Several tutorials are also offered under with the Training and Tutorials category link on the left side of the NCBI home page. An alphabetical list of NCBI resources is available from a link in the upper left of the NCBI home page, and the About NCBI pages provide bioinformatics primers and other supplementary information. A user-support staff is available to answer questions at info@ncbi.nlm.nih.gov. Updates on NCBI resources and database enhancements are described in the NCBI News newsletter (www.ncbi.nlm.nih.gov/bookshelf/br .fcgi?book=newsncbi). In addition, NCBI supports several mailing lists that provide updates (www.ncbi.nlm .nih.gov/Sitemap/Summary/email_lists.html), as well as RSS feeds (www.ncbi.nlm.nih.gov/feed/). FUNDING Funding for open access charge: the Intramural Research Program of the National Institutes of Health, National Library of Medicine.

Conflict of interest statement. None declared.

REFERENCES
1. Benson,D.A., Karsch-Mizrachi,I., Lipman,D.J., Ostell,J. and Sayers,E.W. (2010) GenBank. Nucleic Acids Res., doi:10.1093/nar/ gkq1079. 2. Church,D.M., Lappalainen,I., Sneddon,T.P., Hinton,J., Maguire,M., Lopez,J., Garner,J., Paschall,J., Dicuccio,M., Yaschenko,E. et al. (2010) Public data archives for genomic structural variation. Nat. Genet., 42, 813–814. 3. Shoemaker,B.A., Zhang,D., Thangudu,R.R., Tyagi,M., Fong,J.H., Marchler-Bauer,A., Bryant,S.H., Madej,T. and Panchenko,A.R. (2010) Inferred Biomolecular Interaction Server–a web server to analyze and predict protein interacting partners and binding sites. Nucleic Acids Res., 38, D518–D524. 4. Cooper,P.S., Lipshultz,D., Matten,W.T., McGinnis,S.D., Pechous,S., Romiti,M.L., Tao,T., Valjavec-Gratian,M. and Sayers,E.W. (2010) Education resources of the National Center for Biotechnology Information. Brief Bioinform., doi:10.1093/bib/ bbq022. 5. Papadopoulos,J.S. and Agarwala,R. (2007) COBALT: constraint-based alignment tool for multiple protein sequences. Bioinformatics, 23, 1073–1079. 6. Schuler,G.D., Epstein,J.A., Ohkawa,H. and Kans,J.A. (1996) Entrez: molecular biology database and retrieval system. Methods Enzymol., 266, 141–162. 7. Sewell,W. (1964) Medical subject headings in Medlars. Bull. Med. Libr. Assoc., 52, 164–170. 8. Sequeira,E. (2003) PubMed Central - three years old and growing stronger. ARL, 228, 5–9. 9. Pruitt,K.D., Tatusova,T., Klimke,W. and Maglott,D.R. (2009) NCBI Reference Sequences: current status, policy and new initiatives. Nucleic Acids Res., 37, D32–D36. 10. Leinonen,R., Sugawara,H. and Shumway,M. (2011) The Sequence Read Archive. Nucleic Acids Res., 39, D19–D21. 11. Boeckmann,B., Bairoch,A., Apweiler,R., Blatter,M.C., Estreicher,A., Gasteiger,E., Martin,M.J., Michoud,K., O’Donovan,C., Phan,I. et al. (2003) The SWISS-PROT protein knowledgebase and its supplement TrEMBL in 2003. Nucleic Acids Res., 31, 365–370. 12. Wu,C.H., Yeh,L.S., Huang,H., Arminski,L., Castro-Alvear,J., Chen,Y., Hu,Z., Kourtesis,P., Ledley,R.S., Suzek,B.E. et al. (2003) The Protein Information Resource. Nucleic Acids Res., 31, 345–347. 13. Berman,H., Henrick,K., Nakamura,H. and Markley,J.L. (2007) The worldwide Protein Data Bank (wwPDB): ensuring a single, uniform archive of PDB data. Nucleic Acids Res., 35, D301–D303. 14. Klimke,W., Agarwala,R., Badretdin,A., Chetvernin,S., Ciufo,S., Fedorov,B., Kiryutin,B., O’Neill,K., Resch,W., Resenchuk,S. et al. (2009) The National Center for Biotechnology Information’s Protein Clusters Database. Nucleic Acids Res., 37, D216–D223. 15. Ji,L., Barrett,T., Ayanbule,O., Troup,D.B., Rudnev,D., Muertter,R.N., Tomashevsky,M., Soboleva,A. and Slotta,D.J. (2010) NCBI Peptidome: a new repository for mass spectrometry proteomics data. Nucleic Acids Res., 38, D731–D735. 16. Fu,W., Sanders-Beer,B.E., Katz,K.S., Maglott,D.R., Pruitt,K.D. and Ptak,R.G. (2009) Human immunodeficiency virus type 1, human protein interaction database at NCBI. Nucleic Acids Res., 37, D417–D422. 17. Altschul,S.F., Gish,W., Miller,W., Myers,E.W. and Lipman,D.J. (1990) Basic local alignment search tool. J. Mol. Biol., 215, 403–410. 18. Altschul,S.F., Madden,T.L., Schaffer,A.A., Zhang,J., Zhang,Z., Miller,W. and Lipman,D.J. (1997) Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res., 25, 3389–3402. 19. Ye,J., McGinnis,S. and Madden,T.L. (2006) BLAST: improvements for better sequence analysis. Nucleic Acids Res., 34, W6–W9.

Downloaded from http://nar.oxfordjournals.org/ by guest on March 20, 2015

D50 Nucleic Acids Research, 2011, Vol. 39, Database issue

20. Zhang,Z., Schwartz,S., Wagner,L. and Miller,W. (2000) A greedy algorithm for aligning DNA sequences. J. Comput. Biol., 7, 203–214. 21. Ma,B., Tromp,J. and Li,M. (2002) PatternHunter: faster and more sensitive homology search. Bioinformatics, 18, 440–445. 22. Rozen,S. and Skalestsky,H.J. (2000) In Krawetz,S. and Misener,S. (eds), Bioinformatics Methods and Protocols: Methods in Molecular Biology. Totowa, NJ, Humana Press, pp. 365–386. 23. Maglott,D., Ostell,J., Pruitt,K.D. and Tatusova,T. (2007) Entrez Gene: gene-centered information at NCBI. Nucleic Acids Res., 35, D26–D31. 24. Gulley,M.L., Braziel,R.M., Halling,K.C., Hsi,E.D., Kant,J.A., Nikiforova,M.N., Nowak,J.A., Ogino,S., Oliveira,A., Polesky,H.F. et al. (2007) Clinical laboratory reports in molecular pathology. Arch. Pathol. Lab. Med., 131, 852–863. 25. Barrett,T., Troup,D.B., Wilhite,S.E., Ledoux,P., Rudnev,D., Evangelista,C., Kim,I.F., Soboleva,A., Tomashevsky,M., Marshall,K.A. et al. (2009) NCBI GEO: archive for high-throughput functional genomic data. Nucleic Acids Res., 37, D885–D890. 26. Brazma,A., Hingamp,P., Quackenbush,J., Sherlock,G., Spellman,P., Stoeckert,C., Aach,J., Ansorge,W., Ball,C.A., Causton,H.C. et al. (2001) Minimum information about a microarray experiment (MIAME)-toward standards for microarray data. Nat. Genet., 29, 365–371. 27. Whetzel,P.L., Parkinson,H., Causton,H.C., Fan,L., Fostel,J., Fragoso,G., Game,L., Heiskanen,M., Morrison,N., Rocca-Serra,P. et al. (2006) The MGED Ontology: a resource for semantics-based description of microarray experiments. Bioinformatics, 22, 866–873. 28. Schuler,G.D. (1997) Pieces of the puzzle: expressed sequence tags and the catalog of human genes. J. Mol. Med., 75, 694–698. 29. Amberger,J., Bocchini,C.A., Scott,A.F. and Hamosh,A. (2009) McKusick’s Online Mendelian Inheritance in Man (OMIM). Nucleic Acids Res., 37, D793–D796. 30. Eppig,J.T., Blake,J.A., Bult,C.J., Kadin,J.A. and Richardson,J.E. (2007) The mouse genome database (MGD): new features facilitating a model system. Nucleic Acids Res., 35, D630–D637. 31. Sprague,J., Bayraktaroglu,L., Clements,D., Conlin,T., Fashena,D., Frazer,K., Haendel,M., Howe,D.G., Mani,P., Ramachandran,S. et al. (2006) The Zebrafish Information Network: the zebrafish model organism database. Nucleic Acids Res., 34, D581–D585. 32. Hong,E.L., Balakrishnan,R., Dong,Q., Christie,K.R., Park,J., Binkley,G., Costanzo,M.C., Dwight,S.S., Engel,S.R., Fisk,D.G. et al. (2008) Gene Ontology annotations at SGD: new data sources and annotation methods. Nucleic Acids Res., 36, D577–D581. 33. Tatusov,R.L., Fedorova,N.D., Jackson,J.D., Jacobs,A.R., Kiryutin,B., Koonin,E.V., Krylov,D.M., Mazumder,R., Mekhedov,S.L., Nikolskaya,A.N. et al. (2003) The COG database: an updated version includes eukaryotes. BMC Bioinformatics, 4, 41. 34. Crosby,M.A., Goodman,J.L., Strelets,V.B., Zhang,P. and Gelbart,W.M. (2007) FlyBase: genomes by the dozen. Nucleic Acids Res., 35, D486–D491. 35. Geschwind,D. (2004) GENSAT: a genomic resource for neuroscience research. Lancet Neurol., 3, 82. 36. Gong,S., Zheng,C., Doughty,M.L., Losos,K., Didkovsky,N., Schambra,U.B., Nowak,N.J., Joyner,A., Leblanc,G., Hatten,M.E. et al. (2003) A gene expression atlas of the central nervous system based on bacterial artificial chromosomes. Nature, 425, 917–925. 37. Heintz,N. (2004) Gene expression nervous system atlas (GENSAT). Nat. Neurosci., 7, 483. 38. Kanehisa,M., Araki,M., Goto,S., Hattori,M., Hirakawa,M., Itoh,M., Katayama,T., Kawashima,S., Okuda,S., Tokimatsu,T. et al. (2008) KEGG for linking genomes to life and the environment. Nucleic Acids Res., 36, D480–D484. 39. Kanehisa,M. and Goto,S. (2000) KEGG: kyoto encyclopedia of genes and genomes. Nucleic Acids Res., 28, 27–30. 40. Kanehisa,M., Goto,S., Hattori,M., Aoki-Kinoshita,K.F., Itoh,M., Kawashima,S., Katayama,T., Araki,M. and Hirakawa,M. (2006) From genomics to chemical genomics: new developments in KEGG. Nucleic Acids Res., 34, D354–D357.

41. Keseler,I.M., Bonavides-Martinez,C., Collado-Vides,J., GamaCastro,S., Gunsalus,R.P., Johnson,D.A., Krummenacker,M., Nolan,L.M., Paley,S., Paulsen,I.T. et al. (2009) EcoCyc: a comprehensive view of Escherichia coli biology. Nucleic Acids Res., 37, D464–D470. 42. Matthews,L., Gopinath,G., Gillespie,M., Caudy,M., Croft,D., de Bono,B., Garapati,P., Hemish,J., Hermjakob,H., Jassal,B. et al. (2009) Reactome knowledgebase of human biological pathways and processes. Nucleic Acids Res., 37, D619–D622. 43. Schaefer,C.F., Anthony,K., Krupa,S., Buchoff,J., Day,M., Hannay,T. and Buetow,K.H. (2009) PID: the Pathway Interaction Database. Nucleic Acids Res., 37, D674–D679. 44. Tatusova,T.A., Karsch-Mizrachi,I. and Ostell,J.A. (1999) Complete genomes in WWW Entrez: data representation and analysis. Bioinformatics, 15, 536–543. 45. Ghedin,E., Sengamalay,N.A., Shumway,M., Zaborsky,J., Feldblyum,T., Subbu,V., Spiro,D.J., Sitz,J., Koo,H., Bolotov,P. et al. (2005) Large-scale sequencing of human influenza reveals the dynamic nature of viral genome evolution. Nature, 437, 1162–1166. 46. Kapustin,Y., Souvorov,A., Tatusova,T. and Lipman,D. (2008) Splign: algorithms for computing spliced alignments with identification of paralogs. Biol. Direct, 3, 20. 47. Needleman,S.B. and Wunsch,C.D. (1970) A general method applicable to the search for similarities in the amino acid sequence of two proteins. J. Mol. Biol., 48, 443–453. 48. Manolio,T.A., Rodriguez,L.L., Brooks,L., Abecasis,G., Ballinger,D., Daly,M., Donnelly,P., Faraone,S.V., Frazer,K., Gabriel,S. et al. (2007) New models of collaboration in genome-wide association studies: the Genetic Association Information Network. Nat. Genet., 39, 1045–1051. 49. Sherry,S.T., Ward,M.H., Kholodov,M., Baker,J., Phan,L., Smigielski,E.M. and Sirotkin,K. (2001) dbSNP: the NCBI database of genetic variation. Nucleic Acids Res., 29, 308–311. 50. Pagon,R.A. (2006) GeneTests: an online genetic information resource for health care providers. J. Med. Libr. Assoc., 94, 343–348. 51. Waggoner,D.J. and Pagon,R.A. (2009) Internet resources in medical genetics. Curr. Protoc. Hum. Genet., Chapter 9, Unit 9 12. 52. Lenffer,J., Nicholas,F.W., Castle,K., Rao,A., Gregory,S., Poidinger,M., Mailman,M.D. and Ranganathan,S. (2006) OMIA (Online Mendelian Inheritance in Animals): an enhanced platform and integration into the Entrez search interface at NCBI. Nucleic Acids Res., 34, D599–D601. 53. Knutsen,T., Gobu,V., Knaus,R., Padilla-Nash,H., Augustus,M., Strausberg,R.L., Kirsch,I.R., Sirotkin,K. and Ried,T. (2005) The interactive online SKY/M-FISH & CGH database and the Entrez cancer chromosomes search database: linkage of chromosomal aberrations with the genome sequence. Genes Chromosomes Cancer, 44, 52–64. 54. Mitelman,F., Mertens,F. and Johansson,B. (1997) A breakpoint map of recurrent chromosomal rearrangements in human neoplasia. Nat. Genet., 15, 417–474. 55. Blumenfeld,O.O. and Patnaik,S.K. (2004) Allelic genes of blood group antigens: a source of human mutations and cSNPs documented in the Blood Group Antigen Gene Mutation Database. Hum. Mutat., 23, 8–16. 56. Helmberg,W., Dunivin,R. and Feolo,M. (2004) The sequencing-based typing tool of dbMHC: typing highly polymorphic gene sequences. Nucleic Acids Res., 32, W173–W175. 57. Wang,Y., Addess,K.J., Chen,J., Geer,L.Y., He,J., He,S., Lu,S., Madej,T., Marchler-Bauer,A., Thiessen,P.A. et al. (2007) MMDB: annotating protein sequences with Entrez’s 3D-structure database. Nucleic Acids Res., 35, D298–D300. 58. Marchler-Bauer,A., Anderson,J.B., Chitsaz,F., Derbyshire,M.K., DeWeese-Scott,C., Fong,J.H., Geer,L.Y., Geer,R.C., Gonzales,N.R., Gwadz,M. et al. (2009) CDD: specific functional annotation with the Conserved Domain Database. Nucleic Acids Res., 37, D205–D210. 59. Gibrat,J.F., Madej,T. and Bryant,S.H. (1996) Surprising similarities in structure comparison. Curr. Opin. Struct. Biol., 6, 377–385.

Downloaded from http://nar.oxfordjournals.org/ by guest on March 20, 2015

Nucleic Acids Research, 2011, Vol. 39, Database issue

D51

60. Madej,T., Gibrat,J.F. and Bryant,S.H. (1995) Threading a database of protein cores. Proteins, 23, 356–369. 61. Wang,Y., Geer,L.Y., Chappey,C., Kans,J.A. and Bryant,S.H. (2000) Cn3D: sequence and structure views for Entrez. Trends Biochem. Sci., 25, 300–302. 62. Letunic,I., Copley,R.R., Pils,B., Pinkert,S., Schultz,J. and Bork,P. (2006) SMART 5: domains in the context of genomes and networks. Nucleic Acids Res., 34, D257–D260. 63. Finn,R.D., Mistry,J., Schuster-Bockler,B., Griffiths-Jones,S., Hollich,V., Lassmann,T., Moxon,S., Marshall,M., Khanna,A.,

Durbin,R. et al. (2006) Pfam: clans, web tools and services. Nucleic Acids Res., 34, D247–D251. 64. Haft,D.H., Selengut,J.D. and White,O. (2003) The TIGRFAMs database of protein families. Nucleic Acids Res., 31, 371–373. 65. Wang,Y., Xiao,J., Suzek,T.O., Zhang,J., Wang,J. and Bryant,S.H. (2009) PubChem: a public information system for analyzing bioactivities of small molecules. Nucleic Acids Res., 37, W623–W633.

Downloaded from http://nar.oxfordjournals.org/ by guest on March 20, 2015

Similar Documents

Premium Essay

Information Systems Mgt

...[pic] [pic] Manipal University Dubai Campus Discipline: MBA SEM III, February 2013 – July 2013 Course code : INS 601 Course Title : Information Systems Management Faculty : M I Jawid Nazir Prerequisites : Nil Credits : 3 (Three) Course Description: This course is where the students learnt about the Information Technology fundamentals and types of Information systems in Business. This course depicts the role of information systems in management, including current professional practices and methodologies. Topics include the general systems theory, decision theory, organizational models, types and benefits of information systems, systems planning and development, and management and control of information systems. The students will also get a hands-on in the productivity software’s used in Organizations. Course Requirements: • Attendance: Regular Attendance is mandatory. Students have to maintain more than 80% attendance in order to be eligible to take Semester End examination. • Advance Preparation: The students should have to prepare significant portion of the text book and reading material prior to the beginning of the class. Failure to do so would affect the comprehension of the lecture and performance in the examination. • Project Work: The students will be consigned certain case assignments during the lecture hours of the term. They should submit their completed case analysis work before the end of the class. This would be later be graded and...

Words: 562 - Pages: 3

Free Essay

Information Flow Within an Organization

...Information Flow in an Organization , information is created for meaning, decision making and sharing of knowledge. Just like a river flowing information flows from one place to another, into every house, school or organization. The flow begins with the creation of the data at a terminal; this is the beginning of the information flow. From there the information flows down the pipeline through the network which is like a pipeline. Within this network of pipes you have the switching and routing of the information flow, like the valves used to push water from one location to another. From there the information flows to storage facilities like large storage facilities, these facilities in a data network are the servers, mainframes are used in conjunction with software to store, collate, and share the data just waiting to be accessed and shared. Once a person turns on the faucet or access the information the flow starts again, from the storage facility to another set of switches/routers or valves. This information is accessed like getting a drink of water. Information flow within an organization is an ever evolving process; it is circular in nature according to its activities. Chesapeake Energy’s information flow starts with the design of the information network or pipeline, network circuits include cell modems, T1 and fiber circuits. From here the routers and switches are put in place to send the information to the right storage facility or server. Software is created to determine...

Words: 729 - Pages: 3

Premium Essay

Communication Self Assessment

...Home Page »Business and Management Communications Self-Assessment In: Business and Management Communications Self-Assessment Communication style can be summed up as, the way one is perceived by others vs. how one perceives themselves and the way one interacts with others. The text “Interpersonal skills in Organizations” talks about how behaviour, personality and attitude are key factors in determining communication style. Before reading chapter 1 of the above text, I would have said that my communication skills could be heavily worked on. I am generally very shy and nervous when it comes to talking to people I don’t know and am worried about what others will think about my own thoughts and ideas. Although I am very open to others ideas and am able to see how one idea would work the same as another would, I tend to be afraid of how one might take my own interpretation. After reading the text and doing exercise 1-A and 1-B I realised that my self- evaluation of myself was not far off at all. Scoring moderately in “emotional stability” and “extroversion” shows that, although I am rather shy I am also able to work with others, even if I prefer to do solo work and am well rounded when it comes to my opinions and the options of others. My high ratings in “open to experience” and “agreeableness,” back up my theory that I am much more reserved, although I like to learn new things and look at situations from many perspectives. Having a more low or “flexible” score in “conscientiousness”...

Words: 376 - Pages: 2

Premium Essay

Check Point Information System Business Problem Dimensions

...are lack of training, difficulties of evaluating performance, compliance, work environment, lack of overall company support, indecisive and poor management. In order to have a successful business, I believe that finding solutions to these problems will improve the practice is important. What I have seen in the company I work for now is not everyone is vested in the tasks at hand so the attitudes of some people are poor. If the company can get involvement and interest in the Information System by all members of the company that will eliminate some of the problems and possible create ideas on how to financial fund a good system. 2. What is the difference between IT and information systems? Describe some functions of the information system. Information Technology or IT is the actually hardware and the software that is used for the information system that is utilized by a company. This includes everything from the computer to MS Office applications that will increase the productivity of staff. Information systems can be described as a link that brings people, business data, and computers together. Some of the functions that are noted in the text are the...

Words: 387 - Pages: 2

Free Essay

Communication Plan

...Monthly Status Meeting | -Will be face-to-face or conference call-Report the status to upper management-Well occur monthly | Monthly Reports Meeting | -Channel will be thru email, fax, or memos-Will go over reports such as; cost, issues, and progress | 2. Identify the potential barriers to effective communication and strategies for overcoming the barriers. Potential Communication Barriers | Strategies to Overcoming Barriers | Information Overload | -Listed above are a lot of meetings and employees will get overwhelmed with information that is important? - A solution for this is to have an employee take notes for each department and send them in an email for referencing. | Communication Apprehension | -Some employees may not be comfortable in a face-to-face or with written information. -A solution for this barrier would be to have a mixture of face-to-face, conference calls, and emails. This will allow for everyone to communicate their thoughts and ideas. | Filtering Information | -There will be employees that will filter information to fit the needs of the organization or team.-This will be minimized by reviewing data monthly and holding everyone accountable for their work.-Also keeping...

Words: 389 - Pages: 2

Premium Essay

Chip Positioning

...A Potato Chip Brand Positioning Exercise Frito-Lay, a division of PepsiCo, based in Dallas, Texas, plans to reposition its brand. You, as the marketing director, are responsible for such endeavour. Please present your plan. To facilitate your analysis, the results of the attitudinal survey based on an assumed representative sample of 30 kids has been stored in sheet Chip Preference.xls. These data consists of observations on the following four variables: Crunchy: Crunchiness perception (1-5 scale: 1=Low and 5=High) Salty: Saltiness perception (1-5 scale: 1=Low and 5=High Fun: Fun of eating perception (1-5 scale: 1=Low and 5=High Brand: Index of company brand (1=Brand 1, 2=Brand 2) Pref: Overall preference (1-5 scale: 1=Low and 5=High) a) First use the data to establish the relative importance and significance of each perceptual variable in explaining overall brand preference. Which two variables are the most important? (Explain). b) Develop a perceptual map by plotting the mean perceptions for both Brand 1 and Brand 2 on a two-dimensional map defined by the two independent variables found most important in your analysis in a) above). c) It is known that the Ideal point has average coordinates of 3 and 5 on the dimensions of Crunchiness and Fun of Eating respectively. Based on the ideal point perceptual values, which brand (Brand 1 or Brand 2) is closest to consumer Ideal perceptions? d) Based on your analysis, suggest how you would...

Words: 277 - Pages: 2

Premium Essay

Ups Information System

...Case summary: UPS has created its own information system with Delivery Information Acquisition Device (DIAD) and Web-based Post-Sales Order Management System (OMS) globally by using developed information technology. These special systems help the company to reduce the cost of transaction greatly. By building its efficient order information management system, UPS can make optimal routing strategy, place orders online, and track shipments to meet customer needs. These information systems guarantee the possibility of two-day delivery nationwide as well as lower warehousing and inventory costs for the company. Questions: 1. What are the inputs, processing and outputs of UPS’s package tracking system? Inputs: the inputs include package information, customer signatures, pickups, delivery and timecard information, and locations on each route. Processing: in the process of transactions, the data is transmitted to the information center and stored for retrieval. During the whole process, the data of shipped packages is available to be checked by drivers and tracked by customers. Outputs: mostly the same data as the inputs, including pickups, delivery times, locations of routes and package recipients. In addition, the outputs also include calculations of shipping rates to enable UPS customers to embed UPS functions, such as cost calculations, to their own websites. 2. What technologies are used by UPS? How are these technologies related to UPS’s business strategy? Technologies include...

Words: 494 - Pages: 2

Free Essay

3rai

...Protection Act controls how your personal information is used by organisations, business or the government. Everyone who is responsible for using data has to follow strict rules called ‘data protection principals’. They must make sure the information is: * used fairly and lawfully * used for limited, specifically stated purposes * used in a way that is adequate, relevant and not excessive * accurate * kept for no longer than is absolutely necessary * handled according to people’s data protection rights * kept safe and secure * not transferred outside the UK without adequate protection There is stronger legal protection for more sensitive information, such as: * ethnic background * political opinions * religious beliefs * health * sexual health * criminal records Source: https://www.gov.uk/data-protection/the-data-protection-act Freedom of Information Act 2000 The Freedom of Information Act gives you a wide-ranging right to see all kinds of information held by the government and public authorities. You can use the Act to find out about a problem affecting your local community and to check whether an authority is doing enough to deal with it; to see how effective a policy has been; to find out about the authorities spending; to check whether an authority is doing what it says and to learn more about reasonable decisions. Authorities will only be able to withhold information if an exemption in the Act allows them...

Words: 572 - Pages: 3

Premium Essay

Perception

...Perception The literal meaning of perception is ‘Perception is the organization, identification, and interpretation of sensory information in order to fabricate a mental representation’. The best personal encounter I had was between my newly appointed Manager & Team Lead. We used to take daily calls with our client for gathering the requirement for a new banking project. Their followed a systematic way of approach towards gathering, analyzing, and constantly discussing on the issues at hand and finally documenting and getting a written sign off on the requirement. This process was to be completed in a span of 3 months. Around the third month, the client started pushing and rushing with more requirements and there was less time to already accommodate the existing assignments at hand and on top more was coming in. As the manager was new to the project and he also wanted to establish himself, he compromised employees excessive workload by accepting and saying ‘Yes’ to whatever the client was demanding. He missed the fact that he can’t infer or perceive even without knowing what the employees had difficulties about. And secondly all this were falling into a process where the Quality of output was being compromised. The process was falling apart, then my Team Lead stepped in and had a discussion about this with the manger and made him realize that saying ‘yes’ to all what client is saying would further aggravate the issue. Accepting the requirement now and unable to cater...

Words: 366 - Pages: 2

Free Essay

Business

...and demerits. The investigator has to choose a particular method to collect the information. The choice to a large extent depends on the preliminaries to data collection some of the commonly used methods are discussed below. 1. Direct Personal observation: This is a very general method of collecting primary data. Here the investigator directly contacts the informants, solicits their cooperation and enumerates the data. The information are collected by direct personal interviews. The novelty of this method is its simplicity. It is neither difficult for the enumerator nor the informants. Because both are present at the spot of data collection. This method provides most accurate information as the investigator collects them personally. But as the investigator alone is involved in the process, his personal bias may influence the accuracy of the data. So it is necessary that the investigator should be honest, unbiased and experienced. In such cases the data collected may be fairly accurate. However, the method is quite costly and time-consuming. So the method should be used when the scope of enquiry is small. 2. Indirect Oral Interviews : This is an indirect method of collecting primary data. Here information are not collected directly from the source but by interviewing persons closely related with the problem. This method is applied to apprehend culprits in case of theft, murder etc. The informations relating to one's personal life...

Words: 1115 - Pages: 5

Premium Essay

Transforming Data Into Information

...into Information What is Data? What is information? Data is facts; numbers; statistics; readings from a device or machine. It depends on what the context is. Data is what is used to make up information. Information could be considered to be the same characteristics I just described as data. In the context of transforming data into information, you could assume data is needed to produce information. So information there for is the meaningful translation of a set of or clusters of data that’s produces an output of meaningful information. So data is a bunch of meaningless pieces of information that needs to be composed; analyzed; formed; and so forth to form a meaningful piece of information. Transforming Data Let’s pick a context such as computer programming. You need pieces of data to be structured and formed into something that will result in an output of something; a message, a graph, or a process, in which a machine can perform some sort of action. Well now we could say that information is used to make a product, make a computer produce something, or present statistical information. That would be the output of that data. The data would be numbers, words, or symbols. The information would be a message, a graph, or a process, in which a machine can perform some sort of action. Information Information could be looked at as data as well. Let’s say we need a chart showing the cost of a business expenses in relation to employee salaries. The data for showing the information is...

Words: 315 - Pages: 2

Premium Essay

Petrie's Electronic Week 3

...Petrie's Electronics Case, Chapter 5, Questions 1, 3, and 5. 1. What do you think are the sources of the information Jim and his team collected? How do you think they collected all of that information? Jim collected informations by having interviews inside the company with stakeholders. He also worked with the marketing department to get some information from loyal customers. Jim and his team gathered some information about the current system. 3. If you were looking for alternative approaches for Petrie’s customer loyalty program, where would you look for information? Where would you start? How would you know when you were done? An alternative approach could be researching many different sources. If it were me I would do my research through the internet and compare what I find to the current system used by the customers. I guess the obvious reason to know when you are done is because you can’t find or come up with any new information about the loyalty systems. 5. Why shouldn’t Petrie’s staff build their own unique system in-house? I think it would cost much more and will be much more time consuming. The better thing to do is use an outsource instead of building in-house, that way they are saving money and getting what they want a lot faster. Petrie's Electronics Case, Chapter 6, Questions 1 and 5 1. Are the DFDs in PE Figures 6-1 and 6-2 balanced? Show that they are, or are not. If they are not balanced, how can they be fixed? It looks like they are balanced...

Words: 338 - Pages: 2

Premium Essay

Storing Information

...updating the old data storage system with the new storage procedures that should be put in place in the laboratories of the new build. You need to justify why the funds from the budget should be given to implement the new data storage system. Grading Criteria * P4:Describe the procedure for storing scientific information in a laboratory information management system * M4:Explain the processes involved in storing information in a scientific workplace * D3: Discuss the advantages gained by keeping data and records on a laboratory management information system * Grading Criteria * P4:Describe the procedure for storing scientific information in a laboratory information management system * M4:Explain the processes involved in storing information in a scientific workplace * D3: Discuss the advantages gained by keeping data and records on a laboratory management information system * How Do I Do It? 1. For P4, learners must describe the procedures for storing scientific information in a laboratory information management system (LIMS). A prepared list of scientific data is provided below. Learners must decide which sets of information could be stored on a workplace record system. 2. For M4, learners must explain how scientific data and records are stored....

Words: 1219 - Pages: 5

Premium Essay

Meredith Knows Women - Case Study

...Chapter 4 Case Study Meredith: Thanks to Good Marketing Information, Meredith Knows Women 1. Meredith’s marketing information system really focuses on women. Their target market is women and it is obvious in the way they cater towards women. Some of their strengths include they cater to a woman’s progression throughout life, they have studied their customers so greatly that they have over 700 data points on each one, and they have even segmented each individuals interests in order to better serve them. Not only that, but they have so much data on their customers that they can now sell their information to other businesses. As print is a declining business, Meredith has also ventured into the online and television world and is making their presence known. They are not simply looking at the present but are also setting themselves up for the future. A weakness of Meredith is that preferences can change over time. They need to keep up with the changing interests of their customers and make sure their data points are not just a storage bin full of outdated information. Another weakness is that it is only providing to women through magazines and television shows. With all of the information they have gathered, they should be able to reach out to women in other forms such as clothing, exercise equipment, gardening tools, or even cookware. The information that they have accumulated could be used for another business venture. 2.Through impersonal...

Words: 330 - Pages: 2

Free Essay

Data

...Data & Information Define Data: Data is just raw facts and figures it does not have any meaning until it is processed into information turning it into something useful. DATA Information 01237444444 Telephone Number 1739 Pin Number A,C,D,B,A* Grades Achieved At GCSE Define Information: Information is data that has been processed in a way that is meaningful to a person who receives it. There is an equation for Information which is: INFORMATION= DATA + CONTEXT + MEANING DATA 14101066 Has no meaning or context. CONTEXT A British Date (D/M/YEAR) We now know it says 14th of October 1066. Unfortunately we don’t know it’s meaning so it’s still not information yet. MEANING The Battle Of Hastings We now know everything so it can now be defined as information. How Is Data Protected? You’re data is protected by a law called the Data Protection Act this controls how your personal information is used by organisations, businesses or the government. This means legally everyone responsible for using data has to follow strict rules called ‘data protection principles’ there are eight principles. How Your Data Is Protected Use strong an multiple passwords. Too many of us use simple passwords that are easy for hackers to guess. When we have complicated passwords, a simple “brute force attack”—an attack by a hacker using an automated tool that uses a combination of dictionary words and numbers to crack passwords using strong passwords doesn’t mean this can’t happen it just means...

Words: 904 - Pages: 4