Essentials of Genetics Chapter 18

the complete set of DNA in a single cell of an organism

the study of genomes’
-field of genetics that attempts to understand the content, organization, function, and evolution of genetic information contained in whole genomes

structural genomics
focuses on sequencing genomes and analyzing nucleotide sequences to identify genes and other important sequences such as gene-regulatory regions

whole genome-shotgun sequencing
whole genome-shotgun sequencing
restriction enzymes digest genomic DNA into contigs, which are then sequenced and aligned using bioinformatics to identify overlapping fragments based on sequence identity

1st computer-automated DNA sequencing instruments can process millions of base pairs per day. What type of sequencing was this designed for ?
High-throughout sequencing

Bioinformatics uses computer-based approaches to organize and analyze data. What are some of the most important applications of Bioinformatics?
Bioinformatics uses computer-based approaches to organize and analyze data. What are some of the most important applications of Bioinformatics?

whole genome shotgun sequencing enables
scientists to assemble sequence maps of entire genomes and is the most widely used strategy for sequencing

the largest and most important -available database of DNA sequences and maintained by the National Center for Biotechnology Information (NCBI
-shares and acquires databases in Europe and Japan
-Each sequence deposited in GenBank receives an accession number: is a unique identifier given to a DNA or protein sequence record to allow for tracking of different versions of that sequence record

What is Annotation and BLAST ?
-annotation is the process of identifying (genes regulatory sequences and functions)

-BLAST( Basic Local Alignment Search Tool) : software application used to compare a segment of DNA to sequences throughout the major databases

-BLAST searches calculates a similarity score—also called the identity value—determined by the sum of identical matches between aligned sequences divided by the total number of bases .

Hallmark Characteristics of Genes (Prok or EuK) can be searched on bioinformatics software. Give an of such characteristics example.
Hallmark Characteristics of Genes (Prok or EuK) can be searched on bioinformatics software. Give an of such characteristics example.
Regulatory sequences found upstream are marked by identifiable sequences such as promoters, enhancers, and silencers.

functional genomics
the study of gene functions, based on the resulting RNAs or proteins they encode, and considers the functions of other components of the genome, such as gene-regulatory elements
-can confirm or refute computational predictions about genome functions, and it also considers how how genes are expressed and the regulation of gene expression

Open reading frames (ORFs)
found in protein-coding genes, are sequences that are translated into the amino acid sequence of a protein
-begins with ATG
-Ends with TAA, TAG, or TGA

How can you study or predict gene/ protein function ?
-resulting RNA or possible proteins as well as regulatory elements
-sequence analysis: BLAST search for homologous sequence of newly sequenced DNA

Orthologs vs. Paralogs
Homologous genes from different species thought to have descended from a common ancestor are called orthologs. Homologous genes in the same species are called paralogs.

Gene sequence can be used to predict a polypeptide sequence then analyzed for protein domains and motifs to determine function. What are examples of protein domains and motifs?
-Protein domains: ion channels, membrane-spanning regions, binding, secretion, and export signals
– Motifs: helix-turn-helix, leucine zipper, or zinc finger motifs

Human Genome Project
-was a coordinated international effort to sequence and identify all the genes of the human genome
-The project began in 1990 under the direction of Dr. James Watson. – Dr. Francis Collins led the project under the coordination of the DOE and the National Center of Human Genome Research (NIH).
> 2% codes for proteins ( abt 20,000 protein-coding genes).

comparative genomics
answers questions about genetics and other aspects of biology through the analysis of genomes from different organisms
-research and practical applications
-study of gene and genome evolution and relationship between organisms and their environment
—Compares similarities and differences in gene content, function, and organization among genomes of different organisms

Celera Genomics
a privately funded human genome project, was to use whole-genome shotgun sequencing and computer-automated high-throughput DNA sequencers.

Set up to safeguard personal genome information from being used in discrimination.

Alternative splicing
-many genes are able to code for multiple proteins
– generation of different protein molecules from the same pre-mRNA by incorporation of a different set and order of exons into the mRNA product

single-nucleotide polymorphisms (SNPs)
-a variation in a single nucleotide pair in DNA usually detected during genomic analysis. Present in at least 1 percent of a population a SNP is useful as a genetic marker
-many associated with disease conditions (example: sickle-cell anemia and cystic fibrosis)

copy number variations (CNVs)
DNA segments larger than 1 kb that are repeated a variable number of times in the genome

What is the Human Genome Project’s most valuable contribution ?
-the identification of disease genes and the development of new treatment strategies.
– • Extensive maps have been developed for genes implicated in human disease conditions (ALS, Alzheimer’s, cataracts, deafness, several cancers, etc.

Areas of biological research having an “omics” connection are _______
> continually developing
– proteomics.
– metabolomics.
– glycomics
. – taxicogenomics.
– metagenomics. – pharmacogenomics.
– transcriptomics.
– Nutrigenomics (new field focusing on interactions between diet and genes.

Percentage of genes that the same regardless of racial or ethnic origins? where are the most genetic differences resulting from?
– Single-nucleotide polymorphisms (SNPS) & copy number variations (CNVs)

Stone-age genomics ? examples?
-uses small amounts of ancient DNA from bone and other tissue
-used to study the evolutionary relatedness of various extinct and present-day species
– Egyptian mummy
– Mosses
– Platypus
– Mammoths

Encyclopedia of DNA Elements (ENCODE)
was created with the aim of using both experimental approaches and bioinformatics to identify and analyze functional elements that regulate expression of human genes

Personal Genome Project (PGP)
-As of 2010 many personal genomes have been done.
-PGP sequences diploid genomes.

Human Microbiome Project
A $115 million, 5-year project to complete the genomes of 600-1000 microorganisms, bacteria, viruses, and yeast that live on and inside humans
– Has already revealed over 3.3 million gut microbe genes

Genome 10K
-Genome scientists and museum curators have proposed sequencing 10,000 vertebrate genome the Genome 10k plan
– Will provide insight into genome evolution and speciation

Comparative genomics
compares the genomes of different organisms in order to answer questions about genetics and other aspects of biology
– Incorporates the study of gene and genomic evolution
– Explores the relationship between organisms and the environment
– Studies differences and similarities between organisms and how differences contribute to phenotype, life cycles, etc

Prokaryotes and Eukaryotes
-Many prokaryotes genomes are already sequenced
-Bacteria have a single, circular chromosome w/ substantial variation
-Gene density is very high in prokaryotes
-Bacterial DNA contains operons

-eukaryotic genomes are similar different species, highly variable
– (gene density) varies from chromosome to chromosome
-introns: variation in genomes and in genes
-repetitive sequences : about half of the human genome is repetitive DNA

complete sequences of various organism show the number of genes humans share with other species is very high ranging from about _____ percent of the genes in yeast to _____ percent in mice and ______percent in chimpanzees
30, 80, 98

all the RNA molecules transcribed from a genome

Dog genome
-completed in 2005 and share 75 percent of our genes
-many genetic orders with humans
– over 400 single-gene disorders
-sex-chromosome aneuploides
-multifactorial diseases
-behavioral conditions

Chimpanzee Genome
-human & chimpanzee sequences differ by less than 2%
-speciation event that separated humans and chimps occurred less than 6.3 million years ago
-studies indicate that genome evolution, speciation, and gene expression are interconnected

Rhesus Monkey Genome
-is one of the most important model organisms in biomedical research
-central in our understanding of cardiovascular disease, aging, diabetes, cancer, depression, osteoporosis and many other aspects of human health
– their genome is suited for comparison to humans

Sea Urchin Genome
Sea urchins are shallow-water marine invertebrates

-Sequence alignment and homology searches demonstrate that the sea urchin contains many genes with important functions in humans.
– Have 23,500 genes, including representative genes for all major gene families
– Have genes involved in immunity
– Have nearly 1000 light-sensing genes

Neanderthal Genome
-A rough draft of the Neanderthal (Homo neanderthalensis) genome shows two-thirds of the genome.

A comparative genomic analysis will help identify areas in the genome where humans have undergone rapid evolution since diverging from Neanderthals.
– 99 percent identical
– 78 new protein-coding sequences since divergence

What is the estimated years that genomic studies suggest interbreeding took place between Neanderthals and modern humans?
-45,000 to 80,000 years ago
-The genome of non-African H. sapiens contains approximately 1-4 percent of sequences inherited from Neanderthals

-Multigene families
– Globin family gene
– share similar but not identical DNA sequences
– group of related multigene families
-one of the best-studied examples of gene family

Globin family gene
An ancestral gene encoding an oxygen transport protein was duplicated (800 mya), producing two sister genes. One of the genes evolved into the modern-day myoglobin gene and the other into globin genes encoding hemoglobin (Figure 18-9).
– Myoglobin (oxygen-carrying protein) is found in muscle. – Hemoglobin is made up of α- and β-globin (Figure 18-10).
– Arose due to gene duplication, nucleotide substitution, and chromosomal translocation

Metagenomics (environmental genomics )
is the use of whole-genome shotgun approaches to sequence genomes from entire communities of microbes in environmental samples of water, air, and soil.

The general method for metagenomics is to sequence genomes for all microbes in a given environment
-This will teach us more about millions of species of bacteria as well as viruses, particularly bacteriophages.
– This method has a great potential for identifying genes with novel functions, some of which may have valuable applications in medicine and biotechnology.

Transcriptome analysis vs. global analysis
Transcriptome analysis or global analysis of gene expression studies the expression of genes by a genome
– both qualitatively, by identifying which genes are expressed and which are not,
– and quantitatively, by measuring the varying levels of expression of different genes.

Transcriptome analysis provides insights into
-normal patterns of gene expression to understand how a cell or tissue type differentiates during development.
– how gene expression dictates and controls the physiology of differentiated cells.
– mechanisms of disease development that result from or cause gene-expression changes in cells

Dna Microarray analysis , gene chips
-enables researchers to analyze all of a sample’s expressed genes simultaneously
– Microarrays, also known as gene chips, consist of glass microscope slides onto which single-stranded DNA molecules are attached using a high-speed robotic arm called an arrayer
— A single microarray can have over 20,000 different spots of DNA

Proteomics , proteome
-is the identification, characterization, and quantitative analysis of all proteins
-encoded by the genome of a cell, tissue, or organism
– Can be used to reconcile differences between the number of genes in a genome and the number of different proteins produced
– Allows comparison of proteins in normal and diseased tissue

Protein Structure Initiative (PSI)
-is a 10-year project designed to analyze the three-dimensional structures of more than 4000 protein families.
-Proteins with potential therapeutic properties are top priority.

Mass spectrometry, Matrix-assisted laser desorption ionization
Mass spectrometry (MS) techniques -analyze ionized samples in gaseous form and measure the mass-to-charge (m/z) ratio of different ions in a sample.
– Matrix-assisted laser desorption ionization (MALDI) is used for proteomic analysis of tissue samples treated under different conditions
-MALDI produces a peptide “fingerprint” that is characteristic of the protein being analyze

Protein microarrays
are designed around the same basic concept as microarrays (gene chips) and are often constructed with antibodies that recognize and bind to different proteins.
– Used to examine protein-protein interactions, detect protein markers for disease diagnosis, and study biosensors to detect pathogenic microbes

Identification of Collagen in T. rex
Mass spectrometry analysis of bone -tissue from a T. rex skeleton estimated at over 68 million years old demonstrated that fossilization does not fully destroy all protein in well-preserved fossils.
– Results suggest that T. rex proteins samples contained collagen, a major matrix component of bone, ligaments, tendons, and skin.

Systems biology, interactome
Systems biology incorporates data from genomics, transcriptomics, proteomics, and other areas of biology as well as engineering applications to further elucidate components of interacting pathways and the interrelationships of molecules.
– Interactome describes the interacting components of a cell.

network map
-is a sketch showing the interacting proteins, genes, and other molecules.
– Helps modeling of intricate potential interactions of molecules involved in normal and disease processes
– a network map illustrating the complexity of interactions between genes involved in 22 different human diseases