Bioinformatics Final – Flashcards
Unlock all answers in this set
Unlock answersquestion
| Three major DNA databases |
answer
| EMBL GenBank DDBJ |
question
| Flat-file database |
answer
| Simplest form of a database. Information such as nucleotide or aa sequences are stored as either a large single text file or a collection of different text files |
question
| Accession number |
answer
| Label used to identify a sequence. Ex: X102275 GenBank Genomic DNA sequence DNA |
question
| FastA |
answer
| Simple sequence format used in flat-file databases Ex: Header line for DNA sequence |
question
| PDB |
answer
| File format for 3D structures like proteins |
question
| Structured Query Language |
answer
| Computer language used with relational databases |
question
| INDEL |
answer
| Insertion or deletion mutations |
question
| Block |
answer
| Highly conserved local regions of DNA that are used in BLOSUM substitution matrices |
question
| Multiple Sequence Alignment |
answer
| Collection of three or more sequences that are partially or completely aligned. Residues are inferred to be homologous |
question
| Feng-Doolittle |
answer
| Method of constructing MSAs |
question
| BLAST steps |
answer
| 1:Compile a list of words 2:Scan the database for entries that match the compiled list 3:When a hit on a word pair is found, the hit is extended in either direction until the score drops below a certain cutoff |
question
| Psi-BLAST |
answer
| Position-specific iterated BLAST that iteratively searches a protein sequence database, using the matches in round I to construct a PSSM for searching the database |
question
| Delta-BLAST |
answer
| Searches a database of pre-constructed PSSMs before searching a protein database to yield better homology detection. |
question
| HMM |
answer
| Hidden Markov Model |
question
| Pfam |
answer
| Database with a large collection of protein families, each represented by multiple sequence alignments (MSAs) and Hidden Markov Models (HMMs) |
question
| Profile Hidden Markov Model |
answer
| Can represent a sequence alignment profile similar to how a PSSM (position-specific scoring matrix) does. A profile HMM includes information on amino acid consensus at each position in the alignment like a PSSM. A profile HMM also has position-specific scores for gap insertions and deletions |
question
| Things needed to build an HMM |
answer
| Need to determine two things 1: structure/topology of the HMM-states and transitions. 2: The values of the parameters-emission and transition probablities |
question
| How to build an HMM |
answer
| 1: Pick HMM structure/topology 2: Estimate initial parameters 3: Train the HMM by running sequences through it 4: Transitions that get used are given higher probabilities, those rarely used are given lower probabilities |
question
| Databases that use HMMs |
answer
| Pfam & SMART |
question
| Unrooted tree |
answer
| Fully resolved phylogenetic tree with each node connecting ancestors and descendants, but direction of evolution (which ancestor evolved from which) is undetermined |
question
| Rooted tree |
answer
| Phylogenetic tree in which one species is designated as the "root", the last common ancestor of all species below it |
question
| Internal nodes |
answer
| Represent hypothetical ancestors of taxa |
question
| Terminal nodes |
answer
| Represent the taxa (genes, proteins, species) used to infer the phylogeny |
question
| Cladogram |
answer
| Branch lengths have no meaning |
question
| Additive tree |
answer
| Branch lengths are a measure of evolutionary divergence |
question
| Ultrametric tree |
answer
| Branch lengths are a measure of evolutionary divergence Same constant rate of mutation assumed along all branches |
question
| Ortholog |
answer
| Genes in different species that evolved from a common ancestral gene. Possess the same function |
question
| Paralog |
answer
| Genes in the same species that evolved from a common ancestral gene and created by gene duplication. Develop different functions, though often related to old funtions |
question
| What can be learned from character analysis using phylogenies? |
answer
| When did specific episodes of positive Darwinian selection occur during evolutionary history Which genetic changes are unique to the human lineage What was the most likely geographical location of the common ancestor of the African apes and humans? |
question
| Bootstrap Procedure |
answer
| Assigns values to individual branches that indicate the percentage occurrence |
question
| Consensus tree |
answer
| Shows only features that are consistent between multiple possible trees |
question
| P-distance |
answer
| This distance is the proportion (p) of nucleotide sites at which two sequences being compared are different. It is obtained by dividing the number of nucleotide differences by the total number of nucleotides compared. |
question
| Transition |
answer
| Changing purine to purine, or pyrimidine to pyrimidine More common than transversion |
question
| Transversion |
answer
| Changing purine to pyrimidine, or pyrimidine to purine Less common that transition |
question
| Positive selection |
answer
| Greater # of non-synonymous mutations observed than expected, indicates that mutations are more likely to be retained |
question
| Negative selection |
answer
| Smaller # of non-synonymous mutations observed than expected, indicates that mutations are being selected against and the sequence is conserved |
question
| COGs |
answer
| Clusters of Orthologous Genes Used to find paralogs and homologs. All genes in a species genome are compared against each other and against all genes in another species. If a gene's best-scoring BLAST hit (BeT) is within the genome, they are paralogs. If they BeT is between species, the genes are homologs. |
question
| DSSP |
answer
| Method for the assignment of secondary structure in a protein, uses hydrogen bond patterns |
question
| STRIDE |
answer
| Method for the assignment of secondary structure in a protein, uses both hydrogen bond energy and backbone dihedral angles |
question
| DEFINE |
answer
| Method for the assignment of secondary structure in a protein, matches the interatomic distances within the protein to those from idealized secondary structures. |
question
| 1st method of protein attachment to membrane |
answer
| Attachment due to ionic interactions between protein and cytosolic face of the lipid bilayer |
question
| 2nd method of protein attachment to membrane |
answer
| Attachment via an anchor such as a lipid. Added to the protein post-translationally, meaning that these types of proteins have no specialized structural or sequence features that can be identified |
question
| 3rd method of protein attachment to membrane |
answer
| Bitopic membrane protein, in which the protein chain crosses the membrane exactly once |
question
| 4th method of protein attachment to membrane |
answer
| Polytopic membrane protein, in which the protein chain threads back and forth across the membrane multiple times. |
question
| X-ray crystallography |
answer
| Used to determine most protein structures, requires crystals with a high protein concentration |
question
| NMR |
answer
| Used to determine some protein structures. Limited to smaller proteins |
question
| Threading method |
answer
| Method of predicting protein structure by using a library of folds and comparing the energies of different folds for the target sequence. These folds are then scored and the best-scoring ones are used in the model. |
question
| Homology Method |
answer
| Based on the assumption that homologous proteins have similar structures. Uses structure of known homologue to model target protein. More closely related sequences give better models. |
question
| What do structurally reliable alignments depend on? |
answer
| Sequence identity and alignment length |
question
| SCR |
answer
| Structurally Conserved Region |
question
| Swiss Model |
answer
| Automated protein structure homology-modeling server, used to model protein structures. |
question
| Pearson Correlation Coefficient |
answer
| Simple and fundamental method used to cluster microarray data. |
question
| SOM |
answer
| Self-organizing map. |
question
| 2D gel |
answer
| Separates proteins based on both pH and size. |
question
| BIND |
answer
| Database of components and interactions, where each interaction includes information on cellular location, experimental conditions, conserved sequence, molecular location of interaction, and so on. |
question
| KEGG Pathway |
answer
| Draft metabolic reconstructions |
question
| Steps in making KEGG Pathway |
answer
| 1.Draft reconstruction of metabolic network 2.Curate the reconstruction (add and correct information) 3.Convert to a computable metabolic model. |