当前位置:首页 > 生物信息期末考试重要文件
《生物信息学》课程复习思考题
一、名词解
生物信息学bioinformatics Dotplot算法
分子钟molecular clock
隐马尔科夫模型hidden Markov model, HMM Gene Ontology, GO molecular phylogenetic tree 序列比对sequence alignment 空位罚分
线性空位罚分 constant gap penalty 多序列比对 关系数据库
Dayhoff突变数据矩阵
BLOSUM矩阵blocks substitution matrix
蛋白质结构分类数据库SCOP(structural classification of proteins) CATH蛋白质结构分类数据库 系统发育树 物种树 基因树 有根数、无根树 最大似然法
同源建模蛋白质结构预测 蛋白质结构从头预测法 蛋白质折叠 FASTA-ALL NCBI EBI GenBank Entrez SRS系统
同源性homology、同一性identity、形似性similarity neutral theory of molecular evolution 最小二乘法
1
《生物信息学》课程复习思考题
neighbor-joinning method maximum parsimony 基因组注释 基因组学 蛋白质组学 PDB MEGA软件 PHYLIP软件
动态规划算法 dynamic programming algorithm Smith-Waterman algorithm Needleman-Wunsch算法
BLAST,BLASTn, BLASTp
复习思考题
1. 什么是生物信息学?其主要应用有哪些?
2. 简述生物信息学发展史上重大的标志性成果?
3. 有人说生物将是下一场技术革命的热土,你认为生物信息学将对生物产业化有哪些方面的贡献?
4.什么是生物学数据库?请举例说明。
5. 一级数据库与二级数据库的区别是什么,请举例说明? 6. Entrez的检索途径有哪些?
7.为什么要进行序列比对?以核酸双序列比对为例简述序列比对的基本原理。
8. 假设两条序列:catgt和acgctg。利用动态规划方法来进行序列全局比对分析(完成比对矩阵,并找到最佳比对。记分方法:匹配得分为2,失配得分为-1,空位罚分为-1。)。 0 1 a 2 c 3 g 4 c 5 t 6 g 0 0 -1 -2 -3 -4 -5 -6 1 c -1 2 a -2 3 t -3 4 g -4 5 t -5 9. 假设两条序列:CACGA和CGA。利用Smith-Waterman算法来进行序列比对分析(建立比对矩阵,并找到最佳比对。记分方法:匹配得分为1,失配得分为0,空位罚分为-1。)。
2
《生物信息学》课程复习思考题
10. 假设两条序列:CACGA和CGA。利用S. Needleman与C. Wunsch动态规划方法来进行序列比对分析(建立比对矩阵,并找到最佳比对。记分方法:匹配得分为1,失配得分为0,空位罚分为-1。)。
11. 简述蛋白质二级结构预测流程。
12. 简述蛋白质三级结构同源建模预测流程。
13.为什么要进行蛋白质结构比对?简述蛋白质结构比对的基本原理。 14.为什么说蛋白质高级结构是由一级结构决定的?
15. 简述蛋白质编码基因预测流程。 16.简述基因组注释的基本流。
17. 如何从头预测真核生物蛋白质编码基因?
18.简述利用邻近法构建系统发育树的基本思想。 19.简述UPGMA法构建系统发育树的基本思想。 20. 简叙最大简约构建系统发育树的基本思想。
20.设有4段序列,分别为:A:TAGG; B:TACG; C:AAGC; D:AGCC。利用UPGMA方法构建系统发育树。
21.设有4段序列,分别为:A:TAGG; B:TACG; C:AAGC; D:AGCC。利用邻近法构建系统发育树。
22. 什么是中性学说?中性学说对分子进化有什么影响?
23. 你认为生物信息学学习需要掌握哪些基本的计算机基础?生物学基础?数学基础? 24.什么是分子钟假说?
25.简述构建系统发育树的步骤。
三、文献阅读 1.
Welcome to the UCSC Genome Browser website. This site contains the reference sequence and working draft assemblies for a large collection of genomes. It also provides portals to the ENCODE and Neandertal projects.
We encourage you to explore these sequences with our tools. The Genome Browser zooms and scrolls over chromosomes, showing the work of annotators worldwide. The Gene Sorter shows expression, homology and other information on groups of genes that can be related in many ways. Blat quickly maps your sequence to the genome. The Table Browser provides convenient access to the underlying database. VisiGene lets you browse through a large collection of in situ mouse and frog images to examine expression patterns. Genome Graphs allows you to upload and display genome-wide data sets. 2
WebLogo is a web based application designed to make the generation of sequence logos as easy and painless as possible. Click here to create your own sequence logos.
3
《生物信息学》课程复习思考题
Sequence logos are a graphical representation of an amino acid or nucleic acid multiple sequence alignment developed by Tom Schneider and Mike Stephens. Each logo consists of stacks of
symbols, one stack for each position in the sequence. The overall height of the stack indicates the sequence conservation at that position, while the height of symbols within the stack indicates the relative frequency of each amino or nucleic acid at that position. In general, a sequence logo provides a richer and more precise description of, for example, a binding site, than would a consensus sequence.
3.
The National Center for Biotechnology Information (NCBI) is one of the world's premier Web sites for biomedical and bioinformatics research. Based within the National Library of Medicine at the National Institutes of Health, USA, the NCBI hosts many databases used by biomedical and research professionals. The services include PubMed, the bibliographic database; GenBank, the nucleotide sequence database; and the BLASTalgorithm for sequence comparison, among many others. 4.
KEGG is a database resource for understanding high-level functions and utilities of the biological system, such as the cell, the organism and the ecosystem, from genomic and molecular-level information. It is a computer representation of the biological system, consisting of molecular building blocks of genes and proteins (genomic information) and chemical substances (chemical information) that are integrated with the knowledge on molecular wiring diagrams of interaction, reaction and relation networks (systems information).
The KEGG website at www.kegg.jp has become the primary site of the KEGG database developed by Kanehisa Laboratories. The GenomeNet website at www.genome.jp operated by Kyoto University Bioinformatics Center will continue to mirror the KEGG database and provide additional KEGG-based analysis services.
5.
The GenBank sequence database is an annotated collection of all publicly available nucleotide sequences and their protein translations. This database is produced at National Center for Biotechnology Information (NCBI) as part of an international collaboration with the European Molecular Biology Laboratory (EMBL) Data Library from the European Bioinformatics Institute (EBI) and the DNA Data Bank of Japan (DDBJ). GenBank and its collaborators receive sequences produced in laboratories throughout the world from more than 100,000 distinct organisms. GenBank continues to grow at an exponential rate, doubling every 10 months. Release 134, produced in February 2003, contained over 29.3 billion nucleotide bases in more than 23.0 million sequences. GenBank is built by direct submissions from individual laboratories, as well as from bulk submissions from large-scale sequencing centers. 6.
PubMed is a database developed by the National Center for Biotechnology Information (NCBI) at
4
共分享92篇相关文档