R语言 WGCNA包 GOenrichmentAnalysis()函数中文帮助文档(中英文对照)

loveR · 发表于 2012-10-1 21:13:26

GOenrichmentAnalysis(WGCNA)
GOenrichmentAnalysis()所属R语言包：WGCNA

                                       Calculation of GO enrichment (experimental)
                                       GO富集计算（实验）

                                       译者：生物统计家园网机器人LoveR

描述----------Description----------

WARNING: This function should be considered experimental. The arguments and resulting values (in particular, the enrichment p-values) are not yet finalized and may change in the future. The function should only be used to get a quick and rough overview of GO enrichment in the modules in a data set; for a publication-quality analysis, please use an established tool.
警告：这个函数应该被认为是实验性的。的参数和结果值（特别是丰富的p值）尚未敲定，并可能在未来改变。的功能只可用于得到一个快速和粗糙的概述GO富集在模块中的数据集的出版质量分析，请使用一个既定的工具。

Using Bioconductor's annotation packages, this function calculates enrichments and returns terms with best enrichment values.
使用Bioconductor的注解包，这个函数计算富集和回报与最好的富集值。

用法----------Usage----------

GOenrichmentAnalysis(labels,
                  entrezCodes,
                  yeastORFs = NULL,
                  organism = "human",
                  ontologies = c("BP", "CC", "MF"),
                  evidence = c("IMP", "IGI", "IPI", "ISS", "IDA", "IEA", "TAS", "NAS", "ND", "IC"),
                  includeOffspring = TRUE,
                  backgroundType = "givenInGO",
                  removeDuplicates = TRUE,
                  leaveOutLabel = NULL,
                  nBestP = 10, pCut = NULL,
                  nBiggest = 0,
                  verbose = 2, indent = 0)

参数----------Arguments----------

参数：labels
cluster (module, group) labels of genes to be analyzed. Either a single vector, or a matrix. In the matrix case, each column will be analyzed separately; analyzing a collection of module assignments in one function call will be faster than calling the function several tinmes. For each row, the labels in all columns must correspond to the same gene specified in entrezCodes.
聚类（模块，组）标签的基因进行分析。无论是单一的向量，或矩阵。在矩阵的情况下，每列分别进行分析，分析的集合在一个函数调用的模块分配将快于几个tinmes调用该函数。对于每一行中的所有列，标签必须符合指定的entrezCodes相同的基因。

参数：entrezCodes
Entrez (a.k.a. LocusLink) codes of the genes whose labels are given in labels. A single vector; the i-th entry corresponds to row i of the matrix labels (or to the i-the entry if labels is a vector).
Entrez的（又名LOCUSLINK）编码的基因，其标签中给出了labels。一个单一的向量;对应的矩阵的第i行的第i个条目labels（或到第i的条目，如果labels是一个向量）。

参数：yeastORFs
if organism=="yeast" (below), this argument can be used to input yeast open reading frame (ORF) identifiers instead of Entrez codes. Since the GO mappings for yeast are provided in terms of ORF identifiers, this may lead to a more accurate GO enrichment analysis. If given, the argument entrezCodes is ignored.
organism=="yeast"如果（如下图），这种说法可以用来输入酵母的开放阅读框（ORF）的标识符，而不是Entrez的代码。由于GO映射酵母中所提供在ORF标识符方面，这可能会导致一个更准确的GO富集分析。如果给定的参数entrezCodes被忽略。

参数：organism
character string specifying the organism for which to perform the analysis. Recognized values are (unique abbreviations of) "human", "mouse", "rat", "malaria", "yeast", "fly", "bovine", "worm", "canine", "zebrafish", "chicken".
字符串指定的有机体进行分析。可识别的值是（）"human", "mouse", "rat", "malaria", "yeast", "fly", "bovine", "worm", "canine", "zebrafish", "chicken"唯一的缩写。

参数：ontologies
vector of character strings specifying GO ontologies to be included in the analysis.  Can be any subset of "BP", "CC", "MF". The result will contain the terms with highest enrichment in each specified category, plus a separate list of terms with best enrichment in all ontologies combined.
矢量GO本体的字符串指定要包含在分析中。可以子集"BP", "CC", "MF"。结果将包含在每个指定类别的最高富集，再加上一个单独的列表与本体结合的最好的富集。

参数：evidence
vector of character strings specifying admissible evidence for each gene in its specific term. GO uses the following codes: IMP: inferred from mutant phenotype; IGI: inferred from genetic interaction; IPI: inferred from physical interaction; ISS: inferred from sequence similarity; IDA: inferred from direct assay; IEP: inferred from expression pattern; IEA: inferred from electronic annotation; TAS: traceable author statement;  NAS: non-traceable author statement; ND: no biological data available; IC: inferred by curator. The default is to use all evidence types.
向量的字符串指定其特定任期中每个基因的可接纳的证据。 GO使用下面的代码：IMP：从突变体的表型推断; IGI：从遗传相互作用推断; IPI：从物理相互作用推断ISS：从序列相似性推断; IDA：从直接分析推断; IEP：从表达模式; IEA：推断出电子注释，TAS：可追溯的作者声明，NAS：非可追溯的作者声明，ND：没有可用的生物数据，IC：由馆长推断。默认值是使用所有的证据类型。

参数：includeOffspring
logical: should genes belonging to the offspring of each term be included in the term? As a default, only genes belonging directly to each term are associated with the term. Note that the calculation of enrichments with offspring included can be quite slow for large data sets.
逻辑：基因的后代，每学期应包括在术语？默认情况下，只有直接属于每个术语的基因与这个词相关联。请注意，计算富集的后代，包括为大型数据集可以是相当缓慢的。

参数：backgroundType
specification of the background to use. Recognized values are (unique abbreviations of) "allGiven", "allInGO", "givenInGO", meaning that the functions will take all genes given in labels as backround ("allGiven"), all genes present in any of the GO categories ("allInGO"), or the intersection of given genes and genes present in GO ("givenInGO"). The default is recommended for genome-wide enrichment studies.
背景使用的规范。可识别的值是（）"allGiven", "allInGO", "givenInGO"唯一的缩写，这意味着基因的功能，将采取一切labels（"allGiven"）作为底色，所有的基因存在于任何的GO类别（ "allInGO"），或存在于给定的基因和基因的交点GO（"givenInGO"）。默认情况下，建议全基因组富集研究。

参数：removeDuplicates
logical: should duplicate entries in entrezCodes be removed? If TRUE, only the first occurence of each unique Entrez code will be kept. The cluster labels labels will be adjusted accordingly.
逻辑：应在entrezCodes被删除重复项？如果TRUE，只有第一次出现的每一个独特的Entrez的代码将被保存。聚类标签labels将作相应调整。

参数：leaveOutLabel
optional specifications of module labels for which enrichment calculation is not desired. Can be a single label or a vector of labels to be ignored. However, if in any of the sets no labels are left to calculate enrichment of, the function will stop with an error.
规格可选模块富集计算不希望的标签。可以是一个单独的标签或标签的向量被忽略。但是，如果在任何套没有标签被留下来计算富集，该函数将停止与错误。

参数：nBestP
specifies the number of terms with highest enrichment whose detailed information will be returned.
指定的最高富集的详细信息将返回的数量。

参数：pCut
alternative specification of terms to be returned: all terms whose enrichment p-value is more significant than pCut will be returned. If pCut is given, nBestP is ignored.
替代规范的的条款要返回其丰富的p值是显着比pCut将返回所有条款。如果pCut，nBestP被忽略。

参数：nBiggest
in addition to returning terms with highest enrichment, terms that contain most of the genes in each cluster can be returned by specifying the number of biggest terms per cluster to be returned. This may be useful for development and testing purposes.
除了返回最丰富，包含每个簇中的基因可以返回指定要返回的最大每聚类的数量条款。这可能是有用的用于开发和测试目的。

参数：verbose
integer specifying the verbosity of the function. Zero means silent, positive values will cause the function to print progress reports.
整数，指定冗长的功能。零表示沉默，正面的价值观会导致功能打印进度报告。

参数：indent
integer specifying indentation of the diagnostic messages. Zero means no indentation, each unit adds two spaces.
整数，指定缩进的诊断消息。零表示无压痕，每个单元增加两个空格。

Details

详细信息----------Details----------

This function is basically a wrapper for the annotation packages available from Bioconductor. It requires the packages GO.db, AnnotationDbi, and org.xx.eg.db, where xx is the code corresponding to the organism that the user wishes to analyze (e.g., Hs for human Homo Sapiens, Mm for mouse Mus Musculus etc). For each cluster specified in the input, the function calculates all enrichments in the specified ontologies, and collects information about the terms with highest enrichment. The enrichment p-value is calculated using Fisher exact test. As background we use all of the supplied genes that are present in at least one term in GO (in any of the ontologies).
此功能是基本的注释可从Bioconductor包的包装。它需要的软件包GO.db，AnnotationDbi，org.xx.eg.db，其中xx是相应的代码到用户希望分析的有机体（例如，海关对人类智人，MM鼠标小家鼠等）。对于每一个聚类中指定的输入，函数计算富集在指定的本体，收集信息最丰富的有关条款。使用Fisher精确检验，p值计算的富集。作为背景，我们所提供的基因存在至少一个学期的本体GO（）。

For best results, the newest annotation libraries should be used. Because of the way Bioconductor is set up, to get the newest annotation libraries you may have to use the current version of R.
为了获得最佳结果，应使用最新的注解库。由于Bioconductor设置，以获得最新的注释库，您可能需要使用当前版本的R.

值----------Value----------

A list with the following components:
以下组件列表：

参数：keptForAnalysis
logical vector with one entry per given gene. TRUE if the entry was used for enrichment analysis. Depending on the setting of removeDuplicates above, only a single entry per gene may be used.
逻辑向量与特定基因的每一个条目。 TRUE如果是用于富集分析。根据removeDuplicates以上，只有一个条目，也可以使用每个基因的设置。

参数：inGO
logical vector with one entry per given gene. TRUE if the gene belongs to any GO term, FALSE otherwise. Also FALSE for genes not used for the analysis because of duplication.
逻辑向量与特定基因的每一个条目。 TRUE，如果该基因属于任何GO术语，FALSE否则。另外FALSE的基因不使用，因为重复的分析。

If input labels contained only one vector of labels, the following components:
如果输入labels的标签只包含一个矢量，以下组件：

参数：countsInTerms
a matrix whose rows correspond to given cluster, and whose columns correspond to GO terms, contaning number of genes in the intersection of the corresponding module and GO term. Row and column names are set appropriately.
的矩阵的行对应于给定的聚类，并且其列对应GO术语，在对应的模块的交点浸渗的基因数目和GO术语。设置适当的行和列名。

参数：enrichmentP
a matrix whose rows correspond to given cluster, and whose columns correspond to  GO terms, contaning enrichment p-values of each term in each cluster. Row and column names are set appropriately.
矩阵的行对应于给定的聚类，它的列对应的GO术语，浸渗丰富的p值每学期在每个聚类。设置适当的行和列名。

参数：bestPTerms
a list of lists with each inner list corresponding to an ontology given in ontologies in input, plus one component corresponding to all given ontologies combined.  The name of each component is set appropriately. Each inner list contains two components:  enrichment is a data frame containing the highest enriched terms for each module; and forModule is a list of lists with one inner list per module, appropriately named. Each inner list contains one component per term. This component is yet another list and contains components termName (term name),  enrichmentP (enrichment P value), termDefinition (GO term definition),  termOntology (GO term ontology), geneCodes (Entrez codes of module genes in this term), genePositions (indices of the genes listed in geneCodes within the given labels).  Thus, to obtain information on say the second term of the 5th module in ontology BP,  one can look at the appropriate row of bestPTerms$BP$enrichment, or one can reference bestPTerms$BP$forModule[[5]][[2]]. The author of the function apologizes for any confusion this structure of the output may cause.
的列表，列出了相应的本体论中给出的每一个内部列表ontologies的输入，再加上相应的所有本体结合的一个组成部分。每个组件的名称设置正确。每一个内部列表包含两个组件：enrichment是一个数据框，每个模块包含最丰富条款和forModule是一个列表的列表，每个模块有一个内部列表，适当命名的。每学期每一个内部列表中包含的一个组成部分。这部分是又一个列表，其中包含的组件termName（任期名），enrichmentP（浓缩P值），termDefinition（GO术语定义），termOntology（GO术语本体），geneCodes（Entrez的代码模块基因这个词），genePositions（索引中列出geneCodes的基因，在给定的labels）。因此，为了得到信息说，第二届第五模块本体BP，可以在适当的行bestPTerms$BP$enrichment，或可以参考bestPTerms$BP$forModule[[5]][[2]]。作者的功能，这种结构的输出可能会导致任何混乱表示歉意。

参数：biggestTerms
a list of the same format as bestPTerms, containing information about the terms with most genes in the module for each supplied ontology.
相同的格式列表bestPTerms，含有有关下列内容的信息与在该模块中的每个提供的本体的大多数基因的条款。

If input labels contained more than one vector, instead of the above components the return value contains a list named setResults that has one component per given set; each component is a list containing the above components for the corresponding set.
如果输入labels包含一个以上的向量，而不是上述组分返回值包含一个列表名为setResults具有给定的每一个组成部分，每个组件是一个含有上述成分的对应的列表设置。

（作者）----------Author(s)----------

Peter Langfelder

参见----------See Also----------

Bioconductor's annotation packages such as GO.db and organism-specific annotation
Bioconductor如GO.db和生物体特定注解的注解包

转载请注明:出自生物统计家园网(http://www.biostatistic.net)。

注：
注1：为了方便大家学习，本文档为生物统计家园网机器人LoveR翻译而成，仅供个人R语言学习参考使用，生物统计家园保留版权。
注2：由于是机器人自动翻译，难免有不准确之处，使用时仔细对照中、英文内容进行反复理解，可以帮助R语言的学习。
注3：如遇到不准确之处，请在本贴的后面进行回帖，我们会逐渐进行修订。

账号		自动登录	找回密码
密码			注册

R语言 WGCNA包 GOenrichmentAnalysis()函数中文帮助文档(中英文对照)

浏览过的版块