找回密码
 注册
查看: 401|回复: 0

R语言 tm包 weightSMART()函数中文帮助文档(中英文对照)

[复制链接]
发表于 2012-10-1 10:56:17 | 显示全部楼层 |阅读模式
weightSMART(tm)
weightSMART()所属R语言包:tm

                                        SMART Weightings
                                         SMART比重

                                         译者:生物统计家园网 机器人LoveR

描述----------Description----------

Weight a term-document matrix according to a combination of weights specified in SMART notation.
重量一个术语文档矩阵根据指定在SMART符号的组合的权重。


用法----------Usage----------


weightSMART(m, spec = "nnn", control = list())



参数----------Arguments----------

参数:m
A TermDocumentMatrix in term frequency format.
ATermDocumentMatrix在术语频率格式。


参数:spec
a character string consisting of three characters. The first letter specifies a term frequency schema, the second a document frequency schema, and the third a normalization schema. See Details for available built-in schemata.
一个由三个字符组成的字符串。的第一个字母指定任期频率的模式,第二个文档频率架构,以及第三的归一化模式。详情请参阅可用内置的图式。


参数:control
a list of control parameters.  See Details.
的控制参数的列表。查看详细信息。


Details

详细信息----------Details----------

Formally this function is of class WeightingFunction with the additional attributes Name and Acronym.
正式这个函数是类WeightingFunction的附加属性的Name和Acronym。

The first letter of spec specifies a weighting schema for term frequencies of m:
spec指定的第一个字母的术语频率m加权模式为:




"n" (natural) \mathit{tf}_{i,j} counts the number of occurrences n_{i,j} of a term t_i in a document d_j. The input term-document matrix m is assumed to be in this
“N”(自然)“\mathit{tf}_{i,j}出现的次数进行计数n_{i,j}的一个术语t_i在一个文档中d_j。输入项文档矩阵m被假定为在此




"l" (logarithm) is defined as 1 + \log(\mathit{tf}_{i,j}).
“l”的(对数)被定义为1 + \log(\mathit{tf}_{i,j})。




"a" (augmented) is defined as <i>0.5 +
“a”的(增强)定义为<i> 0.5,+




"b" (boolean) is defined as 1 if \mathit{tf}_{i,j} > 0 and 0 otherwise.
“B”(布尔)被定义为1,如果\mathit{tf}_{i,j} > 0,否则为0。




"L" (log average) is defined as <i>\frac{1 +
“L”(-log平均值)的被定义为<i> \压裂{1 +

The second letter of spec specifies a weighting schema of document frequencies for m:
第二个字母spec指定文件的频率为m加权模式:




"n" (no) is defined as 1.
“n”(否)被定义为1。




"t" (idf) is defined as \log \frac{N}{\mathit{df}_t} where \mathit{df}_t denotes how often term t occurs in all
“T”(IDF)被定义为\log \frac{N}{\mathit{df}_t}\mathit{df}_t表示术语t如何往往发生在所有




"p" (prob idf) is defined as \max(0, \log(\frac{N - \mathit{df}_t}{\mathit{df}_t})).
“P”(概率IDF)被定义为\max(0, \log(\frac{N - \mathit{df}_t}{\mathit{df}_t}))。

The third letter of spec specifies a schema for normalization of m:
spec的第三个字母为m标准化指定的模式:




"n" (none) is defined as 1.
被定义为1的“n”(无)。




"c" (cosine) is defined as &radic;{\mathrm{col\_sums}(m ^ 2)}.
“C”(余弦)被定义为&radic;{\mathrm{col\_sums}(m ^ 2)}。




"u" (pivoted unique) is defined as \mathit{slope} *       &radic;{\mathrm{col\_sums}(m ^ 2)} + (1 - \mathit{slope}) *       \mathit{pivot} where both slope and pivot must be set
\mathit{slope} *       &radic;{\mathrm{col\_sums}(m ^ 2)} + (1 - \mathit{slope}) *       \mathit{pivot}都slope和pivot必须设置被定义为“U”(旋转唯一的)




"b" (byte size) is defined as \frac{1}{\mathit{CharLength}^&alpha;}. The parameter &alpha; must be set via the named tag alpha
“b”的(字节大小)被定义为\frac{1}{\mathit{CharLength}^&alpha;}。参数&alpha;必须通过指定的标记alpha

The final result is defined by multiplication of the chosen term frequency component with the chosen document frequency component with the chosen normalization component.
最终的结果是由所选择的项的频率分量,与所选择的文件的频率分量,与所选择的归一分量乘法定义。


值----------Value----------

The weighted matrix.
加权矩阵。


(作者)----------Author(s)----------


Ingo Feinerer



参考文献----------References----------

Introduction to Information Retrieval. Cambridge University Press, ISBN 0521865719.

实例----------Examples----------


data("crude")
TermDocumentMatrix(crude, control = list(removePunctuation = TRUE, stopwords = TRUE, weighting = function(x) weightSMART(x, spec = "ntc")))

转载请注明:出自 生物统计家园网(http://www.biostatistic.net)。


注:
注1:为了方便大家学习,本文档为生物统计家园网机器人LoveR翻译而成,仅供个人R语言学习参考使用,生物统计家园保留版权。
注2:由于是机器人自动翻译,难免有不准确之处,使用时仔细对照中、英文内容进行反复理解,可以帮助R语言的学习。
注3:如遇到不准确之处,请在本贴的后面进行回帖,我们会逐渐进行修订。
回复

使用道具 举报

您需要登录后才可以回帖 登录 | 注册

本版积分规则

手机版|小黑屋|生物统计家园 网站价格

GMT+8, 2025-6-18 07:16 , Processed in 0.024687 second(s), 16 queries .

Powered by Discuz! X3.5

© 2001-2024 Discuz! Team.

快速回复 返回顶部 返回列表