R语言 tm包 weightSMART()函数中文帮助文档(中英文对照)

loveR · 发表于 2012-10-1 10:56:17

weightSMART(tm)
weightSMART()所属R语言包：tm

                                    SMART Weightings
                                       SMART比重

                                       译者：生物统计家园网机器人LoveR

描述----------Description----------

Weight a term-document matrix according to a combination of weights specified in SMART notation.
重量一个术语文档矩阵根据指定在SMART符号的组合的权重。

用法----------Usage----------

weightSMART(m, spec = "nnn", control = list())

参数----------Arguments----------

参数：m
A TermDocumentMatrix in term frequency format.
ATermDocumentMatrix在术语频率格式。

参数：spec
a character string consisting of three characters. The first letter specifies a term frequency schema, the second a document frequency schema, and the third a normalization schema. See Details for available built-in schemata.
一个由三个字符组成的字符串。的第一个字母指定任期频率的模式，第二个文档频率架构，以及第三的归一化模式。详情请参阅可用内置的图式。

参数：control
a list of control parameters.  See Details.
的控制参数的列表。查看详细信息。

Details

详细信息----------Details----------

Formally this function is of class WeightingFunction with the additional attributes Name and Acronym.
正式这个函数是类WeightingFunction的附加属性的Name和Acronym。

The first letter of spec specifies a weighting schema for term frequencies of m:
spec指定的第一个字母的术语频率m加权模式为：

"n" (natural) \mathit{tf}_{i,j} counts the number of occurrences n_{i,j} of a term t_i in a document d_j. The input term-document matrix m is assumed to be in this
“N”（自然）“\mathit{tf}_{i,j}出现的次数进行计数n_{i,j}的一个术语t_i在一个文档中d_j。输入项文档矩阵m被假定为在此

"l" (logarithm) is defined as 1 + \log(\mathit{tf}_{i,j}).
“l”的（对数）被定义为1 + \log(\mathit{tf}_{i,j})。

"a" (augmented) is defined as <i>0.5 +
“a”的（增强）定义为<i> 0.5，+

"b" (boolean) is defined as 1 if \mathit{tf}_{i,j} > 0 and 0 otherwise.
“B”（布尔）被定义为1，如果\mathit{tf}_{i,j} > 0，否则为0。

"L" (log average) is defined as <i>\frac{1 +
“L”（-log平均值）的被定义为<i> \压裂{1 +

The second letter of spec specifies a weighting schema of document frequencies for m:
第二个字母spec指定文件的频率为m加权模式：

"n" (no) is defined as 1.
“n”（否）被定义为1。

"t" (idf) is defined as \log \frac{N}{\mathit{df}_t} where \mathit{df}_t denotes how often term t occurs in all
“T”（IDF）被定义为\log \frac{N}{\mathit{df}_t}\mathit{df}_t表示术语t如何往往发生在所有

"p" (prob idf) is defined as \max(0, \log(\frac{N - \mathit{df}_t}{\mathit{df}_t})).
“P”（概率IDF）被定义为\max(0, \log(\frac{N - \mathit{df}_t}{\mathit{df}_t}))。

The third letter of spec specifies a schema for normalization of m:
spec的第三个字母为m标准化指定的模式：

"n" (none) is defined as 1.
被定义为1的“n”（无）。

"c" (cosine) is defined as √{\mathrm{col\_sums}(m ^ 2)}.
“C”（余弦）被定义为√{\mathrm{col\_sums}(m ^ 2)}。

"u" (pivoted unique) is defined as \mathit{slope} *    √{\mathrm{col\_sums}(m ^ 2)} + (1 - \mathit{slope}) *    \mathit{pivot} where both slope and pivot must be set
\mathit{slope} *    √{\mathrm{col\_sums}(m ^ 2)} + (1 - \mathit{slope}) *    \mathit{pivot}都slope和pivot必须设置被定义为“U”（旋转唯一的）

"b" (byte size) is defined as \frac{1}{\mathit{CharLength}^α}. The parameter α must be set via the named tag alpha
“b”的（字节大小）被定义为\frac{1}{\mathit{CharLength}^α}。参数α必须通过指定的标记alpha

The final result is defined by multiplication of the chosen term frequency component with the chosen document frequency component with the chosen normalization component.
最终的结果是由所选择的项的频率分量，与所选择的文件的频率分量，与所选择的归一分量乘法定义。

值----------Value----------

The weighted matrix.
加权矩阵。

（作者）----------Author(s)----------

Ingo Feinerer

参考文献----------References----------

Introduction to Information Retrieval. Cambridge University Press, ISBN 0521865719.

实例----------Examples----------

data("crude")
TermDocumentMatrix(crude, control = list(removePunctuation = TRUE, stopwords = TRUE, weighting = function(x) weightSMART(x, spec = "ntc")))

转载请注明:出自生物统计家园网(http://www.biostatistic.net)。

注：
注1：为了方便大家学习，本文档为生物统计家园网机器人LoveR翻译而成，仅供个人R语言学习参考使用，生物统计家园保留版权。
注2：由于是机器人自动翻译，难免有不准确之处，使用时仔细对照中、英文内容进行反复理解，可以帮助R语言的学习。
注3：如遇到不准确之处，请在本贴的后面进行回帖，我们会逐渐进行修订。

账号		自动登录	找回密码
密码			注册

R语言 tm包 weightSMART()函数中文帮助文档(中英文对照)

浏览过的版块