Zipf_n_Heaps(tm)
Zipf_n_Heaps()所属R语言包:tm
Explore Corpus Term Frequency Characteristics
探索语料库的频率特性
译者:生物统计家园网 机器人LoveR
描述----------Description----------
Explore Zipf's law and Heaps' law, two empirical laws in linguistics describing commonly observed characteristics of term frequency distributions in corpora.
探索齐普夫定律和堆“法,描述通常所观察到的特征的短期频率分布在语料库语言学中的两个经验规律。
用法----------Usage----------
Zipf_plot(x, type = "l", ...)
Heaps_plot(x, type = "l", ...)
参数----------Arguments----------
参数:x
a document-term matrix or term-document matrix with unweighted term frequencies.
一个文件术语的矩阵或术语文档矩阵与未加权的术语频率。
参数:type
a character string indicating the type of plot to be drawn, see plot.
一个字符串,表示要绘制的类型图,请参阅plot。
参数:...
further graphical parameters to be used for plotting.
进一步的图形参数要用于进行绘图。
Details
详细信息----------Details----------
Zipf's law (e.g., http://en.wikipedia.org/wiki/Zipf%27s_law) states that given some corpus of natural language utterances, the frequency of any word is inversely proportional to its rank in the frequency table, or, more generally, that the pmf of the term frequencies is of the form c k^{-β}, where k is the rank of the term (taken from the most to the least frequent one). We can conveniently explore the degree to which the law holds by plotting the logarithm of the frequency against the logarithm of the rank, and inspecting the goodness of fit of a linear model.
齐普夫定律(例如,http://en.wikipedia.org/wiki/Zipf%27s_law)指出,由于一些自然语言的话语语料库,任何单词的频率是成反比的频率表中的排名,或者更一般的pmf,这个词的频率的形式c k^{-β},其中k是这个词的排名从最频繁的一个。我们可以方便地探索法律在何种程度上拥有,绘制的频率对数的对数的排名,并检查善良的线性模型的拟合。
Heaps' law (e.g., http://en.wikipedia.org/wiki/Heaps%27_law) states that the vocabulary size V (i.e., the number of different terms employed) grows polynomially with the text size T (the total number of terms in the texts), so that V = c T^β. We can conveniently explore the degree to which the law holds by plotting \log(V) against \log(T), and inspecting the goodness of fit of a linear model.
堆的法律(例如,http://en.wikipedia.org/wiki/Heaps%27_law)指出,词汇量的大小V(即,采用不同的术语)的数量增长多项式的文字大小 X>(在文本的总数),使T。我们可以很方便地探索通过绘制V = c T^β对\log(V),检查善良的线性模型的拟合程度的法律持有。
值----------Value----------
The coefficients of the fitted linear model. As a side effect, the corresponding plot is produced.
拟合的线性模型的系数。作为一个副作用,产生相应的曲线。
实例----------Examples----------
data("acq")
m <- DocumentTermMatrix(acq)
Zipf_plot(m)
Heaps_plot(m)
转载请注明:出自 生物统计家园网(http://www.biostatistic.net)。
注:
注1:为了方便大家学习,本文档为生物统计家园网机器人LoveR翻译而成,仅供个人R语言学习参考使用,生物统计家园保留版权。
注2:由于是机器人自动翻译,难免有不准确之处,使用时仔细对照中、英文内容进行反复理解,可以帮助R语言的学习。
注3:如遇到不准确之处,请在本贴的后面进行回帖,我们会逐渐进行修订。
|