PCorpus(tm)
PCorpus()所属R语言包:tm
Permanent Corpus Constructor
永久语料库构造
译者:生物统计家园网 机器人LoveR
描述----------Description----------
Construct a permanent corpus.
构建一个永久文集。
用法----------Usage----------
PCorpus(x,
readerControl = list(reader = x$DefaultReader, language = "en"),
dbControl = list(dbName = "", dbType = "DB1"),
...)
DBControl(x)
## S3 method for class 'PCorpus'
DMetaData(x)
参数----------Arguments----------
参数:x
A Source object for PCorpus, and a corpus for the other functions.
ASource对象PCorpus,和语料库的其他功能。
参数:readerControl
A list with the named components reader representing a reading function capable of handling the file format found in x, and language giving the text's language (preferably as <acronym>IETF</acronym> language tags). The default language is assumed to be English ("en"). Use NA to avoid internal assumptions (e.g., when the language is unknown or is deliberately not set).
命名的组件的列表reader阅读功能,可处理的文件格式在x,language文本的语言(最好<acronym> IETF </首字母缩写“>”语言标签“)。默认语言假设是英语("en")。使用NA,以避免内部的假设(例如,当语言是未知的,或者是故意不设置)。
参数:dbControl
A list with the named components dbName giving the filename holding the sourced out documents (i.e., the database), and dbType holding a valid database type as supported by package filehash. Under activated database support the tm package tries to keep as few as possible resources in memory under usage of the database.
命名的组件的列表dbName的文件名源出文件(即数据库),并dbType的持有有效的数据库类型所支持的包filehash。在启动数据库的支持tm包在内存中,试图保持尽可能少的资源在使用中的数据库。
参数:...
Optional arguments for the reader.
可选参数的reader。
Details
详细信息----------Details----------
Permanent means that documents are physically stored outside of R (e.g., in a database) and R objects are only pointers to external structures. I.e., changes in the underlying external representation can affect multiple R objects simultaneously.
常驻意味着文件的物理存储位置的R外(例如,在数据库中)和R对象是唯一的外部结构的指针。也就是说,在底层的外部表现形式,可以同时影响多个R对象。
The constructed corpus object inherits from a list and has three attributes containing meta and database management information:
构建的语料库对象的继承从list有三个属性包含元数据库管理信息:
CMetaData Corpus Meta Data contains corpus specific meta data in form of tag-value pairs and information about children in form of a binary tree. This information is useful for
CMetaData语料库元数据包含语料库特定的元数据标记 - 值对的形式和有关儿童的二叉树的形式。此信息是非常有用的
DMetaData Document Meta Data of class data.frame contains document specific meta data for the corpus. This data frame typically encompasses clustering or classification results which basically are metadata for documents but form an own entity (e.g., with its name, the value range,
DMetaData文件的元数据类data.frame包含文档元数据的语料库。这个数据框通常包括聚类或分类的结果基本上是元数据文件,但形成自己的实体(例如,用它的名字,值的范围,
DBControl Database control field is a list with two named components: dbName holds the path to the permanent database storage, and dbType stores the database
DBControl数据库控制领域是一个list有两个组成部分:dbName持有的永久数据库存储的路径,和dbType存储数据库
值----------Value----------
An object of class PCorpus which extends the classes Corpus and list containing a permanent corpus.
类PCorpus的对象扩展类Corpus和list包含一个永久的语料库。
(作者)----------Author(s)----------
Ingo Feinerer
实例----------Examples----------
txt <- system.file("texts", "txt", package = "tm")
## Not run: [#不运行:]
PCorpus(DirSource(txt),
dbControl = list(dbName = "myDB.db", dbType = "DB1"))
## End(Not run)[#(不执行)]
转载请注明:出自 生物统计家园网(http://www.biostatistic.net)。
注:
注1:为了方便大家学习,本文档为生物统计家园网机器人LoveR翻译而成,仅供个人R语言学习参考使用,生物统计家园保留版权。
注2:由于是机器人自动翻译,难免有不准确之处,使用时仔细对照中、英文内容进行反复理解,可以帮助R语言的学习。
注3:如遇到不准确之处,请在本贴的后面进行回帖,我们会逐渐进行修订。
|