找回密码
 注册
查看: 676|回复: 0

R语言 tm.plugin.factiva包 FactivaSource()函数中文帮助文档(中英文对照)

[复制链接]
发表于 2012-10-1 10:44:47 | 显示全部楼层 |阅读模式
FactivaSource(tm.plugin.factiva)
FactivaSource()所属R语言包:tm.plugin.factiva

                                        Factiva Source
                                         Factiva的来源

                                         译者:生物统计家园网 机器人LoveR

描述----------Description----------

Construct a source for an input containing a set of articles exported from Factiva in the <acronym>XML</acronym> or <acronym>HTML</acronym> formats.
构建一个Factiva的在XML </首字母缩写或<acronym>HTML </首字母缩写格式<acronym>中出口的文章包含了一组输入源。


用法----------Usage----------


  FactivaSource(x, encoding = "UTF-8",
                format = c("auto", "XML", "HTML"))



参数----------Arguments----------

参数:x
Either a character identifying the file or a connection.
一个字符识别的文件,或一个连接。


参数:encoding
A character giving the encoding of x. It will be ignored unless the <acronym>XML</acronym> or <acronym>HTML</acronym> input does not include this information, which should normally not happen with files exported from Factiva.
一个字符的编码x。它会被忽略,除非<acronym>XML </首字母缩写或<acronym> HTML </首字母缩写输入不包含此信息,通常不应发生在导出的文件从Factiva的。


参数:format
The format of the file or connection identified by x (see &ldquo;Details&rdquo;).
格式的文件或标识的连接x(见“详细信息”)。


Details

详细信息----------Details----------

This function can be used to import both <acronym>XML</acronym> and <acronym>HTML</acronym> files. If format is set to &ldquo;auto&rdquo; (the default), the file extension is used to guess the format: if the file name ends with &ldquo;.xml&rdquo; or &ldquo;.XML&rdquo;,  <acronym>XML</acronym> is assumed; else, the file is assumed to be in the <acronym>HTML</acronym> format.
此功能可用于,同时导入<acronym>XML</首字母缩写和<acronym> HTML</首字母缩写>文件。如果format设置为“自动”(默认设置),文件扩展名是用来猜测的格式:如果文件名以“。xml”或“XML”,<acronym>假设XML> </首字母缩写,否则,该文件被认为是在<acronym>HTML </首字母缩写>格式。

This function imports the body of the articles, but also sets several meta-data variables on individual documents:
此功能导入人体的文章,但也设置了几个对单个文档的元数据变量:

DateTimeStamp: The publication date.
DateTimeStamp:出版日期。

Heading: The title of the article.
Heading:文章的标题。

Origin: The newspaper the article comes from.
Origin:报纸上的文章。

Edition: The (local) variant of the newspaper.
Edition:(本地)变种的报纸。

Section: The part of the newspaper containing the article.
Section:报纸上的文章的部分。

Subject: One or several keywords defining the subject.
Subject:一个或多个关键字定义的主题。

Coverage: One or several keywords identifying the covered regions.
Coverage:确定覆盖区域的一个或几个关键字。

WordCount: The number of words in the article.
WordCount:在文章的字数。

Publisher: The publisher of the newspaper.
Publisher:报纸出版商。

Rights: The copyright information associated with the article.
Rights:版权信息的文章。

Language: This information is set automatically if readerControl = list(language = NA) is passed (see the example below). Else, the language specified manually is set for all articles. If omitted, the default, "en", is used.
Language:该信息被设置自动readerControl = list(language = NA)如果传递(见下面的例子)。否则,所有文章的语言设置为手动指定。如果省略,则默认情况下,“连接”,使用。

It is advised to export articles from Factiva in the <acronym>XML</acronym> format rather than in <acronym>HTML</acronym> when possible, since the latter does not provide completely clean information. In particular, dates are not guaranteed to be parsed correctly if the machine from which the <acronym>HTML</acronym> file was exported uses a locale different from that of the machine where it is read.
建议出口从Factiva的文章的<acronym>XML </首字母缩写>格式而不是在<acronym>的HTML </首字母缩写>在可能的情况下,因为后者不提供完全干净的信息。特别是,日期不能保证正确解析,如果机器从<acronym> HTML </首字母缩写>文件导出使用不同的语言环境的机器,它是只读的。


值----------Value----------

An object of class XMLSource which extends the class Source representing set of articles from Factiva.
一个对象类XMLSource类Source代表组文章Factiva的延伸。


注意----------Note----------

It has been found that some Factiva articles contain unescaped characters that are not authorized in <acronym>XML</acronym> files. If such articles are included in the input you are trying to import, the <acronym>XML</acronym> parser will fail printing a few error messages, and the corpus will not be created at all.
已经发现,一些Factiva的文章包含了未授权的的未转义字符,在<acronym> XML </首字母缩写文件。如果这样的文章都包含在输入你试图导入,<acronym> XML</首字母缩写>解析器将无法打印一些错误消息,将不会被创建在所有的语料。

If you experience this bug, please report this to the Factiva Customer Service, which will fix the incriminated article; feel free to ask the maintainer of the present package if needed. In the meantime, you can exclude the problematic article from the <acronym>XML</acronym> file: to identify it, proceed by exporting only one half of the original corpus at a time, as many times as needed, and see when it fails; you will eventually find the culprit. (If you know <acronym>XML</acronym>, you can use an <acronym>XML</acronym> validator to find the relevant part of the file, and fix it by hand.)
如果您遇到此错误,请报告Factiva的客户服务,这将解决递增后的文章,如果需要的话,随时要求本包的维护者。在此期间,您可以排除问题的文章的<acronym> XML </首字母缩写文件,以确定它,继续由出口只有一半的原始语料一次,根据需要,多次和看到它失败时,你最终会找到的罪魁祸首。 (如果你知道<acronym>,XML </首字母缩写>,你可以使用一个<acronym> XML</首字母缩写验证器来找到相关的部分文件,并修复它的手。“)


(作者)----------Author(s)----------



Milan Bouchet-Valat




参见----------See Also----------

readFactivaXML and readFactivaHTML for the functions actually parsing individual articles.
readFactivaXML和readFactivaHTML的功能,实际上是在分析个人物品。

getSources to list available sources.
getSources列出可用的来源。


实例----------Examples----------


## Not run: [#不运行:]
    ## Load an XML file[#加载一个XML文件]
    library(tm)
    file <- system.file("factiva_test.xml", package = "tm.plugin.factiva")
    source <- FactivaSource(file)
    corpus <- Corpus(source, readerControl = list(language = NA))

    # See the contents of the documents[查看的文件内容]
    inspect(corpus)

    # See meta-data associated with first article[请参阅相关的元数据的第一篇文章]
    meta(corpus[[1]])

## End(Not run)[#(不执行)]

    ## For an HTML file[#对于HTML文件]
    library(tm)
    file <- system.file("factiva_test.html", package = "tm.plugin.factiva")
    source <- FactivaSource(file)
    corpus <- Corpus(source, readerControl = list(language = NA))

    # See the contents of the documents[查看的文件内容]
    inspect(corpus)

    # See meta-data associated with first article[请参阅相关的元数据的第一篇文章]
    meta(corpus[[1]])

转载请注明:出自 生物统计家园网(http://www.biostatistic.net)。


注:
注1:为了方便大家学习,本文档为生物统计家园网机器人LoveR翻译而成,仅供个人R语言学习参考使用,生物统计家园保留版权。
注2:由于是机器人自动翻译,难免有不准确之处,使用时仔细对照中、英文内容进行反复理解,可以帮助R语言的学习。
注3:如遇到不准确之处,请在本贴的后面进行回帖,我们会逐渐进行修订。
回复

使用道具 举报

您需要登录后才可以回帖 登录 | 注册

本版积分规则

手机版|小黑屋|生物统计家园 网站价格

GMT+8, 2025-6-18 19:20 , Processed in 0.025862 second(s), 16 queries .

Powered by Discuz! X3.5

© 2001-2024 Discuz! Team.

快速回复 返回顶部 返回列表