R语言 rms包 datadist()函数中文帮助文档(中英文对照)

loveR · 发表于 2012-9-27 19:10:08

datadist(rms)
datadist()所属R语言包：rms

                                       Distribution Summaries for Predictor Variables
                                       预测变量的分布摘要

                                       译者：生物统计家园网机器人LoveR

描述----------Description----------

For a given set of variables or a data frame, determines summaries of variables for effect and plotting ranges, values to adjust to, and overall ranges for Predict, plot.Predict, summary.rms, survplot, and nomogram.rms. If datadist is called before a model fit and the resulting object pointed to with options(datadist="name"), the data characteristics will be stored with the fit by Design(), so that later predictions and summaries of the fit will not need to access the original data used in the fit.  Alternatively, you can specify the values for each variable in the model when using these 3 functions, or specify the values of some of them and let the functions look up the remainder (of say adjustmemt levels) from an object created by datadist. The best method is probably to run datadist once before any models are fitted, storing the distribution summaries for all potential variables. Adjustment values are 0 for binary variables, the most frequent category (or optionally the first category level) for categorical (factor) variables, the middle level for  ordered factor variables, and medians for continuous variables. See descriptions of q.display and q.effect for how display and effect ranges are chosen for continuous variables.
对于一组给定的变量或一个数据框，确定Predict，plot.Predict，summary.rms，<摘要用来显示效果的变量和绘图范围，调整到的值，和整体范围X>和survplot。如果nomogram.rms被称为前一个模型拟合，得到的对象与datadist指出，数据的特点，将存储与契合的options(datadist="name")，这样，以后的预测和总结的契合将不需要访问原始数据的拟合中使用。或者，您也可以指定模型中的每个变量的值，在使用这些功能时，或其中的一些指定的值，并让功能查找其余的（说的调整。水平）创建的对象Design() 。最好的方法可能是执行任何型号均配备datadist前一次，所有潜在变量存储分配摘要。调整值是datadist二元变量，最频繁的类别（或第一类）的分类（0）变量，中层factor变量，和中位数为连续的变量。描述ordered factor和q.display如何显示和影响范围选择为连续变量。

用法----------Usage----------

datadist(..., data, q.display, q.effect=c(0.25, 0.75),
      adjto.cat=c('mode','first'), n.unique=10)

## S3 method for class 'datadist'
print(x, ...)
# options(datadist="dd")
# used by summary, plot, survplot, sometimes predict
# For dd substitute the name of the result of datadist

参数----------Arguments----------

参数：...
a list of variable names, separated by commas, a single data frame, or a fit with Design information.  The first element in this list may also be an object created by an earlier call to datadist; then the later variables are added to this datadist object. For a fit object, the variables named in the fit are retrieved from the active data frame or from the location pointed to by data=frame number or data="data frame name". For print, is ignored.
变量名的列表，以逗号分隔，一个数据框，或一个合适的Design信息。在此列表中的第一个元素也可能是一个的先调用datadist，然后后面的变量添加到这个datadist对象创建的对象。对于一个合适的对象，在合适的变量检索到有效的数据框或从data=frame number或data="data frame name"指向的位置。对于print，将被忽略。

参数：data
a data frame or a search position.  If data is a search position, it is assumed that a data frame is attached in that position, and all its variables are used.  If you specify both individual variables in ... and data, the two sets of variables are combined.  Unless the first argument is a fit object, data must be an integer.
一个数据框或一个搜索位置。如果data是一个搜索位置，它被假定安装在那个位置的数据框，并使用其所有的变量。如果同时指定单个变量...和data，两组变量相结合。除非第一个参数是一个合适的对象，data必须是一个整数。

参数：q.display
set of two quantiles for computing the range of continuous variables to use in displaying regression relationships.  Defaults are q and 1-q, where q=10/max(n,200), and n is the number of  non-missing observations.  Thus for n<200, the .05 and .95 quantiles are used.  For n≥q 200, the 10^{th} smallest and 10^{th} largest values are used.  If you specify q.display, those quantiles are used whether or not n<200.
设置2位数的计算范围内使用连续变量的显示回归关系。默认值是q和1-q，其中q=10/max(n,200)和n是多少非缺失的观察。因此，n<200，0.05和0.95分位数。对于n≥q 200，10^{th}小10^{th}最大的值是用来。如果你指定了q.display，这些位数是否n<200。

参数：q.effect
set of two quantiles for computing the range of continuous variables to use in estimating regression effects.  Defaults are c(.25,.75), which yields inter-quartile-range odds ratios, etc.
2位数的计算范围内使用连续变量的估计回归效果。默认值是c（.25，.75），这将产生四分位数范围的比值比，等等。

参数：adjto.cat
default is "mode", indicating that the modal (most frequent) category for categorical (factor) variables is the adjust-to setting. Specify "first" to use the first level of factor variables as the adjustment values.  In the case of many levels having the maximum frequency, the first such level is used for "mode".
默认是"mode"，表明的模态（最频繁的）对分类类别（因子）变量是调整到设定。指定"first"使用的第一级因子变量的调整值。在许多具有最高频率的水平的情况下，第一个这样的水平用于"mode"。

参数：n.unique
variables having n.unique or fewer unique values are considered to be discrete variables in that their unique values are stored in the values list.  This will affect how functions such as nomogram.Design determine whether variables are discrete or not.
变量具有n.unique或更少的唯一值被认为是离散变量，其独特的值被存储在values列表。这将影响功能，如nomogram.Design确定是否变量是离散的或不。

参数：x
result of datadist
结果datadist

Details

详细信息----------Details----------

For categorical variables, the 7 limits are set to character strings (factors) which correspond to c(NA,adjto.level,NA,1,k,1,k), where k is the number of levels. For ordered variables with numeric levels, the limits are set to c(L,M,H,L,H,L,H), where L is the lowest level, M is the middle level, and H is the highest level.
对于分类变量，7限制设置为字符串（因素），对应于c(NA,adjto.level,NA,1,k,1,k)，k的水平是多少。对于有序变量的数字水平，限制设置为c(L,M,H,L,H,L,H)，L是最低级的，M是中等水平，H是最高级别的。

值----------Value----------

a list of class "datadist" with the following components
类"datadist"以下组件的列表

参数：limits
a 7 \times k vector, where k is the number of variables. The 7 rows correspond to the low value for estimating the effect of the variable, the value to adjust the variable to when examining other variables, the high value for effect, low value for displaying the variable, the high value for displaying it, and the overall lowest and highest values.
一个7 \times k向量，其中k是变量的数目。 7行对应于估计的影响的变量，该值来调整的变量时，检查其他变量的值低，高价值的效应，低用于显示的变量值，显示它的高的值，和总体最低值和最高值。

参数：values
a named list, with one vector of unique values for each numeric variable having no more than n.unique unique values  </table>
命名列表，与一个向量的每一个数字变量的唯一值不超过n.unique唯一值</ TABLE>

（作者）----------Author(s)----------

Frank Harrell<br>
Department of Biostatistics<br>
Vanderbilt University<br>
f.harrell@vanderbilt.edu

参见----------See Also----------

rms, rms.trans, describe, Predict, summary.rms
rms，rms.trans，describe，Predict，summary.rms

实例----------Examples----------

## Not run: [＃不运行：]
d <- datadist(data=1)       # use all variables in search pos. 1[使用搜寻POS中的所有变量。 1]
d <- datadist(x1, x2, x3)
page(d)                      # if your options(pager) leaves up a pop-up[如果您的选项（寻呼机）留下了一个弹出式]
                           # window, this is a useful guide in analyses[窗口，这是一个有用的指南分析]
d <- datadist(data=2)       # all variables in search pos. 2[搜索位置中的所有变量。 2]
d <- datadist(data=my.data.frame)
d <- datadist(my.data.frame)  # same as previous.  Run for all potential vars.[同前。运行的所有潜在增值分销商。]
d <- datadist(x2, x3, data=my.data.frame) # combine variables[结合变量]
d <- datadist(x2, x3, q.effect=c(.1,.9), q.display=c(0,1))
# uses inter-decile range odds ratios,[范围使用等分间的比值比，]
# total range of variables for regression function plots[总范围的变量的回归函数曲线]
d <- datadist(d, z)          # add a new variable to an existing datadist[一个新的变量添加到现有的datadist]
options(datadist="d")       #often a good idea, to store info with fit[往往是一个好主意，存储信息相契合的]
f <- ols(y ~ x1*x2*x3)

options(datadist=NULL)       #default at start of session[默认情况下，在会议的开始]
f <- ols(y ~ x1*x2)
d <- datadist(f)             #info not stored in `f'[信息存储在F]
d$limits["Adjust to","x1"] <- .5 #reset adjustment level to .5[重新调整水平0.5]
options(datadist="d")

f <- lrm(y ~ x1*x2, data=mydata)
d <- datadist(f, data=mydata)
options(datadist="d")

f <- lrm(y ~ x1*x2)          #datadist not used - specify all values for[datadist不使用 - 指定的所有值]
summary(f, x1=c(200,500,800), x2=c(1,3,5))       # obtaining predictions[获得预测]
plot(Predict(f, x1=200:800, x2=3))

# Change reference value to get a relative odds plot for a logistic model[更改参考值，以获得一个相对危险的图的MF模式]
d$limits$age[2] <- 30 # make 30 the reference value for age[30的参考值年龄]
# Could also do: d$limits["Adjust to","age"] <- 30[也可以这样做：D $限制=“调整”，“年龄”] < -  30]
fit <- update(fit) # make new reference value take effect[新的参考值生效]
plot(Predict(fit, age, ref.zero=TRUE, fun=exp),
   ylab='Age=x:Age=30 Odds Ratio')

## End(Not run)[＃（不执行）]

转载请注明:出自生物统计家园网(http://www.biostatistic.net)。

注：
注1：为了方便大家学习，本文档为生物统计家园网机器人LoveR翻译而成，仅供个人R语言学习参考使用，生物统计家园保留版权。
注2：由于是机器人自动翻译，难免有不准确之处，使用时仔细对照中、英文内容进行反复理解，可以帮助R语言的学习。
注3：如遇到不准确之处，请在本贴的后面进行回帖，我们会逐渐进行修订。

账号		自动登录	找回密码
密码			注册