Robust Multinomial Regression(multinomRob)
Robust Multinomial Regression()所属R语言包:multinomRob
Multinomial Robust Estimation
多项稳健估计
译者:生物统计家园网 机器人LoveR
描述----------Description----------
multinomRob fits the overdispersed multinomial regression model for grouped count data using the hyperbolic tangent (tanh) and least quartile difference (LQD) robust estimators.
multinomRob适合的overdispersed的分组计数资料采用双曲正切(双曲正切)和至少四分差(LQD)稳健估计的多元回归模型。
用法----------Usage----------
multinomRob(model, data, starting.values=NULL, equality=NULL,
genoud.parms=NULL, print.level=0, iter = FALSE,
maxiter = 10, multinom.t=1, multinom.t.df=NA,
MLEonly=FALSE)
参数----------Arguments----------
参数:model
The regression model specification. This is a list of formulas, with one formula for each category of outcomes for which counts have been measured for each observation. For example, in the following,
回归模型规范。这是一个列表的公式,为每个类别的计数都被测量对于每个观测的结果的一个公式。例如,在下面的内容,
model=list(y1 ~ x1, y2 ~ x2, y3 ~ 0)
model=list(y1 ~ x1, y2 ~ x2, y3 ~ 0)
the outcome variables containing counts are y1, y2 and y3, and the linear predictor for y1 is a coefficient times x1 plus a constant, the linear predictor for y2 is a coefficient times x2 plus a constant, and the linear predictor for y3 is zero. Each formula has the format countvar ~ RHS, where countvar is the name of a vector, in the dataframe referenced by the data argument, that gives the counts for all observations for one category. RHS denotes the righthand side of a formula using the usual syntax for formulas, where each variable in the formula is the name of a vector in the dataframe referenced by the data argument. For example, a RHS specification of var1 + var2*var3 would specify that the regressors are to be var1, var2, var3, the terms generated by the interaction var2:var3, and the constant.
包含计数变量的结果是y1,y2和y3,y1的线性预测是一个系数乘以x1加上一个常数,线性预测y2是一个系数乘以x2加一个常数,并为y3是零的线性预测。每个公式的格式countvar ~ RHS,countvar是一个矢量的名称,在data参数所引用的数据框,给出了一类所有的观测计数。 RHS表示的公式使用的常用语法的公式,公式中每个变量的名字中的一个向量data参数所引用的数据框的右侧。例如,RHS规格的var1 + var2*var3将指定的回归系数是var1,var2,var3,相互作用所产生的 X>,和常数。
The set of outcome alternatives may be specified to vary over observations, by putting in a negative value for alternatives that do not exist for particular observations. If the value of an outcome variable is negative for an observation, then that outcome is considered not available for that observation. The predicted counts for that observation are defined only for the available observations and are based on the linear predictors for the available observations. The same set of coefficient parameter values are used for all observations. Any observation for which fewer than two outcomes are available is omitted.
的一组结果的替代品,可指定不同的观察,通过把在一个负的替代品,不存在特定的观测值。如果观察的结果变量的值是负的,那么这个结果被认为不适用于该观察。这种观察的预测计数仅被定义为可用的观测和基于可用的观测的线性预测。系数的参数值所使用的所有观测同一组。任何观察少于两个的结果是被省略。
Observations with missing data (NA) in any outcome variable or regressor are omitted (listwise deletion).
观察省略任何结果变量或回归量与丢失的数据(NA)(列表删除)。
In a model that has the same regressors for every category, except for one category for which there are no regressors in order to identify the model (the reference category), the RHS specification must be given for all the categories except the reference category. The formula for the reference category must include a RHS specification that explicitly omits the constant, e.g., countvar ~ -1 or countvar ~ 0. The number of coefficient parameters to be estimated equals the number of terms generated by all the formulas, subject to equality constraints that may be specified using the equality argument.
在模型中,除了一类有没有回归量,以确定模型(参考类别),RHS规格必须给所有的类别,但具有相同的回归系数为每个类别参考类别。参考类别的公式必须包含一个RHS规范,明确地忽略了恒定的,例如,countvar ~ -1或countvar ~ 0。系数为待估参数的数量等于条款所产生的所有公式,平等equality使用参数指定的限制,这可能数。
参数:data
The dataframe that contains all the variables referenced in the model argument, which are the data to be analyzed.
数据框,其中包含所有被引用的变量在model的说法,这是对数据进行分析。
参数:starting.values
Starting values for the regression coefficient parameters, as a vector. The parameter ordering matches the ordering of the formulas in the model argument: parameters for the terms in the first formula appear first, then come parameters for the terms in the second formula, etc. In practice it will usually be better to start by letting multinomRob find starting values by using the multinom.t option, then using the results from one run as starting values for a subsequent run done with, perhaps, a larger population of operators for rgenoud.
开始的回归系数的参数的值,作为一个向量。参数排序相匹配的顺序中的公式model参数:第一个参数为第一个公式中的条款出现,然后再参数的条款中第二个公式,在实践中通常会是更好的首先让multinomRob开始使用multinom.t选项的值,然后使用一个运行的结果作为初始值,也许,一个更大的的运营商rgenoud的人口为后续的运行。
参数:equality
List of equality constraints. This is a list of lists of formulas. Each formula has the same format as in the model specification, and must include only a subset of the outcomes and regressors used in the model specification formulas. All the coefficients specified by the formulas in each list will be constrained to have the same value during estimation. For example, in the following,
等式约束名单。这是一个列表,列表中的公式。每个公式模型中的规范,具有相同的格式,只能包含一个子集模型中的规范公式的结果和回归量。的公式在每个列表中所指定的所有的系数将被约束为在估算过程中具有相同的值。例如,在下面的内容,
multinomRob(model=list(y1 ~ x1, y2 ~ x2, y3 ~ 0), data=dtf, equality=list(list(y1 ~ x1 + 0, y2 ~ x2 + 0)) );
multinomRob(model=list(y1 ~ x1, y2 ~ x2, y3 ~ 0), data=dtf, equality=list(list(y1 ~ x1 + 0, y2 ~ x2 + 0)) );
the model to be estimated is
以进行估计的模型是
list(y1 ~ x1, y2 ~ x2, y3 ~ 0)
list(y1 ~ x1, y2 ~ x2, y3 ~ 0)
and the coefficients of x1 and x2 are constrained equal by
x1和x2的系数被限制相等的
equality=list(list(y1 ~ x1 + 0, y2 ~ x2 + 0))
equality=list(list(y1 ~ x1 + 0, y2 ~ x2 + 0))
In the equality formulas it is necessary to say + 0 so the intercepts are not involved in the constraints. If a parameter occurs in two different lists in the equality= argument, then all the parameters in the two lists are constrained to be equal to one another. In the output this is described as consolidating the lists.
在平等公式说+ 0“这样的截距不参与的限制是必要的。如果一个参数出现在两个不同的列表在equality=参数,然后两个列表中的所有参数都被限制为等于另一个。在输出中,这被描述为合并列表。
参数:genoud.parms
List of named arguments used to control the rgenoud optimizer, which is used to compute the LQD estimator.
列表命名参数,到控制的rgenoud的优化,它是用来计算LQD估计。
参数:print.level
Specify 0 for minimal printing, 1 to print more detailed information about LQD and other intermediate computations, 2 to print details about the tanh computations, or 3 to print details about starting values computations.
指定为0的最小印刷,约的LQD与其他中间计算,打印的双曲正切计算,或打印的初始值计算的详细信息打印更详细的信息。
参数:iter
TRUE means to iterate between LQD and tanh estimation steps until either the algorithm converges, the number of iterations specified by the maxiter argument is reached, or if an LQD step occurs that produces a larger value than the previous step did for the overdispersion scale parameter. This option is often improves the fit of the model.
TRUE装置之间进行迭代直至算法收敛LQD和的tanh估计步骤,所指定的迭代数达到maxiter参数,或如果LQD步骤发生,产生一个较大的值比以前的步骤做的偏大规模参数。此选项通常可以提高模型拟合。
参数:maxiter
The maximum number of iterations to be done between LQD and tanh estimation steps.
LQD的tanh估计步骤之间进行迭代的最大数量。
参数:multinom.t
1 means use the multinomial multivariate-t model to compute starting values for the coefficient parameters. But if the MNL results are better (as judged by the LQD fit), MNL values will be used instead. 0 means use nonrobust maximum likelihood estimates for a multinomial regression model. 2 forces the use of the multivariate-t model for starting values even if the MNL estimates provide better starting values for the LQD. Note that with multinom.t=1 or multinom.t=2, multivariate-t starting values will not be used if the model cannot generate valid standard errors. To force the use of multivariate-t estimates even in this circumstance, see the multinom.t.df argument.
1是指使用多项多变量t模型计算的系数参数的初始值。但是,如果的MNL结果是更好的(判断由LQD拟合),MNL值将被用来代替。 0是指使用nonrobust最大似然估计的多元回归模型。 2强制使用的多变量t模型MNL估计值,即使提供更好的初始值的LQD。需要注意的是multinom.t=1或multinom.t=2,多变量-T的初始值将不会被使用,如果模型不能产生有效的标准误差。要强制使用多变量T估计,即使在这种情况下,multinom.t.df参数。
If the starting.values argument is not NULL, the starting values given in that argument are used and the multinom.t argument is ignored. Multinomial multivariate-t starting values are not available when the number of outcome alternatives varies over the observations.
如果starting.values的说法是非NULL,在该参数的初始值是用来multinom.t参数将被忽略。多项多元-T的初始值时,不能使用结果的替代品的数量变化的观察。
参数:multinom.t.df
NA means that the degrees of freedom (DF) for the multivariate-t model (when used) should be estimated. If multinom.t.df is a number, that number will be used for the degrees of freedom and the DF will not be estimated. Only a positive number should be used. Setting multinom.t.df to a number also implies that, if multinom.t=1 or multinom.t=2, the multivariate-t starting values will be used (depending on the comparison with the MNL estimates if multinom.t=1 is set) even if the standard errors are not defined.
NA表示的自由度(DF)的多元t模型(当使用时),应当估计。如果multinom.t.df是一个数字,这个数字将被用于的自由度和DF将无法估计。只有一个正数,应该被使用。设置multinom.t.df数字也意味着,如果multinom.t=1或multinom.t=2,多变量-T的初始值将被使用(根据MNL估计的比较,如果multinom.t=1被设置),即使没有定义标准误差。
参数:MLEonly
If TRUE, then only the standard maximum-likelihood MNL model is estimated. No robust estimation model and no overdispersion parameter is estimated.
如果TRUE,那么只有标准MNL模型的最大似然估计。没有强大的估算模型和无偏大参数估计。
Details
详细信息----------Details----------
The tanh estimator is a redescending M-estimator, and the LQD estimator is a generalized S-estimator. The LQD is used to estimate the scale of the overdispersion. Given that scale estimate, the tanh estimator is used to estimate the coefficient parameters of the linear predictors of the multinomial regression model. <br>
双曲正切估计是一个,redescending M-估计量,和LQD估计的是一个广义的S-估计。 LQD是用来估计规模的偏大离差。的tanh估计鉴于这种规模估计,用于估计的多项式回归模型的线性预测系数的参数。参考
If starting values are not supplied, they are computed using a multinomial multivariate-t model. The program also computes and reports nonrobust maximum likelihood estimates for the multinomial regression model, reporting sandwich estimates for the standard errors that are adjusted for a nonrobust estimate of the error dispersion.
如果不提供初始值,计算使用多项多变量t模型。该计划还计算并报告nonrobust最大似然估计的多元回归模型,报告三明治估计的标准误差调整的一个nonrobust估计的误差扩散。
值----------Value----------
multinomRob returns a list of 15 objects. The returned objects are:
multinomRob返回一个列表的15个对象。返回的对象是:
<table summary="R valueblock"> <tr valign="top"><td>coefficients</td> <td> The tanh coefficient estimates in matrix format. The matrix has one column for each formula specified in the model argument. The name of each column is the name used for the count variable in the corresponding formula. The label for each row of the matrix gives the names of the regressors to which the coefficient values in the row apply. The regressor names in each label are separated by a forward slash (/), and NA is used to denote that no regressor is associated with the corresponding value in the matrix. The value 0 is used in the matrix to fill in for values that do not correspond to a model formula regressor.</td></tr> <tr valign="top"><td>se</td> <td> The tanh coefficient estimate standard errors in matrix format. The format and labelling used for the matrix is the same as is used for the coefficients. The standard errors are derived from the estimated asymptotic sandwich covariance estimate.</td></tr> <tr valign="top"><td>LQDsigma2</td> <td> The LQD dispersion (variance) parameter estimate. This is the LQD estimate of the scale value, squared.</td></tr> <tr valign="top"><td>TANHsigma2</td> <td> The tanh dispersion parameter estimate.</td></tr> <tr valign="top"><td>weights</td> <td> The matrix of tanh weights for the orthogonalized residuals. The matrix has one row for each observation in the data and as many columns as there are formulas specified in the model argument. The first column of the matrix has names for the observations, and the remaining columns contain the weights. Each of the latter columns has a name derived from the name of one of the count variables named in the model argument. If count1 is the name of the count variable used in the first formula, then the second column in the matrix is named weights:count1, etc.
<table summary="R valueblock"> <tr valign="top"> <TD> coefficients</ TD> <TD>以矩阵格式的双曲正切系数估计。矩阵有一列model参数中指定的每个公式。每一列的名称是用于在相应的式中的计数变量的名称。的矩阵的每一行的标签给出的回归系数行中的值适用的名称。在每个标签的回归变量名之间用一个正斜杠(/),和NA是用来表示没有回归量的矩阵中对应的值与。采用的是矩阵填写不符合一个model公式回归量的值,值为0。</ TD> </ TR> <tr valign="top"> <TD>se </ TD> <TD>以矩阵格式的双曲正切系数估计值的标准误差。用于矩阵的格式和标签是相同的,如用于coefficients。标准来自错误的的估计渐近三明治协方差估计。</ TD> </ TR> <tr valign="top"> <TD>LQDsigma2 </ TD> <TD>的LQD分散(方差)参数估计。这是LQD估计的刻度值的平方。</ TD> </ TR> <tr valign="top"> <TD>TANHsigma2 </ TD> <TD>的双曲正切分散参数估计。 / TD> </ TR> <tr valign="top"> <TD>weights</ TD> <TD>矩阵的正交残留物的的双曲正切重量为。每个观测的数据矩阵有一列和多列有公式model参数中指定的。矩阵的第一列中的名称为观测,剩余的列中包含的权重。后列的名称来自model参数的计数变量的名称。如果count1是第一个公式中使用的计数变量的名称,第二列的矩阵名为weights:count1,等
If an observation has negative values specified for some outcome variables, indicating that those outcome alternatives are not available for that observation, then values of NA appear in the weights matrix for that observation, as many NA values as there are unavailable alternatives. The NA values will be the last values in the affected row of the weights matrix, regardless of which outcome alternatives were unavailable for the observation.</td></tr> <tr valign="top"><td>Hdiag</td> <td> Weights used to fully studentize the orthogonalized residuals. The matrix has one row for each observation in the data and as many columns as there are formulas specified in the model argument. The first column of the matrix has names for the observations, and the remaining columns contain the weights. Each of the latter columns has a name derived from the name of one of the count variables named in the model argument. If count1 is the name of the count variable used in the first formula, then the second column in the matrix is named Hdiag:count1, etc.
如果观察的一些结果变量指定的负值,表明这些成果的替代品是不可用的,观察,然后NA中出现的权重矩阵,观察,许多NA值的值有无法替代方案。 NA值将是最后受影响的行中的值的权重矩阵,不管是哪个结果替代品无法观察。</ TD> </ TR> <tr valign="top"> <TD >Hdiag</ TD> <TD>重量,充分studentize正交残差。每个观测的数据矩阵有一列和多列有公式model参数中指定的。矩阵的第一列中的名称为观测,剩余的列中包含的权重。后列的名称来自model参数的计数变量的名称。如果count1是第一个公式中使用的计数变量的名称,第二列的矩阵名为Hdiag:count1,等
If an observation has negative values specified for some outcome variables, indicating that those outcome alternatives are not available for that observation, then values of 0 appear in the weights matrix for that observation, as many 0 values as there are unavailable alternatives. Values of 0 that are created for this reason will be the last values in the affected row of the weights matrix, regardless of which outcome alternatives were unavailable for the observation.</td></tr> <tr valign="top"><td>prob</td> <td> The matrix of predicted probabilities for each category for each observation based on the tanh coefficient estimates.</td></tr> <tr valign="top"><td>residuals.rotate</td> <td> Matrix of studentized residuals which have been made comparable by rotating each choice category to the first position. These residuals, unlike the student and standard residuals below, are no longer orthogonalized because of the rotation. These are the residuals displayed in Table 6 of the reference article.</td></tr> <tr valign="top"><td>residuals.student</td> <td> Matrix of fully studentized orthogonalized residuals.</td></tr> <tr valign="top"><td>residuals.standard</td> <td> Matrix of orthogonalized residuals, standardized by dividing by the overdispersion scale.</td></tr> <tr valign="top"><td>mnl</td> <td> List of nonrobust maximum likelihood estimation results from function multinomMLE.</td></tr> <tr valign="top"><td>multinomT</td> <td> List of multinomial multivariate-t estimation results from function multinomT.</td></tr> <tr valign="top"><td>genoud</td> <td> List of LQD estimation results obtained by rgenoud optimization, from function genoudRob.</td></tr> <tr valign="top"><td>mtanh</td> <td> List of tanh estimation results from function mGNtanh.</td></tr> <tr valign="top"><td>error</td> <td> Exit error code, usually from function mGNtanh.</td></tr> <tr valign="top"><td>iter</td> <td> Number of LQD-tanh iterations.</td></tr> </table>
如果观察一些结果变量指定的具有负的值,表明这些成果的替代品并非为该观察,然后为0的值的权重矩阵中显示该观察中,尽可能多的0值,因为是不可用的替代品。值0所创建的这个原因将是最后的在受影响的行的权重矩阵的值,不管是哪个结果替代品无法观察。</ TD> </ TR> <tr valign="top"> <TD> prob </ TD> <TD>的双曲正切系数估计值的基础上为每个观察,每个类别的预测概率的矩阵。</ TD> </ TR> <tr valign="top"> < residuals.rotate TD> </ TD> <TD>矩阵的学生化残差作出了同样的旋转每一个选择类别第一的位置。学生和标准残差以下的不同的是,这些残差,不再因为旋转正交。这是残差显示在表6的参考文章。</ TD> </ TR> <tr valign="top"> <TD> residuals.student</ TD> <TD>完全的学生化正交化残差矩阵</ TD> </ TR> <tr valign="top"> <TD>residuals.standard </ TD> <TD>矩阵的正交残差,标准化除以偏大规模。</ TD> < / TR> <tr valign="top"> <TD> mnl </ TD> <TD>列表nonrobust最大似然估计的结果,从功能multinomMLE。</ TD> </ TR> < TR VALIGN =“顶”> <TD>multinomT </ TD> <TD>多项多变量-T的估计结果,从功能列表multinomT。</ TD> </ TR> <TR VALIGN = “顶”> <TD> genoud </ TD> <TD> rgenoud优化,从功能列表的的LQD估计结果genoudRob。</ TD> </ TR> <TR VALIGN =“顶“<TD> mtanh </ TD> <TD>全部功能的tanh估计结果mGNtanh。</ TD> </ TR> <tr valign="top"> <TD> error</ TD> <TD>出错误代码,通常是从功能mGNtanh。</ TD> </ TR> <tr valign="top"> <TD>iter <TD> LQD双曲正切的迭代数/ TD> </ TD> </ TR> </ TABLE>
(作者)----------Author(s)----------
Walter R. Mebane, Jr., University of Michigan,
<a href="mailto:wmebane@umich.edu">wmebane@umich.edu</a>, <a href="http://www-personal.umich.edu/~wmebane">http://www-personal.umich.edu/~wmebane</a> <br>
Jasjeet S. Sekhon, UC Berkeley, <a href="mailto:sekhon@berkeley.edu">sekhon@berkeley.edu</a>,
<a href="http://sekhon.berkeley.edu/">http://sekhon.berkeley.edu/</a>
参考文献----------References----------
Walter R. Mebane, Jr. and Jasjeet Singh Sekhon. 2004. “Robust Estimation and Outlier Detection for Overdispersed Multinomial Models of Count Data.” American Journal of Political Science 48 (April): 391–410. http://sekhon.berkeley.edu/multinom.pdf
For additional documentation please visit http://sekhon.berkeley.edu/robust/.
实例----------Examples----------
# make some multinomial data[一些多项数据]
x1 <- rnorm(50);
x2 <- rnorm(50);
p1 <- exp(x1)/(1+exp(x1)+exp(x2));
p2 <- exp(x2)/(1+exp(x1)+exp(x2));
p3 <- 1 - (p1 + p2);
y <- matrix(0, 50, 3);
for (i in 1:50) {
y[i,] <- rmultinomial(1000, c(p1[i], p2[i], p3[i]));
}
# perturb the first 5 observations[扰动前5个观察]
y[1:5,c(1,2,3)] <- y[1:5,c(3,1,2)];
y1 <- y[,1];
y2 <- y[,2];
y3 <- y[,3];
# put data into a dataframe[将数据放入一个数据框]
dtf <- data.frame(x1, x2, y1, y2, y3);
## Set parameters for Genoud[#设置参数热努]
zz.genoud.parms <- list( pop.size = 1000,
wait.generations = 10,
max.generations = 100,
scale.domains = 5,
print.level = 0
)
# estimate a model, with "y3" being the reference category[估计一个模型,参考类别“Y3”]
# true coefficient values are: (Intercept) = 0, x = 1[真实的系数的值是:(截取)= 0,x = 1时]
# impose an equality constraint[施加一个等式约束]
# equality constraint: coefficients of x1 and x2 are equal[等式约束:x1和x2的系数是相等的]
mulrobE <- multinomRob(list(y1 ~ x1, y2 ~ x2, y3 ~ 0),
dtf,
equality = list(list(y1 ~ x1 + 0, y2 ~ x2 + 0)),
genoud.parms = zz.genoud.parms,
print.level = 3, iter=FALSE);
summary(mulrobE, weights=TRUE);
#Do only MLE estimation. The following model is NOT identified if we[不要只MLE的估计。下面的模型没有确定,如果我们]
#try to estimate the overdispersed MNL.[尝试估计overdispersed的MNL。]
dtf <- data.frame(y1=c(1,1),y2=c(2,1),y3=c(1,2),x=c(0,1))
summary(multinomRob(list(y1 ~ 0, y2 ~ x, y3 ~ x), data=dtf, MLEonly=TRUE))
转载请注明:出自 生物统计家园网(http://www.biostatistic.net)。
注:
注1:为了方便大家学习,本文档为生物统计家园网机器人LoveR翻译而成,仅供个人R语言学习参考使用,生物统计家园保留版权。
注2:由于是机器人自动翻译,难免有不准确之处,使用时仔细对照中、英文内容进行反复理解,可以帮助R语言的学习。
注3:如遇到不准确之处,请在本贴的后面进行回帖,我们会逐渐进行修订。
|