R 程式設計/二項式模型

在本節中，我們介紹了二項式模型。我們有一個二進位制的輸出結果和一組解釋變數。

這種模型可以使用線性機率模型進行分析。但是，這種模型對於伯努利分佈引數的一個缺點是，除非對 $\beta$ 進行限制，否則估計的係數可能意味著機率超出單位區間 $[0,1]$ 。因此，更常使用諸如Logit 模型或Probit 模型之類的模型。如果你想估計線性機率模型，可以檢視線性模型頁面。

Logit 模型

模型的形式為： $y_{i}\sim Bernoulli(\pi _{i})$ ，逆連結函式為： $\pi _{i}={\frac {exp(x_{i}'\beta )}{(1+exp(x_{i}'\beta ))}}$ 。可以使用最大似然估計或貝葉斯方法進行估計。

虛假資料模擬

> x <- 1 + rnorm(1000,1) 
> xbeta <- -1  + (x* 1)
> proba <- exp(xbeta)/(1 + exp(xbeta))
> y <- ifelse(runif(1000,0,1) < proba,1,0)
> table(y)
> df <- data.frame(y,x)

最大似然估計

估計 Logit 模型的標準方法是glm()函式，使用 family 為binomial以及 link 為logit.
lrm()(Design) 是邏輯迴歸模型的另一種實現。
Zelig 包^[1]中也包含了實現。

在這個例子中，我們模擬了一個包含一個連續預測變數的模型，並使用glm()函式來估計該模型。

> res <- glm(y ~ x , family  = binomial(link=logit))
> summary(res) # results
> confint(res) # confindence intervals
> names(res) 
> exp(res$coefficients) # odds ratio
> exp(confint(res)) # Confidence intervals for odds ratio (delta method)
> predict(res) # prediction on a linear scale
> predict(res, type = "response") # predicted probabilities
> plot(x, predict(res, type = "response")) # plot the predicted probabilities

Zelig

Zelig' 包使計算所有感興趣的量變得容易。

我們開發了一個新的例子。首先，我們模擬了一個包含兩個連續解釋變數的新資料集，然後使用zelig()函式，使用model = "logit"選項來估計模型。

然後，我們觀察在 x1 和 x2 均值下的 y 的預測值。
然後，我們觀察在 x1 = 0 和 x2 = 0 時的預測值。
我們還觀察了當 x1 從第 3 個四分位數變為第 1 個四分位數時會發生什麼。

> x1 <- 1 + rnorm(1000)
> x2 <- -1 + x1 + rnorm(1000)
> xbeta <- -1  + x1 + x2
> proba <- exp(xbeta)/(1 + exp(xbeta))
> y <- ifelse(runif(1000,0,1) < proba,1,0)
> mydat <- data.frame(y,x1,x2)
> table(y)
> 
> z.out <- zelig(y ~ x1 + x2, model = "logit", data = mydat) # estimating the model
> summary(z.out)
> x.out <- setx(z.out, x1 = mean(x1), x2 = mean(x2)) # setting values for the explanatory variables
> s.out <- sim(z.out, x = x.out) # simulating the quantities of interest
> summary(s.out)
> plot(s.out) # plot the quantities of interest

> # the same with other values
> x.out <- setx(z.out, x1 = 0, x2 = 0)
> s.out <- sim(z.out, x = x.out)
> summary(s.out)

> # What happens if x1 change from the 3rd quartile to the 1st quartile ? 
> x.high <- setx(z.out, x1 = quantile(mydat$x1,.75), x2 = mean(mydat$x2)) 
> x.low <- setx(z.out, x1 = quantile(mydat$x1,.25), x2 = mean(x2)) 
> s.out2<-sim(z.out, x=x.high, x1=x.low) 
> plot(s.out2)

verification 包中的 ROC 曲線。
Zelig 包含rocplot()函式來估計該模型。

請檢視UCLA 統計計算示例

貝葉斯估計

bayesglm()函式，使用 arm 包。
MCMClogit()函式，使用 MCMCpack 包來進行 Logit 模型的貝葉斯估計。

> # Data generating process
> x <- 1 + rnorm(1000,1) 
> xbeta <- -1  + (x* 1)
> proba <- exp(xbeta)/(1 + exp(xbeta))
> y <- ifelse(runif(1000,0,1) < proba,1,0)
> table(y)
> 
> library(MCMCpack)
> res <- MCMClogit(y ~ x)
> summary(res)

> library("arm")
> res <- bayesglm(y ~ x, family = binomial(link=logit))
> summary(res)

Probit 模型

Probit 模型是一種二元模型，其中我們假設連結函式是正態分佈的累積分佈函式。

我們模擬虛假資料。首先，我們在任何分佈中（這無關緊要）抽取兩個隨機變數 x1 和 x2。然後，我們將 xbeta 作為 x1 和 x2 的線性組合來建立向量。我們將連結函式應用於該向量，並將二元變數 y 作為伯努利隨機變數來抽取。

> x1 <- 1 + rnorm(1000)
> x2 <- -1 + x1 + rnorm(1000)
> xbeta <- -1  + x1 + x2
> proba <- pnorm(xbeta)
> y <- ifelse(runif(1000,0,1) < proba,1,0)
> mydat <- data.frame(y,x1,x2)
> table(y)

最大似然

我們可以使用glm()函式，使用family=binomial(link=probit)選項，或者使用 sampleSelection 包中的probit()函式，它是前者的包裝器。

> res <- glm(y ~ x1 + x2 , family = binomial(link=probit), data = mydat)
> summary(res)
> 
> library("sampleSelection")
> probit(y ~ x1 + x2, data = mydat)
> summary(res)

貝葉斯估計

MCMCprobit()(MCMCpack)

> library("MCMCpack")
> post <- MCMCprobit(y ~ x1 + x2 , data = mydat)
> summary(post)
> plot(post)

另請參見

UCLA 統計計算網站^[2]上有一個關於使用 R 的 Probit 模型的示例。

半引數模型

Klein 和 Spady 估計器^[3]在 np 包^[4]中實現（請檢視 npindex()，使用method = "kleinspady"選項）。

參考文獻

↑ Kosuke Imai, Gary King 和 Olivia Lau。2008。“logit：二元因變數的邏輯迴歸”在 Kosuke Imai、Gary King 和 Olivia Lau，“Zelig：每個人都可以使用的統計軟體”中，http://gking.harvard.edu/zelig
↑ UCLA 統計計算 Probit 示例 http://www.ats.ucla.edu/stat/R/dae/probit.htm
↑ Klein, R. W. 和 R. H. Spady（1993），“用於二元響應模型的有效半引數估計器”，Econometrica，61，387-421。
↑ Tristen Hayfield 和 Jeffrey S. Racine（2008）。非引數計量經濟學：np 包。統計軟體雜誌 27(5)。URL http://www.jstatsoft.org/v27/i05/。

前一節：分位數迴歸

索引

下一節：多項式有序模型

[zeliglogit-1] Kosuke Imai, Gary King 和 Olivia Lau。2008。“logit：二元因變數的邏輯迴歸”在 Kosuke Imai、Gary King 和 Olivia Lau，“Zelig：每個人都可以使用的統計軟體”中，http://gking.harvard.edu/zelig

[uclaprobit-2] UCLA 統計計算 Probit 示例 http://www.ats.ucla.edu/stat/R/dae/probit.htm

[3] Klein, R. W. 和 R. H. Spady（1993），“用於二元響應模型的有效半引數估計器”，Econometrica，61，387-421。

[4] Tristen Hayfield 和 Jeffrey S. Racine（2008）。非引數計量經濟學：np 包。統計軟體雜誌 27(5)。URL http://www.jstatsoft.org/v27/i05/。

[1]

[2]

[3]

[4]