R 程式設計/機率分佈

R 程式設計

編輯此框

本頁回顧了主要的機率分佈，並描述了處理它們的 R 函式。

R 擁有許多機率函式。

r 是隨機變數生成器的通用字首，如 runif()、rnorm()。
d 是機率密度函式的通用字首，如 dunif()、dnorm()。
p 是累積分佈函式的通用字首，例如 punif()、pnorm()。
q 是分位數函式的通用字首，例如 qunif()、qnorm()。

離散分佈

本福特分佈

本福特分佈是數字首位數字的分佈。它是由本福特在 1938 年^[1]和紐科姆在 1881 年^[2]提出的。

> library(VGAM)
> dbenf(c(1:9))
[1] 0.30103000 0.17609126 0.12493874 0.09691001 0.07918125 0.06694679 0.05799195 0.05115252 0.04575749

伯努利

我們可以使用以下方法從伯努利分佈中抽樣：sample(), runif()或rbinom()使用size = 1.

> n <- 1000
> x <- sample(c(0,1), n, replace=T)
> x <- sample(c(0,1), n, replace=T, prob=c(0.3,0.7))
> x <- runif(n) > 0.3
> x <- rbinom(n, size=1, prob=0.2)

二項式

我們可以使用 rbinom() 函式從二項式分佈中抽樣，該函式的引數包括：n 表示要抽取的樣本數量，size 定義試驗次數，prob 定義每次試驗成功的機率。

> x <- rbinom(n=100,size=10,prob=0.5)

超幾何分佈

我們可以使用 rhyper() 函式從超幾何分佈中抽取 n 次樣本。

> x <- rhyper(n=1000, 15, 5, 5)

幾何分佈

幾何分佈。

> N <- 10000
> x <- rgeom(N, .5)
> x <- rgeom(N, .01)

多項式

多項式分佈。

> sample(1:6, 100, replace=T, prob= rep(1/6,6))

負二項式分佈

負二項式分佈是伯努利事件序列中在 k 次成功之前失敗次數的分佈。

> N <- 100000
> x <- rnbinom(N, 10, .25)

泊松分佈

我們可以使用 lambda 引數設定均值的泊松分佈中抽取 n 個值。

> x <- rpois(n=100, lambda=3)

齊夫定律

單詞頻率的分佈被稱為齊夫定律。它也是城市規模分佈的良好描述^[3]。dzipf()和pzipf()(VGAM)

> library(VGAM)
> dzipf(x=2, N=1000, s=2)

連續分佈

Beta 和 Dirichlet 分佈

Beta 分佈
gtools 和 MCMCpack 中的 Dirichlet

>library(gtools)
>?rdirichlet
>library(bayesm)
>?rdirichlet
>library(MCMCpack)
>?Dirichlet

柯西

我們可以使用 rcauchy() 函式從給定 location 引數 $x_{0}$ （預設值為 0）和 scale 引數 $\gamma$ （預設值為 1）的柯西分佈中抽取 n 個值。

> x <- rcauchy(n=100, location=0, scale=1)

卡方分佈

卡方分佈的分位數（ $\chi ^{2}$ 分佈）

> qchisq(.95,1)
[1] 3.841459
> qchisq(.95,10)
[1] 18.30704
> qchisq(.95,100)
[1] 124.3421

指數

我們可以使用 rexp() 函式從給定 rate（預設值為 1）的指數分佈中抽取 n 個值

> x <- rexp(n=100, rate=1)

費希爾-斯尼德科

我們可以繪製費希爾分佈（F 分佈）的密度

> par(mar=c(3,3,1,1))
> x <- seq(0,5,len=1000)
> plot(range(x),c(0,2),type="n")
> grid()
> lines(x,df(x,df1=1,df2=1),col="black",lwd=3)
> lines(x,df(x,df1=2,df2=1),col="blue",lwd=3)
> lines(x,df(x,df1=5,df2=2),col="green",lwd=3)
> lines(x,df(x,df1=100,df2=1),col="red",lwd=3)
> lines(x,df(x,df1=100,df2=100),col="grey",lwd=3)
> legend(2,1.5,legend=c("n1=1, n2=1","n1=2, n2=1","n1=5, n2=2","n1=100, n2=1","n1=100, n2=100"),col=c("black","blue","green","red","grey"),lwd=3,bty="n")

伽馬

我們可以使用 rgamma() 函式從具有給定 shape 引數和 scale 引數 $\theta$ 的伽馬分佈中抽取 n 個值。或者，可以給出 shape 引數和 rate 引數 $\beta =1/\theta$ 。

> x <- rgamma(n=10, scale=1, shape=0.4)
> x <- rgamma(n=100, scale=1, rate=0.8)

Lévy

我們可以使用 rlevy() 函式從具有給定位置引數 $\mu$ （由引數 m 定義，預設值為 0）和縮放參數（由引數 s 給出，預設值為 1）的 Lévy 分佈中抽取 n 個值。

> x <- rlevy(n=100, m=0, s=1)

對數正態分佈

我們可以使用 rlnorm() 函式從具有給定 meanlog（預設值為 0）和 sdlog（預設值為 1）的對數正態分佈中抽取 n 個值。

> x <- rlnorm(n=100, meanlog=0, sdlog=1)

正態分佈及相關分佈

我們可以使用 rnorm() 函式從具有給定 mean（預設值為 0）和 sd（預設值為 1）的正態分佈或高斯分佈中抽取 n 個值。

> x <- rnorm(n=100, mean=0, sd=1)

正態分佈的分位數

> qnorm(.95)
[1] 1.644854
> qnorm(.975)
[1] 1.959964
> qnorm(.99)
[1] 2.326348

mvtnorm 包含用於多元正態分佈的函式。
- rmvnorm()生成多元正態分佈。

> library(mvtnorm)
> sig <- matrix(c(1, 0.8, 0.8, 1), 2, 2)
> r <- rmvnorm(1000, sigma = sig)
> cor(r) 
          [,1]      [,2]
[1,] 1.0000000 0.8172368
[2,] 0.8172368 1.0000000

帕累託分佈

廣義帕累託 dgpd()在 evd 中
dpareto(), ppareto(), rpareto(), qpareto() 在 actuar 中
VGAM 包也包含用於帕累託分佈的函式。

學生 t 分佈

學生 t 分佈的分位數

> qt(.975,30)
[1] 2.042272
> qt(.975,100)
[1] 1.983972
> qt(.975,1000)
[1] 1.962339

以下幾行繪製了 t 分佈在自由度函式中的 0.975 分位數

curve(qt(.975,x), from = 2 , to = 100, ylab = "Quantile 0.975 ", xlab = "Degrees of freedom", main = "Student t distribution")
abline(h=qnorm(.975), col = 2)

均勻分佈

我們可以使用 runif() 函式從兩個值之間（預設值為 0 和 1）的均勻分佈（也稱為矩形分佈）中抽取 n 個值。

> runif(n=100, min=0, max=1)

威布林分佈

我們可以使用 rweibull() 函式從具有給定 shape 和 scale 引數 $\mu$ （預設值為 1）的威布林分佈中抽取 n 個值。

> x <- rweibull(n=100, shape=0.5, scale=1)

極值及相關分佈

Gumbel 分佈
Logistic 分佈：兩個 Gumbel 分佈之差的分佈。

plogis, qlogis, dlogis, rlogis

Fréchet dfrechet() evd
廣義極值 dgev() evd
Gumbel dgumbel() evd
Burr, dburr, pburr, qburr, rburr 在 actuar 中

圓形統計學中的分佈

CircStats 包包含用於圓形統計學的函式。
- dvm() 馮·米塞斯分佈（也稱為圓形正態分佈或提霍諾夫分佈）密度函式
- dtri() 三角形密度函式
- dmixedvm() 混合馮·米塞斯密度
- dwrpcauchy() 包裹柯西密度
- dwrpnorm() 包裹正態密度。

另請參見

包 VGAM、SuppDists、actuar、fBasics、bayesm、MCMCpack

參考文獻

↑ Benford, F. (1938) The Law of Anomalous Numbers. Proceedings of the American Philosophical Society, 78, 551–572.
↑ Newcomb, S. (1881) Note on the Frequency of Use of the Different Digits in Natural Numbers. American Journal of Mathematics, 4, 39–40.
↑ Gabaix, Xavier (August 1999). "Zipf's Law for Cities: An Explanation". Quarterly Journal of Economics 114 (3): 739–67. doi:10.1162/003355399556133. ISSN 0033-5533. http://pages.stern.nyu.edu/~xgabaix/papers/zipf.pdf.

上一節：最佳化

索引

下一節：隨機數生成

[benford-1] Benford, F. (1938) The Law of Anomalous Numbers. Proceedings of the American Philosophical Society, 78, 551–572.

[newcomb-2] Newcomb, S. (1881) Note on the Frequency of Use of the Different Digits in Natural Numbers. American Journal of Mathematics, 4, 39–40.

[3] Gabaix, Xavier (August 1999). "Zipf's Law for Cities: An Explanation". Quarterly Journal of Economics 114 (3): 739–67. doi:10.1162/003355399556133. ISSN 0033-5533. http://pages.stern.nyu.edu/~xgabaix/papers/zipf.pdf.

[1]

[2]

[3]