정규분포와 관련된 이론: 카이제곱분포, 표본분산, t-분포, F-분포 (Normal Distribution Theory: Chi-squared distribution, sample variance, t-distribution, F-distribution)

728x90

Normal Distribution Theory

정규분포를 따르는 2개의 확률변수의 합은 정규분포임을 알 수 있었다.

이제 이를 확장해서 $n$개의 확률변수의 합도 정규분포를 따르는지 알아보자.

$n$개의 확률변수 $X_i \sim N(\mu_i, \sigma_i^2)$의 합을 $Y=(\sum_i a_iX_i)+b$라 하면
\[ Y \sim \left( (\sum_i a_i \mu_i)+b,\ \sum_i a_i^2\sigma_i^2) \right) \]

Proof

mgf를 이용하여 증명한다.

확률변수의 합의 mgf는 각 확률변수의 mgf의 곱과 같으므로

$M_Y(t) = \Pi_i M_{X_i}(t) = e^{bt} \cdot \text{exp}[\sum_i(a_i\mu_i)t + \Pi_i(a_i\sigma_i)^2 t^2] = \text{exp}[(\sum_i(a_i \mu_i)+ b)t + \frac{1}{2}(\Pi_i(a_i\sigma_i)^2)t^2]$

Zero covariance is equivalent to independence

\[ \sum_{i=1}^n a_iX_i \perp\!\!\!\perp \sum_{i=1}^n b_iX_i \Leftrightarrow \sum_{i=1}^n a_ib_i\sigma_i^2=0 \]

Proof

$\mathbf{X} = [X_1, \cdots, X_n]^\top,\ \mathbf{a} = [a_1, \cdots, a_n]^\top,\ \mathbf{b}=[b_1, \cdots, b_n]^\top$ 이라 하고

새로운 확률변수 $V,\ W$를 다음과 같이 정의하자.

$V = \sum_i a_iX_i = \mathbf{a}^\top \mathbf{X}$

$W = \sum_i b_iX_i = \mathbf{b}^\top \mathbf{X}$

한편, 공분산은

$Cov(V, W) = \mathbf{a}^\top \mathbf{b} \sigma^2$

두 벡터 V, W가 독립이면

$V \perp\!\!\!\perp W \ \text{iif} \ \mathbf{a}^\top \mathbf{b}=0$

이는 ($X$가 정규분포의 확률변수일 때) orthogonality가 statistical independence를 의미한다.

The Chi-squared Distribution

$X_1, \dots, X_n$이 $N(0, 1)$의 i.i.d라 할 때 $Y = \sum_{i=1}^{n}X_i^2$이라 하면
\[ Y = \sum_{i=1}^{n}X_i^2 \sim \chi^2(n) = \Gamma(\frac{n}{2}, \frac{1}{2}) \]
이다. 이 때 $n$은 카이제곱분포의 자유도(degrees of freedom)이라 한다.

카이제곱분포의 기댓값과 분산

$\Gamma(\alpha, \lambda)$의 기댓값과 분산이 각각 $\alpha / \lambda,\ \alpha / \lambda^2$이므로

$E(Y) = (n/2)/(1/2) = n, \quad V(Y) = (n/2)/(1/2)^2 = 2n$ 이다.

$n$이 충분히 크면 $\chi^2(n) \approx N(n, 2n)$ 이다.

카이제곱분포의 합

독립인 두 확률변수 $U \sim \Gamma(\alpha, \lambda), V \sim \Gamma(\beta, \lambda)$에 대하여

$U+V \sim \Gamma(\alpha + \beta, \lambda)$이므로

독립인 두 카이제곱분포의 확률변수의 합은 자유도의 합과 같다. 즉

$Y \sim \chi^2(n), Z \sim \chi^2(m)$이면 $Y+Z \sim \chi^2(n+m)$

표본평균과 표본분산

$X_1, \dots, X_n$이 $N(\mu, \sigma^2)$의 i.i.d라 할 때 표본평균과 표본분산을 각각 $\overline{X}, S^2$이라 하고
\[ \overline{X} = \cfrac{1}{n}\sum_{i=1}^{n}X_i \]
\[ S^2 = \cfrac{1}{n-1}\sum_{i=1}^{n}(X_i - \overline{X})^2 \]

그리고 아래 두가지를 만족한다.
\[ \overline{X} \perp\!\!\!\perp S^2 \]
\[ (n-1)S^2/\sigma^2 \sim \chi^2(n-1) \]

Why

https://www2.stat.duke.edu/courses/Fall18/sta611.01/Lecture/lec12_mean_var_indep.pdf

https://trivia-starage.tistory.com/250

표본분산은 왜 n-1로 나눌까? (불편추정량, 자유도)

표본분산은 왜 n이 아니라 n-1로 나눌까?Notation$\mu$: 모평균 (모집단의 평균, 우리는 알 수 없다.)$\sigma^2$: 모분산 (모집단의 분산, 우리는 알 수 없다.)$X_1, X_2, \dots, X_n$: 평균이 $\mu$이고 분산이 $\sig

trivia-starage.tistory.com

표본 분산

\begin{align*} E[\sum_{i=1}^n(X_i - \overline{X})^2] &= E[\sum_{i=1}^nX_i^2 -n\overline{X}^2] \\ &= nE(X_1^2) - nE(\overline{X}) \\ &= n\{ V(X_1)+E(X_1)^2 \} - n\{ V(\overline{X})+E(\overline{X})\} \\ &= n(\sigma^2 + \mu^2) -n(\cfrac{\sigma^2}{n} + \mu^2) \\ &= (n-1)\sigma^2 \end{align*}

따라서 $E(S^2)=E[\frac{1}{n-1}\sum_i(X_i - \overline{X})^2] = \sigma^2$ 이다.

Chapter 5에 나오지만, 위 성질 때문에 $S^2$은 (분포와 상관없이) $\sigma^2$의 불편추정량(unbiased estimator)이다.

The $t$ Distribution

$Z \sim N(0, 1)$이고 $W \sim \chi^2(n)$이고 $Z \perp\!\!\!\perp W$ 라고 하자. 그러면
\[ T = \cfrac{Z}{\sqrt{W/n}} \sim t(n) \]
이다.

Note: 독립 조건이 필요하다.

t-distribution with degrees of freedom is 1 and 30 — 자유도가 1과 30인 $t$분포

Example

i.i.d인 $X_1, X_2, X_3 \sim N(\mu, \sigma^2)$에 대하여 $Z = \cfrac{X_1 - \mu}{\sigma}$, $W = \left( \cfrac{X_2-\mu}{\sigma} \right)^2 + \left( \cfrac{X_3 - \mu}{\sigma} \right)^2$라 하자.

그러면 $T = \cfrac{Z}{W/2} \sim t(2)$이다.

표본평균과 $t$ 분포

i.i.d인 $X_1, \dots, X_n \sim N(\mu, \sigma^2)$라 하자. 그러면

$Z = \cfrac{\overline{X} - \mu}{\sigma / \sqrt{n}} \sim N(0, 1)$이고

$\cfrac{(n-1)S^2}{\sigma^2} \sim \chi^2(n-1)$이고 이 둘은 독립이다. (위 내용 참고)

따라서 $T$를 다음과 같이 정의할 수 있다.

$T = \cfrac{\cfrac{\overline{X} - \mu}{\sigma / \sqrt{n}}}{\sqrt{\cfrac{(n-1)S^2}{\sigma^2} /(n-1)} } = \cfrac{\overline{X} - \mu}{S / \sqrt{n}} \sim t(n-1)$ 이다.

$t$분포의 표준정규분포 근사

$n$이 충분히 크면 $t(n) \sim Z(0, 1)$이다.

The $F$ Distribution

두 카이제곱 확률변수 $W \sim \chi^2(m)$과 $V \sim \chi^2(n)$이고 $W \perp\!\!\!\perp V$ 이라 하면
\[ Y = \cfrac{W/m}{V/n} \sim F(m,n) \]
이다.

The density of F-distribution with F(2,1) and F(3, 10) — $F$분포의 그래프

$t$분포와 $F$분포

만약 $X \sim t(n)$이라면 $X^2 \sim F(1, n)$ 이다.

proof

$t^2 = \left( \cfrac{Z}{\sqrt{W/df}} \right)^2 = \cfrac{\chi^2(1) / 1}{W/df} \overset{D}{=}F(1, n)$

728x90

'스터디 > 확률과 통계' 카테고리의 다른 글

Likelihood function, Sufficient Statistics, Minimum Sufficient Statistics (가능도함수, 충분통계량, 최소충분통계량) (0)	2023.05.16
Ch5. Statistical Inference (0)	2023.05.08
중심극한정리 (The Central Limit Theorem, CLT) (0)	2023.04.25
확률변수의 수렴과 큰 수의 법칙 (Sampling, Convergence, Law of Large Numbers) (0)	2023.04.23
확률에서의 부등식, Inequality (Markov's, Chebychev's, Cauchy-Schwartz, Jensen's, 마르코프, 체비셰프, 코시-슈바르츠, 젠센 부등식) (0)	2023.04.13

궁금한게많은joon

정규분포와 관련된 이론: 카이제곱분포, 표본분산, t-분포, F-분포 (Normal Distribution Theory: Chi-squared distribution, sample variance, t-distribution, F-distribution)

Normal Distribution Theory

Proof

Proof

The Chi-squared Distribution

카이제곱분포의 기댓값과 분산

카이제곱분포의 합

표본평균과 표본분산

Why

표본 분산

The $t$ Distribution

Example

표본평균과 $t$ 분포

$t$분포의 표준정규분포 근사

The $F$ Distribution

$t$분포와 $F$분포

'스터디 > 확률과 통계' 카테고리의 다른 글

티스토리툴바

정규분포와 관련된 이론: 카이제곱분포, 표본분산, t-분포, F-분포 (Normal Distribution Theory: Chi-squared distribution, sample variance, t-distribution, F-distribution)

Normal Distribution Theory

Proof

Proof

The Chi-squared Distribution

카이제곱분포의 기댓값과 분산

카이제곱분포의 합

표본평균과 표본분산

Why

표본 분산

The $t$ Distribution

Example

표본평균과 $t$ 분포

$t$분포의 표준정규분포 근사

The $F$ Distribution

$t$분포와 $F$분포

'스터디 > 확률과 통계' 카테고리의 다른 글

관련글

티스토리툴바