[Ensemble] AdaBoost

728x90

AdaBoost

adaptive boosting 이다.

Algorithm in Overview

Given: $d$개의 class-labeled tuple이 input으로 주어진다. $(\mathbf{X}_1, y_1), \dots, (\mathbf{X}_d, y_d)$

맨 처음, 모든 tuple은 uniformly weighed 된다. 즉 $j$번째 tuple의 weight는 $\frac{1}{d}$이다.

$T$ round동안 $T$개의 classifier를 생성한다. 그리고 $i$번째 round에서,

$\mathcal{D}$로부터 복원추출(sampling with replacement)하여 training set $D_i$를 얻는다.
각 tuple은 각 weight에 기반하여 selected 확률을 지닌다.
$D_i$로 classifier $C_i$를 학습하고, 다시 $D_i$를 testset으로 간주하여 error rate를 구한다.
tuple이 misclassified되면 그 weight는 증가하고 그렇지 않으면(잘 classified되면) weight를 줄인다.

Details of AdaBoost

Given $\mathcal{D} = \{ (\mathbf{X}_1, y_1), \dots, (\mathbf{X}_d, y_d) \}$

$D_1$의 weight는 uniform하게 한다: $w_1 = [\frac{1}{m}, \dots, \frac{1}{m}]$. 즉 $w_1(i) = \frac{1}{m} (i=1, \dots, d)$

for $t = 1, \dots, T$

$D_t$를 이용하여 분류기 $C_t$를 학습한다.
- error rate $\epsilon_t$를 계산한다.
  - $ \epsilon_t = \cfrac{\sum_{i=1}^{d}w_t(i) \delta(C_t(\mathbf{X}_i) \neq y_i) }{\sum_{i=1}^{d}w_t } $
- importance of classifier $\alpha_t$를 계산한다.
  - $\alpha_t = \cfrac{1}{2} \ln \left( \cfrac{1 - \epsilon_t}{\epsilon_t} \right)$
  - error rate $\epsilon_t$가 작으면 $\alpha_t$는 커진다.
- (길이가 $d$인) Weight $w_t$를 다음 단계 $(t+1)$의 weight로 update한다.
  - $w_{t+1}(i) = \cfrac{w_t(i)}{Z_t} \text{exp}(-\alpha_t y_i C_t(x_i) )$
  - $Z_t$는 normalization factor이므로 $Z_t = \sum_{i=1}^{d} w_t(i)$ 이다.
  - $y_i \in \{-1, 1 \}$이므로 $y_i = C_t(x_i)$이면 $\text{exp}$ 내부는 $\text{exp}(-\alpha_t)$이고 예측이 틀렸으면 $\text{exp}$ 내부는 $\text{exp}(\alpha_t)$이다.
$C^{*} = \underset{y}{\text{argmax}}\sum_{t=1}^{T} \alpha_t \delta\left( C_t(x) = y \right)$

Toy Example

(예시라서 그런지 소수점 반올림/올림이 난무하다.)

	10개의 인스턴스가 있고, 5개의 (+)와 5개의 (-)를 분류해보자. 처음 $D_1$은 모든 튜플 $(\mathbf{X}_i, y_i)$의 weight는 균일하다. $w_1 = [\frac{1}{10}, \dots, \frac{1}{10}]$
	[Round 1] 첫번째 분류기($h_1$)가 다음과 같이 분류했다고 하자. 3개의 인스턴스가 잘못 분류되었다. 따라서 잘못 분류된 3개의 weight를 더하여 error rate를 구한다. $\epsilon_1 = 0.1 + 0.1 + 0.1 = 0.3$ 이어서 $\alpha_1$를 구하면 $\alpha_1= \frac{1}{2}\ln \frac{1-0.3}{0.3} = \frac{1}{2} (0.84)=0.42$ weight를 업데이트하면 ($Z_1 = 1$이다.) correctly => $(0.1)(e^{-0.42}) = 0.06578 \approx 0.07$ wrong => $(0.1)(e^{0.42}) = 1.52$
	[Round 2] 두번째 분류기($h_2$)가 다음과 같이 분류했다고 하자. 3개의 인스턴스가 잘못 분류되었다. 따라서 잘못 분류된 3개의 weight를 더하여 error rate를 구한다. $\epsilon_2 = 0.07 + 0.07 + 0.07 = 0.21$ 이어서 $\alpha_2$를 구하면 $\alpha_2 = \frac{1}{2}\ln \frac{1-0.21}{0.21} = \frac{1}{2} (1.32) \approx 0.65$
	[Round 3] 세번째 분류기($h_3$)가 다음과 같이 분류했다고 하자. 3개의 인스턴스가 잘못 분류되었다. 따라서 잘못 분류된 3개의 weight를 더하여 error rate를 구한다. $\epsilon_3 = 0.14$ 이어서 $\alpha_3$를 구하면 $\alpha_3 =\frac{1}{2}\ln \frac{1-0.14}{0.14} = \frac{1}{2} (1.81) \approx 0.92$
	[Final Hypotheis] 각 label (1, -1)에 맞게 partition마다 부호를 곱하고, max voting을 통해 최종 label을 예측한다. 예를 들어 6개의 파티션 중 2번째 파티션(12시 방향)의 경우, $-0.42+0.65+0.92>0$이므로 (+)로 예측한다. 또 4번째 파티션(6시 방향)의 경우, $-0.42+0.65-0.92 < 0$이므로 (-)로 예측한다.

728x90

'스터디 > 인공지능, 딥러닝, 머신러닝' 카테고리의 다른 글

[Clustering] Overview, Approach, Cluster Analysis (0)	2023.05.17
[Ensemble] AdaBoost in Python (scikit-learn) (0)	2023.05.16
[Ensemble] Random Forests in Python (scikit-learn) (0)	2023.05.13
[Ensemble] Random Forests (0)	2023.05.12
[Ensemble] Basic Concept of Ensemble Methods, Bagging, Boosting (0)	2023.05.12

궁금한게많은joon

[Ensemble] AdaBoost

AdaBoost