시계열에서 정상성의 의미

데이터분석/통계

시계열에서 정상성의 의미

씩씩한 IT블로그 2020. 12. 23. 12:14

1. 의미

정상성(Stationarity Process)이란? => 평균, 분산, 공분산 및 기타 모든 분포적 특성이 일정함을 의미.

시계열이 정상적이다? => 시간의 흐름에 따라 "통계적 특성(평균, 분산, 공분산)"이 변하지 않는것

<정상의 예시>

<비정상의 예시>

왼쪽부터 (평균이 일정하지않음, 분산이 일정하지않음, 자기공산성 존재(독립 변수들간 종속적) )

2. 약정상(Weak stationarity)과 강정상(Strong Stationarity)

(1) 약정상 : 두 시점을 비교했을 때 정상적인 것.

(비수학적 이해)
if ${X_{i t}}_{t = - \infty}^{t = + \infty}$ is a weak stationary process,
1) $X_{i 1}$ , $X_{i 2}$ , $X_{i 3}$ , ... have the same distribution.
2) $(X_{i 1}, X_{i 3})$ , $(X_{i 5}, X_{i 7})$ , $(X_{i 9}, X_{i 11})$ , ... have the same joint distribution. That's it.
(수학적 이해)
if ${X_{i t}}_{t = - \infty}^{t = + \infty}$ is a weak stationary process,
1) $E (X_{i t}) = μ$ , for all time $t$ (The first moment estimation)
2) $V a r (X_{i t}) = E (X_{i t}^{2}) - E (X_{i t})^{2} < \infty$ , for all time $t$ (The second moment estimation)
3) $C o v (X_{i s}, X_{i k}) = C o v (X_{i (s + h)}, X_{i (k + h)}) = f (h)$ , for all time $s, k, h$ (The cross moment estimation)
=> covariance just depends on $h$ .

(2) 강정상 : 모든 시점에서 정상적인것 (시간차이에만 의존하는것)

(비수학적 이해)
if ${X_{i t}}_{t = - \infty}^{t = + \infty}$ is a strong stationary process,
1) $X_{i 1}$ , $X_{i 2}$ , $X_{i 3}$ , ... have the same distribution.
2) $(X_{i 1}, X_{i 3})$ , $(X_{i 5}, X_{i 7})$ , $(X_{i 9}, X_{i 11})$ , ... have the same joint distribution.
3) $(X_{i 1}, X_{i 3}, X_{i 5})$ , $(X_{i 7}, X_{i 9}, X_{i 11})$ , $(X_{i 13}, X_{i 15}, X_{i 17})$ , ... must have the same joint distribution.
4) $(X_{i 1}, X_{i 3}, . . ., X_{i \infty})$ is invariant under all time translation.

(3) 포함관계

강정상이면 약정상이지만, 약정상은 강정상이 아니다 (강정상 => 약정상 , 약정상 =/=> 강정상)

3. 백색잡음(White Noise)와 랜덤워크(Random walk)

(1) 백색잡음

강정상의 예시. 아래와 같은 조건을 가짐

- 잔차가 정규분포이고

- 평균 0

- 일정한 분산

- 잔차들이 시간의 흐름에 따라 상관성이 없음

(2) 랜덤워크

데이터의 예시. 차분 시 백색잡음으로 전환됨.

4. 정상화 목적

(1) 시계열 모형은 데이터가 정상적이라고 가정한다 => 정상적이여야 분석효과가 높다

(2) 잔차검증 역시 정상화임을 가정하고 수행한다

5. 계수에 따른 정상성 분석

백색잡음인 랜덤한 수를 e_t라고 했을 때 아래와 같은 식이 있다고 가정한다.

Y_it=a * Y_it-1 + e_t

여기서 a의 값을 (0, 0.6, 0.9, 1)로 바꿔가며 정상성이 언제까지 유지되는지 확인한다.

(1) a=0

( Y_it=e_t )가 되어서 백색잡음 그 자체가 되었다. p-value도 0으로 통계분석결과도 정상성임을 말해주고 있다.

# 랜덤워크 데이터 생성 및 통계량 Test (rho=0)
plt.figure(figsize=(10, 4))
seed(1)
rho = 0
random_walk = [-1 if random() < 0.5 else 1]
for i in range(1, 1000):
    movement = -1 if random() < 0.5 else 1
    value = rho * random_walk[i-1] + movement
    random_walk.append(value)
plt.plot(random_walk)
plt.title('Rho: {}\n ADF p-value: {}'.format(rho, np.ravel(stationarity_adf_test(random_walk, []))[1]))
plt.tight_layout()
plt.show()

(2) a=0.6

# 랜덤워크 데이터 생성 및 통계량 Test (rho=0.6)
plt.figure(figsize=(10, 4))
seed(1)
rho = 0.6
random_walk = [-1 if random() < 0.5 else 1]
for i in range(1, 1000):
    movement = -1 if random() < 0.5 else 1
    value = rho * random_walk[i-1] + movement
    random_walk.append(value)
plt.plot(random_walk)
plt.title('Rho: {}\n ADF p-value: {}'.format(rho, np.ravel(stationarity_adf_test(random_walk, []))[1]))
plt.tight_layout()
plt.show()

(3) a=0.9

# 랜덤워크 데이터 생성 및 통계량 Test (rho=0.9)
plt.figure(figsize=(10, 4))
seed(1)
rho = 0.9
random_walk = [-1 if random() < 0.5 else 1]
for i in range(1, 1000):
    movement = -1 if random() < 0.5 else 1
    value = rho * random_walk[i-1] + movement
    random_walk.append(value)
plt.plot(random_walk)
plt.title('Rho: {}\n ADF p-value: {}'.format(rho, np.ravel(stationarity_adf_test(random_walk, []))[1]))
plt.tight_layout()
plt.show()

(4) a=1

(Y_it=1 * Y_it-1 + e_t )가 되어서 랜덤워크가 되었다. p-value역시 매우 크게 나온다.

특히 a=1일 경우 "단위근이 있다" 라고도 표현한다.

# 랜덤워크 데이터 생성 및 통계량 Test (rho=1)
plt.figure(figsize=(10, 4))
seed(1)
rho = 1
random_walk = [-1 if random() < 0.5 else 1]
for i in range(1, 1000):
    movement = -1 if random() < 0.5 else 1
    value = rho * random_walk[i-1] + movement
    random_walk.append(value)
plt.plot(random_walk)
plt.title('Rho: {}\n ADF p-value: {}'.format(rho, np.ravel(stationarity_adf_test(random_walk, []))[1]))
plt.tight_layout()
plt.show()

*출처 : 패스트캠퍼스 "파이썬을 활용한 시계열 데이터분석 A-Z"

저작자표시 (새창열림)

일	월	화	수	목	금	토
		1	2	3	4	5
6	7	8	9	10	11	12
13	14	15	16	17	18	19
20	21	22	23	24	25	26
27	28	29	30	31