3. 삼성전자 주식 데이터 분석 - 예측하기(lstm,RNN모델)

데이터분석/딥러닝

3. 삼성전자 주식 데이터 분석 - 예측하기(lstm,RNN모델)

씩씩한 IT블로그 2020. 10. 4. 20:14

시리즈

삼성전자 주식데이터를 분석하고 예측한다.

1. 삼성전자 주식 데이터 분석 - 분석하기 => sosoeasy.tistory.com/332

2. 삼성전자 주식 데이터 분석 - 예측하기(MLP 모델) => sosoeasy.tistory.com/333

3. 삼성전자 주식 데이터 분석 - 예측하기(lstm, RNN 모델)

1~3 이전과 동일

아래 3가지 목차는 ( 2. 삼성전자 주식 데이터 분석 -예측하기(MLP 모델) => sosoeasy.tistory.com/333 ) 과 동일

1. 주가 데이터 받아오기 
2. 필요한 패키지 받아오기
3. 데이터 불러오기

4. 데이터 전처리

0. 필요한 라이브러리 받아오기

from tensorflow.keras import models
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dropout, Dense, SimpleRNN
from tensorflow.keras.layers import LSTM
import matplotlib.pyplot as plt

1. 중간값 att 추가하기

#가격의 중간값 계산하기
high_prices = df['High'].values
low_prices = df['Low'].values
mid_prices = (high_prices+low_prices)/2
mid_prices

#중간 값 요소 추가하기
df['Mid'] = mid_prices
df

2. 이동평균값 계산하기

#이동평균값 계산하기
ma5 = df['Adj Close'].rolling(window=5).mean()
df['MA5'] = ma5
df = df.fillna(0) #결측값(NaN을 0으로 모두 치환)
df

3. 정규화 (Min_max 정규화)

#데이터 스케일링 
min_max_scaler = MinMaxScaler()
fitted = min_max_scaler.fit(df)

output = min_max_scaler.transform(df)
output = pd.DataFrame(output, columns=df.columns, index=list(df.index.values))
print(output.head())

4. train, test, validation set 나누기

#train/test size 설정
train_size = int(len(output)*0.6)
validation_size = int(len(output)*0.3)+train_size

#train/test 학습 및 라벨 설정
#종가를 예측하기 위해 종가를 label로 설정
train_x = np.array(output[:train_size])
train_y = np.array(output['Close'][:train_size])

validation_x =np.array(output[train_size:validation_size])
validation_y = np.array(output['Close'][train_size:validation_size])

test_x = np.array(output[validation_size:])
test_y = np.array(output['Close'][validation_size:])

print(len(train_x))
print(len(validation_x))
print(len(test_x))
print(train_x.shape)
print(train_y.shape)

5. 파라미터 설정

learning_rate = 0.01
training_cnt = 100
batch_size = 200
input_size = train_x.shape[1]

time_step = 1

6. 차원 맞추기

RNN모델에서는 시간을 고려해야 하기 때문에 차원을 하나 늘려준다. (데이터의수, feature) 였던 shpae가 (데이터의수, 시간, feautre)로 바뀌게 된다.

learning_rate = 0.01
training_cnt = 100
batch_size = 200
input_size = train_x.shape[1]

time_step = 1

# reshape into (size(개수), time step, 입력 feature)
train_x = train_x.reshape(train_x.shape[0],1,input_size)
validation_x = validation_x.reshape(validation_x.shape[0], 1, input_size)
test_x = test_x.reshape(test_x.shape[0], 1, input_size)
train_x.shape, test_x.shape

5. 학습하기 (rnn,lstm)

* 4.데이터 전처리까지 수행한 후 rnn으로 분석할꺼면 1.RNN으로 학습을

LSTM으로 분석할꺼면 2.lstm으로 학습을 수행하세요

1. RNN으로 학습

(1) 모델구조 설정 및 학습

# 모델구조
model = Sequential()
model.add(SimpleRNN(512,input_shape=(time_step,input_size))) # 512는 다른숫자로도 사용가능
# model.add(Dense(512, activation='tanh'))
model.add(Dense(1,activation='tanh')) #output(target)은 '종가'이기 때문에 1요소 = Dense의 output레이어는 1로 설정
model.add(Dropout(0.2))

#오차 및 최적화기 설정
model.compile(loss='mse',optimizer='rmsprop',metrics=['mae','mape'])
model.summary()

#학습
history = model.fit(train_x, train_y, epochs=training_cnt, batch_size=batch_size, verbose=1)
val_mse, val_mae, val_mape = model.evaluate(test_x, test_y, verbose=0)

(2) validation set으로 실제값과 예측값 차이 확인

#학습이 잘 이루어졌는지 예측
pred = model.predict(validation_x)

fig = plt.figure(facecolor='white', figsize=(20, 10))
ax = fig.add_subplot(111)
ax.plot(validation_y, label='True')
ax.plot(pred, label='Prediction')
ax.legend()
plt.show()

(3) test 셋으로 실제값과 예측값 차이 확인

pred = model.predict(test_x)

fig = plt.figure(facecolor='white', figsize=(20, 10))
ax = fig.add_subplot(111)
ax.plot(test_y, label='True')
ax.plot(pred, label='Prediction')

ax.legend()
plt.show()

2. LSTM으로 학습

(1) 모델구조 설정 및 학습

# 모델 구조
model = Sequential()
model.add(LSTM(512,input_shape=(1,input_size))) # 512는 다른 숫자로도 가능
model.add(Dropout(0.2)) 
model.add(Dense(1,activation='tanh')) #output(target)은 '종가'이기 때문에 1요소 = Dense의 output레이어는 1로 설정

#오차 및 최적화기 설정
model.compile(loss='mse',optimizer='rmsprop',metrics=['mae','mape'])
model.summary()


#학습
history = model.fit(train_x,train_y,epochs=training_cnt, batch_size=batch_size, verbose=1)
val_mse, val_mae, val_mape = model.evaluate(test_x, test_y, verbose=0)

(2) validation 셋으로 실제값과 예측값 차이 확인

#최근 500일 정도의 예측 그래프 
pred = model.predict(validation_x)

fig = plt.figure(facecolor='white', figsize=(20, 10))
ax = fig.add_subplot(111)
ax.plot(validation_y, label='True')
ax.plot(pred, label='Prediction')
ax.legend()
plt.show()

(3) test 셋으로 실제값과 예측값 차이 확인

pred = model.predict(test_x)

fig = plt.figure(facecolor='white', figsize=(20, 10))
ax = fig.add_subplot(111)
ax.plot(test_y, label='True')
ax.plot(pred, label='Prediction')
ax.legend()
plt.show()

저작자표시 (새창열림)

일	월	화	수	목	금	토
1	2	3	4	5	6	7
8	9	10	11	12	13	14
15	16	17	18	19	20	21
22	23	24	25	26	27	28
29	30