Tensorflow Tutorial 7

 

과대적합과 과소적합

과대적합과 과소적합

  • 모델 성능이 검증 세트에서 최고점에 도달한 이후 감소하는 경우 과대적합으로 볼 수 있다.
  • 모델을 너무 오래 훈련하면 과대적합이 되기 쉽다.
  • 과대적합을 막는 방법
    • 더 많은 훈련 데이터를 사용하는 방법
    • 규제(Regularization)와 같은 기법을 사용하는 방법(모델이 저장할 수 있는 정보의 양과 종류에 제약을 가함)
  • 과소적합이 발생하는 이유
    • 모델이 너무 단순한 경우
    • 규제가 너무 많은 경우
    • 충분히 오래 훈련하지 않은 경우

가중치 규제와 드롭 아웃(dropout)을 통해 IMDB 영화 리뷰 모델 성능 향상

중요 모듈 import

import tensorflow as tf
from tensorflow import keras

import numpy as np
import matplotlib.pyplot as plt

print(tf.__version__)
2.7.0

IMDB 데이터셋 다운로드

  • 문장을 멀팃 핫 인코딩(multi-hot encoding)하여 고의로 과대적합 발생시킨 후 해결 진행
  • 멀팃 핫 인코딩(multi-hot encoding)은 정수 시퀀스를 0과 1로 이루어진 벡터로 변환
NUM_WORDS = 1000

(train_data, train_labels), (test_data, test_labels) = keras.datasets.imdb.load_data(num_words=NUM_WORDS)

def multi_hot_sequences(sequences, dimension):
    # 0으로 채워진 (len(sequences), dimension) 크기의 행렬을 만듭니다
    results = np.zeros((len(sequences), dimension))
    for i, word_indices in enumerate(sequences):
        results[i, word_indices] = 1.0  # results[i]의 특정 인덱스만 1로 설정합니다
    return results


train_data = multi_hot_sequences(train_data, dimension=NUM_WORDS)
test_data = multi_hot_sequences(test_data, dimension=NUM_WORDS)
plt.plot(train_data[0])
[<matplotlib.lines.Line2D at 0x177be0370>]

png

과대적합 예제

  • 과대적합을 막는 가장 단순한 방법은 모델을 축소하는 것(학습 파라미터를 줄이는 것)
  • 모델의 적절한 크기나 구조는 실험을 통해서 찾는 수밖에 없다.
  • 적은 수의 층과 파라미터로 시작하여 검증 손실이 감소할 때까지 새로운 층을 추가하거나 충의 크기를 늘린다.

기준 모델 만들기

baseline_model = keras.Sequential([
    # `.summary` 메서드 때문에 `input_shape`가 필요합니다
    keras.layers.Dense(16, activation='relu', input_shape=(NUM_WORDS,)),
    keras.layers.Dense(16, activation='relu'),
    keras.layers.Dense(1, activation='sigmoid')
])

baseline_model.compile(optimizer='adam',
                       loss='binary_crossentropy',
                       metrics=['accuracy', 'binary_crossentropy'])

baseline_model.summary()
2022-02-16 16:46:08.319694: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:305] Could not identify NUMA node of platform GPU ID 0, defaulting to 0. Your kernel may not have been built with NUMA support.
2022-02-16 16:46:08.320639: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:271] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 0 MB memory) -> physical PluggableDevice (device: 0, name: METAL, pci bus id: <undefined>)


Metal device set to: Apple M1
Model: "sequential"
_________________________________________________________________
 Layer (type)                Output Shape              Param #
=================================================================
 dense (Dense)               (None, 16)                16016

 dense_1 (Dense)             (None, 16)                272

 dense_2 (Dense)             (None, 1)                 17

=================================================================
Total params: 16,305
Trainable params: 16,305
Non-trainable params: 0
_________________________________________________________________

훈련

baseline_history = baseline_model.fit(train_data,
                                      train_labels,
                                      epochs=20,
                                      batch_size=512,
                                      validation_data=(test_data, test_labels),
                                      verbose=2)
2022-02-16 16:46:44.525859: W tensorflow/core/platform/profile_utils/cpu_utils.cc:128] Failed to get CPU frequency: 0 Hz


Epoch 1/20


2022-02-16 16:46:44.821522: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:112] Plugin optimizer for device_type GPU is enabled.


49/49 - 3s - loss: 0.5579 - accuracy: 0.7141 - binary_crossentropy: 0.5579 - val_loss: 0.4192 - val_accuracy: 0.8234 - val_binary_crossentropy: 0.4192 - 3s/epoch - 60ms/step


2022-02-16 16:46:47.328730: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:112] Plugin optimizer for device_type GPU is enabled.


Epoch 2/20
49/49 - 0s - loss: 0.3598 - accuracy: 0.8492 - binary_crossentropy: 0.3598 - val_loss: 0.3455 - val_accuracy: 0.8529 - val_binary_crossentropy: 0.3455 - 350ms/epoch - 7ms/step
Epoch 3/20
49/49 - 0s - loss: 0.3185 - accuracy: 0.8677 - binary_crossentropy: 0.3185 - val_loss: 0.3303 - val_accuracy: 0.8605 - val_binary_crossentropy: 0.3303 - 350ms/epoch - 7ms/step
Epoch 4/20
49/49 - 0s - loss: 0.3043 - accuracy: 0.8746 - binary_crossentropy: 0.3043 - val_loss: 0.3285 - val_accuracy: 0.8600 - val_binary_crossentropy: 0.3285 - 377ms/epoch - 8ms/step
Epoch 5/20
49/49 - 0s - loss: 0.3022 - accuracy: 0.8752 - binary_crossentropy: 0.3022 - val_loss: 0.3286 - val_accuracy: 0.8603 - val_binary_crossentropy: 0.3286 - 354ms/epoch - 7ms/step
Epoch 6/20
49/49 - 0s - loss: 0.2938 - accuracy: 0.8797 - binary_crossentropy: 0.2938 - val_loss: 0.3293 - val_accuracy: 0.8597 - val_binary_crossentropy: 0.3293 - 357ms/epoch - 7ms/step
Epoch 7/20
49/49 - 0s - loss: 0.2902 - accuracy: 0.8800 - binary_crossentropy: 0.2902 - val_loss: 0.3311 - val_accuracy: 0.8595 - val_binary_crossentropy: 0.3311 - 353ms/epoch - 7ms/step
Epoch 8/20
49/49 - 0s - loss: 0.2859 - accuracy: 0.8833 - binary_crossentropy: 0.2859 - val_loss: 0.3313 - val_accuracy: 0.8585 - val_binary_crossentropy: 0.3313 - 352ms/epoch - 7ms/step
Epoch 9/20
49/49 - 0s - loss: 0.2817 - accuracy: 0.8831 - binary_crossentropy: 0.2817 - val_loss: 0.3343 - val_accuracy: 0.8581 - val_binary_crossentropy: 0.3343 - 364ms/epoch - 7ms/step
Epoch 10/20
49/49 - 0s - loss: 0.2790 - accuracy: 0.8852 - binary_crossentropy: 0.2790 - val_loss: 0.3320 - val_accuracy: 0.8583 - val_binary_crossentropy: 0.3320 - 365ms/epoch - 7ms/step
Epoch 11/20
49/49 - 0s - loss: 0.2712 - accuracy: 0.8877 - binary_crossentropy: 0.2712 - val_loss: 0.3345 - val_accuracy: 0.8572 - val_binary_crossentropy: 0.3345 - 358ms/epoch - 7ms/step
Epoch 12/20
49/49 - 0s - loss: 0.2648 - accuracy: 0.8910 - binary_crossentropy: 0.2648 - val_loss: 0.3374 - val_accuracy: 0.8554 - val_binary_crossentropy: 0.3374 - 353ms/epoch - 7ms/step
Epoch 13/20
49/49 - 0s - loss: 0.2594 - accuracy: 0.8928 - binary_crossentropy: 0.2594 - val_loss: 0.3390 - val_accuracy: 0.8548 - val_binary_crossentropy: 0.3390 - 379ms/epoch - 8ms/step
Epoch 14/20
49/49 - 0s - loss: 0.2537 - accuracy: 0.8957 - binary_crossentropy: 0.2537 - val_loss: 0.3423 - val_accuracy: 0.8538 - val_binary_crossentropy: 0.3423 - 357ms/epoch - 7ms/step
Epoch 15/20
49/49 - 0s - loss: 0.2464 - accuracy: 0.8982 - binary_crossentropy: 0.2464 - val_loss: 0.3444 - val_accuracy: 0.8534 - val_binary_crossentropy: 0.3444 - 354ms/epoch - 7ms/step
Epoch 16/20
49/49 - 0s - loss: 0.2408 - accuracy: 0.9020 - binary_crossentropy: 0.2408 - val_loss: 0.3489 - val_accuracy: 0.8506 - val_binary_crossentropy: 0.3489 - 358ms/epoch - 7ms/step
Epoch 17/20
49/49 - 0s - loss: 0.2351 - accuracy: 0.9040 - binary_crossentropy: 0.2351 - val_loss: 0.3537 - val_accuracy: 0.8514 - val_binary_crossentropy: 0.3537 - 360ms/epoch - 7ms/step
Epoch 18/20
49/49 - 0s - loss: 0.2269 - accuracy: 0.9086 - binary_crossentropy: 0.2269 - val_loss: 0.3605 - val_accuracy: 0.8483 - val_binary_crossentropy: 0.3605 - 353ms/epoch - 7ms/step
Epoch 19/20
49/49 - 0s - loss: 0.2199 - accuracy: 0.9109 - binary_crossentropy: 0.2199 - val_loss: 0.3639 - val_accuracy: 0.8492 - val_binary_crossentropy: 0.3639 - 353ms/epoch - 7ms/step
Epoch 20/20
49/49 - 0s - loss: 0.2144 - accuracy: 0.9157 - binary_crossentropy: 0.2144 - val_loss: 0.3706 - val_accuracy: 0.8477 - val_binary_crossentropy: 0.3706 - 353ms/epoch - 7ms/step

작은 모델 만들기

smaller_model = keras.Sequential([
    keras.layers.Dense(4, activation='relu', input_shape=(NUM_WORDS,)),
    keras.layers.Dense(4, activation='relu'),
    keras.layers.Dense(1, activation='sigmoid')
])

smaller_model.compile(optimizer='adam',
                      loss='binary_crossentropy',
                      metrics=['accuracy', 'binary_crossentropy'])

smaller_model.summary()
Model: "sequential_1"
_________________________________________________________________
 Layer (type)                Output Shape              Param #
=================================================================
 dense_3 (Dense)             (None, 4)                 4004

 dense_4 (Dense)             (None, 4)                 20

 dense_5 (Dense)             (None, 1)                 5

=================================================================
Total params: 4,029
Trainable params: 4,029
Non-trainable params: 0
_________________________________________________________________
smaller_history = smaller_model.fit(train_data,
                                    train_labels,
                                    epochs=20,
                                    batch_size=512,
                                    validation_data=(test_data, test_labels),
                                    verbose=2)
Epoch 1/20


2022-02-16 16:49:22.402846: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:112] Plugin optimizer for device_type GPU is enabled.


49/49 - 1s - loss: 0.6380 - accuracy: 0.6622 - binary_crossentropy: 0.6380 - val_loss: 0.5566 - val_accuracy: 0.7825 - val_binary_crossentropy: 0.5566 - 996ms/epoch - 20ms/step


2022-02-16 16:49:23.111807: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:112] Plugin optimizer for device_type GPU is enabled.


Epoch 2/20
49/49 - 0s - loss: 0.4834 - accuracy: 0.8157 - binary_crossentropy: 0.4834 - val_loss: 0.4341 - val_accuracy: 0.8294 - val_binary_crossentropy: 0.4341 - 342ms/epoch - 7ms/step
Epoch 3/20
49/49 - 0s - loss: 0.3932 - accuracy: 0.8475 - binary_crossentropy: 0.3932 - val_loss: 0.3768 - val_accuracy: 0.8464 - val_binary_crossentropy: 0.3768 - 346ms/epoch - 7ms/step
Epoch 4/20
49/49 - 0s - loss: 0.3513 - accuracy: 0.8587 - binary_crossentropy: 0.3513 - val_loss: 0.3522 - val_accuracy: 0.8532 - val_binary_crossentropy: 0.3522 - 342ms/epoch - 7ms/step
Epoch 5/20
49/49 - 0s - loss: 0.3310 - accuracy: 0.8654 - binary_crossentropy: 0.3310 - val_loss: 0.3400 - val_accuracy: 0.8565 - val_binary_crossentropy: 0.3400 - 365ms/epoch - 7ms/step
Epoch 6/20
49/49 - 0s - loss: 0.3180 - accuracy: 0.8698 - binary_crossentropy: 0.3180 - val_loss: 0.3337 - val_accuracy: 0.8583 - val_binary_crossentropy: 0.3337 - 374ms/epoch - 8ms/step
Epoch 7/20
49/49 - 0s - loss: 0.3107 - accuracy: 0.8730 - binary_crossentropy: 0.3107 - val_loss: 0.3304 - val_accuracy: 0.8602 - val_binary_crossentropy: 0.3304 - 369ms/epoch - 8ms/step
Epoch 8/20
49/49 - 0s - loss: 0.3063 - accuracy: 0.8758 - binary_crossentropy: 0.3063 - val_loss: 0.3295 - val_accuracy: 0.8601 - val_binary_crossentropy: 0.3295 - 351ms/epoch - 7ms/step
Epoch 9/20
49/49 - 0s - loss: 0.3032 - accuracy: 0.8752 - binary_crossentropy: 0.3032 - val_loss: 0.3293 - val_accuracy: 0.8597 - val_binary_crossentropy: 0.3293 - 341ms/epoch - 7ms/step
Epoch 10/20
49/49 - 0s - loss: 0.3009 - accuracy: 0.8761 - binary_crossentropy: 0.3009 - val_loss: 0.3287 - val_accuracy: 0.8601 - val_binary_crossentropy: 0.3287 - 342ms/epoch - 7ms/step
Epoch 11/20
49/49 - 0s - loss: 0.2995 - accuracy: 0.8774 - binary_crossentropy: 0.2995 - val_loss: 0.3283 - val_accuracy: 0.8596 - val_binary_crossentropy: 0.3283 - 339ms/epoch - 7ms/step
Epoch 12/20
49/49 - 0s - loss: 0.2989 - accuracy: 0.8783 - binary_crossentropy: 0.2989 - val_loss: 0.3292 - val_accuracy: 0.8600 - val_binary_crossentropy: 0.3292 - 369ms/epoch - 8ms/step
Epoch 13/20
49/49 - 0s - loss: 0.2974 - accuracy: 0.8781 - binary_crossentropy: 0.2974 - val_loss: 0.3300 - val_accuracy: 0.8598 - val_binary_crossentropy: 0.3300 - 375ms/epoch - 8ms/step
Epoch 14/20
49/49 - 0s - loss: 0.2969 - accuracy: 0.8782 - binary_crossentropy: 0.2969 - val_loss: 0.3298 - val_accuracy: 0.8597 - val_binary_crossentropy: 0.3298 - 380ms/epoch - 8ms/step
Epoch 15/20
49/49 - 0s - loss: 0.2968 - accuracy: 0.8781 - binary_crossentropy: 0.2968 - val_loss: 0.3309 - val_accuracy: 0.8585 - val_binary_crossentropy: 0.3309 - 361ms/epoch - 7ms/step
Epoch 16/20
49/49 - 0s - loss: 0.2971 - accuracy: 0.8772 - binary_crossentropy: 0.2971 - val_loss: 0.3341 - val_accuracy: 0.8580 - val_binary_crossentropy: 0.3341 - 341ms/epoch - 7ms/step
Epoch 17/20
49/49 - 0s - loss: 0.2959 - accuracy: 0.8788 - binary_crossentropy: 0.2959 - val_loss: 0.3304 - val_accuracy: 0.8600 - val_binary_crossentropy: 0.3304 - 339ms/epoch - 7ms/step
Epoch 18/20
49/49 - 0s - loss: 0.2959 - accuracy: 0.8781 - binary_crossentropy: 0.2959 - val_loss: 0.3315 - val_accuracy: 0.8592 - val_binary_crossentropy: 0.3315 - 344ms/epoch - 7ms/step
Epoch 19/20
49/49 - 0s - loss: 0.2958 - accuracy: 0.8786 - binary_crossentropy: 0.2958 - val_loss: 0.3313 - val_accuracy: 0.8592 - val_binary_crossentropy: 0.3313 - 362ms/epoch - 7ms/step
Epoch 20/20
49/49 - 0s - loss: 0.2962 - accuracy: 0.8790 - binary_crossentropy: 0.2962 - val_loss: 0.3335 - val_accuracy: 0.8580 - val_binary_crossentropy: 0.3335 - 371ms/epoch - 8ms/step

큰 모델 만들기

bigger_model = keras.models.Sequential([
    keras.layers.Dense(512, activation='relu', input_shape=(NUM_WORDS,)),
    keras.layers.Dense(512, activation='relu'),
    keras.layers.Dense(1, activation='sigmoid')
])

bigger_model.compile(optimizer='adam',
                     loss='binary_crossentropy',
                     metrics=['accuracy','binary_crossentropy'])

bigger_model.summary()
Model: "sequential_2"
_________________________________________________________________
 Layer (type)                Output Shape              Param #
=================================================================
 dense_6 (Dense)             (None, 512)               512512

 dense_7 (Dense)             (None, 512)               262656

 dense_8 (Dense)             (None, 1)                 513

=================================================================
Total params: 775,681
Trainable params: 775,681
Non-trainable params: 0
_________________________________________________________________
bigger_history = bigger_model.fit(train_data, train_labels,
                                  epochs=20,
                                  batch_size=512,
                                  validation_data=(test_data, test_labels),
                                  verbose=2)
Epoch 1/20


2022-02-16 16:50:22.236683: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:112] Plugin optimizer for device_type GPU is enabled.
2022-02-16 16:50:23.006449: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:112] Plugin optimizer for device_type GPU is enabled.


49/49 - 1s - loss: 0.4582 - accuracy: 0.7766 - binary_crossentropy: 0.4582 - val_loss: 0.3320 - val_accuracy: 0.8575 - val_binary_crossentropy: 0.3320 - 1s/epoch - 22ms/step
Epoch 2/20
49/49 - 0s - loss: 0.2967 - accuracy: 0.8783 - binary_crossentropy: 0.2967 - val_loss: 0.3290 - val_accuracy: 0.8567 - val_binary_crossentropy: 0.3290 - 447ms/epoch - 9ms/step
Epoch 3/20
49/49 - 0s - loss: 0.2350 - accuracy: 0.9045 - binary_crossentropy: 0.2350 - val_loss: 0.3337 - val_accuracy: 0.8564 - val_binary_crossentropy: 0.3337 - 451ms/epoch - 9ms/step
Epoch 4/20
49/49 - 0s - loss: 0.1216 - accuracy: 0.9613 - binary_crossentropy: 0.1216 - val_loss: 0.3974 - val_accuracy: 0.8477 - val_binary_crossentropy: 0.3974 - 452ms/epoch - 9ms/step
Epoch 5/20
49/49 - 0s - loss: 0.0294 - accuracy: 0.9947 - binary_crossentropy: 0.0294 - val_loss: 0.5020 - val_accuracy: 0.8463 - val_binary_crossentropy: 0.5020 - 463ms/epoch - 9ms/step
Epoch 6/20
49/49 - 0s - loss: 0.0055 - accuracy: 0.9998 - binary_crossentropy: 0.0055 - val_loss: 0.5950 - val_accuracy: 0.8504 - val_binary_crossentropy: 0.5950 - 450ms/epoch - 9ms/step
Epoch 7/20
49/49 - 0s - loss: 0.0013 - accuracy: 1.0000 - binary_crossentropy: 0.0013 - val_loss: 0.6423 - val_accuracy: 0.8518 - val_binary_crossentropy: 0.6423 - 454ms/epoch - 9ms/step
Epoch 8/20
49/49 - 0s - loss: 6.8116e-04 - accuracy: 1.0000 - binary_crossentropy: 6.8116e-04 - val_loss: 0.6703 - val_accuracy: 0.8516 - val_binary_crossentropy: 0.6703 - 451ms/epoch - 9ms/step
Epoch 9/20
49/49 - 0s - loss: 4.6930e-04 - accuracy: 1.0000 - binary_crossentropy: 4.6930e-04 - val_loss: 0.6917 - val_accuracy: 0.8522 - val_binary_crossentropy: 0.6917 - 452ms/epoch - 9ms/step
Epoch 10/20
49/49 - 0s - loss: 3.5740e-04 - accuracy: 1.0000 - binary_crossentropy: 3.5740e-04 - val_loss: 0.7092 - val_accuracy: 0.8521 - val_binary_crossentropy: 0.7092 - 448ms/epoch - 9ms/step
Epoch 11/20
49/49 - 0s - loss: 2.8219e-04 - accuracy: 1.0000 - binary_crossentropy: 2.8219e-04 - val_loss: 0.7248 - val_accuracy: 0.8526 - val_binary_crossentropy: 0.7248 - 449ms/epoch - 9ms/step
Epoch 12/20
49/49 - 0s - loss: 2.2992e-04 - accuracy: 1.0000 - binary_crossentropy: 2.2992e-04 - val_loss: 0.7383 - val_accuracy: 0.8525 - val_binary_crossentropy: 0.7383 - 460ms/epoch - 9ms/step
Epoch 13/20
49/49 - 0s - loss: 1.9081e-04 - accuracy: 1.0000 - binary_crossentropy: 1.9081e-04 - val_loss: 0.7510 - val_accuracy: 0.8525 - val_binary_crossentropy: 0.7510 - 449ms/epoch - 9ms/step
Epoch 14/20
49/49 - 0s - loss: 1.6138e-04 - accuracy: 1.0000 - binary_crossentropy: 1.6138e-04 - val_loss: 0.7619 - val_accuracy: 0.8523 - val_binary_crossentropy: 0.7619 - 449ms/epoch - 9ms/step
Epoch 15/20
49/49 - 0s - loss: 1.3817e-04 - accuracy: 1.0000 - binary_crossentropy: 1.3817e-04 - val_loss: 0.7723 - val_accuracy: 0.8523 - val_binary_crossentropy: 0.7723 - 470ms/epoch - 10ms/step
Epoch 16/20
49/49 - 0s - loss: 1.1985e-04 - accuracy: 1.0000 - binary_crossentropy: 1.1985e-04 - val_loss: 0.7823 - val_accuracy: 0.8527 - val_binary_crossentropy: 0.7823 - 468ms/epoch - 10ms/step
Epoch 17/20
49/49 - 0s - loss: 1.0468e-04 - accuracy: 1.0000 - binary_crossentropy: 1.0468e-04 - val_loss: 0.7910 - val_accuracy: 0.8525 - val_binary_crossentropy: 0.7910 - 473ms/epoch - 10ms/step
Epoch 18/20
49/49 - 0s - loss: 9.2427e-05 - accuracy: 1.0000 - binary_crossentropy: 9.2427e-05 - val_loss: 0.7992 - val_accuracy: 0.8524 - val_binary_crossentropy: 0.7992 - 458ms/epoch - 9ms/step
Epoch 19/20
49/49 - 0s - loss: 8.1998e-05 - accuracy: 1.0000 - binary_crossentropy: 8.1998e-05 - val_loss: 0.8079 - val_accuracy: 0.8524 - val_binary_crossentropy: 0.8079 - 471ms/epoch - 10ms/step
Epoch 20/20
49/49 - 0s - loss: 7.3255e-05 - accuracy: 1.0000 - binary_crossentropy: 7.3255e-05 - val_loss: 0.8152 - val_accuracy: 0.8525 - val_binary_crossentropy: 0.8152 - 445ms/epoch - 9ms/step

훈련 손실과 검증 손실 그래프 그리기

def plot_history(histories, key='binary_crossentropy'):
  plt.figure(figsize=(16,10))

  for name, history in histories:
    val = plt.plot(history.epoch, history.history['val_'+key],
                   '--', label=name.title()+' Val')
    plt.plot(history.epoch, history.history[key], color=val[0].get_color(),
             label=name.title()+' Train')

  plt.xlabel('Epochs')
  plt.ylabel(key.replace('_',' ').title())
  plt.legend()

  plt.xlim([0,max(history.epoch)])


plot_history([('baseline', baseline_history),
              ('smaller', smaller_history),
              ('bigger', bigger_history)])

png


과대적합을 방지하기 위한 전략

가중치를 규제하기

  • 간단한 모델은 복잡한 모델보다 과대적합 되는 경향이 낮다(오캅의 면도날)
  • 가중치 규제(weight regularization)
    • 가중치가 작은 값을 갖도록 네트워크 복잡도에 제약을 가해 가중치 값을 균일하게 만들어준다.
    • 네트워크의 손실 함수에 큰 가중치에 해당하는 비용을 추가한다.
    • L1 규제:
      • 가중치의 절댓값에 비례하는 비용이 추가(즉, 가중치의 “L1 노름(norm)”을 추가)
      • 일부 가중치를 0으로 만드는 경우가 생긴다.
    • L2 규제:
      • 가중치의 제곱에 비례하는 비용이 추가(즉, 가중치의 “L2 노름”의 제곱을 추가)
      • 신경망에서는 가중치 감쇠(weight decay)라고도 부른다.
      • 가중치가 0이 되는 경우는 없다.(많이 사용하는 이유)
  • 이런 규제는 휸련 시에만 주어진다.

l2(0.001)는 네트워크의 전체 손실에 층에 있는 가중치 행렬의 모든 값이 0.001 * weight_coefficient_value**2만큼 더해진다는 의미

l2_model = keras.models.Sequential([
    keras.layers.Dense(16, kernel_regularizer=keras.regularizers.l2(0.001),
                       activation='relu', input_shape=(NUM_WORDS,)),
    keras.layers.Dense(16, kernel_regularizer=keras.regularizers.l2(0.001),
                       activation='relu'),
    keras.layers.Dense(1, activation='sigmoid')
])

l2_model.compile(optimizer='adam',
                 loss='binary_crossentropy',
                 metrics=['accuracy', 'binary_crossentropy'])

l2_model_history = l2_model.fit(train_data, train_labels,
                                epochs=20,
                                batch_size=512,
                                validation_data=(test_data, test_labels),
                                verbose=2)
Epoch 1/20


2022-02-16 17:02:23.770072: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:112] Plugin optimizer for device_type GPU is enabled.
2022-02-16 17:02:24.528232: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:112] Plugin optimizer for device_type GPU is enabled.


49/49 - 1s - loss: 0.5972 - accuracy: 0.7176 - binary_crossentropy: 0.5536 - val_loss: 0.4636 - val_accuracy: 0.8218 - val_binary_crossentropy: 0.4218 - 1s/epoch - 23ms/step
Epoch 2/20
49/49 - 0s - loss: 0.4045 - accuracy: 0.8483 - binary_crossentropy: 0.3628 - val_loss: 0.3858 - val_accuracy: 0.8554 - val_binary_crossentropy: 0.3444 - 366ms/epoch - 7ms/step
Epoch 3/20
49/49 - 0s - loss: 0.3607 - accuracy: 0.8660 - binary_crossentropy: 0.3199 - val_loss: 0.3703 - val_accuracy: 0.8612 - val_binary_crossentropy: 0.3302 - 368ms/epoch - 8ms/step
Epoch 4/20
49/49 - 0s - loss: 0.3490 - accuracy: 0.8724 - binary_crossentropy: 0.3096 - val_loss: 0.3730 - val_accuracy: 0.8580 - val_binary_crossentropy: 0.3345 - 366ms/epoch - 7ms/step
Epoch 5/20
49/49 - 0s - loss: 0.3436 - accuracy: 0.8747 - binary_crossentropy: 0.3058 - val_loss: 0.3647 - val_accuracy: 0.8606 - val_binary_crossentropy: 0.3277 - 367ms/epoch - 7ms/step
Epoch 6/20
49/49 - 0s - loss: 0.3382 - accuracy: 0.8756 - binary_crossentropy: 0.3019 - val_loss: 0.3634 - val_accuracy: 0.8597 - val_binary_crossentropy: 0.3279 - 393ms/epoch - 8ms/step
Epoch 7/20
49/49 - 0s - loss: 0.3356 - accuracy: 0.8769 - binary_crossentropy: 0.3006 - val_loss: 0.3626 - val_accuracy: 0.8602 - val_binary_crossentropy: 0.3283 - 370ms/epoch - 8ms/step
Epoch 8/20
49/49 - 0s - loss: 0.3324 - accuracy: 0.8763 - binary_crossentropy: 0.2986 - val_loss: 0.3636 - val_accuracy: 0.8591 - val_binary_crossentropy: 0.3305 - 389ms/epoch - 8ms/step
Epoch 9/20
49/49 - 0s - loss: 0.3333 - accuracy: 0.8754 - binary_crossentropy: 0.3005 - val_loss: 0.3694 - val_accuracy: 0.8552 - val_binary_crossentropy: 0.3370 - 365ms/epoch - 7ms/step
Epoch 10/20
49/49 - 0s - loss: 0.3277 - accuracy: 0.8768 - binary_crossentropy: 0.2956 - val_loss: 0.3577 - val_accuracy: 0.8603 - val_binary_crossentropy: 0.3259 - 399ms/epoch - 8ms/step
Epoch 11/20
49/49 - 0s - loss: 0.3240 - accuracy: 0.8796 - binary_crossentropy: 0.2924 - val_loss: 0.3589 - val_accuracy: 0.8594 - val_binary_crossentropy: 0.3276 - 399ms/epoch - 8ms/step
Epoch 12/20
49/49 - 0s - loss: 0.3216 - accuracy: 0.8795 - binary_crossentropy: 0.2905 - val_loss: 0.3584 - val_accuracy: 0.8592 - val_binary_crossentropy: 0.3275 - 376ms/epoch - 8ms/step
Epoch 13/20
49/49 - 0s - loss: 0.3171 - accuracy: 0.8828 - binary_crossentropy: 0.2862 - val_loss: 0.3561 - val_accuracy: 0.8595 - val_binary_crossentropy: 0.3254 - 373ms/epoch - 8ms/step
Epoch 14/20
49/49 - 0s - loss: 0.3138 - accuracy: 0.8837 - binary_crossentropy: 0.2831 - val_loss: 0.3592 - val_accuracy: 0.8595 - val_binary_crossentropy: 0.3286 - 367ms/epoch - 7ms/step
Epoch 15/20
49/49 - 0s - loss: 0.3107 - accuracy: 0.8835 - binary_crossentropy: 0.2800 - val_loss: 0.3571 - val_accuracy: 0.8590 - val_binary_crossentropy: 0.3265 - 400ms/epoch - 8ms/step
Epoch 16/20
49/49 - 0s - loss: 0.3061 - accuracy: 0.8865 - binary_crossentropy: 0.2754 - val_loss: 0.3586 - val_accuracy: 0.8589 - val_binary_crossentropy: 0.3278 - 416ms/epoch - 8ms/step
Epoch 17/20
49/49 - 0s - loss: 0.3026 - accuracy: 0.8884 - binary_crossentropy: 0.2717 - val_loss: 0.3559 - val_accuracy: 0.8605 - val_binary_crossentropy: 0.3249 - 383ms/epoch - 8ms/step
Epoch 18/20
49/49 - 0s - loss: 0.2987 - accuracy: 0.8908 - binary_crossentropy: 0.2675 - val_loss: 0.3659 - val_accuracy: 0.8553 - val_binary_crossentropy: 0.3346 - 363ms/epoch - 7ms/step
Epoch 19/20
49/49 - 0s - loss: 0.2969 - accuracy: 0.8913 - binary_crossentropy: 0.2654 - val_loss: 0.3626 - val_accuracy: 0.8576 - val_binary_crossentropy: 0.3309 - 362ms/epoch - 7ms/step
Epoch 20/20
49/49 - 0s - loss: 0.2906 - accuracy: 0.8951 - binary_crossentropy: 0.2587 - val_loss: 0.3631 - val_accuracy: 0.8560 - val_binary_crossentropy: 0.3311 - 371ms/epoch - 8ms/step
plot_history([('baseline', baseline_history),
              ('l2', l2_model_history)])

png

드롭아웃 추가하기

  • 신경망에서 가장 효과적이고 널리 사용하는 규제 기법 중 하나
  • 드롭아웃을 층에 적용하면 훈련하는 동안 층의 출력 특성을 랜덤하게 끈다.
  • “드롭아웃 비율”은 0이 되는 특성의 비율로 보통 0.2에서 0.5 사이를 사용
  • 테스트 단계에서는 어떤 유닛도 드롭아웃하지 않고, 훈련 단계보다 더 많은 유닛이 활성화되기 때문에 균형을 맞추기 위해 층의 출력 값을 드롭아웃 비율만큼 줄인다.
dpt_model = keras.models.Sequential([
    keras.layers.Dense(16, activation='relu', input_shape=(NUM_WORDS,)),
    keras.layers.Dropout(0.5),
    keras.layers.Dense(16, activation='relu'),
    keras.layers.Dropout(0.5),
    keras.layers.Dense(1, activation='sigmoid')
])

dpt_model.compile(optimizer='adam',
                  loss='binary_crossentropy',
                  metrics=['accuracy','binary_crossentropy'])

dpt_model_history = dpt_model.fit(train_data, train_labels,
                                  epochs=20,
                                  batch_size=512,
                                  validation_data=(test_data, test_labels),
                                  verbose=2)
Epoch 1/20


2022-02-16 17:08:16.176677: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:112] Plugin optimizer for device_type GPU is enabled.
2022-02-16 17:08:16.971556: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:112] Plugin optimizer for device_type GPU is enabled.


49/49 - 1s - loss: 0.6754 - accuracy: 0.5668 - binary_crossentropy: 0.6754 - val_loss: 0.6073 - val_accuracy: 0.7556 - val_binary_crossentropy: 0.6073 - 1s/epoch - 23ms/step
Epoch 2/20
49/49 - 0s - loss: 0.5809 - accuracy: 0.6927 - binary_crossentropy: 0.5809 - val_loss: 0.4830 - val_accuracy: 0.8256 - val_binary_crossentropy: 0.4830 - 395ms/epoch - 8ms/step
Epoch 3/20
49/49 - 0s - loss: 0.5046 - accuracy: 0.7648 - binary_crossentropy: 0.5046 - val_loss: 0.4086 - val_accuracy: 0.8441 - val_binary_crossentropy: 0.4086 - 425ms/epoch - 9ms/step
Epoch 4/20
49/49 - 0s - loss: 0.4542 - accuracy: 0.7994 - binary_crossentropy: 0.4542 - val_loss: 0.3610 - val_accuracy: 0.8526 - val_binary_crossentropy: 0.3610 - 387ms/epoch - 8ms/step
Epoch 5/20
49/49 - 0s - loss: 0.4217 - accuracy: 0.8217 - binary_crossentropy: 0.4217 - val_loss: 0.3435 - val_accuracy: 0.8560 - val_binary_crossentropy: 0.3435 - 382ms/epoch - 8ms/step
Epoch 6/20
49/49 - 0s - loss: 0.3939 - accuracy: 0.8314 - binary_crossentropy: 0.3939 - val_loss: 0.3342 - val_accuracy: 0.8590 - val_binary_crossentropy: 0.3342 - 386ms/epoch - 8ms/step
Epoch 7/20
49/49 - 0s - loss: 0.3829 - accuracy: 0.8367 - binary_crossentropy: 0.3829 - val_loss: 0.3278 - val_accuracy: 0.8604 - val_binary_crossentropy: 0.3278 - 366ms/epoch - 7ms/step
Epoch 8/20
49/49 - 0s - loss: 0.3677 - accuracy: 0.8487 - binary_crossentropy: 0.3677 - val_loss: 0.3247 - val_accuracy: 0.8604 - val_binary_crossentropy: 0.3247 - 371ms/epoch - 8ms/step
Epoch 9/20
49/49 - 0s - loss: 0.3571 - accuracy: 0.8552 - binary_crossentropy: 0.3571 - val_loss: 0.3234 - val_accuracy: 0.8606 - val_binary_crossentropy: 0.3234 - 397ms/epoch - 8ms/step
Epoch 10/20
49/49 - 0s - loss: 0.3493 - accuracy: 0.8543 - binary_crossentropy: 0.3493 - val_loss: 0.3233 - val_accuracy: 0.8598 - val_binary_crossentropy: 0.3233 - 411ms/epoch - 8ms/step
Epoch 11/20
49/49 - 0s - loss: 0.3439 - accuracy: 0.8616 - binary_crossentropy: 0.3439 - val_loss: 0.3230 - val_accuracy: 0.8579 - val_binary_crossentropy: 0.3230 - 397ms/epoch - 8ms/step
Epoch 12/20
49/49 - 0s - loss: 0.3342 - accuracy: 0.8643 - binary_crossentropy: 0.3342 - val_loss: 0.3250 - val_accuracy: 0.8580 - val_binary_crossentropy: 0.3250 - 370ms/epoch - 8ms/step
Epoch 13/20
49/49 - 0s - loss: 0.3279 - accuracy: 0.8677 - binary_crossentropy: 0.3279 - val_loss: 0.3256 - val_accuracy: 0.8581 - val_binary_crossentropy: 0.3256 - 371ms/epoch - 8ms/step
Epoch 14/20
49/49 - 0s - loss: 0.3204 - accuracy: 0.8701 - binary_crossentropy: 0.3204 - val_loss: 0.3288 - val_accuracy: 0.8558 - val_binary_crossentropy: 0.3288 - 379ms/epoch - 8ms/step
Epoch 15/20
49/49 - 0s - loss: 0.3139 - accuracy: 0.8735 - binary_crossentropy: 0.3139 - val_loss: 0.3295 - val_accuracy: 0.8574 - val_binary_crossentropy: 0.3295 - 398ms/epoch - 8ms/step
Epoch 16/20
49/49 - 0s - loss: 0.3055 - accuracy: 0.8786 - binary_crossentropy: 0.3055 - val_loss: 0.3321 - val_accuracy: 0.8562 - val_binary_crossentropy: 0.3321 - 382ms/epoch - 8ms/step
Epoch 17/20
49/49 - 0s - loss: 0.3017 - accuracy: 0.8793 - binary_crossentropy: 0.3017 - val_loss: 0.3317 - val_accuracy: 0.8564 - val_binary_crossentropy: 0.3317 - 361ms/epoch - 7ms/step
Epoch 18/20
49/49 - 0s - loss: 0.2961 - accuracy: 0.8832 - binary_crossentropy: 0.2961 - val_loss: 0.3391 - val_accuracy: 0.8566 - val_binary_crossentropy: 0.3391 - 357ms/epoch - 7ms/step
Epoch 19/20
49/49 - 0s - loss: 0.2943 - accuracy: 0.8826 - binary_crossentropy: 0.2943 - val_loss: 0.3418 - val_accuracy: 0.8548 - val_binary_crossentropy: 0.3418 - 369ms/epoch - 8ms/step
Epoch 20/20
49/49 - 0s - loss: 0.2918 - accuracy: 0.8863 - binary_crossentropy: 0.2918 - val_loss: 0.3381 - val_accuracy: 0.8545 - val_binary_crossentropy: 0.3381 - 419ms/epoch - 9ms/step
plot_history([('baseline', baseline_history),
              ('dropout', dpt_model_history)])

png

정리

과대적합을 막기 위해서는

  • 더 많은 훈련 데이터를 모읍니다.
  • 네트워크의 용량을 줄입니다.
  • 가중치 규제를 추가합니다.
  • 드롭아웃을 추가합니다.