Kerasで画像認識 - MNIST編 - ReLU関数
KerasのMNISTのサンプルプログラムについて、活性化関数をsigmoid関数からReLU関数に変更してみましょう。
from keras.models import Sequential from keras.layers import Dense, Activation from keras.utils import to_categorical from keras.datasets import mnist (x_train, y_train), (x_test, y_test) = mnist.load_data() # 28x28 => 784 x_train = x_train.reshape(60000, 784) x_test = x_test.reshape(10000, 784) # one-hot ex: 3 => [0,0,0,1,0,0,0,0,0,0] y_train = to_categorical(y_train, 10) y_test = to_categorical(y_test, 10) model = Sequential() model.add(Dense(50, input_dim=784)) model.add(Activation('relu')) model.add(Dense(20)) model.add(Activation('relu')) model.add(Dense(10)) model.add(Activation('softmax')) model.compile(optimizer='sgd', loss='categorical_crossentropy', metrics=['accuracy']) history = model.fit(x_train, y_train, batch_size=32, validation_data=(x_test, y_test))
プログラムを実行してみると、、学習が上手くいかないようです。
Train on 60000 samples, validate on 10000 samples Epoch 1/10 60000/60000 [==============================] - 10s - loss: 14.5407 - acc: 0.0977 - val_loss: 14.5353 - val_acc: 0.0982 Epoch 2/10 60000/60000 [==============================] - 7s - loss: 14.5487 - acc: 0.0974 - val_loss: 14.5353 - val_acc: 0.0982 Epoch 3/10 60000/60000 [==============================] - 7s - loss: 14.5487 - acc: 0.0974 - val_loss: 14.5353 - val_acc: 0.0982 Epoch 4/10 60000/60000 [==============================] - 6s - loss: 14.5487 - acc: 0.0974 - val_loss: 14.5353 - val_acc: 0.0982 Epoch 5/10 60000/60000 [==============================] - 6s - loss: 14.5487 - acc: 0.0974 - val_loss: 14.5353 - val_acc: 0.0982 Epoch 6/10 60000/60000 [==============================] - 6s - loss: 14.5487 - acc: 0.0974 - val_loss: 14.5353 - val_acc: 0.0982 Epoch 7/10 60000/60000 [==============================] - 7s - loss: 14.5487 - acc: 0.0974 - val_loss: 14.5353 - val_acc: 0.0982 Epoch 8/10 60000/60000 [==============================] - 6s - loss: 14.5487 - acc: 0.0974 - val_loss: 14.5353 - val_acc: 0.0982 Epoch 9/10 60000/60000 [==============================] - 7s - loss: 14.5487 - acc: 0.0974 - val_loss: 14.5353 - val_acc: 0.0982 Epoch 10/10 60000/60000 [==============================] - 7s - loss: 14.5487 - acc: 0.0974 - val_loss: 14.5353 - val_acc: 0.0982
入力データの正規化
MNISTの画像ベクトルは0-255の値が格納されています。学習が上手く進むようにこの値を0-1の値に変換しておきます。
# 0-255 => 0-1 x_train = x_train.astype('float32') x_test = x_test.astype('float32') x_train /= 255 x_test /= 255
このような作業を正規化といいます。
プログラムを次のように実装して再度実行してみましょう。
from keras.models import Sequential from keras.layers import Dense, Activation from keras.utils import to_categorical from keras.datasets import mnist (x_train, y_train), (x_test, y_test) = mnist.load_data() # 28x28 => 784 x_train = x_train.reshape(60000, 784) x_test = x_test.reshape(10000, 784) # 0-255 => 0-1 x_train = x_train.astype('float32') x_test = x_test.astype('float32') x_train /= 255 x_test /= 255 # one-hot ex: 3 => [0,0,0,1,0,0,0,0,0,0] y_train = to_categorical(y_train, 10) y_test = to_categorical(y_test, 10) model = Sequential() model.add(Dense(50, input_dim=784)) model.add(Activation('relu')) model.add(Dense(20)) model.add(Activation('relu')) model.add(Dense(10)) model.add(Activation('softmax')) model.compile(optimizer='sgd', loss='categorical_crossentropy', metrics=['accuracy']) history = model.fit(x_train, y_train, batch_size=32, validation_data=(x_test, y_test))
プログラムの実行結果は次のようになります。
Train on 60000 samples, validate on 10000 samples Epoch 1/10 60000/60000 [==============================] - 6s - loss: 0.6895 - acc: 0.8073 - val_loss: 0.3421 - val_acc: 0.9029 Epoch 2/10 60000/60000 [==============================] - 6s - loss: 0.3159 - acc: 0.9085 - val_loss: 0.2759 - val_acc: 0.9218 Epoch 3/10 60000/60000 [==============================] - 8s - loss: 0.2626 - acc: 0.9243 - val_loss: 0.2375 - val_acc: 0.9306 Epoch 4/10 60000/60000 [==============================] - 14s - loss: 0.2282 - acc: 0.9338 - val_loss: 0.2081 - val_acc: 0.9388 Epoch 5/10 60000/60000 [==============================] - 8s - loss: 0.2027 - acc: 0.9413 - val_loss: 0.1904 - val_acc: 0.9437 Epoch 6/10 60000/60000 [==============================] - 7s - loss: 0.1833 - acc: 0.9469 - val_loss: 0.1774 - val_acc: 0.9468 Epoch 7/10 60000/60000 [==============================] - 6s - loss: 0.1682 - acc: 0.9520 - val_loss: 0.1669 - val_acc: 0.9516 Epoch 8/10 60000/60000 [==============================] - 7s - loss: 0.1560 - acc: 0.9541 - val_loss: 0.1575 - val_acc: 0.9531 Epoch 9/10 60000/60000 [==============================] - 6s - loss: 0.1452 - acc: 0.9578 - val_loss: 0.1476 - val_acc: 0.9548 Epoch 10/10 60000/60000 [==============================] - 6s - loss: 0.1360 - acc: 0.9606 - val_loss: 0.1412 - val_acc: 0.9563
今度は上手く学習できているようです。前回sigmoid関数で実行した場合は90%程度でしたので5%近く結果は向上したようです。
Heの初期値
KerasはDenseレイヤーの重みの初期化にglorot_uniform(Glorot(Xavier)の一様分布)を返します。sigmoid関数の場合はGlorotが良いようですが、ReLU関数を使う場合、He の正規分布を使うのが良いとされています。こちらも試してみましょう。
重みの初期化は次のように実装します。
from keras.initializers import he_normal model.add(Dense(20, kernel_initializer=he_normal()))
利用可能な初期値はKerasのマニュアルページが参考になります。
https://keras.io/ja/initializers/
先ほどのプログラムを修正してみましょう。
from keras.models import Sequential from keras.layers import Dense, Activation from keras.utils import to_categorical from keras.datasets import mnist from keras.initializers import he_normal (x_train, y_train), (x_test, y_test) = mnist.load_data() # 28x28 => 784 x_train = x_train.reshape(60000, 784) x_test = x_test.reshape(10000, 784) # 0-255 => 0-1 x_train = x_train.astype('float32') x_test = x_test.astype('float32') x_train /= 255 x_test /= 255 # one-hot ex: 3 => [0,0,0,1,0,0,0,0,0,0] y_train = to_categorical(y_train, 10) y_test = to_categorical(y_test, 10) model = Sequential() model.add(Dense(50, input_dim=784, kernel_initializer=he_normal())) model.add(Activation('relu')) model.add(Dense(20, kernel_initializer=he_normal())) model.add(Activation('relu')) model.add(Dense(10, kernel_initializer=he_normal())) model.add(Activation('softmax')) model.compile(optimizer='sgd', loss='categorical_crossentropy', metrics=['accuracy']) history2 = model.fit(x_train, y_train, batch_size=32, validation_data=(x_test, y_test))
プログラムの実行結果は次のようになります。
Train on 60000 samples, validate on 10000 samples Epoch 1/10 60000/60000 [==============================] - 12s - loss: 0.7723 - acc: 0.7700 - val_loss: 0.3472 - val_acc: 0.8991 Epoch 2/10 60000/60000 [==============================] - 15s - loss: 0.3223 - acc: 0.9079 - val_loss: 0.2798 - val_acc: 0.9179 Epoch 3/10 60000/60000 [==============================] - 9s - loss: 0.2673 - acc: 0.9229 - val_loss: 0.2414 - val_acc: 0.9291 Epoch 4/10 60000/60000 [==============================] - 6s - loss: 0.2325 - acc: 0.9336 - val_loss: 0.2169 - val_acc: 0.9371 Epoch 5/10 60000/60000 [==============================] - 8s - loss: 0.2070 - acc: 0.9406 - val_loss: 0.1937 - val_acc: 0.9427 Epoch 6/10 60000/60000 [==============================] - 14s - loss: 0.1872 - acc: 0.9458 - val_loss: 0.1786 - val_acc: 0.9484 Epoch 7/10 60000/60000 [==============================] - 9s - loss: 0.1706 - acc: 0.9509 - val_loss: 0.1644 - val_acc: 0.9520 Epoch 8/10 60000/60000 [==============================] - 9s - loss: 0.1561 - acc: 0.9547 - val_loss: 0.1552 - val_acc: 0.9542 Epoch 9/10 60000/60000 [==============================] - 9s - loss: 0.1443 - acc: 0.9582 - val_loss: 0.1445 - val_acc: 0.9576 Epoch 10/10 60000/60000 [==============================] - 10s - loss: 0.1344 - acc: 0.9611 - val_loss: 0.1369 - val_acc: 0.9589
若干の改善はありませんが、大きな変化はみられませんでした。今回のMNISTデータだと効果がわかりにくいのかもしれません。重みの初期化についてはまた時間のあるときに考察してみようと思います。