Murayama blog.

AIの民主化。

Kerasで画像認識 - MNIST編 - ReLU関数

KerasのMNISTのサンプルプログラムについて、活性化関数をsigmoid関数からReLU関数に変更してみましょう。

from keras.models import Sequential
from keras.layers import Dense, Activation
from keras.utils import to_categorical
from keras.datasets import mnist

(x_train, y_train), (x_test, y_test) = mnist.load_data()

# 28x28 => 784
x_train = x_train.reshape(60000, 784)
x_test = x_test.reshape(10000, 784)

# one-hot ex: 3 => [0,0,0,1,0,0,0,0,0,0]
y_train = to_categorical(y_train, 10)
y_test = to_categorical(y_test, 10)

model = Sequential()
model.add(Dense(50, input_dim=784))
model.add(Activation('relu'))
model.add(Dense(20))
model.add(Activation('relu'))
model.add(Dense(10))
model.add(Activation('softmax'))
model.compile(optimizer='sgd', loss='categorical_crossentropy', metrics=['accuracy'])

history = model.fit(x_train, y_train, batch_size=32, validation_data=(x_test, y_test))

プログラムを実行してみると、、学習が上手くいかないようです。

Train on 60000 samples, validate on 10000 samples
Epoch 1/10
60000/60000 [==============================] - 10s - loss: 14.5407 - acc: 0.0977 - val_loss: 14.5353 - val_acc: 0.0982
Epoch 2/10
60000/60000 [==============================] - 7s - loss: 14.5487 - acc: 0.0974 - val_loss: 14.5353 - val_acc: 0.0982
Epoch 3/10
60000/60000 [==============================] - 7s - loss: 14.5487 - acc: 0.0974 - val_loss: 14.5353 - val_acc: 0.0982
Epoch 4/10
60000/60000 [==============================] - 6s - loss: 14.5487 - acc: 0.0974 - val_loss: 14.5353 - val_acc: 0.0982
Epoch 5/10
60000/60000 [==============================] - 6s - loss: 14.5487 - acc: 0.0974 - val_loss: 14.5353 - val_acc: 0.0982
Epoch 6/10
60000/60000 [==============================] - 6s - loss: 14.5487 - acc: 0.0974 - val_loss: 14.5353 - val_acc: 0.0982
Epoch 7/10
60000/60000 [==============================] - 7s - loss: 14.5487 - acc: 0.0974 - val_loss: 14.5353 - val_acc: 0.0982
Epoch 8/10
60000/60000 [==============================] - 6s - loss: 14.5487 - acc: 0.0974 - val_loss: 14.5353 - val_acc: 0.0982
Epoch 9/10
60000/60000 [==============================] - 7s - loss: 14.5487 - acc: 0.0974 - val_loss: 14.5353 - val_acc: 0.0982
Epoch 10/10
60000/60000 [==============================] - 7s - loss: 14.5487 - acc: 0.0974 - val_loss: 14.5353 - val_acc: 0.0982

入力データの正規化

MNISTの画像ベクトルは0-255の値が格納されています。学習が上手く進むようにこの値を0-1の値に変換しておきます。

# 0-255 => 0-1
x_train = x_train.astype('float32')
x_test = x_test.astype('float32')
x_train /= 255
x_test /= 255

このような作業を正規化といいます。

プログラムを次のように実装して再度実行してみましょう。

from keras.models import Sequential
from keras.layers import Dense, Activation
from keras.utils import to_categorical
from keras.datasets import mnist

(x_train, y_train), (x_test, y_test) = mnist.load_data()

# 28x28 => 784
x_train = x_train.reshape(60000, 784)
x_test = x_test.reshape(10000, 784)

# 0-255 => 0-1
x_train = x_train.astype('float32')
x_test = x_test.astype('float32')
x_train /= 255
x_test /= 255

# one-hot ex: 3 => [0,0,0,1,0,0,0,0,0,0]
y_train = to_categorical(y_train, 10)
y_test = to_categorical(y_test, 10)

model = Sequential()
model.add(Dense(50, input_dim=784))
model.add(Activation('relu'))
model.add(Dense(20))
model.add(Activation('relu'))
model.add(Dense(10))
model.add(Activation('softmax'))
model.compile(optimizer='sgd', loss='categorical_crossentropy', metrics=['accuracy'])

history = model.fit(x_train, y_train, batch_size=32, validation_data=(x_test, y_test))

プログラムの実行結果は次のようになります。

Train on 60000 samples, validate on 10000 samples
Epoch 1/10
60000/60000 [==============================] - 6s - loss: 0.6895 - acc: 0.8073 - val_loss: 0.3421 - val_acc: 0.9029
Epoch 2/10
60000/60000 [==============================] - 6s - loss: 0.3159 - acc: 0.9085 - val_loss: 0.2759 - val_acc: 0.9218
Epoch 3/10
60000/60000 [==============================] - 8s - loss: 0.2626 - acc: 0.9243 - val_loss: 0.2375 - val_acc: 0.9306
Epoch 4/10
60000/60000 [==============================] - 14s - loss: 0.2282 - acc: 0.9338 - val_loss: 0.2081 - val_acc: 0.9388
Epoch 5/10
60000/60000 [==============================] - 8s - loss: 0.2027 - acc: 0.9413 - val_loss: 0.1904 - val_acc: 0.9437
Epoch 6/10
60000/60000 [==============================] - 7s - loss: 0.1833 - acc: 0.9469 - val_loss: 0.1774 - val_acc: 0.9468
Epoch 7/10
60000/60000 [==============================] - 6s - loss: 0.1682 - acc: 0.9520 - val_loss: 0.1669 - val_acc: 0.9516
Epoch 8/10
60000/60000 [==============================] - 7s - loss: 0.1560 - acc: 0.9541 - val_loss: 0.1575 - val_acc: 0.9531
Epoch 9/10
60000/60000 [==============================] - 6s - loss: 0.1452 - acc: 0.9578 - val_loss: 0.1476 - val_acc: 0.9548
Epoch 10/10
60000/60000 [==============================] - 6s - loss: 0.1360 - acc: 0.9606 - val_loss: 0.1412 - val_acc: 0.9563

今度は上手く学習できているようです。前回sigmoid関数で実行した場合は90%程度でしたので5%近く結果は向上したようです。

Heの初期値

KerasはDenseレイヤーの重みの初期化にglorot_uniform(Glorot(Xavier)の一様分布)を返します。sigmoid関数の場合はGlorotが良いようですが、ReLU関数を使う場合、He の正規分布を使うのが良いとされています。こちらも試してみましょう。

重みの初期化は次のように実装します。

from keras.initializers import he_normal

model.add(Dense(20, kernel_initializer=he_normal()))

利用可能な初期値はKerasのマニュアルページが参考になります。

https://keras.io/ja/initializers/

先ほどのプログラムを修正してみましょう。

from keras.models import Sequential
from keras.layers import Dense, Activation
from keras.utils import to_categorical
from keras.datasets import mnist
from keras.initializers import he_normal

(x_train, y_train), (x_test, y_test) = mnist.load_data()

# 28x28 => 784
x_train = x_train.reshape(60000, 784)
x_test = x_test.reshape(10000, 784)

# 0-255 => 0-1
x_train = x_train.astype('float32')
x_test = x_test.astype('float32')
x_train /= 255
x_test /= 255

# one-hot ex: 3 => [0,0,0,1,0,0,0,0,0,0]
y_train = to_categorical(y_train, 10)
y_test = to_categorical(y_test, 10)

model = Sequential()
model.add(Dense(50, input_dim=784, kernel_initializer=he_normal()))
model.add(Activation('relu'))
model.add(Dense(20, kernel_initializer=he_normal()))
model.add(Activation('relu'))
model.add(Dense(10, kernel_initializer=he_normal()))
model.add(Activation('softmax'))
model.compile(optimizer='sgd', loss='categorical_crossentropy', metrics=['accuracy'])

history2 = model.fit(x_train, y_train, batch_size=32, validation_data=(x_test, y_test))

プログラムの実行結果は次のようになります。

Train on 60000 samples, validate on 10000 samples
Epoch 1/10
60000/60000 [==============================] - 12s - loss: 0.7723 - acc: 0.7700 - val_loss: 0.3472 - val_acc: 0.8991
Epoch 2/10
60000/60000 [==============================] - 15s - loss: 0.3223 - acc: 0.9079 - val_loss: 0.2798 - val_acc: 0.9179
Epoch 3/10
60000/60000 [==============================] - 9s - loss: 0.2673 - acc: 0.9229 - val_loss: 0.2414 - val_acc: 0.9291
Epoch 4/10
60000/60000 [==============================] - 6s - loss: 0.2325 - acc: 0.9336 - val_loss: 0.2169 - val_acc: 0.9371
Epoch 5/10
60000/60000 [==============================] - 8s - loss: 0.2070 - acc: 0.9406 - val_loss: 0.1937 - val_acc: 0.9427
Epoch 6/10
60000/60000 [==============================] - 14s - loss: 0.1872 - acc: 0.9458 - val_loss: 0.1786 - val_acc: 0.9484
Epoch 7/10
60000/60000 [==============================] - 9s - loss: 0.1706 - acc: 0.9509 - val_loss: 0.1644 - val_acc: 0.9520
Epoch 8/10
60000/60000 [==============================] - 9s - loss: 0.1561 - acc: 0.9547 - val_loss: 0.1552 - val_acc: 0.9542
Epoch 9/10
60000/60000 [==============================] - 9s - loss: 0.1443 - acc: 0.9582 - val_loss: 0.1445 - val_acc: 0.9576
Epoch 10/10
60000/60000 [==============================] - 10s - loss: 0.1344 - acc: 0.9611 - val_loss: 0.1369 - val_acc: 0.9589

若干の改善はありませんが、大きな変化はみられませんでした。今回のMNISTデータだと効果がわかりにくいのかもしれません。重みの初期化についてはまた時間のあるときに考察してみようと思います。