[머신러닝/딥러닝] MNIST Dataset

Deep Learning2019. 12. 18. 22:28

[머신러닝/딥러닝] MNIST Dataset

※ 김성훈 교수님의 [모두를 위한 딥러닝] 강의 정리

- https://www.youtube.com/watch?reload=9&v=BS6O0zOGX4E&feature=youtu.be&list=PLlMkM4tgfjnLSOjrEJN31gZATbcj_MpUm&fbclid=IwAR07UnOxQEOxSKkH6bQ8PzYj2vDop_J0Pbzkg3IVQeQ_zTKcXdNOwaSf_k0

- 참고자료 : Andrew Ng's ML class

1) https://class.coursera.org/ml-003/lecture

2) http://holehouse.org/mlclass/ (note)

1. MNIST Dataset

- 숫자 0~9까지의 손글씨 이미지의 집합

- 학습데이터 60,000개, 테스트데이터 10,000개로 구성

- 사이즈는 28 x 28, 이미지의 값은 0 또는 1 (흑 또는 백)

- preprocessing, formatting이 모두 완료된 데이터셋

- 다운로드 주소 : http://yann.lecun.com/exdb/mnist/

2. TensorFlow에서 MNIST Dataset 불러오기 using input_data.py

- 다운로드 주소 : https://github.com/tensorflow/tensorflow/blob/r0.7/tensorflow/examples/tutorials/mnist/input_data.py

from tensorflow.examples.tutorials.mnist import input_data

# Check out https://www.tensorflow.org/get_started/mnist/beginners for

# more information about the mnist dataset

mnist = input_data.read_data_sets("MNIST_data/", one_hot=True)

nb_classes = 10

# MNIST data image of shape 28 * 28 = 784

X = tf.placeholder(tf.float32, [None, 784])

# 0 - 9 digits recognition = 10 classes

Y = tf.placeholder(tf.float32, [None, nb_classes])

W = tf.Variable(tf.random_normal([784, nb_classes]))

b = tf.Variable(tf.random_normal([nb_classes]))

3. Softmax

# Hypothesis (using softmax)

hypothesis = tf.nn.softmax(tf.matmul(X, W) + b)

cost = tf.reduce_mean(-tf.reduce_sum(Y * tf.log(hypothesis), axis=1))

train = tf.train.GradientDescentOptimizer(learning_rate=0.1).minimize(cost)

# Test model

is_correct = tf.equal(tf.argmax(hypothesis, 1), tf.argmax(Y, 1))

# Calculate accuracy

accuracy = tf.reduce_mean(tf.cast(is_correct, tf.float32))

4. Training epoch/batch

- epoch : 전체 training set을 1회 학습

- batch : 1 epoch을 나누어서 실행하기 위한 사이즈

# parameters

num_epochs = 15

batch_size = 100

num_iterations = int(mnist.train.num_examples / batch_size)

with tf.Session() as sess:

# Initialize TensorFlow variables

sess.run(tf.global_variables_initializer())

# Training cycle

for epoch in range(num_epochs):

avg_cost = 0

for i in range(num_iterations):

batch_xs, batch_ys = mnist.train.next_batch(batch_size)

_, cost_val = sess.run([train, cost], feed_dict={X: batch_xs, Y: batch_ys})

avg_cost += cost_val / num_iterations

print("Epoch: {:04d}, Cost: {:.9f}".format(epoch + 1, avg_cost))

print("Learning finished")

5. Report results on test dataset

# Test the model using test sets

print(

"Accuracy: ",

accuracy.eval(

session=sess, feed_dict={X: mnist.test.images, Y: mnist.test.labels}

)

6. Sample image show and prediction

# Get one and predict

r = random.randint(0, mnist.test.num_examples - 1)

print("Label: ", sess.run(tf.argmax(mnist.test.labels[r : r + 1], 1)))

print(

"Prediction: ",

sess.run(tf.argmax(hypothesis, 1), feed_dict={X: mnist.test.images[r : r + 1]}),

)

plt.imshow(

mnist.test.images[r : r + 1].reshape(28, 28),

cmap="Greys",

interpolation="nearest",

)

plt.show()

7. 전체 소스코드

import tensorflow as tf

import matplotlib.pyplot as plt # matplotlib : 이미지를 다루는 라이브러리

import random # random : W와 b에 임의 값을 주는 역할

tf.set_random_seed(777) # for reproducibility

from tensorflow.examples.tutorials.mnist import input_data

# Check out https://www.tensorflow.org/get_started/mnist/beginners for

# more information about the mnist dataset

mnist = input_data.read_data_sets("MNIST_data/", one_hot=True) # MNIST Dataset 다운로드

nb_classes = 10 # 10개 label(0~9) 분류

# MNIST data image of shape 28 * 28 = 784 (이미지 feature 784개)

X = tf.placeholder(tf.float32, [None, 784])

# 0 - 9 digits recognition = 10 classes (이미지와 매칭되어 있는 label(정답) 10종류)

Y = tf.placeholder(tf.float32, [None, nb_classes])

W = tf.Variable(tf.random_normal([784, nb_classes])) # W, b : 랜덤으로 값 지정

b = tf.Variable(tf.random_normal([nb_classes]))

# Hypothesis (using softmax)

hypothesis = tf.nn.softmax(tf.matmul(X, W) + b) # 가설함수 H(X)를 softmax에 대입

cost = tf.reduce_mean(-tf.reduce_sum(Y * tf.log(hypothesis), axis=1)) # Cross-Entropy를 이용한 cost

train = tf.train.GradientDescentOptimizer(learning_rate=0.1).minimize(cost) # Gradient Descent Algorithm을 이용한 cost 최소화

# Test model : 정답(label)과 H(X)의 예상결과 비교

is_correct = tf.equal(tf.argmax(hypothesis, 1), tf.argmax(Y, 1))

# Calculate accuracy : 정확도 측정

accuracy = tf.reduce_mean(tf.cast(is_correct, tf.float32))

# parameters

num_epochs = 15 # 전체 이미지를 15회 반복 학습

batch_size = 100 # 학습 이미지 파일이 많아서 100개씩 가져옴

num_iterations = int(mnist.train.num_examples / batch_size)

with tf.Session() as sess:

# Initialize TensorFlow variables : TensorFlow 변수 초기화 및 세션 시작

sess.run(tf.global_variables_initializer())

# Training cycle

for epoch in range(num_epochs):

avg_cost = 0

for i in range(num_iterations):

batch_xs, batch_ys = mnist.train.next_batch(batch_size) # Training 데이터를 X와 Y에 너헝줌

_, cost_val = sess.run([train, cost], feed_dict={X: batch_xs, Y: batch_ys}) # 학습과정의 cost 계산

avg_cost += cost_val / num_iterations

print("Epoch: {:04d}, Cost: {:.9f}".format(epoch + 1, avg_cost)) # 1회 학습당 cost 결과 출력

print("Learning finished")

# Test the model using test sets : Test 데이터를 이용하여 label(정답)과 비교. 정확도 계산

print(

"Accuracy: ",

accuracy.eval(

session=sess, feed_dict={X: mnist.test.images, Y: mnist.test.labels}

)

# Get one and predict

r = random.randint(0, mnist.test.num_examples - 1) # Test 데이터에서 임의의 값을 뽑아서 예측

print("Label: ", sess.run(tf.argmax(mnist.test.labels[r : r + 1], 1)))

print(

"Prediction: ",

sess.run(tf.argmax(hypothesis, 1), feed_dict={X: mnist.test.images[r : r + 1]}),

)

plt.imshow(

mnist.test.images[r : r + 1].reshape(28, 28),

cmap="Greys",

interpolation="nearest",

)

plt.show() # Test 데이터에서 뽑은 이미지 출력

저작자표시 비영리 변경금지

'Deep Learning' 카테고리의 다른 글

[머신러닝/딥러닝] XOR 문제 딥러닝으로 풀기 (0)	2019.12.23
[머신러닝/딥러닝] Tensor Manipulation (0)	2019.12.18
[머신러닝/딥러닝] 딥러닝의 기본 개념 (0)	2019.12.18
[머신러닝/딥러닝] 팁 : Learning rate, Preprocessing, Overfitting (0)	2019.12.12
[머신러닝/딥러닝] Softmax Classification 구현하기 by TensorfFlow (0)	2019.12.11

Posted by CCIBOMB

[ccibomb@CRG]# _bykim

[머신러닝/딥러닝] MNIST Dataset

'Deep Learning' 카테고리의 다른 글

카테고리

공지사항

태그목록

최근에 올라온 글

글 보관함

달력

티스토리툴바