Using GANs to generate realistic images

Small, but nonetheless realistic

Photo by Cristofer Jeschke on Unsplash

GANs are one of the most promising new algorithms in the field of machine learning. With uses ranging from detecting glaucomatous images to reconstructing an image of a person’s face after listening to their voice. I wanted to try GANs out for myself so I constructed a GAN using Keras to generate realistic images.


The dataset that I will be using is the CIFAR10 Dataset. CIFAR is an acronym that stands for the Canadian Institute For Advanced Research and the CIFAR-10 dataset was developed along with the CIFAR-100 dataset (covered in the next section) by researchers at the CIFAR institute.

The dataset is comprised of 60,000 32×32 pixel color photographs of objects from 10 classes, such as frogs, birds, cats, ships, airplanes, etc.

These are very small images, much smaller than a typical photograph, and the dataset is intended for computer vision research.

What are GANs?

General Adversarial Networks, or GANs for short, are a type of neural network for generative machine learning. They are able to accurately recreate similar, but not identical, content to what they are fed in.

How do GANs work?

A GAN consists of two parts: A generator and a discriminator.

The generator is a Neural Network that takes in random values and returns a long array of pixel values, that can be reconstructed to form images. The discriminator is another separate Neural Network that compares “real” and “fake” images, and tries to guess if they are real or fake.

The adversarial part of the GAN is how they work together and feed into each other: When training the GAN, the loss value for the generator is how accurate the discriminator is. The worse the discriminator performs, the better the generator is performing. On the other hand, the loss value of the discriminator is based on the accuracy of the predictions.

This means that the two Neural Networks are competing against each other: One is trying to trick the other, while the other tries to avoid being tricked.

Advantages of GANs:

  • Unsupervised Learning

Although the GAN itself is a form of Supervised Learning, the relationship between the generator and discriminator is unsupervised. This means that less data is required at every level of the network.

  • Highly applicable

Since the generator and discriminator of the data have convolutional layers as their input layers, the data for GANs usually come in the form of images. Since images can be expressed as a long array of numbers, most numerical data can be composed into images and are therefore compatible with GANs.

Disadvantages of GANs:

  • Long Computation Time

Because of the nested neural networks within the GAN, it can take a long time to train it. A good GPU is a necessity for training GANs.

  • Possible Collapse

The balance between the generator and the discriminator is very fragile. If there is a local minima for the generator, it might start creating unrecognizable generations that, by coincidence, perfectly fool the discriminator. This would happen more for images in which there is no clear pattern, as it would pick it up on false signals.

  • No True Way to Evaluate Model

On the premise that GANs would generate original image, there would be no objective, numerical way to check how accurate the recreations of the GAN is. One can only hope that the GAN will do its job.

Now that you have a basic understanding of how GANs should theoretically work, let’s look into the code.


Step 1| Import Prerequisites:

from numpy import expand_dims
from numpy import zeros
from numpy import ones
from numpy import vstack
from numpy.random import randn
from numpy.random import randint
from keras.datasets.cifar10 import load_data
from keras.optimizers import Adam
from keras.models import Sequential
from keras.layers import Dense
from keras.layers import Reshape
from keras.layers import Flatten
from keras.layers import Conv2D
from keras.layers import Conv2DTranspose
from keras.layers import LeakyReLU
from keras.layers import Dropout
from matplotlib import pyplot

These are all the prerequisites necessary for the program to function. There is no need to download external files from the internet to get the CIFAR10 dataset. CIFAR10 is one of the few featured datasets on Keras, along with the MNIST dataset and the Boston Housing dataset. The first time the load_data function is called, it will be downloaded on your computer. After that, it just loads that file.

Step 2| Define Discriminator:

def define_discriminator(in_shape=(32,32,3)):
model = Sequential()
model.add(Conv2D(64, (3,3), padding='same', input_shape=in_shape))
model.add(Conv2D(128, (3,3), strides=(2,2), padding='same'))
model.add(Conv2D(128, (3,3), strides=(2,2), padding='same'))
model.add(Conv2D(256, (3,3), strides=(2,2), padding='same'))
model.add(Dense(1, activation='sigmoid'))
opt = Adam(lr=0.0002, beta_1=0.5)
model.compile(loss='binary_crossentropy', optimizer=opt, metrics=['accuracy'])
return model

To give the discriminator the ability to evaluate if an image is fake or real, it must have a final layer with a sigmoid function. This is so that the predictions can be limited within 0 and 1. The in_shape represents the resolution of the image, the classic 3 channels, for an image of 32x32 pixels.

Step 3| Define Generator:

def define_generator(latent_dim):
model = Sequential()
n_nodes = 256 * 4 * 4
model.add(Dense(n_nodes, input_dim=latent_dim))
model.add(Reshape((4, 4, 256)))
model.add(Conv2DTranspose(128, (4,4), strides=(2,2), padding='same'))
model.add(Conv2DTranspose(128, (4,4), strides=(2,2), padding='same'))
model.add(Conv2DTranspose(128, (4,4), strides=(2,2), padding='same'))
model.add(Conv2D(3, (3,3), activation='tanh', padding='same'))
return model

The generator is given a set of random points, that is upscaled with each Conv2DTRanspose, until it finally reaches the 32,32,3 resolution. The output can therefore be formed into an image using matplotlib.

Step 4| Define GAN:

def define_gan(g_model, d_model):
d_model.trainable = False
model = Sequential()
opt = Adam(lr=0.0002, beta_1=0.5)
model.compile(loss='binary_crossentropy', optimizer=opt)
return model

The gan is simply stacking the generator on top of the discriminator, so that the output of the generator is fed directly into the discriminator. The loss is binary_crossentropy as there are only two possible outputs from the discriminator, fake or real.

Step 5| Configure inputs:

def load_real_samples():
(trainX, _), (_, _) = load_data()
X = trainX.astype('float32')
X = (X - 127.5) / 127.5
return X

def generate_real_samples(dataset, n_samples):
ix = randint(0, dataset.shape[0], n_samples)
X = dataset[ix]
y = ones((n_samples, 1))
return X, y

def generate_latent_points(latent_dim, n_samples):
x_input = randn(latent_dim * n_samples)
x_input = x_input.reshape(n_samples, latent_dim)
return x_input

def generate_fake_samples(g_model, latent_dim, n_samples):
x_input = generate_latent_points(latent_dim, n_samples)
X = g_model.predict(x_input)
y = zeros((n_samples, 1))
return X, y

The first foresight one must have before training a network is whether one has configured enough information to perform a full propagation of the network. The necessary inputs for a full propagation are the set of latent points for the generator and the fake and real samples for the discriminator.

Step 6| Train GAN:

def train(g_model, d_model, gan_model, dataset, latent_dim, n_epochs=200, n_batch=128):
bat_per_epo = int(dataset.shape[0] / n_batch)
half_batch = int(n_batch / 2)
for i in range(n_epochs):
for j in range(bat_per_epo):
X_real, y_real = generate_real_samples(dataset, half_batch)
d_loss1, _ = d_model.train_on_batch(X_real, y_real)
X_fake, y_fake = generate_fake_samples(g_model, latent_dim, half_batch)
d_loss2, _ = d_model.train_on_batch(X_fake, y_fake)
X_gan = generate_latent_points(latent_dim, n_batch)
y_gan = ones((n_batch, 1))
g_loss = gan_model.train_on_batch(X_gan, y_gan)
print('>%d, %d/%d, d1=%.3f, d2=%.3f g=%.3f' %
(i+1, j+1, bat_per_epo, d_loss1, d_loss2, g_loss))
if (i+1) % 10 == 0:
summarize_performance(i, g_model, d_model, dataset, latent_dim)

By training the GAN, the discriminator and the generator’s weights are presumed to be linked, and the gradients are propagated backwards. The batch size can be adjusted to control the fitting of the model or computation time and a basic report will be shown every epoch.

Step 7| Summarize Performance:

def summarize_performance(epoch, g_model, d_model, dataset, latent_dim, n_samples=150)
X_real, y_real = generate_real_samples(dataset, n_samples)
_, acc_real = d_model.evaluate(X_real, y_real, verbose=0)
x_fake, y_fake = generate_fake_samples(g_model, latent_dim, n_samples)
_, acc_fake = d_model.evaluate(x_fake, y_fake, verbose=0)
print('>Accuracy real: %.0f%%, fake: %.0f%%' % (acc_real*100, acc_fake*100))
save_plot(x_fake, epoch)
filename = 'generator_model_%03d.h5' % (epoch+1)

Summarizing the performance will show the accuracy of the discriminator and save the best weights in a file of the same directory so that training can be spread out over time. I executed this program on a Google Colab notebook, as the data was on the Keras API, preventing the need for reading the documentation of loading local files into google colab notebooks.

Step 8| Execute Program:

latent_dim = 100
d_model = define_discriminator()
g_model = define_generator(latent_dim)
gan_model = define_gan(g_model, d_model)
dataset = load_real_samples()
train(g_model, d_model, gan_model, dataset, latent_dim)

Now that all of the functions have been configured, we just have to call upon them with the relevant parameters.


The program is actually reasonably successful. Observe the results below!

Generated Images by Author

I hope that I have piqued your interest with GANs and given you a sound argument on their place in machine learning in the future.

Thank you for reading my article!

Using GANs to generate realistic images using Keras and the CIFAR10 Dataset was originally published in Towards Data Science on Medium, where people are continuing the conversation by highlighting and responding to this story.