Super-Resolution using Deep Learning
Deep Learning technology has been applied to every possible solution to get better results. As we have been witnessing, Deep Learning has been making a great leap in Computer Vision. One of those examples is image quality upscaling. As an input to the Deep Learning model, we take the low-quality image and then
What are SRGANs mean?
SRGANs stands for (as you may have noticed) Super Resolution GAN. SRGANs are another type of algorithms used in Deep Learning. The first thing we should know is what GANs are? GANs acts as artist drawing/making content from scratch. In GAN there are two main networks such as encoder and decoder, where the first creates the content and the second judges how is the ground truth result differs from the predicted one.
Why we use SRGANs?
The problem is that we want to upscale the low-resolution image (or even video) to better quality. And there are a lot of classic methods doing the interpolation of image, but the problem is that it still gives us quality reduced and distorted the result. But with SRGANs we can get better results than classical interpolation methods.
Here are some examples:
How SRGANs work?
In GANs there are generator and discriminator, where generator, based on train dataset, creates new image from scratch and discriminator judges whether the output is from train dataset or it is fake generated. And then discriminator tells the generator to output a more realistic result.
Discriminator and generator are both learning simultaneously, and once the generator is trained it knows enough about the distribution of the training set so that it can now generate new samples which share very similar to the training set.
The SRGAN’s typical workflow is:
- Takes in an image as an input
- Passes it through a trained model
- Outputs an image/result of the same size or larger that is an improvement over the input.
In other words, 1.we take High-Quality image and manually downscale the quality. Thus we have high and low-quality images in our dataset. 2. Then we feed the low-quality image to the generator, which outputs the super-resolution image. And the last is that discriminator uses the output from the generator and compares with the ground truth image and backpropagates GAN losses and trains the discriminator and generator.
Some things to remember from the GAN architecture above:
- Residual Blocks. The NN where residual blocks, each layer feeds into the next layer and directly to 2–3 hoops away.
- PixelShuffler x2: This is a feature map upscaling.
- k3n64s1 — kernel size of 3, 64 channels and 1stride.
There is a lot of usage of SRGANs that would solve some important problems. For example, downloading images and videos on bad internet connection. I think my idea , which is about applying this technology to get upscaled images/videos, is nice usage of this technology. The same can be applied to video calls such that we can apply SRGAN only on the people’s face using perceptual losses. Because we usually don’t mind the background while having video calls.