AI image video generation through GANish algorithms

AI generated images and videos will let humans harder and harder to tell whether it’s real or fake. The images are so real and natural. The technology behind it uses Generative Adversarial Networks (GANs) derivatives. If the reader is not interested in its development history, please skip it and go to see the demo use cases.

If you are following AI technology, Deep Learning (or Deep Neural Network(DNN)) and Reinforcement Learning are the two major breakthroughs in modern history. The derivatives from Deep Neural Network for the ImageNet competition outperform human beings in 2015, 2016. After that, humans would never be able to beat machines in image classification accuracy. Since then, the derivatives of Deep Neural Network keep generating huge progress in research and real AI applications. One example is Convolutional Neural Network (CNN),  which is used heavily in face recognition. One China company, SenseTime (商湯科技) raised landmark Series C Financing with $600 million USD, valuing the Company at Over $4.5 Billion USD, is a very good example.

Under the derivatives of DNN, there is a famous one, Generative Adversarial Network (GAN). It was brought out by Ian Goodfellow in 2014. Facebook AI guru, Yann LeCun, told to Quora media,

This (GAN), and the variations that are now being proposed is the most interesting idea in the last 10 years in ML, in my opinion.

The number of research paper coming out of the variations of GANs has skyrocketed.


Fig. The number of research paper from the variations of GANs. Source: Gan-zoo

So, what is GAN? A quote from the blogger, Adit Deshpande’s explanation,

The basic idea of these networks is that you have 2 models, a generative model and a discriminative model. The discriminative model has the task of determining whether a given image looks natural (an image from the dataset) or looks like it has been artificially created. The task of the generator is to create natural looking images that are similar to the original data distribution. This can be thought of as a zero-sum or minimax two player game. The analogy used in the paper is that the generative model is like “a team of counterfeiters, trying to produce and use fake currency” while the discriminative model is like “the police, trying to detect the counterfeit currency”. The generator is trying to fool the discriminator while the discriminator is trying to not get fooled by the generator. As the models train through alternating optimization, both methods are improved until a point where the “counterfeits are indistinguishable from the genuine articles”.

Use case #1: Removing and inpainting a person in any part of an image

Sometimes, when we take a photo, there might be someone in the background image and we want to remove it. Although it can be done in 2012 in Adobe PhotoShop, the algorithm for doing it keep improving. Now, we can use one variation of GAN to do it.


Fig. Thank Arunabh Sharma to share his inpainting result by editing his own photo. Source: Inpainting Arunabh Sharma

Use case #2: Food image inpainting

This year, PIXNET hosted a hackathon about food image AI generation. An irregular shape was taken out from a food image. The teams are asked to implement AI image generation to fill in the hole. The judges are the audiences. They decide which AI-generated food images are more natural and humans are willing to eat. The winner went to which team earned most votes. The writer did the implementation by using Partial Convolutional Neural Network via Keras (PConv-Keras), and purposely pick strange ones here to show that not every AI is smart enough to inpaint an image. Of course, through selection of training data, and more iterations of training, the output image will become better and better.

Given an image like the following,


Fig. Image to be filled. Source: here

The generated output could be the followings


Fig. Food images generated by AI. Source: here

Which one do you think more real?

Use case #3: Simple line drawing becomes colorful image in real-time

Using a simple line drawing as its reference point, Pix2Pix converts it into a colorful image based on its understanding of shapes, human drawings and the real world.



Fig. Pix2Pix in real-time. Source: here

Use case #4: Motion transfer in video (Let people who don’t know how to dance do the dancing in video)

A research team at UC Berkeley published their work on Youtube. Given a source video of a person dancing, the algorithm can transfer that performance to a novel (amateur) target after only a few minutes of the target subject performing standard moves.

The above use cases are what I found interesting. Of course, there are more other use cases. One example is style-transfer. Given two images, one is by artist, such as Vincent van Gogh, and the other is common image. It can easily transfer the Van Gosh’s style to the other image. Another example is high-resolution image generation. Through a series of  generation and discrimination, a model can be trained to make an image become a higher resolution image.

AI4quant CEO Jason Chuang welcomes companies who would like to leverage AI to improve quality or efficiency in internal process. Please feel free to contact us.


PConv-Keras Github連結:

Pix2Pix Github 連結:

Is it possible to predict typhoon accurately through AI?


This time New Taipei City decided to take one day off, however, Taipei and Keelung decided not to take one day off. How confident is the Central Weather Bureau to predict the progress and direction of typhoon? From the result, its seems that Taipei city mayor, 柯P did a correct prediction. Is our typhoon model algorithm and computing power good enough to predict it? How can we leverage AI or machine learning to predict the typhoon?

AI can understand an image

I went to Google Cloud OnBoard yesterday, the instructor, Eefy, demo the Vision API. I was totally shock that how the AI can understand the meaning of an image. I want to use  a real case to test it out more. I remember that Taiwanese Ministry of Foreign Affairs made a mistake last December. They put Washington Dulles International Airport image on the second generation passport. Due to this mistake, it costs about 16.5 million NT dollars to put sticker on it. How different is the Washington Dulles International Airport and Taiwan Taoyuan International Airport? I use the Google Cloud Vision API to find out.


As we can see, the Vision API can perfectly understand which image is which airport. If the government can leverage AI to do image check, it can save us huge amount of money to remedy mistake.

I am also interested in when did the image recognition/ image classification had such breakthrough. When did the AI beat human’s judgement. Then I found out it’s the ImageNet Classification contest. It happens around 2015, and after that, AI always win over human eyes. The last ImageNet classification contest ends in 2017. There is no need to do further contest, because machine AI already outperforms human beings.