An Interactive Introduction to Autoencoders

Learn autoencoders by training one right in your browser!

Published

VISxAI Workshop 2021

Created

July 2021

Autoencoders have many different applications, but most notably they have been used for dimensionality reduction and as generative models . Even though autoencoders are used to learn complex representations, the neural network architecture itself is very simple! Autoencoders just learn to deconstruct down, then learn to reconstruct back up.

The rest of the article will dive into the Structure of an autoencoder in context to 1. More specifically, it will be broken up into the Encoder, Latent Space, and Decoder to explain each piece of the puzzle. Then, the article will end with a Conclusion that extends to applications of autoencoders elsewhere.

Structure

Encoder

The Encoder is the first half of the neural network that takes an input with a higher dimension, then outputs to a lower dimension – thereby creating a bottleneckMore precisely, it takes the input of some dimension m , then creates a bottleneck by reducing the dimensions down to n, where m > n.. Similar to other data compression problems, we are going from a larger to smaller representation. Hence, the design of the Encoder from 1 trapezoid , starting with a larger base, moving to a smaller one: going from 3 Dimensional(3D) to 2 Dimensional(2D) data.

2: Hover over data points in 3D input Data or 2D Latent Space to see encoder mapping from 3D to 2D.

You can see from 2 after training the entire autoencoder, the Encoder is just a learned function that takes the input to a lower dimension. In the 2, the 2D Latent Space looks like the 3D input Data if we ignored the vertical dimension. This is exactly what we should expect given the bottleneck defined from 3D to 2D.

Latent Space

The Latent Space is composed of all outputs from the Encoder. Or in other words, one output is a latent vector, and all outputs would constitute a Latent Space. Among being able to do vector arithmetic to find connections and combinations, visualizing this space gives insight to the structure of the data.

One way to visualize the structure that forms is through Opposite Gradients trail. By understanding what direction each point in the Latent Space is tending towards, we can get an idea of where the training is headed towards .

After computing the gradient of loss with respect to the latent outputThe partial derivative of loss with respect to the first latent output \frac{\partial \text{loss}}{\partial \text{latent}_0} and the partial derivative of loss with respect to the second latent output \frac{\partial \text{loss}}{\partial \text{latent}_1}. , we now have direction of steepest ascent to increase loss. Then, since the goal is to lower loss, we negate The negation of the Gradient is where the term Opposite of "Opposite Gradient" refers to. the direction to decrease loss. In 3, the point has a trail that represents the Opposite Gradient: what direction the point needs to move to lower loss.

4: Drag the slider to see 2D Latent Space during training.

Just by observing the trails trail, you can see the structure that takes place over training. Notice how each point doesn't move exactly in the direction of it's trail trail, it is more of an indicator of the gravity of the structure: larger and more trails trail will pull the data in that direction, and uniformly distributed trails trail will not affect the structure at all. This method of visualization can be applied to other outputs, demonstrated by Kahng et al. in GAN Lab. By specifically applying it to the 2D Latent Space in 4, it becomes much easier to see where the points want to move and understand the underlying structure.

Decoder

After the Encoder deconstructs the original input down to the Latent Space, the Decoder reconstructs back up to the original input. Hence the trapezoid for the Decoder design, starting with a smaller base, moving to a larger one. The loss function, adequately named "reconstruction loss," is computed with the original input and the reconstructed input. Now we can backpropagate from the reconstruction loss and optimize! All the pieces are now present to train the autoencoder.

5: Hover over data points in 2D Latent Space or 3D Reconstruction to see mapping from 2D to 3D.

You can see from 5 after training the entire autoencoder, the Decoder is just a learned function that reconstructs from the bottleneck. In the 5, the 3D Reconstruction looks like the 2D Latent Space if we added a dimension.

Conclusion

Autoencoders are not just fixed to 3D data like the previous examples. They can be used on other examples too!

In fact, in addition to being applied to many different shapes and sizes of data, the autoencoder structure can be used to tackle real problems like denoising images, removing imperfections or watermarks from images, learning complex or emergent structures in data, and many more .

To give one final example, if we wanted to visualize the structure of the MNIST digits dataset that consists of 28 by 28 handwritten digits (0-9), we could use an autoencoder!

6: Hover over 2D Latent Space to see Reconstruction of a digit.

In 6. after training the autoencoder with a 2D bottleneck, we can see clusters form in the Latent Space! Also notice the similarity between the digits! See how the 9s are seen mixed in with the 7s and 4s in the Latent Space.

Acknowledgments

The outcome was heavily influenced and inspired by the amazing works of GAN Lab and Understanding UMAP . Visualizing 3D data reduced down to 2D , the model view, the layered distributions, and the gradient lines/trails were key ideas used in the main autoencoder visualization.

The article was styled with the Distill HTML Template.

Understanding UMAP, Communicating with Interactive Articles, and GAN Lab were used for a css styling reference for the article and controls.

Libraries used: Plotting and visualization done in Svelte, with help of d3.js and ScatterGL. Autoencoder created and trained with Tensorflow JS.

Who created this?

Donald "Donny" R. Bertucci implemented all of the visualizations and wrote the article. Donny is an undergraduate student at Oregon State University and a member of the Data Interaction and Visualization (DIV) Lab .