Learn autoencoders by training one right in your browser!
Autoencoders have many different applications, but most notably
they have been used for dimensionality reduction and as
generative models
The rest of the article will dive into the Structure of an autoencoder in context to 1. More specifically, it will be broken up into the Encoder, Latent Space, and Decoder to explain each piece of the puzzle. Then, the article will end with a Conclusion that extends to applications of autoencoders elsewhere.
The Encoder is the first half
of the neural network that takes an input with a
higher dimension, then outputs to a
lower
dimension – thereby creating a bottleneck3D
) to 2 Dimensional(2D
)
data.
3D
to
2D
.
You can see from
2 after
training the entire autoencoder, the
Encoder is just a learned
function that takes the input to a lower dimension. In the
2, the
2D Latent Space looks like the
3D input Data if we ignored
the vertical dimension. This is exactly what we should expect
given the bottleneck defined from 3D
to
2D
.
The Latent Space is composed of all outputs from the Encoder. Or in other words, one output is a latent vector, and all outputs would constitute a Latent Space. Among being able to do vector arithmetic to find connections and combinations, visualizing this space gives insight to the structure of the data.
One way to visualize the structure that forms is through
Opposite
Gradients
. By understanding what
direction each point in the
Latent Space
is tending towards, we can get an idea of where the training is
headed towards
After computing the gradient of loss with respect to the latent
output
Just by observing the trails
, you can see the structure
that takes place over training. Notice how each point doesn't
move exactly in the direction of it's trail
, it is more of an indicator
of the gravity of the structure: larger and more trails
will pull the data in that
direction, and uniformly distributed trails
will not affect the structure at all. This method of
visualization can be applied to other outputs, demonstrated by
Kahng et al.
in
GAN Lab
After the Encoder deconstructs the original input down to the Latent Space, the Decoder reconstructs back up to the original input. Hence the trapezoid for the Decoder design, starting with a smaller base, moving to a larger one. The loss function, adequately named "reconstruction loss," is computed with the original input and the reconstructed input. Now we can backpropagate from the reconstruction loss and optimize! All the pieces are now present to train the autoencoder.
2D
to 3D
.
You can see from 5 after training the entire autoencoder, the Decoder is just a learned function that reconstructs from the bottleneck. In the 5, the 3D Reconstruction looks like the 2D Latent Space if we added a dimension.
Autoencoders are not just fixed to 3D
data like the
previous examples. They can be used on other examples too!
In fact, in addition to being applied to many different shapes
and sizes of data, the autoencoder structure can be used to
tackle real problems like denoising images, removing
imperfections or watermarks from images, learning complex or
emergent structures in data, and many more
To give one final example, if we wanted to visualize the
structure of the
MNIST
digits dataset that consists of
28 by 28
handwritten digits (0-9), we could use an
autoencoder!
In 6. after
training the autoencoder with a 2D
bottleneck, we
can see clusters form in the
Latent Space! Also notice the
similarity between the digits! See how the
9s are seen mixed in with
the 7s and
4s in the
Latent Space.
The outcome was heavily influenced and inspired by the amazing
works of
GAN Lab and
Understanding UMAP
3D
data reduced down to
2D
The article was styled with the Distill HTML Template.
Understanding UMAP, Communicating with Interactive Articles, and GAN Lab were used for a css styling reference for the article and controls.
Libraries used: Plotting and visualization done in Svelte, with help of d3.js and ScatterGL. Autoencoder created and trained with Tensorflow JS.
Donald "Donny" R. Bertucci implemented all of the visualizations and wrote the article. Donny is an undergraduate student at Oregon State University and a member of the Data Interaction and Visualization (DIV) Lab .