Learn autoencoders by training one right in your browser!

Autoencoders have many different applications, but most notably
they have been used for dimensionality reduction and as
generative models
** very simple**! Autoencoders just learn to
*deconstruct* down, then learn to *reconstruct* back
up.

The rest of the article will dive into the Structure of an autoencoder in context to 1. More specifically, it will be broken up into the Encoder, Latent Space, and Decoder to explain each piece of the puzzle. Then, the article will end with a Conclusion that extends to applications of autoencoders elsewhere.

The Encoder is the first half
of the neural network that takes an input with a
**higher** dimension, then outputs to a
**lower**
dimension – thereby creating a bottleneck`3D`

) to 2 Dimensional(`2D`

)
data.

`3D`

to
`2D`

.
You can see from
2 after
training the entire autoencoder, the
Encoder is just a learned
function that takes the input to a lower dimension. In the
2, the
2D Latent Space looks like the
3D input Data if we ignored
the vertical dimension. This is exactly what we should expect
given the bottleneck defined from `3D`

to
`2D`

.

The Latent Space is composed of all outputs from the Encoder. Or in other words, one output is a latent vector, and all outputs would constitute a Latent Space. Among being able to do vector arithmetic to find connections and combinations, visualizing this space gives insight to the structure of the data.

One way to visualize the structure that forms is through
Opposite
Gradients
. By understanding what
direction each point in the
Latent Space
is tending towards, we can get an idea of where the training is
headed towards

After computing the gradient of loss with respect to the latent
output**increase** loss. Then, since the goal is to lower loss, we
negate**Opposite** of "**Opposite** Gradient" refers
to.**decrease** loss. In
3,
the point has a trail that represents the Opposite Gradient:
what direction the point needs to move to lower loss

Just by observing the trails
, you can see the structure
that takes place over training. Notice how each point doesn't
move exactly in the direction of it's trail
, it is more of an indicator
of the gravity of the structure: larger and more trails
will pull the data in that
direction, and uniformly distributed trails
will not affect the structure at all. This method of
visualization can be applied to other outputs, demonstrated by
`Kahng `

in
GAN Lab*et al.*

After the Encoder
**deconstructs**
the original input down to the
Latent Space, the
Decoder
**reconstructs** back up to the original input. Hence the
trapezoid for the
Decoder design, starting with a
smaller base, moving to a larger one. The loss function,
adequately named "reconstruction loss," is computed with the
original input and the reconstructed input. Now we can
backpropagate from the reconstruction loss and optimize! All the
pieces are now present to train the autoencoder.

`2D`

to `3D`

.
You can see from 5 after training the entire autoencoder, the Decoder is just a learned function that reconstructs from the bottleneck. In the 5, the 3D Reconstruction looks like the 2D Latent Space if we added a dimension.

Autoencoders are not just fixed to `3D`

data like the
previous examples. They can be used on other examples too!

In fact, in addition to being applied to many different shapes
and sizes of data, the autoencoder structure can be used to
tackle real problems like denoising images, removing
imperfections or watermarks from images, learning complex or
emergent structures in data, and many more

To give one final example, if we wanted to visualize the
structure of the
`MNIST`

digits dataset that consists of
`28 by 28`

handwritten digits (0-9), we could use an
autoencoder!

In 6. after
training the autoencoder with a `2D`

bottleneck, we
can see clusters form in the
Latent Space! Also notice the
similarity between the digits! See how the
9s are seen mixed in with
the 7s and
4s in the
Latent Space.

The outcome was heavily influenced and inspired by the amazing
works of
GAN Lab and
Understanding UMAP
`3D`

data reduced down to
`2D`

The article was styled with the Distill HTML Template.

Understanding UMAP, Communicating with Interactive Articles, and GAN Lab were used for a css styling reference for the article and controls.

**Libraries used: **Plotting and visualization done in
Svelte, with help of
d3.js and
ScatterGL.
Autoencoder created and trained with
Tensorflow JS.

Donald "Donny" R. Bertucci implemented all of the visualizations and wrote the article. Donny is an undergraduate student at Oregon State University and a member of the Data Interaction and Visualization (DIV) Lab .