Holy shit – I thought it was lost forever. While Googling my dead name out of morbid curiosity I found a copy of one of my old Triangles™ related blag posts on Medium. The images I originally made for the project are on a white background which doesn’t really *jive* with my hip new dark themed blag. Please forgive me for any eyestrain encountered.

# Going the Distance

The goal of this project is to be able to create *good* representation of an image using only **N** triangles.

## What is Good?

Defining what **good** and **good enough** means is important.

If we don’t know what a **good** image is:

We can’t tell the difference between a bad representation and a good representation. We wouldn’t be able to search in the correct direction, or search at all.

If we don’t know what **good enough** is:

We have to enumerate every possible representation and take the best one. While possible, this is intractable for almost all image sizes and number of triangles.

Good is also relative to the context of the goal. After all, a list of triangles that is a good representation of an image of a fruit bowl is not a good representation of an image of the Eiffel Tower.

So the first step is to convert our list of triangles into an image through a process called **rasterization**. How I do this in my project will be the subject of a future blog post. Now we have two images, the **original** and an image created from our representation of it.

## Distance

Before we talk about the ‘*goodness*‘ of an image, lets talk about the distance between two images. A good distance function should take two images as inputs, and output a number such that the less similar the images, the higher the number. If the images are identical, the distance should be 0 as there is no way to make them ‘closer’.

An image can be thought of as a 2-dimensional grid, with each ‘square’ on the grid representing a pixel. Each pixel can be thought of as 3 colors, Red, Green and Blue that are mixed in different proportions to create the appearance of a single color to your brain.

For example, here is a simple 3 pixel by 3 pixel image of a green and red checkerboard pattern.

There are many different ways to represent the color of pixel, however in this project I will be using RGB (Red, Green, Blue). Each color channel can take on a value from 0 to 255 for a total of \(256³ = 16,777,216\) possible colors per pixel.

#### Simple Distance Function

A simple way to compute the distance between two images is to compute the sum of the distances between each pixel. Define pixel distance as the sum of the absolute value of the differences of two pixels’ color channels.

\begin{align} imageDistance(img1, img2) & = \sum_{i=0}^\text{pixels} pixelDistance(img1[i], img2[i]) \\\\ pixelDistance(p1, p2) & = \sum_{color}^\text{r,g,b} |p1.color – p2.color|\end{align}

Using these formulas we can calculate the distance between these two images.

To start we compare each corresponding pixel of the images. We’ll go left-right, top-bottom and start with the upper left pixel.

The distance between the first pixel of img1.png and img2.png:

\begin{align} pixelDistance(img1[1], img2[1]) & = | 214 – 182 | + | 84 – 42 | + | 84 – 88| \\\\ & = 32 + 42 + 4 \\\\ & = 78 \end{align}

The distance between the second pixel of img1.png and img2.png:

\begin{align} pixelDistance(img1[2], img2[2]) & = | 160 – 160 | + | 204 – 204 | + | 88 – 88| \\\\ & = 0 + 0 + 0 \\\\ & = 0 \end{align}

Since all the other pixels are comparisons between the same colors, the final distance looks like this:

\begin{align} imageDistance(img1, img2) & = (5 \times 32) + (4 \times 0) \\\\ & = 160 \end{align}

#### Mean Squared Error Distance

The simple distance we used above is a good start but we can make an improvement by switching our **pixelDistance** function from using the sum of the differences to using the sum of the differences squared.

\[pixelDistance(p1, p2) = \sum_{color}^\text{r,g,b} (p1.color – p2.color)^2 \]

The new distance between the first pixels:

\begin{align} pixelDistance(img1[1], img2[1]) & = (214 – 182)^2 + (84 – 42)^2 + (84 – 88)^2 \\\\ & = 1{,}024 + 1{,}764 + 16 \\\\ & = 2{,}804 \end{align}

The new distance between the second pixels:

\begin{align} pixelDistance(img1[2], img2[2]) & = (160 – 160)^2 + (204 – 204)^2 + (88 – 88)^2 \\\\ & = 0 + 0 + 0 \\\\ & = 0 \end{align}

\begin{align} imageDistance(img1, img2) & = (5 \times 2{,}804) + (4 \times 0) \\\\ & = 14{,}020 \end{align}

Why is this better?

First, it’s important to note that although it appears the distance has increased from 160 to 14,020, that you cannot directly compare distances derived from two different distance functions.

Second, take a look at the color swatches below. You’ll notice that although both swatches on the right differ from the base color by 48, the center swatch distributes it evenly amongst all 3 channels, while the far right swatch has all the difference in the red channel.

The simple pixel distance function would assign both pixels a distance of 48. The mean squared error pixel distance function would assign the center swatch a distance of 768 and the right swatch a distance of 2,304. The mean squared error distance function favors spreading out the difference among all the colors channels. This is preferable because if the difference is spread out amongst all three channels it appears to be the same image, only lightened or darkened. If the distance is all in one channel, the hue shifts and the colors are all off. I would favor a dimmer/brighter version of an image to a hue shifted version of an image so this is an improvement.

Here’s our red panda friend again. In the first image, every pixel had its red channel lowered by 75. In the second, every pixel had all channels lowered by 25. The third picture is the original for comparison.

## Colors and Shapes

There is still a weakness with our image distance functions, and that is it only takes into account the distance between colors, and not the distance between the overall form of the image.

To illustrate, the below image is the red panda image where every color channel has been darkened by 35.

Where as this plain brown image, where every pixel has been set to the average RGB color of the original, scores about **8%** better.

I hope you would agree with me when I assert that the first image is more similar to the original than latter. The latter could be almost anything, where as the first just needs to be lightened a bit to be an exact match.

### Grayscale

One attempt would be to convert the image to grayscale, removing color data and leaving only brightness.

The 3 color channels can be converted into a single brightness value using the following formula.

\[brightness = 0.299r + 0.587g + 0.11b\]

The reason each channel isn’t represented in equal proportion is that the human eye doesn’t see each color as equally bright.

However, our darkened grayscale image still loses out to the grayscale average color:

In fact, the distance has widened in the average grayscale image’s favor, beating the darkened grayscale image by 13%.

### Compression

In a similar vein, we could try compressing the image. In this case we will transform the image into an 8×8 square of pixels. Each pixel in the 8×8 image is the average color of the pixels it subsumed.

Unsurprisingly, this also fails to rank the darkened image as closer to the original.

### Hamming Distance and Image Hashes

Even though the two approaches above fail to properly consider the form of the image there is a way to combine them to produce an approach that consistently ranks the darkened image as closer to the original than the average color image.

#### Image Hash Algorithm

A hash is a function that takes a higher dimensional input and maps it to a lower dimensional output. In this case, our dimensions are the individual color channels of the image. The original image is 320 pixels wide by 240 pixels tall, and has 3 color channels per pixel, and 256 possible values per color channel. This means we start with a input feature space of \(256^{3^{(320\times240)}}\). This is a mind bogglingly large number. To make it more manageable we are going to map all potential images down to only \(2^{64}\) possible hashes using the algorithm below:

- Compress each image into an 8×8 pixel image.
- Convert each 8×8 image into grayscale.
- Calculate the average brightness of the image.
- For each pixel in the image, if the pixel is darker than average set it to white, if it is brighter than average, set it to black.

The 64 pixels of the third images can be thought of as a string of 1’s and 0’s where each black pixel represents a 1 and each white pixel represents a 0. Going left-right, top-down we can construct a string that represents each image. The original image’s hash has been converted below as an example.

\[100000010010000000001111000111001111111000101110000101000000000\]

Then we can compare each string of 1’s and 0’s to the original, and count each the number of times they differ. The number of places two strings differ is known as the **Hamming Distance**.

The Hamming Distance between the original and darkened image is **3 **while the Hamming Distance between the original and average color is **24** or **40**depending if you say pixels equal to the average are black or white. In either case, it’s clear that this approach suggests that the darkened image is closer to the original than a flat brown.

##### Author’s Note

I picked 8×8 pixels because it 64 bits fits nicely in my blog’s column width, and 64 bits fits nicely into most computer architecture’s word size. If you need a more fine grained approach you can always bump up the compressed images’ resolution. For instance, here is a 1,024 bit hash (32×32) representation of our red panda friend.

#### Discrete Cosine Transformation

I have also considered another approach, similar to the one above, but instead of using the mean brightness, I would compute the Discrete Cosine Transformation. DCT translates an image into a matrix of coefficients for a series of horizontal and vertical frequencies, that when summed, reproduce an approximation of the image. Similar images will have similar coefficients in their DCT matrix and so we can compute a distance between them to tell how similar they are. You can compute the DCT matrix using the algorithm below:

**DCT Image Hash Algorithm**

- Compress each image into an 32×32 pixel image.
- Convert each 32×32 image into grayscale.
- Compute the Discrete Cosine Transform for each 32×32 image. Round very small coefficients to 0.
- Save the first 64 coefficients (top-left 8×8 sub-matrix) of each image. These represent the lowest frequencies and are the most important to the image’s structure.
- Calculate the average coefficient of the 8×8 matrix, but don’t include the first term as it contributes no detail to the image.
- Then compare each coefficient in the DCT Matrix against the mean coefficient value and assign it a 1 if it was above the mean and a 0 if it was below.

Again you can see that this approach fairs quite well. The darkened image’s hash appears quite similar to the original’s, and the average color’s hash appears quite different. The darkened panda’s image has a Hamming Distance of 6 from the original, and the average color’s Hamming distance is 32. I will be comparing both approaches over multiple images during real trials to get a handle on which one is most effective.

##### Potential Issue

There is still room to improve this fitness function. For instance, it scores this image very poorly, despite it looking a great deal closer to the red panda image than the average color image.

This is not helped by taking the image’s DCT hash. Take a look at the image hashes produced below (32×32).

Although they appear to be very similar, their bits are inverse! This gives the max distance image a Hamming Distance of **59** which is very large despite their similarities in form. To counter this corner case, take the minimum of Hamming Distance and (64 – Hamming Distance). The idea being that if almost all your bits are different, then your arrangement must be very similar.

**Computing DCT**

I glossed over step **#3** above. Given its eponymous nature, is a rather important step so here is how you compute the Discrete Cosine Transform:

\[G_{u,v}=\frac 14 \alpha(u)\alpha(v) \sum_{x=0}^{32} \sum_{y=0}^{32} cos\left[ \frac {(2x + 1) \times u}{64}\right]+cos\left[ \frac {(2y + 1) \times v}{64}\right]\]

Where \(G\) is the DCT matrix and \(G_{u,v}\) is the coefficient at location \((u,v)\) in \(G\). \(\alpha()\) is a function described thusly:

\[ \alpha(u) = \left\{ \begin{align} u = 0&{;} \quad \frac{1}{\sqrt{2}} \\ otherwise&{;} \quad 1 \end{align} \right\} \]

## Fitness

There is a fundamental issue with using distance as the sole measure of a representation’s “goodness.” Although all images have a minimum distance of 0, their maximum distance depends on their size and coloring. A distance of 123,456 might be relatively close for a large high contrast image, but relatively far for a small image. We need a way to compare the closeness of representations among images of different sizes and palettes.

Enter the fitness function. The fitness function normalizes the distance to a value between 0 and 1. This is important because the program will be used on images of many different dimensions and colors and we need a way to rate the success of the output that is consistent among all of them.

Here’s my current fitness function:

\[fitness = \frac{1}{2}\left(\frac{Max\,Distance – Distance}{Max\,Distance}\right) + \frac{1}{2}\left(\frac{\vert 32- Hamming\,Distance\vert}{32}\right)\]

##### Author’s Note

Revisiting this several years later – I’m pretty sure there are better ways to normalize a function than a simple average, like adapting a Logistic function.

**Max distance** represents the furthest image (according to the mean squared error color distance formula) possible from the original. For our red panda friend, that’s this frightful image below.

The distance is almost **10x** greater than that of the average color or darkened images. To create this image, every color channel was set to the furthest possible value from the source. So therefore every color channel was set to either 0 or 255, making this image essentially 8 color.

The left term divides the difference between the representation and the worst image. If the distance is 0, then the **max distance** term divides itself giving 1, which is the highest possible fitness.

The right term represents the Hamming Distance between the two image hashes, whether using the mean brightness method, or the discrete cosine transformation method. If the hashes are identical the highest possible fitness is assigned.

Both terms are multiplied by one-half to give them equal weight, although proper weighting would need to be determined by experimentation.

### Good Enough

So now that we have a way to judge the goodness, or **fitness** of an image’s representation, how do we know when the fitness is **good enough?**

Honestly it’s a subjective/arbitrary threshold. For this project, I will qualify ‘good enough’, i.e. the minimum acceptable fitness, as **0.985** or **98.5% **similar to the original image.

If you remember the red pandas from the previous blog post you may have noticed a small black box with white numbers in the lower left hand corner. That number was the representation’s fitness score. The third red panda, that used 500 triangles had a fitness of just over **98.5%** using only mean squared error distance function as a measure of fitness. I feel that that representation is sufficiently detailed that you would recognize it instantly as the original image’s representation even among other similar images.

That’s all for now! I haven’t worked on the Triangles project in years, but finding this post has reinvigorated my interest. Look for more Triangle related posts in the future.