The Value of Crossover

In my last (real) post I wrote about the effects of elitism on a population, and concluded that a small amount of elites, who cloned themselves, produced the greatest increase in population fitness. Just like in real life.

I also wrote:

I also don’t have the time to run everything multiple times (takes days at the moment).

I am pleased to say that I rewrote the program a weekend ago using the CUDA runtime library, allowing me to use my GPU and speed things up quite a bit. Which is good, because I screwed up running this test 3 times and had to restart it – if I was on the old version I would not have the time for such second chances.

This program is mostly similar to the first, with a few differences:

No Elitism

Using the findings of the previous post, I decided to set this one up using the “Only the Best” method. As a reminder, in this heuristic parents and kids are both evaluated and on the top N are passed onto the next generation. Flexible elitism.

Pixel Fitness

The fitness metric was changed slightly – it’s still RMSE – but at the pixel level instead of color channel level. Conceptually there is almost no difference, what’s important to know is that fitness still increases without bound the more similar the images are, and that the fitness levels in this post cannot be directly compared to the CPU version.

Haploid Chromosomes

In the previous model, the “chromosomes” had two genomes each, a “dominant” one and a “recessive” one. During crossover, the dominant and recessive genomes would crossover. This all happens within a single artists. Babies are made by taking combining the dominant genome from one parent with the recessive genome of the other.

In the haploid model of crossover, the parents combine their genomes at the crossover point to make a new, child genome. i.e. the child might get the first 20% of its genome from parent 1 and the remaining 80% from parent 2.

I also changed genome initialization so that all triangles are set to invisible in the beginning (previously the first triangle was made visible).

No Bit Level Crossover

I didn’t have time to implement it and I don’t think it makes a difference. Bit level crossover is basically byte level crossover with an extra point mutation.

No Location Simulation

In this version genomes are free to mate with anyone other genome (in stochastic proportion to their fitness).

Randomness Fixed

The random seed actually works in this version – I can guarantee that all runs started with the same set of genomes and all runs are repeatable.

Effort

Another difference is that you no longer specify the number of generations you want to run, but instead the effort you want the algorithm to put into searching for the best image.

$$effort = \text{population size} + \left(\text{number of children} \times \text{number of generations}\right)$$

This allows me to compare settings with different numbers of population and children on even footing.

Methodology

This time I ran the program on three different types of images: a picture, a drawing and a drawing with a transparent background. I wanted to see if a different type of image would affect the results.

From left to right: A picture of a robotic girl, a picture of a red panda, a transparent of Lina from DotA 2I ran the program with 100,000 effort, 100 population that produces 100 children each round with 100 triangles in their genomes and adjusted their crossover chance between [0,1] in 0.1 increments. I ran the program 10 times per image per increment to produce the average I used for the graph below:



The error bars represent the 95% confidence interval.

From this light testing it seems that crossover does indeed improve fitness. This is a relief because despite there being theory on why it should improve fitness, I have not personally discerned a difference. This is probably because the difference is only dramatic between 0 and 0.25. I seems that it’s more important to have some crossover than how worry about how much crossover.

I’ll be using a 100% crossover chance going forward and I’ll try to tease out the difference between crossover at the byte level and crossover at the triangle level.

 

Triangles Revamped

Last weekend I finished v1.0.0 of my new triangles program.

It was depressing to see that there were 3 other versions collecting dust on my neglected GitHub so I deleted them. I choose to forget. I live the noble lie that existence is worth the cost.

What Does It Do?

It takes an image as input, and tries to draw it using a fixed number of triangles to the best of its abilities. It uses a genetic algorithm to arrange the size, shape and color of the triangles. It also takes in a bunch of different parameters, but we’ll get into those later.

At least I tried

How Well Does It Work?

Not that well, which is supremely disappointing. I have two theories on why:

  1. Pure Genetic Algorithms may not work well for this use case
  2. The hyper-parameters might need tuning (which is step II)

I’ll be trying to optimize the hyper parameters as step two of this program, and I’ll try a pseudo-GA approach to see if the output can be improved; I’m certain that it can.

How Does It Work?

The way genetic algorithms work is that they encode the solution space as a string of bits, and manipulates these bits to search through the solution space to find a satisficing solution. In other words, genetic algorithms are a nature-inspired version of parallel hill climbing/gradient descent. To get started, we have to know how to encode our solution space as a string of bits, and how to manipulate this string of bits to improve our solution.

Encoding the Solution Space

The end goal is to have an image drawn with a finite number of translucent triangles. I choose to represent this a background color, followed by a list of triangles. This would then be drawn on a “blank” image of the same size. Encoding it this way reduces the number of bits needed to represent the image, and thus the solution space we need to search. Another thing important to the success of genetic algorithms is that the bit locations themselves have meaning. Since the triangles will be drawn in order, and order matters when you have transparency, a list is ideal way to provide this bit-location semantics.

Basic representation of triangle image genome

Background Color

The first 32 bits of my image encode the RGBA value of the image background color. Each byte (8 bits) represents one color value, and one translucency value, giving us 8 bit color depth, or ~16.7 million colors to choose from (not including transparency). Unfortunately, this also means finding the right background color has ~4 billion possible solutions.

Background Color Genome Representation

For those not in the know, “alpha” is how the translucency parameter is typically referred to. Adding an alpha channel means that this program can handle transparent images in addition to images with a defined background.

The program will read this value first, transform it into a color and set every pixel in the image to that color. For example, if the bit string looked like this:

\[11000100011000010110111111111111\]

We’d break it up into the component RGBA bytes:

\begin{align} r &= [11000100] \\ &= 196 \\\\ b &= [01100001] \\ &=  97 \\\\ g &= [01101111] \\ &= 111 \\\\ a &= [11111111] \\ &= 255 \end{align}

The rgb value is (196,97,111) which renders a nice red color. Opacity is determined as a value between 0 and 1, with 0 being invisible and 1 being completely opaque. The alpha value can be any number [0,255]. To derive opacity we simply divide by the maximum number. In this case:

$$255 \div 255 = 1$$

So the background is totally opaque. If our source image was 500×500 pixels, this bit string would produce an image that looks like this:

Red RGBA example image

This RGBA encoding is also how each triangle’s color will be represented.

Triangles

Triangles are encoded as a set of set of 3, (x,y) points, an RGBA color and a bit that encodes whether or not the triangle is active or not. Each triangle can be encoded in 11 bytes, or 88 bits.

Triangle bit encoding

Points

As you may know, triangles have three points, the question is, how to render them onto an image? Images can be thought of as a grid of pixels. Using discrete Cartesian coordinates, we can address pixels on this 2 dimensional grid.

In this program, the origin is at the upper-left corner of an image, with the x-axis indicating width offset and the y-axis the height offset. This means that the upper-left pixel can be addressed by the point (0,0). If the image was 100 pixels tall, and 100 pixels wide; the bottom-right pixel could be addressed by the point (99,99).

Here is some pixel art I made that is 16×16 pixels (scaled up to 128×128):

Amy 16x16 Pixel Art

The point (0,0) is purple. The point (0, 2) is gray. The point (1,4) is yellow. You get the idea.

Each point is represented by an (x,y) value:

Point Encoding

This is actually a lie. I tried to directly encode the points using ten bits, allowing me to directly address pixels in images up to 1024×1024 in size. I later switched to encoding the width/height percentage, where Why?

  1. If I use images smaller than that the bits are wasted (while still expanding the search space).
  2. If I uses images smaller than 1024×1024 the addressing of pixels would no longer be distributed evenly, distorting the algorithm.
  3. I cannot use the algorithm on images of any size.
  4. Making the numbers 8 bit gives nice byte alignment (without limiting me to 256×256 images, which are quite small).

So instead the (x, y) point indicates the percent width and height of the image. Since each point is 8 bits, they represent 1/255th of the image. For example, if the (x,y) values were (123, 16) and the source image was 512×512 pixels:

\begin{align} x &= [01111011] \\ &= 123 \\\\ 4 &= [00010000] \\ &= 16 \\\\  pixel_{x} &= (123 \div 256) \times 512 \\ &= (~0.48) \times 512  \\ &= 246 \\\\  pixel_{y} &= (16 \div 256) \times 512 \\ &= (0.0625) \times 512  \\ &= 32 \\\\ pixel_{x,y} &= (246,32) \end{align}

Using these 3 points, we can define which pixels are inside of the triangle, and therefore color them appropriately to “draw” them.

Color

Color is done much the same as the background color, with one major difference.

Triangle color encoding

Like before, triangles have an 8-bit RGB color profile. Unlike before, they only have 7 bits to determine their opacity. I did this so I could sneak in bool value, while maintaining byte alignment. I don’t think the triangles suffer from only having 128 opacity states versus 256.

The “visible” bit determines if the triangle is drawn at all. Although this could be achieved by setting all the opacity bits to 0 – we will see why it’s necessary below.

Putting it all together

Here’s what a bit string – or “genome” – of a red image with a centered, 50% transparent, light blue triangle would look like:

\[11000100011000010110111111111111010000001100000010000000\\01000000110000001100000000110000001100001111111101111111\]

Again, notice how the dimensions aren’t specified. They depend on the source image. The encoding is dimension agnostic – which is a nice feature.

The size of a genome, in bytes: \(4 + ( \text{number of triangles} \times 11)\)

Multiply by 8 to get the number of bits. This means a drawing with 2 triangles has \(2^{208}\) possible configurations.

Manipulating the Bits

So now we know how to encode solutions are bit strings, which I will affectionately name “genomes.” Given that even simple genomes can represent an incredibly large search space, we need an intelligent way to manipulate the strings, and narrow the search space. Unfortunately, this is not an AI post, this is a genetic algorithms post – we only have the power of natural selection. Natural selection is not very smart. It can be effective at find solutions but only if you harness the power of the only thing it cares about – reproductive fitness. Elizier Yudkowsky puts it nicely:

within each species, the blind idiot god is purely obsessed with inclusive genetic fitness. No quality is valued, not even survival, except insofar as it increases reproductive fitness. There’s no point in an organism with steel skin if it ends up having 1% less reproductive capacity.

from his blog post: Thou Art Godshatter

 

This means that the most important part of the algorithm, besides the encoding, is the fitness function. We’ll get there eventually, but first we need to know the life  of an Artist.

Artists

To keep the nature analogy going, artists are the creatures produced by the genome. They are very similar – since an artist is defined by its genome, but they are not the same. An artist is an extension of the genome in three ways.

  1. Artists are finite. They, not their genomes, are the ones that live and die. They are the ones that reproduce. The same genome may appear in different artists, multiple artists, in different places and different times. Artists live once.
  2. Artists live in a place. They have a location within their environment.
  3. Artists have two copies of their genome, a dominant and recessive strand.

When an artist is created, it is given two, equal length, random bit strings. One we’ll label “dominant” and the other “recessive.” These random strings are modified so that the “visible” bit of each triangle, except the first, is set to 0.

Population & Location

The first thing we do is create a population of artists. As stated above, these artists start with totally random bit-strings, or genotypes.

These artists are distributed on a map. This map is really an adjacency matrix, of \(N\) dimensions. This means that each artist is adjacent to \(N \)other artists. i.e. if \(N=4\) it simulates a 2D grid, wraparound grid. Artists are only allowed to mate with artists that are adjacent to them.

Crossover

Each artist has a chance to crossover \((~70%)\). A random point in the genome is picked, and the dominant and recessive genomes swap their bits after that point.

Genome Crossover

This crossover can be set to happen at the bit, byte or triangle alignment. i.e. if set to triangle alignment, triangles are never split, and always swapped intact.

Why?

Good question. The reason seems to be “because nature does it.” To be fair, that’s why anybody does genetic algorithms in the first place, but there is another reason why crossover might be valuable. Since GAs operate on bit strings, it stands to reason that there would be valuable subsequences of bits. Crossover allows these sub sequences to be recombined, and swapped around.

Mutation

Each artist has a chance \((~0.05%)\) to mutate at each bit. If a mutation occurs, the bit is flipped. i.e. if it is a 1 it becomes a 0, and if it is a 0 becomes a 1.

Why?

Mutation is important because it prevents the algorithm from being stuck. If the initial population of artists did not have the sequence present, then, without mutation, that sequence can never be created. Just like in nature, mutation is usually harmful, but seemingly necessary for advancement.

Fitness

The most important part of a genetic algorithm is how “fitness” is measured and reproduction is handled. This makes sense, natural selection is purely concerned with “reproductive fitness” which has both of these words in it.

Fitness is a measure of how close a solution is to ideal. This means you have to know what a good, or perfect solution, looks like before you start. This is tricky, because if you knew what a good solution was, you wouldn’t need to find one. In this case we need another function, one that takes the bit string and maps it back to the original domain. This domain lends itself to evaluation in a way the genome doesn’t.

Expressing Yourself

In biology, a genotype is the DNA, the sequence of ribonucleic acids that make up a gene. The phenotype is the expression of the DNA, or gene. In this algorithm, the bit strings, or the list of triangles is like the genotype. If we take this information and convert it into an image, that is like the phenotype.

In this case, the background color for the image is determined by the RGBA background color value encoded in the dominant genome. Then, each triangle is examined in order from both the dominant and recessive genomes. If the triangle from the dominant genome is “visible” (the visible bit is set to 1) then it is drawn. If the dominant triangle is not visible and the recessive triangle is visible then the recessive triangle is drawn. If neither are visible, no triangle is drawn. Repeat until we run out of triangles. This gives us our artist’s image.

Calculating Image Distance

We can compare this image to the source image to see how close it is. The closer it is to the source image, the better. In this program I settled on using the per-pixel root mean squared error for my fitness function. The larger the error, the more different the images, and the lower the fitness. If you’re interested in learning more, I have a previous post you might want to check out.

I also tried adding a custom image hash and an image hash based off of a discrete cosine transform – but neither seemed to improve the fitness function so I dropped them as they are expensive to calculate.

Reproduction

Once all the artists have been assigned a fitness, it is time for them to reproduce. Again, to harness natural selections powers we must allow more fit artists to reproduce with greater frequency. The question is, how much more often?

If we allow high fitness artists to reproduce too prolifically, we may converge too quickly as they take over the gene pool in rapid succession. If we don’t allow them any opportunity to reproduce more often then the algorithm will make progress only by blind luck.

The method I chose was a sigma-scaling. The standard deviation of fitness is calculated, and artists are allowed to reproduce in proportion to their z-factor. This has two benefits: when the variance is very high (in the beginning) a lucky artist won’t prematurely dominate the gene pool. In later generations, when the program has converged, the variance will be low, but there will still be artists that are several standard deviations from the mean and so selective pressure will remain.

$$ExpectedReproduction(a, t)= \left\{ 1 + \frac{f(a)-\bar f(t)}{2\sigma(t)} \right\}$$

Where \(a\) is the artist, \(f(a)\) is the fitness of the artist, \(\bar f(t)\) is the average fitness of the population at time t, \(\sigma (t)\) is the standard deviation of the fitness of the population at time t. If \(\sigma (t) =0\) then ExpectedReproduction is 1. If ExpectedReproduction is less than 0.1 it is set to 0.1 out of kindness.

Using a stochastic process artists are selected to mate such that they will reproduce at least \( \lfloor ExpRepr(i,t) \rfloor \) times and at most \( \lceil ExpRepr(i,t)\rceil\) times.

Mating

Artists are only allowed to mate with artists adjacent to them, and these are chosen at random. Their offspring inherit the dominant genome from the mating parent, and the recessive genome from the selected mate.

There is also an elitism parameter that can be set. If set, the top \(X\) members of each generation are cloned into the next generation, bypassing mating. This can help the program converge faster.

Finally

All the offspring of the current generation are distributed onto the map, roughly where their parents were; The current generation are deleted and the process starts over.

What’s Next?

I’ve been running the program on images and I’m not super impressed with the results. I plan on tinkering with it to see if there is any obvious improvements that I can make (that are still in the spirit of genetic algorithms). Most important is to do some actual research on which hyper parameters perform best. I’ve never machined learned parameters before so wish me luck!

  1. Test which hyper-parameters are the best
    1. This is the science part where I run the program many times to determine which hyper parameters produce the best looking image
  2. Switch to OpenCV so I can use CUDA to compare images and rasterize them
    1. This should make the code much faster – probably requires rewriting nearly all of it though.
  3. Hook it up to a webcam
    1. Have the “environment” change each frame so that the “artists” are constantly evolving.

Genetic Algorithms

I’m re-re-re-re-starting trying to build a program that uses a genetic algorithm to draw images using triangles. Like my blog that I keep “restarting.” Like everything in my pathetic life. This sushi is like your life, many things started and nothing finishedLast week I read An Introduction to Genetic Algorithms by Melanie Mitchell as a primer. It’s a good book that talks about the history, various technics and mathematics behind understanding GAs. That said, I could never shake the feeling that the central theme of the book was “They’re an interesting way to program things, but we’re not sure what they’re good for and are most likely just a bad form a hill climbing.” I wasn’t too bothered by this, because my goal with this program is to learn more about GAs and not build scalable, efficient machine learning algorithms.

I had an old version of this program that made pretty good images (see the panda on this earlier post, or this one). I then remade a shitty version of the program, like 2 years ago, that I managed to get working again as well. It takes longer and produces worse results, more proof that I am declining in my old age.

Crystal maiden image compared to the same image rendered only in triangles

I sketched out a rough plan of my revised algorithm over the weekend, and now it’s finally time to implement. Too bad it’s been ages since I’ve written any code let alone ceepeepee. Wish me luck.

Going the Distance

Holy shit – I thought it was lost forever. While Googling my dead name out of morbid curiosity I found a copy of one of my old Triangles™ related blag posts on Medium. The images I originally made for the project are on a white background which doesn’t really jive with my hip new dark themed blag. Please forgive me for any eyestrain encountered.


Going the Distance

The goal of this project is to be able to create *good* representation of an image using only N triangles.

What is Good?

Defining what good and good enough means is important.

If we don’t know what a good image is:

We can’t tell the difference between a bad representation and a good representation. We wouldn’t be able to search in the correct direction, or search at all.

If we don’t know what good enough is:

We have to enumerate every possible representation and take the best one. While possible, this is intractable for almost all image sizes and number of triangles.

Good is also relative to the context of the goal. After all, a list of triangles that is a good representation of an image of a fruit bowl is not a good representation of an image of the Eiffel Tower.

So the first step is to convert our list of triangles into an image through a process called rasterization. How I do this in my project will be the subject of a future blog post. Now we have two images, the original and an image created from our representation of it.

Distance

Before we talk about the ‘goodness‘ of an image, lets talk about the distance between two images. A good distance function should take two images as inputs, and output a number such that the less similar the images, the higher the number. If the images are identical, the distance should be 0 as there is no way to make them ‘closer’.

An image can be thought of as a 2-dimensional grid, with each ‘square’ on the grid representing a pixel. Each pixel can be thought of as 3 colors, Red, Green and Blue that are mixed in different proportions to create the appearance of a single color to your brain.

For example, here is a simple 3 pixel by 3 pixel image of a green and red checkerboard pattern.

3×3 Red-Green Image

There are many different ways to represent the color of pixel, however in this project I will be using RGB (Red, Green, Blue). Each color channel can take on a value from 0 to 255 for a total of \(256³ = 16,777,216\) possible colors per pixel.

Simple Distance Function

A simple way to compute the distance between two images is to compute the sum of the distances between each pixel. Define pixel distance as the sum of the absolute value of the differences of two pixels’ color channels.

\begin{align} imageDistance(img1, img2) & = \sum_{i=0}^\text{pixels} pixelDistance(img1[i], img2[i]) \\\\ pixelDistance(p1, p2) & = \sum_{color}^\text{r,g,b} |p1.color – p2.color|\end{align}

Using these formulas we can calculate the distance between these two images.

To start we compare each corresponding pixel of the images. We’ll go left-right, top-bottom and start with the upper left pixel.

 

The distance between the first pixel of img1.png and img2.png:

\begin{align} pixelDistance(img1[1], img2[1]) & = | 214 – 182 | + | 84 – 42 | + | 84 – 88| \\\\ & = 32 + 42 + 4 \\\\ & = 78 \end{align}

The distance between the second pixel of img1.png and img2.png:

\begin{align} pixelDistance(img1[2], img2[2]) & = | 160 – 160 | + | 204 – 204 | + | 88 – 88| \\\\ & = 0 + 0 + 0 \\\\ & = 0 \end{align}

Since all the other pixels are comparisons between the same colors, the final distance looks like this:

\begin{align} imageDistance(img1, img2) & = (5 \times 32) + (4 \times 0) \\\\ & = 160 \end{align}

Mean Squared Error Distance

The simple distance we used above is a good start but we can make an improvement by switching our pixelDistance function from using the sum of the differences to using the sum of the differences squared.

\[pixelDistance(p1, p2) = \sum_{color}^\text{r,g,b} (p1.color – p2.color)^2 \]

The new distance between the first pixels:

\begin{align} pixelDistance(img1[1], img2[1]) & = (214 – 182)^2 + (84 – 42)^2 + (84 – 88)^2 \\\\ & = 1{,}024 + 1{,}764 + 16 \\\\ & = 2{,}804 \end{align}

The new distance between the second pixels:

\begin{align} pixelDistance(img1[2], img2[2]) & = (160 – 160)^2 + (204 – 204)^2 + (88 – 88)^2 \\\\ & = 0 + 0 + 0 \\\\ & = 0 \end{align}

\begin{align} imageDistance(img1, img2) & = (5 \times 2{,}804) + (4 \times 0) \\\\ & = 14{,}020 \end{align}

Why is this better?

First, it’s important to note that although it appears the distance has increased from 160 to 14,020, that you cannot directly compare distances derived from two different distance functions.

Second, take a look at the color swatches below. You’ll notice that although both swatches on the right differ from the base color by 48, the center swatch distributes it evenly amongst all 3 channels, while the far right swatch has all the difference in the red channel.

The simple pixel distance function would assign both pixels a distance of 48. The mean squared error pixel distance function would assign the center swatch a distance of 768 and the right swatch a distance of 2,304. The mean squared error distance function favors spreading out the difference among all the colors channels. This is preferable because if the difference is spread out amongst all three channels it appears to be the same image, only lightened or darkened. If the distance is all in one channel, the hue shifts and the colors are all off. I would favor a dimmer/brighter version of an image to a hue shifted version of an image so this is an improvement.

Here’s our red panda friend again. In the first image, every pixel had its red channel lowered by 75. In the second, every pixel had all channels lowered by 25. The third picture is the original for comparison.

Red Channel -75
All Channels -25
Original
Original

Colors and Shapes

There is still a weakness with our image distance functions, and that is it only takes into account the distance between colors, and not the distance between the overall form of the image.

To illustrate, the below image is the red panda image where every color channel has been darkened by 35.

Distance: 894,618,661

Where as this plain brown image, where every pixel has been set to the average RGB color of the original, scores about 8% better.

Distance: 821,531,904

I hope you would agree with me when I assert that the first image is more similar to the original than latter. The latter could be almost anything, where as the first just needs to be lightened a bit to be an exact match.

Grayscale

One attempt would be to convert the image to grayscale, removing color data and leaving only brightness.

The 3 color channels can be converted into a single brightness value using the following formula.

\[brightness = 0.299r + 0.587g + 0.11b\]

The reason each channel isn’t represented in equal proportion is that the human eye doesn’t see each color as equally bright.

However, our darkened grayscale image still loses out to the grayscale average color:

Distance: 305,179,915
Distance: 263,832,741

In fact, the distance has widened in the average grayscale image’s favor, beating the darkened grayscale image by 13%.

Compression

In a similar vein, we could try compressing the image. In this case we will transform the image into an 8×8 square of pixels. Each pixel in the 8×8 image is the average color of the pixels it subsumed.

Original — 8×8
Darkened: 783,056
Average: 688,697

Unsurprisingly, this also fails to rank the darkened image as closer to the original.

Hamming Distance and Image Hashes

Even though the two approaches above fail to properly consider the form of the image there is a way to combine them to produce an approach that consistently ranks the darkened image as closer to the original than the average color image.

Image Hash Algorithm

A hash is a function that takes a higher dimensional input and maps it to a lower dimensional output. In this case, our dimensions are the individual color channels of the image. The original image is 320 pixels wide by 240 pixels tall, and has 3 color channels per pixel, and 256 possible values per color channel. This means we start with a input feature space of \(256^{3^{(320\times240)}}\). This is a mind bogglingly large number. To make it more manageable we are going to map all potential images down to only \(2^{64}\) possible hashes using the algorithm below:

  1. Compress each image into an 8×8 pixel image.
  2. Convert each 8×8 image into grayscale.
  3. Calculate the average brightness of the image.
  4. For each pixel in the image, if the pixel is darker than average set it to white, if it is brighter than average, set it to black.
Original
Darkened
Average Color

The 64 pixels of the third images can be thought of as a string of 1’s and 0’s where each black pixel represents a 1 and each white pixel represents a 0. Going left-right, top-down we can construct a string that represents each image. The original image’s hash has been converted below as an example.

\[100000010010000000001111000111001111111000101110000101000000000\]

Then we can compare each string of 1’s and 0’s to the original, and count each the number of times they differ. The number of places two strings differ is known as the Hamming Distance.

The Hamming Distance between the original and darkened image is 3 while the Hamming Distance between the original and average color is 24 or 40depending if you say pixels equal to the average are black or white. In either case, it’s clear that this approach suggests that the darkened image is closer to the original than a flat brown.


Author’s Note

I picked 8×8 pixels because it 64 bits fits nicely in my blog’s column width, and 64 bits fits nicely into most computer architecture’s word size. If you need a more fine grained approach you can always bump up the compressed images’ resolution. For instance, here is a 1,024 bit hash (32×32) representation of our red panda friend.

32×32 Red Panda Hash

Discrete Cosine Transformation

I have also considered another approach, similar to the one above, but instead of using the mean brightness, I would compute the Discrete Cosine Transformation. DCT translates an image into a matrix of coefficients for a series of horizontal and vertical frequencies, that when summed, reproduce an approximation of the image. Similar images will have similar coefficients in their DCT matrix and so we can compute a distance between them to tell how similar they are. You can compute the DCT matrix using the algorithm below:

DCT Image Hash Algorithm
  1. Compress each image into an 32×32 pixel image.
  2. Convert each 32×32 image into grayscale.
  3. Compute the Discrete Cosine Transform for each 32×32 image. Round very small coefficients to 0.
  4. Save the first 64 coefficients (top-left 8×8 sub-matrix) of each image. These represent the lowest frequencies and are the most important to the image’s structure.
  5. Calculate the average coefficient of the 8×8 matrix, but don’t include the first term as it contributes no detail to the image.
  6. Then compare each coefficient in the DCT Matrix against the mean coefficient value and assign it a 1 if it was above the mean and a 0 if it was below.
DCT Image Hashes

Again you can see that this approach fairs quite well. The darkened image’s hash appears quite similar to the original’s, and the average color’s hash appears quite different. The darkened panda’s image has a Hamming Distance of 6 from the original, and the average color’s Hamming distance is 32. I will be comparing both approaches over multiple images during real trials to get a handle on which one is most effective.

Potential Issue

There is still room to improve this fitness function. For instance, it scores this image very poorly, despite it looking a great deal closer to the red panda image than the average color image.

The furthest possible image according the MSE

This is not helped by taking the image’s DCT hash. Take a look at the image hashes produced below (32×32).

Although they appear to be very similar, their bits are inverse! This gives the max distance image a Hamming Distance of 59 which is very large despite their similarities in form. To counter this corner case, take the minimum of Hamming Distance and (64 – Hamming Distance). The idea being that if almost all your bits are different, then your arrangement must be very similar.

Computing DCT

I glossed over step #3 above. Given its eponymous nature, is a rather important step so here is how you compute the Discrete Cosine Transform:

\[G_{u,v}=\frac 14 \alpha(u)\alpha(v) \sum_{x=0}^{32} \sum_{y=0}^{32} cos\left[ \frac {(2x + 1) \times u}{64}\right]+cos\left[ \frac {(2y + 1) \times v}{64}\right]\]

Where \(G\) is the DCT matrix and \(G_{u,v}\) is the coefficient at location \((u,v)\) in \(G\). \(\alpha()\) is a function described thusly:

\[ \alpha(u) = \left\{ \begin{align} u = 0&{;} \quad \frac{1}{\sqrt{2}} \\ otherwise&{;}  \quad 1   \end{align} \right\} \]

Fitness

There is a fundamental issue with using distance as the sole measure of a representation’s “goodness.” Although all images have a minimum distance of 0, their maximum distance depends on their size and coloring. A distance of 123,456 might be relatively close for a large high contrast image, but relatively far for a small image. We need a way to compare the closeness of representations among images of different sizes and palettes.

Enter the fitness function. The fitness function normalizes the distance to a value between 0 and 1. This is important because the program will be used on images of many different dimensions and colors and we need a way to rate the success of the output that is consistent among all of them.

Here’s my current fitness function:

\[fitness = \frac{1}{2}\left(\frac{Max\,Distance – Distance}{Max\,Distance}\right) + \frac{1}{2}\left(\frac{\vert 32- Hamming\,Distance\vert}{32}\right)\]


Author’s Note

Revisiting this several years later – I’m pretty sure there are better ways to normalize a function than a simple average, like adapting a Logistic function.


Max distance represents the furthest image (according to the mean squared error color distance formula) possible from the original. For our red panda friend, that’s this frightful image below.

Distance: 8,122,111,813

The distance is almost 10x greater than that of the average color or darkened images. To create this image, every color channel was set to the furthest possible value from the source. So therefore every color channel was set to either 0 or 255, making this image essentially 8 color.

The left term divides the difference between the representation and the worst image. If the distance is 0, then the max distance term divides itself giving 1, which is the highest possible fitness.

The right term represents the Hamming Distance between the two image hashes, whether using the mean brightness method, or the discrete cosine transformation method. If the hashes are identical the highest possible fitness is assigned.

Both terms are multiplied by one-half to give them equal weight, although proper weighting would need to be determined by experimentation.

Good Enough

So now that we have a way to judge the goodness, or fitness of an image’s representation, how do we know when the fitness is good enough?

Honestly it’s a subjective/arbitrary threshold. For this project, I will qualify ‘good enough’, i.e. the minimum acceptable fitness, as 0.985 or 98.5% similar to the original image.

If you remember the red pandas from the previous blog post you may have noticed a small black box with white numbers in the lower left hand corner. That number was the representation’s fitness score. The third red panda, that used 500 triangles had a fitness of just over 98.5% using only mean squared error distance function as a measure of fitness. I feel that that representation is sufficiently detailed that you would recognize it instantly as the original image’s representation even among other similar images.


That’s all for now! I haven’t worked on the Triangles project in years, but finding this post has reinvigorated my interest. Look for more Triangle related posts in the future.

Triangles: Finally

A long time ago, I built a rudimentary program that would take an image, take a number of triangles, and do it’s best to recreate that image using the triangles. I also created a blog to talk about the program and the algorithmic choices I made while building it.

Today I’m pleased to announce that the project is restarted!
Screen Shot of Generated Triangles

In my infinite wisdom, I took down the blog content from my server, and the code from my GitHub and saved them locally onto my laptop.* While I was moving to San Jose my car was robbed u__u;; and I lost all my work.
My Ford Escape with two windows smashed I did get a half pack of cigarettes, a crack pipe and a rock in exchange; so it wasn’t a complete disaster.

The good news is that as of last weekend, the project is back on. I spent a long night trying to remember C++ well enough to hack together a shadow of what the program was before with the hope that it one day outshine its predecessor.

Check out this example using a red panda who has become the “Lenna” of this project.

Red Panda staring into the camera Same image as above but created with translucent trianglesSo with that in mind I’ll be posting periodic updates on my Twitter and larger updates and lessons learned on this blag.

You can laugh at my pitiful attempt to re-enter coding here.

Update: I used to have a website up where it was drawing stuff but that fell apart too.

*Why did you not have a backup?

1) Because I am not a clever man.
2) Because I was undergoing a name change, and was changing hosts as part of hosting them under a new account.