Building upon the idea
of representation, we will
discuss how images are represented in digital form. We'll work up to
it, first starting with how color is represented (which is based on the
physiology of the human eye), then looking at images as rectangular
arrangements of spots of pure color. Finally, we'll calculate the file
size of an image and discuss one way of
*compressing* the file so that it is smaller and therefore faster
to download. This compression is, in fact, a different
*representation* of the information.

In the present day, modern browsers support 140 color names. This
means that we can use color names such
as `black`

, `aqua`

, or `chocolate`

as
values for the CSS properties that except a color value, such
as `color`

, `background-color`

, etc. Many years
ago, browsers could only support 17 color names, known as standard
colors: *aqua, black, blue, fuchsia, gray, green, lime, maroon,
navy, olive, orange, purple, red, silver, teal, white, and
yellow*. However, later that list was expanded with 123 more
colors. W3Schools maintains a complete list of
the 140
recognized color names. While you can achieve a lot by using only
these named colors, very often you want something more specific from
the color spectrum. It turns out that we can use numerical codes to
refer to colors, because inside the computer, colors are represented
by numbers. How? For that, we need to understand additive colors and
color vision.

Human retinas happen to have rod-shaped cells that are sensitive to all light, and cone-shaped cells that come in three kinds: red-sensitive, green-sensitive, and blue-sensitive. Therefore, there are three (additive) primary colors: Red, Green and Blue or RGB. All visible colors are seen by exciting these three types of cells in various degrees. (For more information, consult these Wikipedia articles on additive color and color vision.)

Color monitors and TV sets use RGB to display all their colors, including yellow, chartreuse, you name it. So, every color is some amount of Red, some amount of Green, and some amount of Blue.

On computers, RGB color components are standardly defined on a scale from 0 to 255, which is 8 bits or 1 byte.

Play with the Color slider page to get a feel for this.

Here is a list of examples:

- Cornflower = 100 149 237
- Forest Green = 34 139 34
- Gold = 255 215 0
- DodgerBlue = 30 144 255
- Sienna = 160 82 45
- HotPink = 255 105 180

We can use this knowledge about colors being represented as a mix of red, blue, and green when specifying color values in CSS. There are three ways to do this:

color: rgb(64,224,208); /* three RGB numbers in the range 0-255 */ color: rgb(25%,88%,82%); /* three RGB percentages */ color: #40E0D0; /* three RGB numbers expressed as a hexadecimal triple */

The first two ways are self-explanatory, since they use decimal numbers
and percentage values with which you are familiar. In the following,
we will explain the meaning of the hexadecimal color codes such as
`#40E0D0`

. The `#`

sign is used in this case
to simply indicate that the sequence of digits and letters is in
hexadecimal.

People use decimal (base 10), computers use binary (base 2), but programmers often use hexadecimal (base 16) for convenience.

Binary numerals get long very fast. It is not easy to remember 24
binary digits, but you can more easily remember 6 hexadecimal
digits. Each hexadecimal digit represents exactly four binary digits
(bits). (This is because 2^{4}=16.)

One way to understand hexadecimal is by analogy with decimal, but we're all so familiar with decimal numerals that our reflexes get in the way. (In fact, humans throughout history have used many different numeral systems; decimal is not sacrosanct.) So, we first need to break down decimal notation so that you can see the analogy with hexadecimal. For now, we'll stick with two-digit numerals, but the same ideas extend to any larger numbers.

Decimal notation works by organizing things into groups of ten, then counting the groups and the leftovers: Suppose you had a bunch of sticks on the ground and you bundled them all into groups of 10 with some left over (fewer than 10). Now, use a symbol to denote the number of bundles and another symbol to denote the number of sticks left over. You've just invented two-digit numbers in base 10.

Hexadecimal: Do the same thing with bundles of 16, and you've invented two-digit numbers in base 16. For example, if you had thirty-five sticks , they could be bundled into two groups of sixteen and three left over, so the hexadecimal notation is 23. Careful! That numeral isn't the decimal number twenty-three! It's still thirty-five sticks, but we write it down in hexadecimal as 23.

To distinguish a decimal numeral from a hexadecimal numeral, we use subscripts. So, to say that thirty-five sticks is written 23 in hexadecimal, we can write:

35

_{10}= 23_{16}

Both decimal and hexadecimal notations are based on *place
value*. We say that 23_{16} means 35_{10} because
it's a "2" in the *sixteens* place and "3" in the *ones*
place, just like 35_{10} has a "3" in the *tens* place
and a "5" in the *ones* place.

Let's take another example. Suppose we have 26_{10}
sticks. That's one group of 16 and 10 left over. How do we write
that number in hexadecimal? Is it 110_{16}? That is, a "1"
in the *sixteens* place followed by a "10" in the *ones*
place? No; that would be confusing, since it would look like a
three-digit numeral. We need a symbol that means ten. We can't use
"10," since that's not a single symbol. Instead, we use "A"; that is,
A_{16}=10_{10}. Similarly, "B" means 11, "C" means
12, "D" means 13, "E" means 14, and "F" means 15. We don't need any
more symbols, because we can't have 16 things left over, since that
would make another group of 16. The following table summarizes these
correspondences and what we've done so far.

Decimal | 0 | 1 | ... | 9 | 10 | 11 | 12 | 13 | 14 | 15 | 16 | 17 | 18 | ... | 28 | 29 | 30 | 31 | 32 | 33 | 34 | 35 | 36 |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|

Hexadecimal | 0 | 1 | ... | 9 | A | B | C | D | E | F | 10 | 11 | 12 | ... | 1C | 1D | 1E | 1F | 20 | 21 | 22 | 23 | 24 |

To convert a big decimal number to hexadecimal, just divide. For
example, 230_{10} divided by 16 is 14_{10} with a
remainder of 6_{10}. Thus, the hexadecimal numeral is
E6_{16}. To convert a hexadecimal number to decimal, just
multiply: `E6`

.
_{16}=E*16 + 6 = 14*16 + 6 = 230

Try the following conversions as an in-class exercise. You can use a calculator, you can ask your neighbors, anything you like.

Dec | Hex | Dec | Hex |
---|---|---|---|

7 | 22 | ||

26 | 100 | ||

127 | 149 | ||

240 | 255 |

You can check your work with the following form:

Now that we know both hexadecimal and binary, you can convert binary to
hexadecimal (and vice versa). However, you would probably do so by
converting the binary number to decimal and then the decimal number to
hexadecimal. There's a better way, involving almost no arithmetic (or,
rather, all the arithmetic is with one-digit numbers you can add in your
head). Indeed, this technique is the *reason* that computer
scientists like using hexadecimal. (Well, this and the fun of getting to
spell words like ACE and DEADBEEF with hex digits.)

Let's start with an example. Suppose you need to convert the following from binary to hexadecimal:

01010100 = ??

_{16}

What we're going to do is to take the bits in chunks of four bits, so to mark the chunks we'll insert a period in the middle of the number:

0101.0100 = ??

_{16}

Now, we just convert each chunk directly into hex. The first chunk, 0101, is just the number 5. The second chunk, 0100, is just the number 4. Those are already in hex, so we are done:

0101.0100 = 54

_{16}

(Try doing it via decimal, to check. The decimal value corresponding to both of these is 80+4=84.)

Let's do another one, this time with slightly larger values:

10101100 = ??

_{16}

Again, take the bits in chunks of four bits:

1010.1100 = ??

_{16}

Now, we just convert each chunk directly into hex. The first chunk,
1010, is 8+2 or 10_{10}, which is the digit A in hex. The second
chunk, 1100, is 8+4 or 12_{10}, which is the digit C in hex. So
we are now done:

1010.1100 = AC

_{16}

(Again, check our work by doing it via decimal. The decimal value corresponding to both of these is 160+12=172.)

Why does this work? Suppose we needed to convert 172 from decimal to
hex: our first step would be to divide the number by 16. In binary,
moving the binary point

to the left by one place is equivalent to
dividing by two, so moving the binary point four places is equivalent to
dividing by 16. So when we put a period in the middle of the 8-bit
binary number, it is exactly the same as dividing by 16. We then have
the quotient to the left of the binary point, and the remainder to the
right of the binary point. Just convert each to hex, and we are done.

Notice that the only arithmetic we have to do is converting each chunk of four bits to the equivalent hex digit. The mental arithmetic involved is limited: we know that (1) we are adding one-digit numbers, (2) at most four of them, and (3) the sum will always be less than 16.

Watch the rest of Prof. Kurmas from Grand Valley State University on binary numbers and hex numbers. This is a version he edited for us. You watched the first 5 minutes for last time; watch the rest for today.

Even better, here's a video
with Tom Lehrer
singing New Math

. It's about 4 minutes long; you'll enjoy it.

We already know that every color in a computer is a combination
of some amount of each of the three primary colors: red, green and blue.
The amounts are *always* given in the same order: red, green,
blue. The amounts are numbers from 0 to 255_{10}, or, in
hexadecimal, 00 to FF_{16}. Each primary is expressed as a
two-digit numeral in hexadecimal, using a leading zero if necessary so
that the numeral is always two digits. Three pairs of hexadecimal
digits completely specifies a color. Finally, the notation for a
color always starts with a pound sign (#). For example, a color like
(35, 230, 10) would be written #23E60A.

Experiment with defining a color numerically. In the form below, enter a color value in the syntax #RRGGBB and press return/enter. The box will change its background color to display the entered color value.

Now that we know how to represent a color, we can represent
*images*. You can think of an image as a rectangular 2D grid
of spots of pure color, each represented as RRGGBB. A spot of pure
color is called a **pixel**, short for
picture element, the atom of a picture. Pixels are better seen if you
blow up an image several times; here are some examples. Click on the
picture to enlarge it.

Every image on the computer monitor is represented with pixels. The
images on a web page are saved in files that, in addition to the image
data, contain information on the size of the image, the set of colors
used, the origin of the image, etc. Depending on how exactly this
information is saved, we refer to them as **image
formats**. GIF, JPEG, PNG, and BMP are some of the well-known
image formats. We will talk more about image formats below. For now,
we will focus on the number of pixels and the representation of each
pixel, and consequently, the file size of the image.

We said above that the amount of each primary color is a number
from 0 to 255_{10} or 00 to FF_{16}. It is no
coincidence that this is exactly one byte (8 bits). A byte is a
convenient chunk of computer memory, so one byte was devoted to
representing the amount of a single primary color. Thus, it takes 3
bytes (24 bits) to represent a single spot of pure color.

With 256 values for each primary, we have 256 x 256 x 256 = 16,777,216 colors. Humans can distinguish over 10 million colors, so 24-bit color is sufficient to represent more colors than humans can distinguish. All modern monitors use this so-called 24-bit color. Some old monitors used 16-bit or 8-bit color, which were relatively impoverished, being only able to represent 65,536 colors (for a 16-bit monitor) or 256 colors (for an 8-bit monitor). Of course, a black-and-white monitor can only represent two colors, which could be called 1-bit color. An example is the Scottish terrier picture, above.

In an uncompressed file format, every pixel needs 24 bits (3 bytes) to be stored. Let's suppose you are going to take pictures of all your 30 class peers for a class website, using your iPhone4 camera. According to the phone specifications, its screen has 2592 x 1936 pixels, which amounts to about 5 million pixels, or 5MP (mega-pixels). Thus, if every pixel takes 3 bytes, and a photo with your camera has 5MP, to store the image you need 15MB (mega bytes). For all your peer photos, you will need 30 x 15MB = 450 MB.

Imagine now that you put all these photos online on your website, in one single page (using the attributes width and height to make them fit in one screen), and then you send the link to this page to your parents. They might have an average Internet connection (e.g. Verizon offers offers a 1-3 Mbs (mega bit per second) to non-FiOS subscribers).

The amount of time that it will take to load a page with all these pictures on your parent's computer can be calculated as below:

content size (450MB) x 8 bits/byte / 1Mbs = 3600 seconds or 1 hour.

If each of your photos have been only around 100KB instead (as we require in some of your homework assignments), then the amount of time to load all of them on the page would have been 24 seconds.

So, how do we get our images to be so small in size? There are two
ways: *resizing* (decrease the number of pixels in the image by
judicious cropping), and *compressing* (decreasing the necessary
number of bits per pixel). We will discuss compression in the next
section.

Short of making our images smaller (fewer pixels), what can we do
to speed up the downloads? We can *compress* the files.

There are two classes of compression techniques:

- lossless compression, where clever encoding allows the number of bytes to be reduced but where the original image can be perfectly reconstructed from the compressed form, and
- lossy compression, where we discard less-important information in order to reduce the amount of information to be stored or transmitted.

We will look in detail at one kind of lossless compression, which
is indexed color (GIF encoding), because it gives us a window into the
kinds of ideas and techniques that matter in designing
*representations* of information.

The idea behind indexed color is that if a particular color is used many times in an image, we can create a "shorthand" for it. In fact, if we limit the number of colors, each one can be assigned a shorthand. What will be confusing is that the colors are, of course, represented as numerals and so are the shorthands! For example, instead of saying (for the umpteenth time), color #D619E0, we'll just say, for example, color number 5. This will only work, however, if the shorthands really are shorter. They are, and we'll see exactly how much.

One way to think about indexed color is that we are
*creating* a "paint-by-numbers"
picture. We choose:

- the numbered list of colors
- what color (number) each pixel is

- What is the numbered list of colors? There are just two:
index color 0 #FF0000 1 #FFFF00 - We then paint the picture using just two numbers, 0 and 1. A zero means a pixel is red, and a one means the pixel is yellow.
- How many bits does it take to represent this image? Well, there
are 300x500 or 150,000 pixels, but each one is just 1 bit, so it
takes 150,000 bits or 150,000/8 = 18,750 bytes or about 18 kB.
Compare that with the 450 kB (300 x 500 x 3 byte/pixel) of the
uncompressed representation, and you can see
this is
*much*smaller. In fact, it's 1/24^{th}the size, since each pixel takes 1 bit to represent rather than 24. It'll be 24 times faster to download. - What about that table of colors? That's called the
*color palette*, by analogy with an artist's palette. That has to be represented too. Otherwise, the browser would know there were only two colors in the picture, but wouldn't know what colors they are. There are two entries in this palette, each of which is 3 bytes (24 bits), so add at least 6 more bytes to the representation.

You can see the general scheme at work: we create a table of all
the colors used in the picture. The shorthand for a color is simply
its index in the table. We will limit the table so that the
shorthands will be at most 8 bits. Since the shorthands are all
replacing 24-bit color specifications, the shorthand is *at
most* one-third the size. In the example above, the shorthand is
1/24th the size.

Let's continue with the example. What is the file size if the image uses 4 colors, say red, yellow, blue and lime? In that case, the table looks like this:

index | color |
---|---|

00 | #FF0000 |

01 | #FFFF00 |

10 | #0000FF |

11 | #00FF00 |

As you can see, the shorthand is now two bits instead of one. Therefore, the 150,000 pixels require 300,000 bits or 300,000/8=37,500 bytes or about 37.5kB. Obviously, this is about twice the size of the previous example, since each shorthand is now twice as big. Nevertheless, it's still much smaller than the 450 kB uncompressed file.

What about the size of the palette? That's now twice as big, too. Four entries at 3 bytes each adds 12 bytes to the file size, which is a negligible increase to the 37.5 kB.

What's the pattern here? The number of colors in the original image determines the size of the palette, which determines the number of bits in each shorthand, which then determines the size of the file as a whole. The shorthand for a color is simply the binary numeral for the row that the color is in the table. For example, the color red in the last example was in row zero (00 in binary) and the color lime was in row 3 (11 in binary).

You can see that the number of bits required for each pixel is the
key quantity. This quantity is called bits per pixel or "bpp." It's
also often called "bit depth" so that the **file size of an image** is
just `width x height x bit depth`

, almost as if it were a
physically 3D box.

Finally, we can state the rule:

The bit depth of an image must be large enough so that the number of rows in the table is enough for all the colors. If the bit depth is

d, the number of rows in the table is 2^{d}.

Here's the exact relationship, along with the size of a 300x500 image:

bit-depth | max colors | file size of 300x500 image |
---|---|---|

1 | 2 | 18kB |

2 | 4 | 37kB |

3 | 8 | 55kB |

4 | 16 | 73kB |

5 | 32 | 91kB |

6 | 64 | 110kB |

7 | 128 | 128kB |

8 | 256 | 147kB |

Consider an image that is 80 x 100 (pixels).

- How many bytes are needed to represent this image if it's black and white? Don't forget to represent the color table.
- How many bytes if the image uses 4 colors?
- How many bytes if the image uses 16 colors?
- How many bytes if the image uses 17 colors?

In summary, you can reduce your image file size by using fewer colors. Of course, this may reduce the quality of your image. It's a tradeoff.

We've learned how indexed color works and how it affects file size. This is important not only for the theoretical understanding of why representations matter, but also for the practical usefulness of understanding how to reduce the sizes of your images. In this section, we'll review how to compute the approximate size of an indexed-color image. (Indexed color is one of the tricks used in GIF files, though GIF files use other tricks as well.) Why do we do this? Because it combines all the conceptual issues into one small calculation.

A key concept in the computation is the *bit-depth* of the
image. Read on page 19 the definition
of bit-depth. It's the number of bits
necessary to represent the desired number of colors. Remember that the
number of colors is **2 ^{d}**, where

Recall that the indexed-color representation comes in two parts:

- the pixels (represented by their shorthand values, each of size equal to the bit-depth), and
- the palette (in which the full-color definition is given for each shorthand).

Thus, our computation breaks down into two parts.

- The size for just the pixels is essentially the number of pixels,
multiplied by the size of each. We divide by 8 because bit-depth is
in
*bits*but file sizes are expressed in*bytes*. Since a byte equals 8 bits, we divide by 8.width * height * bit_depth / 8

- The size for the palette is the number of colors multiplied
by the 3 bytes needed to represent each color:
num_colors * 3

To find the rough size of an image, we first determine the
bit-depth, then we compute the file size using the two formulas above.
(This is the rough

size because, remember, we are omitting some
fixed overhead and further compression techniques.) You can combine
them into one formula:

(width * height * bit_depth) / 8 + (num_colors * 3) |

Finally, because the file size will usually be large (thousands or millions of byte), we divide by 1000 or 1,000,000 to convert to kilobytes or megabytes, as appropriate.

We will continue to discuss file size calculations in lab and homework assignments.