13 December, 2007

How to create a tag cloud? (With formula and sample calculation)

I googled on how to create a tag cloud. I found some, but, I didn't like their way of doing it because I think they did it the improper way. That's why I wrote this blog so that it's my turn to post something educational.

But before anything else, what is a cloud tag? Let me define it in my own words. Visually, it is a group of terms displayed with varying font sizes that are packed together so that it resembles a cumulus cloud. It is usually arranged alphabetically and center-aligned. Some tag clouds also have varying colors. In HTML, each tag is usually a hyper link. Conceptually, each tag isn't just a mere term; a tag in a cloud tag is a representation of an idea, a concept, or something that can be weighted; so, a bigger tag means a greater value or interest. (For example, the flicker tag cloud: http://www.flickr.com/photos/tags/ )

Now the question is how. How are the sizes of tags made vary? Simple. In HTML, just use the CSS font-size attribute.

Example:
<_a href="http://www.blogger.com/mylink"> tag item <_/a>

Look at the example above. If that looks strange to you, then stop reading right now and go away because you're not my target reader.

If you're still reading, then you know that that's an HTML tag for a link.

To have a tag cloud, you need many tags but with varying font sizes among them. That's easy, isn't it? But the hard part is generating those tags dynamically and computing the right size for the right tag.

What you need is a database of tags. Then query your database so that you have with you the list of tags and their number of occurrences. See the following table for example.

tags | occurrences
----------------------------------------------------------
birthday | 144
christmas | 108
valentines | 211
thanksgiving | 168
liberation | 88
halo ween | 114
new year | 140

The above table is our sample data. Each tag represents your customers favorite holiday. How can you present the tags as a cloud tag being the valentines day as the biggest (with 50px font-size) and the liberation day as the smallest (with 12px font-size)?

We will use the following variables, namely:
a = the smallest count (or occurrence).
b = the count of the tag being computed.
c = the largest count.
w = the smallest font-size.
x = the font-size for the tag. It is the unknown.
y = the largest font-size.


Now let's substitute the given values to their respective variables. Assuming that we are solving for the "thanksgiving" font-size.
a = 88
b = 168
c = 211
w = 12
x = ?
y = 50

And here's the formula:

x = (b-a) (y-w)
----------- + w
(c-a)

Or to put it in one liner (using c-like syntax):

x = ( ((b-a) * (y-w)) / (c-a) ) + w

And that's it. That's the formula. You might be wondering where I get that formula. Well, it's hard to explain here in words but let me still try. Using the "ratio and proportion" in Mathematics, the ratio of the distance between a and b and the distance between a and c is equated with the distance between w and x and the distance between w and y.

Or to make it simple,

b-a x-w
----- = -----
c-a y-w

Let's now continue computing the font-size for the thanksgiving. By substituting the values to the equation above, we will have...

x = ( ((168-88) * (50-12)) / (211-88) ) + 12
x = 36.715446
x = 37

The thanksgiving tag should have 37px font-size in the tag cloud. Try computing for the rest of the tag. You will get:
birthday = 29px
christmas = 18px
valentines = 50px
thanksgiving = 37px
liberation = 12px
halo ween = 20px
new year = 28px

--End

tip: When using Java, operate on float data type, not integer.