Please note that this is a draft. It’s probably a little rough around the edges. If it is a bit judgemental about development practices in specific examples, I apologize for that. I think it is important to work iteratively, but also make concrete suggestions. Hopefully this very early draft will do that. Thanks
john
Tagclouds on the web today
Recently, Cameron Adams, who’s building a really interesting, soon to be released webapp associated with our conference Web Directions got in touch to brainstorm how best to markup “tagclouds”. He pointed me at a link to this mailing list post by Russ Weakley, of listtutorial, WSG, WE04 and WE05 and much other fame, suggesting the idea of a tagcloud microformat.
Interestingly, only a few weeks back, when setting up a new BBPress based forum, the issue of how best to style tagclouds cropped up, as in BBPress (and as I soon discovered commonly elsewhere) the markup looks like this
.<a href="..." style="font-size: 20px">
In fairness, in BBPress, you can easily setup what font sizes are used - setting a lower and upper limit in number values, as well as a unit. But it’s still very far from ideal, as this mixes presentation both right into the logic of the app, and into the HTML.
I got into a bit of a discussion with one of the BBPress developers, Michael Adams, suggesting it would perhaps be better for BBPress to use class values in pace of the inline CSS. When Cam and I were chatting, I thought, you know, it’s probably worth considering Russ’s suggestion of a microformat.
Now, microformats.org makes spells out the process by which a new microformat should come into being, so I thought I’d follow that process, and document it in practice in this and subsequent posts.
First up, we need to ask “Why? There must be a problem to be solved. No problem, no microformat
Well, what I think Cam, Russ and I observed is that there is already a solution, a “design pattern” commonly used. Does it actually solve a problem? That’s perhaps a thorny issue with some, but on the whole, the tagcloud’s widespread adoption at large and small sites, in Wordpress plugins, and elsewhere, does suggest it’s a solid pattern.
Next we want to Document Current Behavior. So, I went and took a look at a number of popular tag clouds, and their implementation. One issue which came up with my brainstorm with Cam was, “what exactly do tagclouds represent?” do they represent simply popularity of tags overall, or do different tagclouds communicate different kinds of popularity - for example recently popular tags, historically popular tags, most recent tags, and so on.
So, here are the results of that research.
Site Name: flickr
url
http://flickr.com/photos/tags/
Tag cloud models
Flickr has both their main tagcloud, which models the relative popularity of tags historically, that is, how often a tag has been used in total, as well as smaller “hot tags” tag clouds, which reflect the most popular tags in a shorter period of time, a day and week respectively
Screenshots



Code conventions
The hot tags code looks like this
<table id="Recently">
<tr>
<td>
<p><b>In the last 24 hours</b><br />
<b><a href="/photos/tags/pics2006/">pics2006</a>, </b>
<b><a href="/photos/tags/ubicomp2006/">ubicomp2006</a>, </b>
...
</p>
</td>
<td>
<p><b>Over the last week</b><br />
<b><a href="/photos/tags/itunes7/">itunes7</a>, </b>
<b><a href="/photos/tags/futureofwebappssf06/">futureofwebappssf06</a>, </b>
Flickrs main tagcloud looks like this in HTML
<p id="TagCloud">
<a href="/photos/tags/06/" style="font-size: 12px;">06</a>
<a href="/photos/tags/amsterdam/" style="font-size: 15px;">amsterdam</a>
Notes
- The hot tags are simply cells in a table with the id “recently”
- The tagcloud proper is a p with the id of “tagcloud” - which would allow only a single tagcloud per page
- that the links are simply links - despite this being a list of words, ordered alphabetically
- that the rank or weight or popularity of the tag is visually created using inline style and font-size
- There are at least a dozen different levels of popularity, difficult to determine without tedious work, as these are reflected in font-sizes in pixels.
Site Name: Technorati
url
Tag cloud models
This page shows two tag clouds, called “heat maps” - one for the most popular tags in the last hour, and one for the 100 historically most popular tags.
Screenshots


Code conventions
<ul class="heatmap" id="smallheatmap">
<li><em><em><em><em><em><a href="/tag/Advertising">Advertising</a>
</em></em></em></em></em></li> <li><em><em>
<em><a href="/tag/Blogroll">Blogroll</a></em></em></em></li>
<ul class="heatmap" id="bigheatmap">
<li><em><em><em><a href="/tag/Allgemein">Allgemein</a>
</em></em></em></li>
Notes
- each tagcloud is of class “heatmap”, but are differentiated by the ids “smallheatmap” (hot tags of the last hour) and largeheatmap (historically popular tags)
- tags are an unordered list
- Innovatively, and very cleverly, nested em elements provide the “weight” or “importance” of a tag.
- It’s not easy to determine how many levels of tag weight there are, as I’m too lazy to count nested em elements
Site Name: BBpress
url
This is an example standard BBPress install. The PHP has been edited to make the font-sizing em based, and to constrain the font sizing to an upper and lower limit. This is done in the PHP.
Tag cloud models
As far as I can determine, the tagcloud shows the historically most popular tags, but given its labeling as “Hot tags” it may reflect the recent popularity of the the tags.
Screenshots

Code conventions
<p class="frontpageheatmap">
<a href='http://support.westciv.com/tags.php?tag=bbedit' title='2 topics'
style='font-size: 1.2em;'>BBEdit</a>
<a href='http://support.westciv.com/tags.php?tag=case_sensitive' title='1 topics'
style='font-size: 0.9em;'>case_sensitive</a>
Notes
- the cloud is a paragraph of class “frontpageheatmap”
- tags are just links, not in a list
- the relative rank of a tag is shown by inline style, but also indicated by the title attribute on the link element. e.g. title=”1 topics”, title=”3 topics”
- It’s unclear from the example how many levels of tag weight or rank there are
Site Name: del.icio.us
url
Tag cloud models
The tag cloud shows a list of the most popular tags, presumably over time.
Screenshots

Code conventions
<div class="alphacloud">
<a href="/tag/.net" class="lr s2;">.net</a>
<a href="/tag/advertising" class="lb s1">advertising</a>
<a href="/tag/ajax" class=" s5">ajax</a>
When the tags are chosen to be shown as a list in order of frequency, rather than alphabeticcally, the class name is “freqcloud”
Notes
- There are two kinds of cloud - an alphacloud which orders tags alpahbetically, and a freqcloud, which orders them by popularity
- The links are not in a list
- There is a root element with a class value, variously “alphacloud” and “freqcloud”
- The weight of a tag is given by a class value - s1 though s5, giving 5 levels of weight
Site Name: Zoomclouds
url
http://zoomclouds.com/cloud/ZC_Blom/
Tag cloud models
Like most tag clouds, zoomclouds models the historical popularity of a tag.
Screenshots
Zoomclouds, more than most, utlize CSS to style their clouds.

Code conventions
<div class="zoomclouds">
<span class="tag2"><a href="http://www.zoomclouds.com/tag/ZC_Blom/asian+stocks"
onmouseout="zoomclouds_cs()" style="font-size: 12px;">asian stocks</a></span>
<span class="zoomcloudswg">(39)</span>
<span class="tag1"><a href="http://www.zoomclouds.com/tag/ZC_Blom/analysts"
style="font-size: 9px;">analysts</a></span>
<span class="zoomcloudswg">(23)</span>
- the root element is a div with a class value of “zoomclouds”
- the count of the tag is added as content after the tag name, with a class of zoomcloudswg.
- the weight of the tag is given by a font-size value, but also with a clas value of tagN, where N is an integer of 1-4 (at least in the example cited)
Site Name: Squidoo
url
http://www.squidoo.com/browse/tag_cloud
Tag cloud models
Squidoo has three kinds of tag cloud - all ranking popularity. Hot tags, are those which are popular in the last 24 hours, recent tags are popular in the last week, and all time tags are the most popular tags over the life of squidoo
Screenshots



Code conventions
<div id="tagcloud24Display" class="tagcloud">
<a href="http://www.squidoo.com/tags/advertising" style="font-size:1em">advertising</a>
<a href="http://www.squidoo.com/tags/art" style="font-size:1.8em">art</a>
<div id="tagcloudWeekDisplay" class="tagcloud">
<a href="http://www.squidoo.com/tags/advertising" style="font-size:.8em">advertising</a>
<a href="http://www.squidoo.com/tags/art" style="font-size:2em">art</a>
<div id="tagcloudDisplay" class="tagcloud">
<a href="http://www.squidoo.com/tags/advertising" style="font-size:1.5em">advertising</a>
<a href="http://www.squidoo.com/tags/art" style="font-size:2.2em">art</a>
Notes
- tag clouds all are divs with a class value of tagcloud
- the three are differentiated by id values, “tagcloud24Display”, “tagcloudWeekDisplay”, “tagcloudDisplay”
- weight is indicated by inline CSS “font-size” values
- there appear to be quite a significant number different weights, as indicated by different font-size values
Site Name: Web Connections
url
http://connections.webdirections.org - live from 13 September 2006
Tag cloud models
The tag cloud represents the popularity of tags historically
Screenshots

Code conventions
<div class="hTagcloud">
<ul class="popularity">
<li class="weight1"><a href="/tags/Access+Testing">Access Testing</a></li>
<li class="weight1"><a href="/tags/McFarlane+Prize">McFarlane Prize</a></li>
Notes
- The root element is a div of class “hTagcloud”. Web connections front end developer Cameron Adams and I came up with this as a very early implementation of a potential proposal for a tagcloud microformat, and chose the name by analogy with hCard, etc.
- this contains an unordered list of class “popularity”. This allows for the extension to other class values for different kinds of cloud
- each tag is a list item, with a class value of “weightN” where n is an integer value of 1-5
First pass analysis
From a visual and logical perspective, tagclouds have a reasonably small number of common components, and largely all focus on the same problem. They are typically
- an alphabetically ordered list of links to a tag space - occasionally the order is by popularity.
- the links are usually single words
While it is possible to imagine other ways of representing tags, like most recent, using a tag cloud, in the examples considered all show popularity, albeit over different time scales. Typically the times scales are
- most commonly, all time popularity
- less frequently popularity over the last week and
- popularity ver the last 24 hours
On the ground, things become more complicated.
the root elements
Typically, but not always, there is a root element, with a class or id value.
Root elements include the following elements
- p
- div
- ul
- td
And are given the following class and/or id values
- class=”heatmap” id=”smallheatmap”
- class=”heatmap” id=”bigheatmap”
- id=”TagCloud”
- id=recently
- class=”frontpageheatmap”
- class=”alphacloud”
- class=”freqcloud”
- class=”zoomclouds”
- id=”tagcloud24Display” class=”tagcloud”
- id=”tagcloudWeekDisplay” class=”tagcloud”
- id=”tagcloudDisplay” class=”tagcloud”
- class=”hTagcloud”
Clearly there is some consensus. Cloud is the most common part of the identifying values, with tagcloud in whole or part reasonably common. But, do heatmaps constitute a specific subset of tagclouds? Do we really have two kinds of seemingly similar entities - tagclouds and heatmaps? Or areall tagclouds heatmaps?
A class or id name for the root element of a tagcloud is required. We are proposing hTagcloud, as the general term for these entities is “tagcloud” and by analogy with hCard etc. Class rather than id would seem to make the most sense, as often more than one tagcloud appears on a page.
The tags themselves
Some of the clouds are marked up as lists:
- technorati
- webconnections
Some of them are marked up as links without any other intervening markup
- flickrs tagcloud (but not heat maps!)
- BBPress
- del.icio.us
- squidoo
Zoomclouds wrap the links in a span with a class value.
Marking up “weight”
Probably the trickiest issue is how popularity “weight” is marked up.
Several sites use inline CSS, with font-size values including
- flickr
- BBPress
- squidoo
This is hardly to be considered semantic markup.
Other sites use class values
- del.icio.us
- zoomclouds
- web connections
Probably the most immediately obvious mechanism for marking up tag weights. Class is used in ways similar to this all the time. But we should perhaps be less hasty than this. What exactly is class for? Our old friend the HTML 4.01 spec says of class
The class attribute has several roles in HTML: … For general purpose processing by user agents.
Does giving an tag a class value to represent its popularity constitute using class for “general purpose processing”? The definition is sufficiently vague as to seemingly preclude nothing that would loosely be associated with data processing. But it might be suggested that class is for element identification (the class attribute definition is actually found in the specification subsection titled “Element identifiers”), not for containing actual data, which arguably relative popularity is. Perhaps one way of addressing this is by semantically naming the class values in a relative way, for example “popular”, “v-popular”, “vv-popular”. In this context, tags would “belong to a class” based on popularity. But, it’s not the tag which belongs to a class, rather the element which has the tag as its value which is assigned a class in this way.
You might be able to see why I suggested that the obvious use of class might be a bit hasty.
Two unique examples stand out. In addition to using inline CSS, BBPress also adds a title value relative to weight. The more popular a tag, the higher the value of the title value. While this may seem perverse, it’s at least arguably correct. The HTML 4.01 spec says of title This attribute offers advisory information about the element for which it is set.
As to whether the popularity of a tag constitutes “advisory information” is a matter for discussion.
Technorati uses nested em elements to indicate weight. This is very clever, IMO, but I suspect, and a quick straw poll with some pretty savvy developers suggests, that it is perhaps not overly humanly friendly at least from a publisher’s point of view.
Precisely how best to markup the weight of a tag would seem to me to be the outstanding issue to resolve in developing a tagcloud microformat.
Next steps
Ok, looking over our microformats process checklist: we’ve seen there is a problem to solve, and we’ve done some research into the current ways in which the problem is being solved. What’s next? Microformats.org has this to say about creating a new microformat
There are other things to try before developing a microformat. First, ask yourself these questions:
- Is there a standard element in XHTML that would work?
- Is there a compound of XHTML elements that would work?
- Ok, if the answer to the above two is ‘no,’ we can talk about a microformat.
So
1. Is there a standard XHTML element which would work?
I really don’t think so.
2. Is there a compound of XHTML elements that would work?
I think that we can get a fair way to solving this problem by using a number of standard HTML components. In essence, a tagcloud is just a list of links. It’s ordered usually alphabetically, but in the case of del.icio.us, by frequency as well. Often too, the list is labeled with a heading, and some kind of explanation.
Issues to resolve
A number of issues emerge even before we get to the more thorny ones outlined above.
Should a tagcloud microformat mandate the use of lists, or any type of particular element? Typically, microformats focus on the use of class values, and other attribute values. However, some focus in element types too - for example XOXO.
Should the issue of different types of tagcloud (all time popularity, time scoped popularity) as found in these real world examples be accommodated in a hTagcloud?
The web connections tagcloud markup deals with this issue like this
<div class="hTagcloud">
<ul class="popularity">
<li class="weight1"><a href="/tags/Access+Testing">Access Testing</a></li>
<li class="weight1"><a href="/tags/McFarlane+Prize">McFarlane Prize</a></li>
This would conform with microformats patterns, where the root element has a single identifying class or id value.
Squidoo on the other hand, marks this up by using both an id and class value on its “root” tagcloud element
<div id="tagcloud24Display" class="tagcloud">
So, again, we can find in these real world examples that there is a need to differentiate different kinds of tagcloud.
The web connections approach uses the semantic appropriateness of the list, coupled with a containing root div element to provide a mechanism for doing this conformant with current microformats patterns.
The really significant issue, as outline above is just how to correctly use the mechanisms for HTML to markup the weight of fonts?
Toward a proposal for hTagcloud
OK, I’ve followed the process for considering whether a tagcloud microformat makes sense. i think given the widespread use of this pattern, at some significant sites and in some significant applications, that at least the proposal of this microformat makes sense. Where do we go now?
[A]re there any well established, interoperable implemented standards we can look at which address this problem?
. To the best of my knowledge, no.
Next the proposal procedure asks us to ensure that it isdesigned for humans first and machines second
. Let’s keep that in mind with the following .01 draft proposal.
In conjunction with this, the process asks
- If I looked at this microformat in a browser that didn’t support CSS or had CSS turned off, would it still be human-readable?
- Are this format’s elements stylable with CSS?
We’ll address these in a moment as well.
.01 hTagcloud proposal
Based on the above discussion, here is a very first stab at an hTagcloud microformat proposal. Some of it is contingent on the resolution of the issues outlined above and summarized below.
- hTagclouds have a root element with a class value of hTagcloud
- this root element contains a list element, with an optional class value which identifies the nature of the tagcloud - is it historically popular? is it popularity within the last 24hours, is it popularity within the last 7 days. These are the three common kinds of popularity in the real world examples shown. Should other reasonably common kinds of cloud be found in the wild, these can be added to the list
- This list may also optionally have a class value to indicate the order - alphabetical or by frequency (is this really required?)
- tags are link elements, with an href value of the tagspace which the tagcloud represents
- popularity or “weight” is conveyed with class values. There are 5 class values, ranging from “popular” to “vvvv-popular”. Some tag clouds have many more levels than this, but around 5 is a common number of weights. Many more than 5 becomes difficult to convey meaningfully via style. “Popular” is the “lowest” value, because in any non trivial tagging system, the number of tags in the system vastly exceeds the tags displayed in the tagcloud. All tags in a tagcloud are at least popular. That’s why the values don’t start with vv-unpopular, and range to vv-popular, by analogy with CSS named font sizes. As per the discussion on class values for marking up tag weight above, popularity has been chosen above terms like “weight” so that the tags belong to a class based on popularity, rather than using class to carry with it more data about the content of the element. The use of class in this way is familar to a significant number of developers, and makes for easily stylable tagclouds.
- The use of the “title” has been excluded for the following reasons
- it’s an atypical use of title
- elements marked up with title could only be styled using CSS with attribute selectors, which are both largely unused by developers, and not supported in the majority of browsers people use (in the sense that the majority of web users are using a browser which does not support this selector, therefore in effect these elements aren’t stylable at present with CSS, and so developers would be unlikely to adopt this practice for this practical reason)
- The use of nested ems has been not adopted for this draft because of its novelty, and the fussiness of coding it could quite possibly preclude the adoption of hTagcloud. This is based on a straw poll of some very proficient developers. While it is a very clever use of HTML, it might also be argued that the existence of the em element for
Indicat[ing] emphasis
and additionally the strong element forIndicat[ing] stronger emphasis
suggests that the use of multiply nested em elements does not indicate greater emphasis than a single unnested em element. Otherwise strong would be redundant, as equivalent to <em><em>
Of course, this is a .01 specification, and has been explicitly drafted to put the issues on the table for discussion at this early stage, while also moving the proposal forward by making concrete suggestions.
an example hTagcloud
<div class="hTagcloud">
<ul class="popularity">
<li class="vvvv-popular"><a href="/tags/Web+Standards+Group">Web Standards Group</a></li>
<li class="vvv-popular"><a href="/tags/accessibility">accessibility</a></li>
<li class="popular"><a href="/tags/beta+tester">beta tester</a></li>
<li class="vvv-popular"><a href="/tags/css">css</a></li>
<li class="v-popular"><a href="/tags/ex-coder">ex-coder</a></li>
<li class="vv-popular"><a href="/tags/usability">usability</a></li>
<li class="vvvv-popular"><a href="/tags/wsg">wsg</a></li>
</ul>
</div>

