SiteExperts.com Logo Home | Community | Developer's Paradise | Jobs
User Groups | Site Tools | Site Information | Search

Inside Technique : W3C DOM Table of Contents : Finding all the headers

Our first task is to extract all the header (H1...H6) elements in the document. The Internet Explorer model makes this extremely simple. In Internet Explorer the document can be represented as a tree or a flattened collection of elements. The flattened collection exposes easy access all elements through the all collection. Through this collection we can easily extract all the header elements.

The W3C recommendation exposes the document primarily as a tree. In addition, a convenience method is exposed, getElementsByTagName, that can retrieve all the elements of a particular type in the document or all elements using a special wildcard identifier ("*"). Unfortunately, while IE5 supports this method, it does not support the wildcard value for returning all elements.

At this point, we can ignore IE's lack of support and recursively navigate the tree of elements to find all the header elements or we can override IE's support for getElementsByTagName with a fixed version from within JavaScript. (for more about recursion, see Rajeev's article on building a maze recursively).

If we don't want to include any browser detection code, we can write our own function for locating the headers. This script is not simple and requires understanding recursion. Below is a basic function that visits each element node in the document. On the last page we include an enhanced version of this function that locates just the header elements and builds the TOC on the fly.

// Walk all elements - Recursive Standards-based
function getElements(obj) {
 for (var i=0;i < obj.childNodes.length;i++)
  if (obj.childNodes[i].nodeType==1) // Elements only
    getElements(obj.childNodes[i])
}

getElements(document.childNodes[0])

Rather than deal with the complexity of this function, with a very simple script we can override IE5's incomplete support for getElementsByTagName. A positive side-effect of this fix is we also add full support for this method to Internet Explorer 4.0. With this small script we can make IE5's implementation compatible with Netscape's. This also simplifies the script that navigates to all elements. When examining the getElements() function below, notice that we no longer need to call the getElements function recursively:

function ie_getElementsByTagName(str) {
 // Map to the all collections
 if (str=="*")
  return document.all
 else
  return document.all.tags(str)
}

if (document.all)
 document.getElementsByTagName = ie_getElementsByTagName

function getElements() {
 var obj = document.getElementsByTagName("*")
 for (var i=0;i < obj.length;i++)
  var el = obj[i]	// get the element
}

getElements()

The script for accessing all the elements is almost the same as the script we would write using the original Internet Explorer model. The only difference is we use the getElementsByTagName() method instead of the all collection. The next step is to write the script so only the header elements are extracted.

We are going to continue with the simpler, non-recursive solution. We do provide the source code for both solutions is provided at the end of this article.. Extracting the headers with getElementsByTagName() is simple. We just examine all the element's in the document and check whether they are a header element:

function getHeaders() {
 var obj = document.getElementsByTagName("*")
 var tagList = "H1;H2;H3;H4;H5;H6;"
 for (var i=0;i < obj.length;i++)
  if (tagList.indexOf(obj[i].tagName+";")>=0) {
    // Got One
  }
}

Next we are going to process each header element when found and iteratively build the table of contents.