Inside Technique : Do your documents want to go to the web?
Legacy Data and the Web Series
We said we go back to the routes after the detour through some of the font issues last time. Instead of rehashing the nuts and bolts of what kind of data we are dealing with (which you can find in the archive or drop us a note if you need a review), this time we want to look at some of the decisions that go into the legacy to web transition. As always, if you have questions, comments, or criticisms, let us know. We'll take questions, too! Address them to email@example.com.
If the legacy data you are working with is flat, line data generated by an in-house or commercial systems it is most likely formatted by an external device. In this case we don't mean a physical device, but a virtual device or filter that conditions the data for the target printing device. The big question as you approach the web with your data is what you want it to look like when it appears on the screen. Should it be a mirror of the document as it prints or should it be more web-friendly?
This is the first questions that should be answered but often the last question that is answered. It is important because it should direct how you approach the data and applications you are using today. If you goal is to use the web-enabled version of your documents as little more than an archive, by all means make the documents look identical to the paper documents. Here are some sample scenarios where this is your best option:
These are not all of the possibilities, but you should see the pattern. When there are multiple target users of a document, separated by time and space, who must potentially confer about the information contained in the document, preserving the fidelity of the original document is a pretty good idea. There is a caveat in this. If the original documents are formatted edge-to-edge in small fonts with little white space they are not a candidate for moving to the web in the current format. They are candidates for re-design before the adventure of moving to the web even starts!
On the opposite side of the fence are those documents which would benefit from serious reformatting to make them usable online. Documents that meet the criteria we identified above are just the beginning. Documents that have a wide variety of fonts and font sizes, tabular material formatted for landscape presentation, and highly blocked formatting render documents almost unusable if moved online without serious re-design. Think of some of the common business documents, such as applications, audit forms, and surveys that are built in two-column format. Displaying such formatted texts on the screen in two-column format makes them hard to read and hard to use.
So, we have set up a conundrum. We want to keep documents that have multiple users who must confer about them in as close to the original format as we can, but we do not want to mimic documents on the web in such a way that they are hard to use or worse, unreadable. This takes a bit of planning, which should start with a task force to identify the documents that are targets for a migration to web delivery. You'll want to separate your documents into two basic categories:
For the moment we are going to concentrate on the documents that must remain identical.
Once you have your categories it remains to look critically at your documents and assign them to a bucket. For those documents that must be identical in all presentations, look for potential problem areas such as custom fonts, multi-column presentation, tabular data, and overly large or small fonts. Look carefully at the footers and document identification information, which is often produced in 4 or 5 point type. Remember that when you move to the web you are moving from an 8.5 x 11 (US Letter) or 210mm x 297mm (A4) environment to a presentation environment that you have little control over. For some users the screen is in 640 x 480 pixel mode while for others it will be in 1024 x 768. There are higher and lower resolutions, and even the video card in the individual machine can change what a viewer sees on the screen. Colors will vary by screen and video driver, and fonts available for display will vary with the individual installation unless you take some drastic action to ensure font availability for your documents.
It is as big a job as it sounds like. You do have several paths to that mirror image. You can create an HTML/DHTML/XML version that formats the data, you can convert to PDF, or you can convert to image. The easiest is often to convert to image since for most environments this often takes little more than putting a fax driver into the output environment and then wrapping the bitmap as a GIF or PNG file. The downside is that this tends to produce very large files which are not searchable and take forever to load to the screen. It's fast to develop and very slow to use. But it can be a short term solution. To be usable it needs a file indexing methodology and an architecture to provide a path to the required documents in some logical manner, but it can be done.
If this isn't for you, consider PDF. Adobe's Portable Document Format is viewable through both Netscape and Internet Explorer as long as you are up-to-date on your installation. Creating PDF from existing applications can be reasonably easy to do. Check out PDFzone World Update at http://www.pdfzone.com for a truckload of utilities for turning the formats you have into PDF. The downside of PDF is that it is larger than flatter formats like HTML, but the upside is that you get in the PDF view what you saw on paper. You can scale the view on the screen and you can even establish bookmarks and views in the PDF document during its creation to make navigation easier. If you have ever seen a demonstration of the Merrill Lynch customer statement environment you have seen how PDF can be generated automatically by a legacy system to produce an incredibly navigable user experience. We talked about some of the vendors who support this type of venture in parts 5 and 6.
But if only the flattest format will do, then you are down the road to getting your data tagged in HTML or XML, with the supporting formatting issues. This can be a difficult task since the fonts available for print are often quite different than those available on the web. How white space is allocated and even how line breaks are calculated is quite different as well. When you add the variations in screen displays you can tell you are in for a challenge. However, if you generate your HTML or XML so that you force the margins and line lengths as well as the page depth (often by using tables and treating each cell as a "page"), you can make it work. There are not too many good vendor tools for getting to tagged mark-up while preserving the look and feel of a print version, but interrogate all of your vendors. Quite a few of built transforms or work with transform vendors to provide a solutions when this is the route you want to take. Remember to ask questions about how they will guarantee the formatting, and be prepared to make some decisions about handling custom fonts and proprietary fonts which may not be available any place on the web.
That's it for this time. Next time we'll delve back into ideas
for handling reformatting of data that just will not go to the web with
re-formatting. Let us know if you have questions. As always, we are are at firstname.lastname@example.org.
Copyright 2000 McGrew + McDaniel Group, Inc.
© 1997-2000 InsideDHTML.com, LLC. All rights reserved.