Contents 
 Index 
 "TPT User's Guide" 
 < Previous 
 Next > 

Chapter 10 voyant_indexer.pl

The voyant_indexer.pl perl program uses the temporary files for the index and generates a series of HTML files that are the master index over the whole system.

It can also perform word-chunking that expands the number of index entries and turns it into more of a concordance, a useful feature for programs.

This assumes that _index_file files were created for all sub-projects by voyant_nav.pl, which itself is usually called from the 56_nav_index.b shell script. Using all available and uniquely named _index_file files, this creates a series of master m_idx*.script files.

Overview

This program issues a system call to create a list of candidate input index_ files in the path directory. Then it steps through each of those files and concatenates their input into master_raw.

master_raw is turned into a hash table master_index. The key into the hash table is the index entry. What gets stored is both the title and the URL.

Word-chunking, when turned on, is performed on each element in the master_index. Natural boundaries (spaces, dashes, underscores, changes in case in the middle of a word) are used to create additional two-level index entries that contain the word-chunk followed by where it came from.

The ignore_terms_file is used to eliminate useless word-chunked entries (such as “the”, “a”, “to”, etc.) The additional useful entries are appended to the contents of the hash master_raw using the division_mult_entry separator only if the new entry is not a duplicate.

Word-chunking is particular useful for API documentation so that the reader does not have to remember the exact name of a code item in order to find it. An initial index token of “api_GetMovie-list” could be found not just under its name in the “A's”, but under “get”, “movie”, and “list”.

The expanded list is sorted.

The sorted list is output to a series of m_idx_ files. New m_idx_ files are created whenever a new character starts a word in the list.

When an index entry is referenced by multiple URLs, the additional references appear in the output as small document icons next to the first reference in plain text.

The Beginnings

Originally, I had mini-HTML systems that came from FrameMaker through Mif2Go and from source code through Doxygen. I was in charge of the system, so could manipulate the source documents to create links between mini-HTML systems.

The problems with this technique were:

• Such manual hyperlinks are inflexible to changes in the directory structure.

• Limitations in certain tools required more hard-coded paths than I wanted to have in my hyperlinks.

• Doxygen in particular does not like hyperlinks which are fully qualified and tended to break them.

• The mini-HTML systems could not be used out-of-context in other situations where they might be shared (e.g., another documentation suite), because there were too many interdependencies between systems.

• The inter-system dependencies were hard to maintain and keep working, particularly when the overall structure of the system was not yet set in stone.

The Extensions

How do printed manuals allow you to cross-reference between them? What if the manual name changes? What if the cross-reference target changes?

It seems to me that technical writers often purposely do not add cross-references to other manuals. One reason is the problem of maintaining the references. However, another reason is that you don’t have to cross-reference everything.

I almost developed an extension tool that found created database references of topics (such as titles, code item names, etc.) and the HTML file where they were located. This tool would then go through the entire system and automatically add hyperlinks.

I stop myself from wasting time developing this because:

• It is difficult to tell a tool when enough is enough. Every page would be full of hyperlinks.

• This would be hard to read and follow.

• This would distract the reader and lead them astray when really the writer intended for them to stay on that subject before moving on.

• It is easy to get one-to-one hyperlinks, but more of a challenge to get one-to-many hyperlinks, which is what the case would be if you wanted to web all occurrences of some topic or keyword together.

• It occurred to me that when something is of interest in a printed manual, the technical writer makes an index entry.

• For multi-book documentation suites, the technical writer may have a master index covering all manuals.

• Even if there is no master index, each manual has an index. As a result, the technical writer can rely on the reader’s intelligence to go looking in the indices of the various manuals (if the technical writer didn’t already point them to the right manual) to find more information.

As such, I curbed my desire to program more by fleshing out the master index, which in my case is modular and always covers the suite of documentation in the project.

Because the index files are created individually for the sub-projects, those sub-projects can be shared in other documentation suites and not mess up the master index.

True, it would be nice on occasion if I could put in links between mini-HTML systems. However, if the mini-systems are indexed properly, the reader can get where they need to be.



 "TPT User's Guide" 
 < Previous 
 Next > 


Open-Source tools compliments of Voyant Technologies, Inc. and Glenn C. Maxey.
01/13/2003

TP Tools v2-00-0a

# tpt-hug-02