Contents 
 Index 
 "Perl Program Reference" 
 < Previous 
 Next > 

voyant_indexer.pl File Reference

Creates a comprehensive index from temporary index files that were generated from another program. More...

Go to the source code of this file.

Defines

Functions

Variables


Detailed Description

Creates a comprehensive index from temporary index files that were generated from another program.

Parameters:
globe::path  location to find the index files. The name should be terminated with a slash (\). Although the directory_name is the first command line parameter, the true input are the index files contained within that directory. Index files must begin with "index_".
globe::master_tree_file  [optional path and] filename for the HTML file to use as a template for all index files to be generated. Ideally, this should contain navigation tools to get between the generated index files [a-z] and other parts of the system, such as the table of contents. This file has several specially flagged HTML comment sections that are required.
globe::ignore_terms_file  [optional path and] filename for a text file that contains words to ignore in the word-chunking process.
Returns:
This creates a series of HTML files that begin with "m_idx_". Generally, what follows in the name is the first character of the first word within the file. All index entries beginning with that character are in that file. These files are created in $globe::path.
  1. The $globe::master_tree_file is viewed first to make sure that it has all required information.

  2. This program issues a system call to create a list of candidate input index_ files in the $globe::path directory.

  3. Then it steps through each of those files and concatentates their input into $globe::master_raw.

  4. globe:
    :master_raw is turned into a hash table
    idx_struct. The key into the hash table is the index entry. What gets stored is both the display text and an array of URLs.
  5. Word-chunking is performed on each element in the $globe::master_index. Natural boundaries (spaces, dashes, underscores, changes in case in the middle of a word) are used to create additional two-level index entries that contain the word-chunk followed by where it came from. The $globe::ignore_terms_file is used to eliminate unuseful word-chunked entries (such as "the", "a", "to", etc.) The additional useful entries are appended to the contents of the hash
    globe:
    :master_raw using the
    globe::division_mult_entry separator only if the new entry is not a duplicate. Word-chunking is particular useful for API documentation so that the reader does not have to remember the exact name of a code item in order to find it. An initial index token of "api_GetMovie-list" could be found not just under its name in the "A's", but under "get", "movie", and "list".

  6. The expanded list is sorted.

  7. Exact duplicates in terms of index entry and URL are removed from the $globe::master_raw hash.

  8. The sorted list is output to a series of m_idx_ files. New m_idx_ files are created whenever a new character starts a word in the list.

  9. When an index entry is referenced by multiple URLs, the additional references appear in the output as small document icons next to the first reference in plain text.
More information about the true input "index_" files. These files are an unsorted running list of index tokens that were extracted from the HTML files in a directory. The tokens have two parts: the index entry and its URL. The separator is :,: (($globe::word_url_boundary). Additionally, the index entry can have two levels. In such cases, the seperator is :;: ($globe::word_c_boundary). Finally, a given index entry can represent multiple references or URLs. In such cases, the multiple entries are separated by :;;;: ($globe::division_mult_entry)

If the separators are changed in the generator program (voyant_nav.pl), they need to be changed here, too. The variable names in both programs are the same. The separators themselves were chosen because they were deemed never to occur in an index entry or URL and aren't Perl special characters.

More information about the master_file. Aside from serving as a template for all generated index files, this file chunks information using specially tagged HTML comments in order to simplify locating where generated information is to be placed. In addition, some tags contain information critical to the proper operation of the indexer.

The critical tags given by Perl variable and HTML syntax are:

The HTML syntax can be changed. However, the voy order sections are identical for various programs and their master_files which facilitates propogating information.
Limitations and Caveats:
This does not support index entries that might be or have Perl special characters. These are often eliminated early in the process.
The input index_ files cannot have ".htm" as part of the name. This assumes that input information was of the proper format with an index entry, $globe::word_url_boundary, and URL. If any of the input index_ files did not have this, this can cause problems.

Rather than passing in variables which can create copies in memory, many items use global variables that are defined in globe.pm. When a variable is known to be global, its name begins with "$globe::". The intent is to facilitate maintenance by having all user-defined tags in one place outside of the program.

Many debug statements are left in the code, although commented out or programmed out with if (0){...}. On occassion, a statement is copied, commented out, and then the copy modified in order to keep old techniques around while verifying new techniques. Old techniques were not always purged once the new one worked.

Author:
Glenn C. Maxey

Definition in file voyant_indexer.pl.


Define Documentation

#define assoc_t_data   $_[1]
 

#define c_word
 

#define capital   1
 

#define in_file
 

#define ind_file   $_[0]
 

#define out_file   $globe::path . "m_idx"
 

#define proc_title   &trash_special_characters($unproc_title)
 

#define remember_letter   "0"
 

#define remember_level   ""
 

#define term
 

#define unproc_title   $_[0]
 

#define very_critical   0
 

#define w_cnt   0
 


Function Documentation

int BEGIN  
 

Definition at line 209 of file voyant_indexer.pl.

if   nested_scripts = ~ /$remember_letter/i
 

Definition at line 1173 of file voyant_indexer.pl.

if $first_letter!~//   [a-zA-Z0-9]
 

Definition at line 1167 of file voyant_indexer.pl.

if  
 

Definition at line 602 of file voyant_indexer.pl.

if   ARGV,
  _arg_inc
 

Definition at line 274 of file voyant_indexer.pl.

unless open(OUT_INDEX,">$out_file$remember_letter.html")   
 

Definition at line 1152 of file voyant_indexer.pl.


Variable Documentation

PURGE_ENTRY __pad2__
 

Definition at line 1160 of file voyant_indexer.pl.

_arg_inc
 

Definition at line 302 of file voyant_indexer.pl.

_cnt = 0
 

Definition at line 1157 of file voyant_indexer.pl.

first_letter = substr($entry, 0, 1)
 

Definition at line 1166 of file voyant_indexer.pl.



 "Perl Program Reference" 
 < Previous 
 Next > 



Open-Source tools compliments of Voyant Technologies, Inc. and Glenn C. Maxey.
01/13/2003

TP Tools v2-00-0a

# tpt-perl-hcr-02