Contents 
 Index 
 "Perl Program Reference" 
 < Previous 
 Next > 

voyant_indexer.pl

Go to the documentation of this file.
00001 // This file has been modified on-the-fly with an input filter
00002 // to change it from Perl syntax to C++ strictly for the purposes
00003 // of faking out Doxygen. Modifications include:
00004 
00005 // - changing local() definitions to C++ #define statements.
00006 // - commenting out undef statements.
00007 // - changing $globe'... variable names to $globe_...
00008 // - changing sub statements to look like C++ functions.
00009 // - changing # comments to C++ comments.
00010 // - ...
00011 
00012 // If you see other strangeness in the HTML version of the Perl file,
00013 // it comes from getting it to look more C++ like.
00014 
00015 
00016 // #!/usr/#define/bin/perl
00017 //#############################################################################
00018 //#
00019 //#    $Id: voyant_indexer.pl,v 1.29 2002/12/24 15:05:54 gmaxe Exp $
00020 //#
00021 /** @file
00022  ** @brief Creates a comprehensive index from temporary index files that
00023  ** were generated from another program.
00024  **
00025  ** @param $globe::path location to find the index files. 
00026  ** The name should be terminated with a slash (\). 
00027  ** Although the directory_name is the first command line parameter,
00028  ** the true input are the index files contained within that directory.
00029  ** Index files must begin with "index_". 
00030  **
00031  ** @param $globe::master_tree_file [optional path and] filename for the HTML file
00032  ** to use as a template for all index files to be generated.
00033  ** Ideally, this should contain navigation tools to get between the
00034  ** generated index files [a-z] and other parts of the system, such as
00035  ** the table of contents. This file has several specially flagged
00036  ** HTML comment sections that are required.
00037  **
00038  ** @param $globe::ignore_terms_file [optional path and] filename for a text file that
00039  ** contains words to ignore in the word-chunking process.
00040  **
00041  ** @return This creates a series of HTML files that begin with "m_idx_".
00042  ** Generally, what follows in the name is the first character of the
00043  ** first word within the file. All index entries beginning with that character
00044  ** are in that file. These files are created in $globe::path.
00045  ** 
00046  ** <ol><li>The $globe::master_tree_file is viewed first to make sure that it has all
00047  ** required information.</li>
00048  ** 
00049  ** <li>This program issues a system call to create a list of candidate input
00050  ** index_ files in the $globe::path directory.</li>
00051  ** 
00052  ** <li> Then it steps through each of
00053  ** those files and concatentates their input into $globe::master_raw.</li>
00054  **
00055  ** <li> $globe::master_raw is turned into a hash table $idx_struct. The
00056  ** key into the hash table is the index entry. What gets stored is both the
00057  ** display text and an array of URLs.</li><ul>
00058  **
00059  ** <li>$entry = compacted, clean, lower-case key from display text for sorting. </li>
00060  ** <li>$subentry = compacted, clean, lower-case key from display text for sorting.</li>
00061  ** <li>$idx_struct{$entry}{display} = display text.</li>
00062  ** <li>$idx_struct{$entry}{url}[] = array of URL's for the $entry.</li>
00063  ** <li>$idx_struct{$entry}{sub}{$subentry}{display} = display text
00064  **              for the $entry's $subentry.</li>
00065  ** <li>$idx_struct{$entry}{sub}{$subentry}{url}[] = array of URL's 
00066  **              for the $entry's $subentry.</li></ul>
00067  **  
00068  ** 
00069  ** <li>Word-chunking is performed on each element in the $globe::master_index. 
00070  ** Natural boundaries (spaces, dashes, underscores, changes in case in the middle of a word) 
00071  ** are used to create additional two-level index entries that contain the word-chunk 
00072  ** followed by where it came from. The $globe::ignore_terms_file is used to
00073  ** eliminate unuseful word-chunked entries (such as "the", "a", "to", etc.)
00074  ** The additional useful entries are appended to
00075  ** the contents of the hash $globe::master_raw using the $globe::division_mult_entry
00076  ** separator only if the new entry is not a duplicate.
00077  ** Word-chunking is particular useful for API documentation so that the reader
00078  ** does not have to remember the exact name of a code item in order to find
00079  ** it. An initial index token of "api_GetMovie-list" could be found not just under
00080  ** its name in the "A's", but under "get", "movie", and "list".</li>
00081  ** 
00082  ** <li>The expanded list is sorted.</li>
00083  ** 
00084  ** <li>Exact duplicates in terms of index entry and URL are removed from the
00085  ** $globe::master_raw hash.</li>
00086  ** 
00087  ** <li>The sorted list is output to a series of m_idx_ files.
00088  ** New m_idx_ files are created whenever a new character starts a word in the list.</li>
00089  ** 
00090  ** <li>When an index entry is referenced by multiple URLs, the additional references
00091  ** appear in the output as small document icons next to the first reference in 
00092  ** plain text.</li>
00093  ** </ol> 
00094  **
00095  ** More information about the true input "index_" files.
00096  ** These files are an unsorted running list of index tokens that were 
00097  ** extracted from the HTML files in a directory. The tokens have two 
00098  ** parts: the index entry and its URL. The separator is :,: (($globe::word_url_boundary).
00099  ** Additionally, the index entry can have two levels. In such
00100  ** cases, the seperator is :;: ($globe::word_c_boundary).
00101  ** Finally, a given index entry can represent multiple references or URLs.
00102  ** In such cases, the multiple entries are separated by :;;;: ($globe::division_mult_entry)
00103  ** 
00104  ** If the separators are changed in the generator program (voyant_nav.pl),
00105  ** they need to be changed here, too. The variable names in both programs are the same.
00106  ** The separators themselves were chosen because they were deemed never to occur 
00107  ** in an index entry or URL and aren't Perl special characters.
00108  ** 
00109  ** More information about the master_file. Aside from serving as a template for
00110  ** all generated index files, this file chunks information using specially tagged
00111  ** HTML comments in order to simplify locating where generated information is to
00112  ** be placed. In addition, some tags contain information critical to the proper operation
00113  ** of the indexer.
00114  ** 
00115  ** The critical tags given by Perl variable and HTML syntax are:
00116  ** <ul><li>($globe::m_define{order}[0]) = "< !-- begin voy_order --"</li>
00117  ** <li>($globe::m_define{order}[1]) = "!-- end voy_order -->"</li>
00118  ** <li>($globe::m_define{structure}[0]) = "< !-- begin voy_structure -->"</li>
00119  ** <li>($globe::m_define{structure}[1]) = "< !-- end voy_structure -->"</li>
00120  ** <li>($globe::m_define{topic_list}[0]) = "< !-- begin voy_topic_list -->"</li>
00121  ** <li>($globe::m_define{topic_list}[1]) = "< !-- end voy_topic_list -->"</li></ul>
00122  ** 
00123  ** The HTML syntax can be changed. However, the voy order sections are identical
00124  ** for various programs and their master_files which facilitates propogating information.
00125  ** 
00126  ** @lim This does not support index entries that might be or have Perl special
00127  ** characters. These are often eliminated early in the process.
00128  **
00129  ** The input index_ files cannot have ".htm" as part of the name.
00130  ** This assumes that input information was of the proper format with an
00131  ** index entry, $globe::word_url_boundary, and URL. If any of the input index_
00132  ** files did not have this, this can cause problems.
00133  **
00134  ** Rather than passing in variables which can create copies in memory, many items
00135  ** use global variables that are defined in globe.pm. When a
00136  ** variable is known to be global, its name begins with "$globe::". The intent is
00137  ** to facilitate maintenance by having all user-defined tags in one place outside 
00138  ** of the program.
00139  **
00140  ** Many debug statements are left in the code, although commented out or programmed
00141  ** out with if (0){...}. On occassion, a statement is copied, commented out, and then
00142  ** the copy modified in order to keep old techniques around while verifying new techniques.
00143  ** Old techniques were not always purged once the new one worked.
00144  **
00145  ** @ingroup tp_tools tp_idx
00146  **
00147  ** @author Glenn C. Maxey
00148  **/
00149 // #
00150 //# 2002 Created by Voyant Technologies, Inc., Westminster, Colorado, USA.
00151 //#
00152 //# Permission to use, copy, modify, and distribute this software and its 
00153 //# documentation under the terms of the GNU General Public License is hereby 
00154 //# granted. No representations are made about the suitability of this software 
00155 //# for any purpose. It is provided "as is" without express or implied warranty. 
00156 //# See the GNU General Public License (http://www.gnu.org/copyleft/gpl.html) 
00157 //# for more details.
00158 //# 
00159 //# Documents produced by this script are derivative works derived from the 
00160 //# input used in their production; they are not affected by this license.
00161 //#
00162 //#    Revision Information:
00163 //#
00164 //#       $Log: voyant_indexer.pl,v $
00165 //#       Revision 1.29  2002/12/24 15:05:54  gmaxe
00166 //#       New name and tweaks to add full paths.
00167 //#
00168 //#       Revision 1.28  2002/11/20 15:23:05  gmaxe
00169 // //#       Added exit codes so that wrapper scripts can catch errors properly.
00170 //#
00171 //#       Revision 1.27  2002/07/26 18:56:11  gmaxe
00172 //#       Got rid of old definitions and migrated new structures into all;
00173 //#       enhanced xhelp output file names and format; now everything is
00174 //#       alphebetized.
00175 //#
00176 //#       Revision 1.26  2002/07/22 21:51:31  gmaxe
00177 //#       Created separate configuration file; allow for two inputs to
00178 //#       the voyant_latex.pl. commented out old variables in globe.pm.
00179 //#
00180 //#       Revision 1.25  2002/07/09 21:37:27  gmaxe
00181 //#       Changed the data structure to make maintenance, input, and output easier.
00182 //#       Improved the efficiency of the word chunking, html output, script output,
00183 //#       and duplicate entries. It handles multiple URLs for a given index entry
00184 //#       better by putting it into an array inside the hash. However, the
00185 //#       elimination of ignore terms is still expensive.
00186 //#
00187 //#       Revision 1.24  2002/07/03 15:53:58  gmaxe
00188 //#       Commented out ignore_terms loops within expensive word-chunking loops.
00189 //#       Ignore terms are dealt with later. Reduces time by over 50%.
00190 //#
00191 //#
00192 //#############################################################################
00193 
00194 
00195 //#############################################################################
00196 /** @fn int BEGIN
00197  ** @brief Code to execute when first entered.
00198  **
00199  ** @param None. 
00200  **
00201  ** @return None.
00202  **
00203  ** @lim None
00204  ** @ingroup tp_nav
00205  **/
00206 // #############################################################################
00207 int BEGIN  ( ) {
00208 //    print "\n============  Starting voyant_indexer.pl ==================================\n";
00209 
00210    $_index_file_list = "_index_file_list";
00211    $in_file = "";
00212    $f_type = "index_";
00213    $_arg_inc = 0;
00214 
00215    $no_scope_file = 0;
00216    $scope_pm = "globe.pm";  //  first time through; other scope stuff passed in.
00217 
00218 
00219    push (@INC, `pwd`);
00220    push (@INC, '/rtfm/techpubs/perl');
00221    if (0){
00222 //       print (@INC, "\n");
00223    }
00224    // ####
00225    //  All global variables are defined in the following file
00226    // ####
00227    unless (open ( IN_LIST, $scope_pm)) {
00228       $scope_pm = "/rtfm/techpubs/perl/$scope_pm";
00229       unless (open ( IN_LIST, $scope_pm)) {
00230          push (@file_errors, "Cannot open file \"$scope_pm\"");
00231          $no_scope_file++;
00232       }
00233    }
00234 //    close (IN_LIST);
00235    push (@INC, $scope_pm);
00236    if (!@file_errors) {
00237       // ####
00238       //  All global variables are defined in the following file
00239       // ####
00240       require $scope_pm;
00241 
00242       if (&globe::declare_variables()) {
00243 //          print "Variables initialized from $scope_pm.\n";
00244       } else {
00245          push (@file_errors, "Could not initialize variables from $scope_pm.\n");
00246       }
00247    } //  if not @file_errors
00248 
00249 
00250    //  Get path if there is one.
00251    if (@ARGV > $_arg_inc) {
00252       $globe::path = @ARGV[$_arg_inc];
00253       if ($globe::path =~ /\//) {
00254 //          print "The path specified is $globe::path\n";
00255       
00256          @path_chunks = split ( /\//, $globe::path);
00257          if (@path_chunks < 2) {
00258             $globe::rel_path_to_start_point = "./";
00259             $globe::pdf_name_from_dir = "book_cit_dtoss.pdf";
00260          } else {
00261             $globe::rel_path_to_start_point = "../";
00262             $globe::pdf_name_from_dir = "$path_chunks[@path_chunks - 1].pdf";
00263          }
00264       } else {
00265          push (@file_errors, "The input argument \"$globe::path\" requires a forward slash (\/) at the end.\n");
00266       }
00267    } else {
00268       push (@file_errors, "ERROR: root path is required.");
00269    } //  if 1 or more arguments
00270    $_arg_inc++;
00271    
00272    if (@ARGV > $_arg_inc) {
00273       $globe::master_tree_file = @ARGV[$_arg_inc];
00274 //       print "The master file is $globe::master_tree_file\n";
00275       unless (open ( IN_MASTER, "$globe::master_tree_file")) {
00276          push (@file_errors, "Cannot open file \"$globe::master_tree_file\".");
00277       }
00278    } //  if 2 or more arguments
00279    $_arg_inc++;
00280 
00281    if (@ARGV > $_arg_inc) {
00282       $globe::ignore_terms_file = @ARGV[$_arg_inc];
00283 //       print "The ignore terms file is $globe::ignore_terms_file\n";
00284       if (open ( IN_IGNORE, "$globe::ignore_terms_file")) {
00285          while (<IN_IGNORE>){
00286             $temp = $_;
00287             $temp =~ s/\n//;
00288             push (@globe::ignore_terms, $temp);
00289          }
00290          if (0) {
00291 //             print "Terms to ignore\n";
00292             foreach $term (@globe::ignore_terms){
00293 //                print "$term\n";
00294             }
00295          }
00296       } else {
00297          push (@file_errors, "Cannot open file \"$globe::master_tree_file\".");
00298       }
00299    } //  if 3 or more arguments
00300    $_arg_inc++;
00301 
00302    if (@ARGV > $_arg_inc) {
00303       $globe::word_chunk = 1;
00304       $globe::word_chunk = @ARGV[$_arg_inc];
00305       if ($globe::word_chunk =~ /^no/i){
00306          $globe::word_chunk = 0;
00307       } else {
00308          $globe::word_chunk = 1;
00309       }
00310    } else {
00311       $globe::word_chunk = 1;
00312    } //  if 3 or more arguments
00313 //    print "The flag for word-chunking is $globe::word_chunk; 1, do; 0, don't.\n";
00314    $_arg_inc++;
00315 
00316 
00317 }
00318 
00319 //#############################################################################
00320 /** @fn int main
00321  ** @brief The main program.
00322  **
00323  ** @param None. 
00324  **
00325  ** @return None.
00326  **
00327  ** @lim None
00328  ** @ingroup tp_nav
00329  **/
00330 // #############################################################################
00331 // sub main {
00332 {
00333    // #############################################################################
00334    // # Program start
00335    // #############################################################################
00336 
00337    if (0){
00338 //       print "=== Definitions 3 \n";
00339 //       exit(1);
00340    }
00341 
00342    if (@file_errors) {
00343       //  Makes no sense to go on if input parameters are off.
00344 //       print "\n============  Summary of errors =================================a\n";
00345       for ($i=0; $i<@file_errors; $i++){
00346 //          print "$i = $file_errors[$i]\n";
00347       }
00348       &using_indexer();
00349 //       exit(1);
00350    }
00351 
00352 
00353    //  Get the master definitions
00354    while (<IN_MASTER>){    //  entire master file into memory.
00355       $globe::master_index_html .= $_;
00356    }
00357    
00358    
00359    // ####
00360    //  IMPORTANT!!!
00361    //  Master ordering not used
00362    // ####
00363    &globe::get_master_nav_info();
00364 
00365    $_index_file_list = "$globe::path$_index_file_list";
00366 
00367    if (system ("ls $globe::path$f_type* > $_index_file_list")) {
00368 //       print "There is no $globe::path directory. Nothing done.\n";
00369       return(1);
00370    }
00371    
00372    // #########
00373    //  Create a master index list
00374    // #########
00375    &process_index_files();
00376 //    print "Total entries incoming = $globe::total_entry_in\n";
00377    &sort_raw_master();
00378    &create_index_files();
00379    &create_index_script();
00380 
00381    if (@file_errors) {
00382 //       print "\n============  Summary of errors ==================================\n";
00383       for ($i=0; $i<@file_errors; $i++){
00384 //          print "$i = $file_errors[$i]\n";
00385       }
00386    }
00387    
00388    
00389    // #############################################################################
00390    // # End of all the work
00391    // #############################################################################
00392 //    exit(0);
00393 }
00394 
00395 
00396 //#############################################################################
00397 /** @fn process_index_files
00398  ** @brief Gets a list of index_ files and hands them off one-at-a-time to
00399  ** &get_raw_master.
00400  **
00401  ** @param $_index_file_list The name of the temporary file that contains
00402  ** a list of potential index_ files.
00403  **
00404  ** @return Nothing.
00405  **
00406  ** A system call was made prior to running this routine which generated
00407  ** the list of files in $_index_file_list. This routine 
00408  **
00409  ** <ol><li>Opens that temporary file.</li>
00410  ** <li> Picks off each file from the list.</li>
00411  ** <li>Passes the index_ file to get_raw_master.</li></ol>
00412  ** 
00413  ** @lim If the index_ file from the list has ".htm" as part of its name,
00414  ** it is ignored.
00415  **
00416  ** @ingroup tp_idx
00417  **/
00418 // #############################################################################
00419 int process_index_files  ( ) {
00420    #define $in_file
00421    // undef (@path_chunk);
00422    
00423 //    print "Entering process_index_files\n";
00424 
00425    unless (open ( IN_LIST, "$_index_file_list")) {
00426 //       print "Cannot open file \"$_index_file_list\"\n";
00427 //       exit(1);
00428    }
00429    TOKEN_FILE: while (<IN_LIST>){    //  read a line from file into $_
00430       $in_file = $_;
00431       $in_file =~ s/\n//;
00432       
00433       if ($in_file =~ /\.htm/){
00434 //          //  print "gcm1 $in_file\n";
00435          next TOKEN_FILE;
00436       } else {
00437 //          print "$in_file\n";
00438          &get_raw_master($in_file);
00439       }
00440       $globe::relative_path = $in_file;
00441       $globe::relative_path =~ s/index_//;
00442       $globe::relative_path =~ s/\.html//;
00443       $globe::relative_path =~ s/\.htm//;
00444       @path_chunk = split ( /\//, $globe::relative_path, 2);
00445       if (@path_chunk =~ /2/) {
00446          $globe::relative_path = "$path_chunk[1]/";
00447       } else {
00448          $globe::relative_path .= "\./";
00449       }
00450 //       //  print "relative path = $globe::relative_path\n";
00451    } //  while IN_LIST
00452 //    close (IN_LIST);
00453 
00454 //    //  print "Master list:\n$globe::master_raw\n";
00455 } // process_index_files
00456 
00457 
00458 
00459 //#############################################################################
00460 /** @fn get_raw_master
00461  ** @brief Opens the input file and appends its information to $globe::master_raw
00462  ** before closing and returning.
00463  **
00464  ** @param $ind_file is the $in_file from process_index_files.
00465  ** @return $globe::master_raw has information from the input file appended to
00466  ** it.
00467  ** 
00468  ** @lim Does no checking on the information from within the input file
00469  ** to verify that it is correct. Garbage in and this may bomb completely.
00470  ** @ingroup tp_idx
00471  **/
00472 // #############################################################################
00473 int get_raw_master  ( ) {
00474    #define $ind_file  $_[0]
00475 //    //  print "Entering get_raw_master\n";
00476 
00477    unless (open ( IND_FILE, "$ind_file")) {
00478 //       print "Cannot open file \"$ind_file\"\n";
00479 //       exit(1);
00480    }
00481    while (<IND_FILE>){    //  read a line from file into $_
00482       $globe::master_raw .= $_;
00483       $globe::total_entry_in++;
00484    } //  while IND_FILE
00485 //    close (IND_FILE);
00486 } // get_raw_master
00487 
00488 
00489 //#############################################################################
00490 /** @fn sort_raw_master
00491  ** @brief Creates a $idx_struct hash table from the raw index material.
00492  **
00493  ** @param $globe::master_raw is used to get its information, although not
00494  ** explicitly passed in. To free up memory, this gets // undefined.
00495  **
00496  ** @retval $idx_struct is a hash that uses the index entry as the key
00497  ** and whose contents are the title plus URL.
00498  ** @retval $globe::master_raw is cleared after its information is put into the
00499  ** hash.
00500  ** 
00501  ** Parses the raw index information into a key (the index entry) and its
00502  ** associated title plus URL. The title is the same as the key. It is retained
00503  ** in the contents of the hash to help distinguish between entries and to
00504  ** help eliminate duplicates.
00505  **
00506  ** This routine calls the word_chunking routine to create additional entries from
00507  ** the items.
00508  ** 
00509  ** Uses @globe::ignore_terms to eliminate entries.
00510  **
00511  ** When finished, $idx_struct is a big list of index entries.
00512  **
00513  ** @lim Assumes that input information was of the proper format with an
00514  ** index entry, $globe::word_url_boundary, and URL. If any of the input index_
00515  ** files did not have this, this can cause problems.
00516  **
00517  ** Eliminates special characters and can purge entries that have Perl 
00518  ** special characters.
00519  **
00520  ** Word-chunking should have been made an input option. In order to turn off,
00521  ** the code has to be modified: 
00522  **
00523  ** $entry = compacted and clean display text for sorting.
00524  ** $subentry = compacted and clean display text for sorting.
00525  ** $idx_struct{$entry}{display} = display text
00526  ** $idx_struct{$entry}{url}[] = array of URL's for the $entry.
00527  ** $idx_struct{$entry}{sub}{$subentry}{display} = display text
00528  **              for the $entry's $subentry.
00529  ** $idx_struct{$entry}{sub}{$subentry}{url}[] = array of URL's 
00530  **              for the $entry's $subentry.
00531  **  
00532  ** @ingroup tp_idx
00533  **/
00534 // #############################################################################
00535 int sort_raw_master  ( ) {
00536 //    print "Entering sort_raw_master\n";
00537 
00538    @unchunked_list = split (/\n/, $globe::master_raw);
00539    TOKEN: for ($i=0; $i<=$// unchunked_list; $i++){
00540       if ($unchunked_list[$i] =~ /$globe::word_url_boundary/){
00541          $unchunked_list[$i] =~ s/  / /g;  //  some title have 2 spaces
00542          $unchunked_list[$i] =~ s/\r//g;
00543          $unchunked_list[$i] =~ s/\n//g;
00544          @entry_chunk = split (/$globe::word_url_boundary/, $unchunked_list[$i], 2);
00545          $entry_chunk[0] =~ s/\,[\s]*/$globe::word_c_boundary/;  
00546          if ($entry_chunk[0] =~ $globe::word_c_boundary){
00547             // ####
00548             //  Indicates that FM had multi-level index token
00549             //  This will not have any word chunking done
00550             // ####
00551             @title_chunk = split (/$globe::word_c_boundary/, $entry_chunk[0], 2);
00552             $entry1 = &trash_special_characters ($title_chunk[0]);
00553             $entry1 =~ s/ //g; //  get rid of all spaces
00554             $entry1 =~ tr/A-Z/a-z/;
00555             if ((0) && (&ignore_item($entry_chunk[0]))) {
00556                // ####
00557                //  This is turned off because it is assumed that if it came
00558                //  from FM, then the token is valid
00559                // ####
00560                next TOKEN;
00561             }
00562             &add_to_index_struct ($entry1, $title_chunk[0], $entry_chunk[1]);
00563             
00564             $entry2 = &trash_special_characters ($title_chunk[1]);
00565             $entry2 =~ s/ //g; //  get rid of all spaces 
00566             $entry2 =~ tr/A-Z/a-z/;
00567             if (!(&add_to_lev2_index_struct ($entry1, $entry2, $title_chunk[1], $entry_chunk[1]))) {
00568 //                print "Tried adding a second level without the primary level.\n";
00569             }
00570          } else { //  ($entry_chunk[0] =~ $globe::word_c_boundary)
00571             // ####
00572             //  This will not have any word chunking done
00573             // ####
00574             $entry1 = &trash_special_characters ($entry_chunk[0]); 
00575             $entry1 =~ s/ //g; //  get rid of all spaces 
00576             $entry1 =~ tr/A-Z/a-z/;
00577             if ((0) && (&ignore_item($entry_chunk[0]))) {
00578                // ####
00579                //  This means that the entry is not one of interest
00580                // ####
00581                next TOKEN;
00582             }
00583             &add_to_index_struct ($entry1, $entry_chunk[0], $entry_chunk[1]);
00584             // ####
00585             //  the following does word chunking
00586             // ####
00587             if ((1) && ($globe::word_chunk)) {
00588                &word_chunking( $entry_chunk[0], $entry_chunk[1]);
00589             } //  globe::word_chunk
00590          } //  ($entry_chunk[0] =~ $globe::word_c_boundary)
00591       } else { //  TOKEN
00592          //  token does not have URL; don't process
00593          next TOKEN;
00594       } //  TOKEN
00595    
00596    } //  $i unchunked_list
00597    
00598    $_cnt=0;
00599    if (0){ //  debug loop
00600       foreach $entry (sort keys %idx_struct){
00601          $_cnt++;
00602 //          print "$_cnt \"$entry\" = \"$idx_struct{$entry}{display}\"\n";
00603          if (($_cnt < 12) && (1)){
00604             foreach $subentry (sort keys %{$idx_struct{$entry}{sub}}) {
00605 //                print "== $_cnt \"$subentry\" = \"$idx_struct{$entry}{sub}{$subentry}{display}\" \n";
00606             }
00607          }
00608          if ((0)){
00609             foreach $subentry (sort keys %{$idx_struct{$entry}{sub}}) {
00610 //                print "== $_cnt \"$subentry\" = \"$idx_struct{$entry}{sub}{$subentry}{display}\" \n";
00611             }
00612          }
00613          if (($_cnt > 12) && (1)){
00614 //             exit(1);
00615          }
00616       }
00617 //       print "Stopping...\n";
00618 //       exit(1);
00619    } //  debug loop
00620    
00621 } //  sort_raw_master
00622 
00623 
00624 
00625 
00626 //#############################################################################
00627 /** @fn ignore_item
00628  ** @brief Compares the input term to a list of ignore terms.
00629  **
00630  ** @param $term_to_test Term to test.
00631  **
00632  ** @return Returns 1 if the term has matched an ignore term;
00633  ** Otherwise it returns 0.
00634  **
00635  ** It tests for fragments twice, so that "get" doesn't match on
00636  ** "together".
00637  **
00638  ** @lim If the term has a perl special character in it, it is
00639  ** sent back immediately and left alone.
00640  **  
00641  ** @ingroup tp_idx
00642  **/
00643 // #############################################################################
00644 int ignore_item  ( ) {
00645    $term_to_test = $_[0];
00646 
00647    // ####
00648    //  Don't mess with entries that have perl special characters, like +, [
00649    //  Otherwise, it will fail in the testing below; not important enough to
00650    //  purge from the output; probably won't be in the ignore file anyway.
00651    // ####
00652    if (($term_to_test =~ /\?/)
00653          || ($term_to_test =~ /\+/)
00654          || ($term_to_test =~ /\*/)
00655          || ($term_to_test =~ /\/\./)
00656          || ($term_to_test =~ /\(/)
00657          || ($term_to_test =~ /\)/)
00658          || ($term_to_test =~ /\[/)
00659          || ($term_to_test =~ /\$/)
00660          || ($term_to_test =~ /\// /))
00661    {
00662       return (0);
00663    }
00664 
00665    // ####
00666    //  Remove index items that are in the ignore file
00667    // ####
00668    foreach $r_term (@globe::ignore_terms){
00669       $r2_term = $r_term;
00670       $r2_term =~ s/ //g;
00671       // ####
00672       if (0){ // debug
00673          $stop = 0;
00674          if ($term_to_test =~ /\+/){
00675 //             print "gcm0 $term_to_test\n";
00676             $stop = 1;
00677          }
00678          if ($r_term =~ /\+/){
00679 //             print "gcm1 $r_term\n";
00680             $stop = 1;
00681          }
00682          if ($r2_term =~ /\+/){
00683 //              print "gcm2 $r2_term\n";
00684              $stop = 1;
00685          }
00686          if ($stop){
00687 //              exit(1);
00688          }
00689       } // debug
00690       // ####
00691       // ####
00692       //  Need to make sure that "get" doesn't match on "together".
00693       // ####
00694       if ((($term_to_test =~ /$r_term/i) || ($term_to_test =~ /$r2_term/i)) 
00695            && (($r_term =~ /$term_to_test/i) || ($r2_term =~ /$entry/i))) {
00696          if (0){
00697 //             print "r_term = $r_term; entry = $term_to_test\n";
00698          }
00699          return (1);
00700       }
00701    }
00702    //  default fall through; 0 means good.
00703    return (0);
00704 } //  ignore_item
00705 
00706 
00707 //#############################################################################
00708 /** @fn add_to_index_struct
00709  ** @brief Adds an element to the complicated hash table.
00710  **
00711  ** @param $in_entry the compacted entry into the hash
00712  ** @param $in_title the display text to show for the item
00713  ** @param $in_url the anchor to add to the URL array.
00714  ** If you send in $globe::word_c_boundary for the $in_url,
00715  ** then it won't add the URL to the list.
00716  **
00717  ** @retval Always returns  1.
00718  **
00719  **
00720  ** $entry = compacted and clean display text for sorting.
00721  ** $subentry = compacted and clean display text for sorting.
00722  ** $idx_struct{$entry}{display} = display text
00723  ** $idx_struct{$entry}{url}[] = array of URL's for the $entry.
00724  ** $idx_struct{$entry}{sub}{$subentry}{display} = display text
00725  **              for the $entry's $subentry.
00726  ** $idx_struct{$entry}{sub}{$subentry}{url}[] = array of URL's 
00727  **              for the $entry's $subentry.
00728  **  
00729  ** @ingroup tp_idx
00730  **/
00731 // #############################################################################
00732 int add_to_index_struct  ( ) {
00733    $in_entry = $_[0];
00734    $in_title = $_[1];
00735    $in_url   = $_[2];
00736    
00737    $donothave = 1;
00738    if ($in_url =~ /$globe::word_c_boundary/) {
00739       $donothave = 0;
00740    }
00741    if (exists ($idx_struct{$in_entry})) {
00742       //  We already have display text;
00743       //  need URL.
00744       //  weed out duplicates from the beginning.
00745       for ($j = 0; $j <= $// {$idx_struct{$in_entry}{url}}; $j++) {
00746         if ($idx_struct{$in_entry}{url}[$j] =~ /$in_url/) {
00747            $donothave = 0;
00748         }
00749       } //  for $j
00750       if ($donothave) {
00751          push (@{$idx_struct{$in_entry}{url}}, $in_url);
00752       } 
00753    } else {
00754       //  we don't have anything; need to add it.
00755       $idx_struct{$in_entry}{display} = $in_title;
00756       if ($donothave) {
00757          push (@{$idx_struct{$in_entry}{url}}, $in_url);
00758       } 
00759    }
00760    return (1);
00761 } //  add_to_index_struct
00762 
00763 
00764 //#############################################################################
00765 /** @fn add_to_lev2_index_struct
00766  ** @brief Adds an element to the second level of the complicated hash table.
00767  **
00768  ** @param $in_entry the compacted entry into the hash.
00769  ** @param $in_sub the compacted subentry into the hash.
00770  ** @param $in_title the display text to show for the item.
00771  ** @param $in_url the anchor to add to the URL array.
00772  **
00773  ** @retval Always returns  1.
00774  **  
00775  **
00776  ** $entry = compacted and clean display text for sorting.
00777  ** $subentry = compacted and clean display text for sorting.
00778  ** $idx_struct{$entry}{display} = display text
00779  ** $idx_struct{$entry}{url}[] = array of URL's for the $entry.
00780  ** $idx_struct{$entry}{sub}{$subentry}{display} = display text
00781  **              for the $entry's $subentry.
00782  ** $idx_struct{$entry}{sub}{$subentry}{url}[] = array of URL's 
00783  **              for the $entry's $subentry.
00784  **
00785  ** @ingroup tp_idx
00786  **/
00787 // #############################################################################
00788 int add_to_lev2_index_struct  ( ) {
00789    //  assumes that $in_entry already exists
00790    $in_entry = $_[0];
00791    $in_sub   = $_[1];
00792    $in_title = $_[2];
00793    $in_url   = $_[3];
00794    
00795    if (!(exists ($idx_struct{$in_entry}))) {
00796       return (0);
00797    }
00798    if (exists ($idx_struct{$in_entry}{sub}{$in_sub})) {
00799       //  We already have display text;
00800       //  need URL.
00801       //  weed out duplicates from the beginning.
00802       $donothave = 1;
00803       for ($j = 0; $j <= $// {$idx_struct{$in_entry}{sub}{$in_sub}{url}}; $j++) {
00804         if ($idx_struct{$in_entry}{sub}{$in_sub}{url}[$j] =~ /$in_url/) {
00805            $donothave = 0;
00806         }
00807       } //  for $j
00808       if ($donothave) {
00809          push (@{$idx_struct{$in_entry}{sub}{$in_sub}{url}}, $in_url);
00810       } 
00811    } else {
00812       //  we don't have anything; need to add it.
00813       $idx_struct{$in_entry}{sub}{$in_sub}{display} = $in_title;
00814       push (@{$idx_struct{$in_entry}{sub}{$in_sub}{url}}, $in_url);
00815    }
00816    return (1);
00817 } //  add_to_lev2_index_struct
00818 
00819 //#############################################################################
00820 /** @fn word_chunking
00821  ** @brief Performs word-chunking on the passed in entries that was extracted
00822  ** from the $globe::master_raw.
00823  **
00824  ** @param $unproc_title the unprocessed title passed in $entry_chunk[0]. 
00825  ** 
00826  ** @param $assoc_t_data the associated title and data given by $entry_chunk[1].
00827  **
00828  ** @return Updated entries in the hash $globe::master_index. If a word-chunk
00829  ** already is available as a key into the hash, then this appends its information
00830  ** to the contents of the key using $globe::division_mult_entry.
00831  **
00832  ** Word-chunking is performed on the $unproc_title. 
00833  ** Natural boundaries (spaces, dashes, underscores, changes in case in the middle of a word) 
00834  ** are used to create additional two-level index entries that contain the word-chunk 
00835  ** followed by where it came from. 
00836  **
00837  ** The $globe::ignore_terms_file is used to
00838  ** eliminate unuseful word-chunked entries (such as "the", "a", "to", etc.)
00839  **
00840  ** The additional useful entries are appended to
00841  ** the contents of the hash $globe::master_raw using the $globe::division_mult_entry
00842  ** separator only if the new entry is not a duplicate.
00843  **
00844  ** Word-chunking is particular useful for API documentation so that the reader
00845  ** does not have to remember the exact name of a code item in order to find
00846  ** it. An initial index token of "api_GetMovie-list" could be found not just under
00847  ** its name in the "A's", but under "get", "movie", and "list".
00848  **
00849  **
00850  ** $entry = compacted and clean display text for sorting.
00851  ** $subentry = compacted and clean display text for sorting.
00852  ** $idx_struct{$entry}{display} = display text
00853  ** $idx_struct{$entry}{url}[] = array of URL's for the $entry.
00854  ** $idx_struct{$entry}{sub}{$subentry}{display} = display text
00855  **              for the $entry's $subentry.
00856  ** $idx_struct{$entry}{sub}{$subentry}{url}[] = array of URL's 
00857  **              for the $entry's $subentry.
00858  **  
00859  ** @lim None.
00860  ** @ingroup tp_idx
00861  **/
00862 // #############################################################################
00863 int word_chunking  ( ) {
00864    #define $unproc_title  $_[0]
00865    #define $assoc_t_data  $_[1]
00866    #define $capital  1
00867    #define $w_cnt  0
00868    // undef (@w_chunks);
00869    // undef (@word);
00870    #define $c_word
00871    #define $term
00872 
00873    #define $proc_title  &trash_special_characters($unproc_title)
00874    $proc_title =~ s/ //g;
00875    $proc_title =~ tr/A-Z/a-z/;
00876    
00877    if (0) {
00878 //       print "$unproc_title $assoc_t_data\n";
00879    }
00880    
00881    // ####
00882    //  Return quickly if it is an additional "used by" cross-reference.
00883    //  It'll already have too many word-chunk references. No problem.
00884    // ####
00885    if ($unproc_title =~ $globe::ack_used_by){
00886       return (1);
00887    }
00888    
00889    // ####
00890    //  Use for word sorting: spaces, underscores, dashes, em-dashes,
00891    //  slashes
00892    // ####
00893    @w_chunks = split ( /\s|_|-|\&\// 8212\;|\&\#151\;|\/|\\|\./, $unproc_title);
00894 
00895    // ####
00896    //  Additional word chunking based on capital letters
00897    //  Create even more word chunks based on whether an individual word
00898    //  has smaller sections in it based on capitalization.
00899    //  For example, "fileIsDirectory" has "Is" and
00900    //  "Directory" in it.
00901    // ####
00902    if ($capital) {   //  on/off for capital word-chunking
00903       NEXT_CC_WORD: foreach $c_word (@w_chunks){
00904          // ####
00905          //  Get rid of elements in word chunk that we don't want hanging around
00906          //  More special characters introduced for operator++
00907          //  Don't bother word chunking them
00908          // ####
00909          $c_word = &trash_special_characters($c_word);
00910          // ####
00911          
00912          // ####
00913          //  Check for one or more capital letters within existing word chunk
00914          // ####
00915          if ($c_word =~ /[A-Z]+/) {
00916 //             //  print "==== $c_word  ===\n";
00917             // ####
00918             //  Split the word chunk into capital letter chunks if it is in word chunk
00919             // ####
00920             @cap_chunk = split (/([A-Z]+)/, $c_word);
00921             if (@cap_chunk > 1) {
00922                // ####
00923                //  Go through all cap chunks; ignore the first element
00924                //  because this will already be handled as part of a word.
00925                // ####
00926                for ($k=1; $k < @cap_chunk; $k++) {  //  ignore 0 element
00927 //                   //  print "$k sub_cap = $cap_chunk[$k]\n";
00928                   if (($cap_chunk[$k] =~ /[A-Z]/) && ($k+1 < @cap_chunk)) {
00929                      // ####
00930                      //  Handles when a cap chunk is followed by a small chunk.
00931 //                      //  print "Create chunk = $cap_chunk[$k] with $cap_chunk[$k+1]\n";
00932                      // ####
00933                      $temp = "$cap_chunk[$k]$cap_chunk[$k+1]";
00934                      $temp =~ tr/A-Z/a-z/;
00935                      push (@word, $temp);
00936                      // ####
00937                      //  jump past next k
00938                      // ####
00939                      $k++;
00940                   } elsif (($cap_chunk[$k] =~ /[A-Z]/) && ($k+1 == @cap_chunk)) {
00941                      // ####
00942                      //  Handles situation where last chunk is all caps.
00943 //                      //  print "Create chunk = $cap_chunk[$k]\n";
00944                      // ####
00945                      $temp = "$cap_chunk[$k]";
00946                      $temp =~ tr/A-Z/a-z/;
00947                      push (@word, $temp);
00948                      // ####
00949                      //  jump past next k cap chunk
00950                      // ####
00951                      $k++;
00952                   } else {
00953                      // ####
00954                      //  should never really get to this point
00955                      //  chunk is in small caps somewhere probably at beginning
00956                      // ####
00957 //                      print "$k of $// cap_chunk; \"$cap_chunk[$k]\" of \"$c_word\" from \"$unproc_title\"; probably k=0 or 1\n";
00958                   } 
00959                } //  for the number of cap chunks
00960             } //  if more than one cap chunk
00961          } //  if there are capital letters in word chunk
00962       } //  for each word chunk
00963       
00964       // ####
00965       //  Need to add the words to the list.
00966       // ####
00967       foreach $sw_w (@word) {
00968          push (@w_chunks, $sw_w);
00969          if (0) {
00970 //             print "fractions: $sw_w\n";
00971          }
00972       }
00973 
00974       if ((0) && ($unproc_title =~ /BapiVersion/i)) { // debug
00975          $_ccc= 0;
00976 //          print "==== more chunk \"$unproc_title\"\n";
00977          foreach $c_word (@w_chunks){
00978             $_ccc++;
00979 //             print "1 $_ccc $c_word\n";      
00980          }
00981          $_ccc= 0;
00982          foreach $c_word (@word){
00983             $_ccc++;
00984 //             print "2 $_ccc $c_word\n";      
00985          }
00986 //          exit(1);
00987       } //  debug
00988    } //  # on/off for capital word-chunking
00989 
00990 
00991    // ####
00992    //  Do something with all of the word chunks
00993    // ####
00994    { //  bracket level
00995 //       //  print "======= $unproc_title \n";
00996       NEXT_C_WORD: foreach $c_word (@w_chunks){
00997          if ($unproc_title =~ /^$c_word/i) {
00998             // ####
00999             //  Don't do the a word chunk that comes
01000 //             //  close to matching the original name 
01001             //  (as in "log" for "logFuncDebugGet". Hence,
01002             //  We'll purposely skip over it.
01003             // ####
01004             next NEXT_C_WORD;
01005          } 
01006          $c_word =~ tr/A-Z/a-z/;
01007          $c_word = &trash_special_characters($c_word);
01008          
01009          if ((0) && (&ignore_item($c_word))) {
01010             // ####
01011             //  This means that the entry is not one of interest
01012             // ####
01013             next NEXT_C_WORD;
01014          }
01015          // ####
01016          //  We don't want extra hyperlinks to word junk fragment.
01017          //  Calling it specifically with $globe::word_c_boundary as url
01018          // ####
01019          &add_to_index_struct ($c_word, $c_word, $globe::word_c_boundary);
01020 
01021          if (!(&add_to_lev2_index_struct ($c_word, $proc_title, $unproc_title, $assoc_t_data))) {
01022 //             print "Tried adding a second level without the primary level.\n";
01023          }
01024          
01025       } //  foreach $c_word
01026    } //  bracket level
01027 } //  word_chunking
01028 
01029 
01030 //#############################################################################
01031 /** @fn trash_special_characters
01032  ** @brief Removes all special characters that we don't want in index as
01033  ** word chunks.
01034  **
01035  ** @param $in_word The word that might have special characters.
01036  ** @return The word without special characters.
01037  ** 
01038  ** @lim Debug statements are left in.
01039  ** @ingroup tp_idx
01040  **/
01041 // #############################################################################
01042 int trash_special_characters  ( ) {
01043    $in_word = $_[0];
01044    
01045    // ####
01046    $in_word =~ s/\&\// 151\;//;
01047    $in_word =~ s/\&\// 8212\;//;
01048    $in_word =~ s/\s//g;
01049    $in_word =~ s/\:\://g;
01050    $in_word =~ s/\(//g;
01051    $in_word =~ s/\)//g;
01052    $in_word =~ s/\,//g;
01053    //  $in_word =~ s/\.//g; # 07/08/2002 removed
01054    $in_word =~ s/\-//g;
01055    $in_word =~ s/\_//g;
01056    $in_word =~ s/\://g;
01057    $in_word =~ s/\[//g;
01058    $in_word =~ s/\]//g;
01059    $in_word =~ s/\+//g;
01060    $in_word =~ s/\=//g;
01061    $in_word =~ s/\*//g;
01062    $in_word =~ s/\?//g;
01063    $in_word =~ s/\&//g;
01064    $in_word =~ s/\%//g;
01065    $in_word =~ s/\$//g;
01066    $in_word =~ s/\// //g;
01067    $in_word =~ s/\@//g;
01068    $in_word =~ s/\!//g;
01069    $in_word =~ s/\|//g;
01070    $in_word =~ s/\\//g;
01071    $in_word =~ s/\///g;
01072    $in_word =~ s/\<//g;
01073    $in_word =~ s/\>//g;
01074    $in_word =~ s/\"//g;
01075    $in_word =~ s/\'//g;
01076    $in_word =~ s/\~//g;
01077    $in_word =~ s/\`//g;
01078    // ####
01079    return ($in_word);
01080 } //  trash_special_characters
01081 
01082 
01083 
01084 //#############################################################################
01085 /** @fn create_index_files
01086  ** @brief Outputs m_idx_ HTML files from the contents of $idx_struct.
01087  **
01088  ** @param $idx_struct is the sorted list.
01089  ** @return A series of HTML files that make up the index. The files begin
01090  ** with "m_idx_", are followed by a character, and end with ".html".
01091  ** 
01092  ** This uses the $globe::master_index_html and swaps out the information
01093  ** delineated by $globe::m_define{structure}[0] and $globe::m_define{structure}[1].
01094  **
01095  ** The $idx_struct is sorted. Every time the first letter of
01096  ** the key changes, a new m_idx_ file is created.
01097  **
01098  ** When creating the output, it formats it using valid HTML with an anchor href
01099  ** containing the valid URL and the text. It eliminates duplicates (e.g., URL
01100  ** is identical). 
01101  ** 
01102  ** It handles the levels, as in second-level index entries.
01103  **
01104  ** $entry = compacted and clean display text for sorting.
01105  ** $subentry = compacted and clean display text for sorting.
01106  ** $idx_struct{$entry}{display} = display text
01107  ** $idx_struct{$entry}{url}[] = array of URL's for the $entry.
01108  ** $idx_struct{$entry}{sub}{$subentry}{display} = display text
01109  **              for the $entry's $subentry.
01110  ** $idx_struct{$entry}{sub}{$subentry}{url}[] = array of URL's 
01111  **              for the $entry's $subentry.
01112  ** 
01113  ** @lim Debug statements are left in.
01114  ** @ingroup tp_idx
01115  **/
01116 // #############################################################################
01117 int create_index_files  ( ) {
01118    #define $inner_index  "This Letter has no entries."
01119    #define $remember_letter  "0"
01120    #define $remember_level  ""
01121    #define $out_file  $globe::path . "m_idx_" //  will have letter and .html appended
01122    
01123 //    print "Entering create_index_files\n";
01124    $_cnt=0;
01125    if (0){ //  debug loop
01126       foreach $entry (sort keys %idx_struct){
01127          $_cnt++;
01128 //          print "$_cnt \"$entry\" = \"$idx_struct{$entry}{display}\"\n";
01129          if (($_cnt < 25) && (0)){
01130             foreach $subentry (sort keys %{$idx_struct{$entry}{sub}}) {
01131 //                print "== $_cnt \"$subentry\" = \"$idx_struct{$entry}{sub}{$subentry}{display}\"\n";
01132             }
01133          }
01134          if ((1)){
01135             foreach $subentry (sort keys %{$idx_struct{$entry}{sub}}) {
01136 //                print "== $_cnt \"$subentry\" = \"$idx_struct{$entry}{sub}{$subentry}{display}\"\n";
01137             }
01138          }
01139          if (($_cnt > 12) && (0)){
01140 //             exit(1);
01141          }
01142       }
01143 //       exit(1);
01144    } //  debug loop
01145    
01146    unless (open ( OUT_INDEX, ">$out_file$remember_letter.html")) {
01147       push (@file_errors, "Cannot open file \"$out_file$remember_letter.html\"\n");
01148 //       print "Cannot open file \"$out_file$remember_letter.html\"\n";
01149    }
01150 
01151    $_cnt=0;
01152    PURGE_ENTRY: foreach $entry (sort keys %idx_struct){
01153       if ((1) && (&ignore_item ($idx_struct{$entry}{display}))) {
01154          next PURGE_ENTRY;
01155       }
01156       
01157       // ####
01158       //  Take care of writing out stored index info for a given letter
01159       // ####
01160       $first_letter = substr($entry, 0, 1);
01161       if ($first_letter !~ /[a-zA-Z0-9]/) {
01162          $first_letter = "\-";
01163       }
01164       if (0) {
01165 //          print "===letter \"$first_letter\" ===\n$inner_index\n";
01166       }
01167       if ($first_letter =~ /$remember_letter/i){
01168          if (0) {
01169 //             print "Current letter ($remember_letter)\n";
01170          }
01171       } else {
01172          // ####
01173          //  write to the previous index file
01174          // ####
01175          @chunks = split ( /$globe::m_define{structure}[0]|$globe::m_define{structure}[1]/, $globe::master_index_html, 3);
01176          $chunks[1] = $inner_index;
01177          $globe::master_index_html = join ("", $chunks[0], $globe::m_define{structure}[0], $chunks[1], $globe::m_define{structure}[1], $chunks[2]);
01178 //          print (OUT_INDEX $globe::master_index_html);
01179          if (0) {
01180 //             print "===index entries \"$first_letter\" ===\n$inner_index\n";
01181          }
01182 //          //  close the previous index file
01183 //          close (OUT_INDEX); 
01184          //  Change the letter and open the next index file for writing
01185          $remember_letter = $first_letter;
01186          $temp_letter = $remember_letter;
01187          $temp_letter =~ tr/a-z/A-Z/; 
01188          unless (open ( OUT_INDEX, ">$out_file$remember_letter.html")) {
01189             push (@file_errors, "Cannot open file \"$out_file$remember_letter.html\"\n");
01190 //             print "Cannot open file \"$out_file$remember_letter.html\"\n";
01191          }
01192          //  reset the index information
01193          $inner_index = "<p class=\"GroupTitlesIX\"><center><b>-$temp_letter-</b></center></p>\n";
01194       }
01195       // ####
01196       
01197       // ####
01198       //  Take care of writing out stored index info for a given letter
01199       //  Ignore duplicates.
01200       // ####
01201       
01202       $inner_index .= "<pre class=\"Level1IX\">";
01203       if ($// {$idx_struct{$entry}{url}} > 1) {
01204          // ####
01205          //  Has multiple destinations to worry about
01206          // ####
01207          $inner_index .= "$idx_struct{$entry}{display}";
01208          $inner_index .= "<br>";
01209          for ($i=0; $i <= $// {$idx_struct{$entry}{url}}; $i++) {
01210             $inner_index .= "$idx_struct{$entry}{url}[$i]";
01211             $inner_index .= "<img src=\"nav_doc.gif\" border=\"0\"></a>";
01212             $_cnt++;
01213          }
01214          $inner_index .= "</pre>\n";
01215       } else {
01216          // ####
01217          //  Only has one destination to worry about
01218          // ####
01219          $inner_index .= "$idx_struct{$entry}{url}[0]";
01220          $inner_index .= "$idx_struct{$entry}{display}";
01221          $inner_index .= "</a></pre>\n";
01222          $_cnt++;
01223       }
01224       
01225       PURGE_SUBENTRY: foreach $subentry (sort keys %{$idx_struct{$entry}{sub}}){
01226          if ((0) && (&ignore_item ($idx_struct{$entry}{sub}{$subentry}{display}))) {
01227             next PURGE_SUBENTRY;
01228          }
01229          $inner_index .= "<pre class=\"Level2IX\">";
01230          if ($// {$idx_struct{$entry}{sub}{$subentry}{url}} > 1) {
01231             // ####
01232             //  Has multiple destinations to worry about
01233             // ####
01234             $inner_index .= "$idx_struct{$entry}{sub}{$subentry}{display}";
01235             $inner_index .= "<br>";
01236             for ($i=0; $i <= $// {$idx_struct{$entry}{sub}{$subentry}{url}}; $i++) {
01237                $inner_index .= "$idx_struct{$entry}{sub}{$subentry}{url}[$i]";
01238                $inner_index .= "<img src=\"nav_doc.gif\" border=\"0\"></a>";
01239                $_cnt++;
01240             }
01241             $inner_index .= "</pre>\n";
01242          } else {
01243             // ####
01244             //  Only has one destination to worry about
01245             // ####
01246             $inner_index .= "$idx_struct{$entry}{sub}{$subentry}{url}[0]";
01247             $inner_index .= "$idx_struct{$entry}{sub}{$subentry}{display}";
01248             $inner_index .= "</a></pre>\n";
01249             $_cnt++;
01250          }
01251       } //  PURGE_SUBENTRY
01252    } //  PURGE_ENTRY for each entry
01253       
01254    
01255 //    print "Duplicates and ignore terms removed.\n";
01256 //    print "Total Individual Hyperlinks: $_cnt\n";
01257 
01258    
01259    // ####
01260    //  Clean-up for last letter/file.
01261    // ####
01262    //  write to the previous index file
01263    @chunks = split ( /$globe::m_define{structure}[0]|$globe::m_define{structure}[1]/, $globe::master_index_html, 3);
01264    $chunks[1] = $inner_index;
01265    $globe::master_index_html = join ("", $chunks[0], $globe::m_define{structure}[0], $chunks[1], $globe::m_define{structure}[1], $chunks[2]);
01266 //    print (OUT_INDEX $globe::master_index_html);
01267    if (0) {
01268 //       print "===index entries \"$first_letter\" ===\n$inner_index\n";
01269    }
01270 //    //  close the previous index file
01271 //    close (OUT_INDEX); 
01272    // ####
01273 
01274 } //  create_index_files
01275 
01276 //#############################################################################
01277 /** @fn create_index_script
01278  ** @brief Outputs index script file from the contents of $idx_struct.
01279  **
01280  ** @param $idx_struct is the sorted list.
01281  ** @return A single script files that make up the index. 
01282  ** 
01283  ** When creating the output, it formats it using valid script with an anchor href
01284  ** containing the valid URL and the text. It eliminates duplicates (e.g., URL
01285  ** is identical). 
01286  ** 
01287  ** It handles the levels, as in second-level index entries.
01288  **
01289  ** $entry = compacted and clean display text for sorting.
01290  ** $subentry = compacted and clean display text for sorting.
01291  ** $idx_struct{$entry}{display} = display text
01292  ** $idx_struct{$entry}{url}[] = array of URL's for the $entry.
01293  ** $idx_struct{$entry}{sub}{$subentry}{display} = display text
01294  **              for the $entry's $subentry.
01295  ** $idx_struct{$entry}{sub}{$subentry}{url}[] = array of URL's 
01296  **              for the $entry's $subentry.
01297  ** 
01298  ** @lim Debug statements are left in.
01299  ** @ingroup tp_idx
01300  **/
01301 // #############################################################################
01302 int create_index_script  ( ) {
01303    #define $remember_letter  "0"
01304    #define $remember_level  ""
01305    #define $out_file  $globe::path . "m_idx" //  will have letter and .script appended
01306    #define $very_critical  0
01307    
01308 //    print "Entering create_index_script\n";
01309    $_cnt=0;
01310    if (0){ //  debug loop
01311       foreach $entry (sort keys %idx_struct){
01312          $_cnt++;
01313 //          print "$_cnt \"$entry\" = \"$idx_struct{$entry}{display}\"\n";
01314          if (($_cnt < 25) && (0)){
01315             foreach $subentry (sort keys %{$idx_struct{$entry}{sub}}) {
01316 //                print "== $_cnt \"$subentry\" = \"$idx_struct{$entry}{sub}{$subentry}{display}\"\n";
01317             }
01318          }
01319          if ((1)){
01320             foreach $subentry (sort keys %{$idx_struct{$entry}{sub}}) {
01321 //                print "== $_cnt \"$subentry\" = \"$idx_struct{$entry}{sub}{$subentry}{display}\"\n";
01322             }
01323          }
01324          if (($_cnt > 12) && (0)){
01325 //             exit(1);
01326          }
01327       }
01328 //       exit(1);
01329    } //  debug loop
01330    
01331 
01332 
01333    // ####
01334    //  Handle the script file implementation
01335    // ####
01336       
01337    unless (open ( OUT_SCRIPT, ">$out_file.script")) {
01338       push (@file_errors, "Cannot open file \"$out_file.script\"\n");
01339 //       print "Cannot open file \"$out_file.script\"\n";
01340    }
01341 //    print "Preparing to output \"$out_file.script\"\n";
01342 //    //  $inner_script = "Item level=1 image=nav_folderclosed.gif text=Master Index\r\n";
01343    $inner_script = "";
01344    
01345    PURGE_ENTRY_SCRIPT: foreach $entry (sort keys %idx_struct){
01346       if ((0) && (&ignore_item ($idx_struct{$entry}{display}))) {
01347          next PURGE_ENTRY_SCRIPT;
01348       }
01349       
01350       // ####
01351       //  Take care of index info for a given letter
01352       // ####
01353       $first_letter = substr($entry, 0, 1);
01354       if ($first_letter !~ /[a-zA-Z]/) {
01355          $first_letter = "\-";
01356          next PURGE_ENTRY_SCRIPT;
01357       }
01358       if (0) {
01359 //          print "===letter \"$first_letter\" \n";
01360       }
01361       if ($first_letter =~ /$remember_letter/i){
01362          if (0) {
01363 //             print "Current letter ($remember_letter)\n";
01364          }
01365       } else {
01366          // ####
01367          //  write to the previous index file
01368          // ####
01369          //  Change the letter and open the next index file for writing
01370          $remember_letter = $first_letter;
01371          $temp_letter = $remember_letter;
01372          $temp_letter =~ tr/a-z/A-Z/; 
01373 //          //  $inner_script .= "Item level=2 image=nav_folderclosed.gif text=-$temp_letter-\r\n";
01374 //          $inner_script .= "Item level=1 image=nav_folderclosed.gif text=-$temp_letter-\r\n";
01375       }
01376       // ####
01377       
01378       if ((0) && ($// {$idx_struct{$entry}{url}} > 1)) {
01379          // ####
01380          //  Has multiple destinations to worry about
01381          // ####
01382 //          //  $inner_script .= "Item level=3 image=nav_folderclosed.gif text=";
01383 //          $inner_script .= "Item level=2 image=nav_folderclosed.gif text=";
01384          $inner_script .= "$idx_struct{$entry}{display}";
01385          $inner_script .= "\r\n";
01386          for ($i=0; $i <= $// {$idx_struct{$entry}{url}}; $i++) {
01387             //  $inner_script .= "Item level=4 image=nav_doc.gif url=";
01388             $inner_script .= "Item level=3 image=nav_doc.gif url=";
01389             //  $inner_script .= "$idx_struct{$entry}{url}[$i]";
01390             ($before, $piece, $after) = &globe::get_tag_chunk( $idx_struct{$entry}{url}[$i],
01391                                  "href[\s]*\=[\s]*\"", 
01392                                  "\"", 
01393                                  $very_critical);
01394 
01395             if ($piece) {
01396                $inner_script .= "$piece";
01397                $inner_script .= ",basefrm ";
01398             }
01399             $inner_script .= "\r\n";
01400          }
01401       } else {
01402          // ####
01403          //  Only has one destination to worry about
01404          // ####
01405          //  $inner_script .= "Item level=3 image=nav_doc.gif url=";
01406          $inner_script .= "Item level=2 image=nav_doc.gif url=";
01407          //  $inner_script .= "$idx_struct{$entry}{url}[0]";
01408          ($before, $piece, $after) = &globe::get_tag_chunk( $idx_struct{$entry}{url}[0],
01409                                  "href[\s]*\=[\s]*\"", 
01410                                  "\"", 
01411                                  $very_critical);
01412          if ($piece) {
01413              $inner_script .= "$piece";
01414              $inner_script .= ",basefrm ";
01415          }
01416          $inner_script .= " text=";
01417          $inner_script .= "$idx_struct{$entry}{display}";
01418          $inner_script .= "\r\n";
01419       }
01420       PURGE_SUBENTRY_SCRIPT: foreach $subentry (sort keys %{$idx_struct{$entry}{sub}}){
01421          if ((0) && (&ignore_item ($idx_struct{$entry}{sub}{$subentry}{display}))) {
01422             next PURGE_SUBENTRY_SCRIPT;
01423          }
01424          if ((0) && ($// {$idx_struct{$entry}{sub}{$subentry}{url}} > 1)) {
01425             // ####
01426             //  Has multiple destinations to worry about
01427             // ####
01428 //             //  $inner_script .= "Item level=4 image=nav_folderclosed.gif text=";
01429 //             $inner_script .= "Item level=3 image=nav_folderclosed.gif text=";
01430             $inner_script .= "$idx_struct{$entry}{sub}{$subentry}{display}";
01431             $inner_script .= "\r\n";
01432             for ($i=0; $i <= $// {$idx_struct{$entry}{sub}{$subentry}{url}}; $i++) {
01433                //  $inner_script .= "Item level=5 image=nav_doc.gif url=";
01434                $inner_script .= "Item level=4 image=nav_doc.gif url=";
01435                //  $inner_script .= "$idx_struct{$entry}{sub}{$subentry}{url}[$i]";
01436                ($before, $piece, $after) = &globe::get_tag_chunk( $idx_struct{$entry}{sub}{$subentry}{url}[$i],
01437                                  "href[\s]*\=[\s]*\"", 
01438                                  "\"", 
01439                                  $very_critical);
01440 
01441                if ($piece) {
01442                   $inner_script .= "$piece";
01443                   $inner_script .= ",basefrm ";
01444                }
01445                $inner_script .= "\r\n";
01446             }
01447          } else {
01448             // ####
01449             //  Only has one destination to worry about
01450             // ####
01451             //  $inner_script .= "Item level=4 image=nav_doc.gif url=";
01452             $inner_script .= "Item level=3 image=nav_doc.gif url=";
01453             //  $inner_script .= "$idx_struct{$entry}{sub}{$subentry}{url}[0]";
01454             ($before, $piece, $after) = &globe::get_tag_chunk( $idx_struct{$entry}{sub}{$subentry}{url}[0],
01455                                  "href[\s]*\=[\s]*\"", 
01456                                  "\"", 
01457                                  $very_critical);
01458 
01459             if ($piece) {
01460                $inner_script .= "$piece";
01461                $inner_script .= ",basefrm ";
01462             }
01463              
01464             $inner_script .= " text=";
01465             $inner_script .= "$idx_struct{$entry}{sub}{$subentry}{display}";
01466             $inner_script .= "\r\n";
01467          }
01468       } //  PURGE_SUBENTRY_SCRIPT:
01469    } //  PURGE_ENTRY_SCRIPT for each entry
01470 
01471 //    print (OUT_SCRIPT $inner_script);
01472 //    close (OUT_SCRIPT); 
01473 
01474 } //  create_index_script
01475 
01476 
01477 //#############################################################################
01478 /** @fn using_indexer
01479  ** @brief What to do when no arguments are given.
01480  ** @param None
01481  ** @return None
01482  ** 
01483  ** @lim None
01484  ** @ingroup tp_idx
01485  **/
01486 // #############################################################################
01487 int using_indexer  ( ) {
01488 //    print "\nvoyant_indexer.pl creates index information to be displayed in the navigation\n";
01489 //    print "pane. It assumes all input index files have been copied into the\n";
01490 //    print "input directory are are named uniquely. The index files\n";
01491 //    print "were generated by voyant_nav.pl when processing Doxygen or Mif2Go output.\n";
01492 //    print "The index files are temporary files.\n";
01493 //    print "\n-- All index files must begin with \"index_\".\n";
01494 //    print "-- All files must reside in the input directory.\n\n";
01495 //    print "This takes three arguments:\n";
01496 //    print "[1] The directory (with slash \) of where to find the raw index files.\n";
01497 //    print "[2] The path & name of the HTML file to use as the master template for the individual index files.\n";
01498 //    print "   It must have sections for voyant_header, voyant_structure,\n";
01499 //    print "   and voyant_footer.\n";
01500 //    print "   A section has a begin and end, such as:\n";
01501 //    print "     <!-- begin voy_structure -->\n";
01502 //    print "     <!-- end voy_structure -->\n";
01503 //    print "[3] The path & name of the file containing words to ignore in word-chunking.\n";
01504 //    print "\nThe output are files begin with \"m_idx_\".\n";
01505 //    print "[4] If you don't want word-chunking, enter \"no_chunk\".\n";
01506 //    print "The m_idx output is displayed in a treefrm while the content it controls\ndisplays in the basefrm.\n";
01507 //    print "\nTerminating voyant_indexer.pl without doing anything.\n";
01508    return;
01509 } //  using_indexer
01510 
01511 
01512 
01513 
01514 //#############################################################################
01515 /** @fn int END
01516  ** @brief Code to execute when first entered.
01517  **
01518  ** @param None. 
01519  **
01520  ** @return None.
01521  **
01522  ** @lim None
01523  ** @ingroup tp_nav
01524  **/
01525 // #############################################################################
01526 int END  ( ) {
01527    // undef ($_index_file_list);     //  "_index_file_list";
01528    // undef ($in_file);     //  "";
01529    // undef ($f_type);     //  "index_";
01530 
01531 
01532    
01533    // #############################################################################
01534    // # Memory clean-up.
01535    // #############################################################################
01536    &globe::memory_clean_up();
01537 
01538 //    print "\n============  Finished voyant_indexer.pl ==================================\n";
01539 } //  END
01540 
01541 
01542 
01543 
01544 
01545    


 "Perl Program Reference" 
 < Previous 
 Next > 


Open-Source tools compliments of Voyant Technologies, Inc. and Glenn C. Maxey.
01/13/2003

TP Tools v2-00-0a

# tpt-perl-hcr-02