|
|
|
|
|
00001 // This file has been modified on-the-fly with an input filter 00002 // to change it from Perl syntax to C++ strictly for the purposes 00003 // of faking out Doxygen. Modifications include: 00004 00005 // - changing local() definitions to C++ #define statements. 00006 // - commenting out undef statements. 00007 // - changing $globe'... variable names to $globe_... 00008 // - changing sub statements to look like C++ functions. 00009 // - changing # comments to C++ comments. 00010 // - ... 00011 00012 // If you see other strangeness in the HTML version of the Perl file, 00013 // it comes from getting it to look more C++ like. 00014 00015 00016 // #!/usr/#define/bin/perl 00017 //############################################################################# 00018 //# 00019 //# $Id: voyant_indexer.pl,v 1.29 2002/12/24 15:05:54 gmaxe Exp $ 00020 //# 00021 /** @file 00022 ** @brief Creates a comprehensive index from temporary index files that 00023 ** were generated from another program. 00024 ** 00025 ** @param $globe::path location to find the index files. 00026 ** The name should be terminated with a slash (\). 00027 ** Although the directory_name is the first command line parameter, 00028 ** the true input are the index files contained within that directory. 00029 ** Index files must begin with "index_". 00030 ** 00031 ** @param $globe::master_tree_file [optional path and] filename for the HTML file 00032 ** to use as a template for all index files to be generated. 00033 ** Ideally, this should contain navigation tools to get between the 00034 ** generated index files [a-z] and other parts of the system, such as 00035 ** the table of contents. This file has several specially flagged 00036 ** HTML comment sections that are required. 00037 ** 00038 ** @param $globe::ignore_terms_file [optional path and] filename for a text file that 00039 ** contains words to ignore in the word-chunking process. 00040 ** 00041 ** @return This creates a series of HTML files that begin with "m_idx_". 00042 ** Generally, what follows in the name is the first character of the 00043 ** first word within the file. All index entries beginning with that character 00044 ** are in that file. These files are created in $globe::path. 00045 ** 00046 ** <ol><li>The $globe::master_tree_file is viewed first to make sure that it has all 00047 ** required information.</li> 00048 ** 00049 ** <li>This program issues a system call to create a list of candidate input 00050 ** index_ files in the $globe::path directory.</li> 00051 ** 00052 ** <li> Then it steps through each of 00053 ** those files and concatentates their input into $globe::master_raw.</li> 00054 ** 00055 ** <li> $globe::master_raw is turned into a hash table $idx_struct. The 00056 ** key into the hash table is the index entry. What gets stored is both the 00057 ** display text and an array of URLs.</li><ul> 00058 ** 00059 ** <li>$entry = compacted, clean, lower-case key from display text for sorting. </li> 00060 ** <li>$subentry = compacted, clean, lower-case key from display text for sorting.</li> 00061 ** <li>$idx_struct{$entry}{display} = display text.</li> 00062 ** <li>$idx_struct{$entry}{url}[] = array of URL's for the $entry.</li> 00063 ** <li>$idx_struct{$entry}{sub}{$subentry}{display} = display text 00064 ** for the $entry's $subentry.</li> 00065 ** <li>$idx_struct{$entry}{sub}{$subentry}{url}[] = array of URL's 00066 ** for the $entry's $subentry.</li></ul> 00067 ** 00068 ** 00069 ** <li>Word-chunking is performed on each element in the $globe::master_index. 00070 ** Natural boundaries (spaces, dashes, underscores, changes in case in the middle of a word) 00071 ** are used to create additional two-level index entries that contain the word-chunk 00072 ** followed by where it came from. The $globe::ignore_terms_file is used to 00073 ** eliminate unuseful word-chunked entries (such as "the", "a", "to", etc.) 00074 ** The additional useful entries are appended to 00075 ** the contents of the hash $globe::master_raw using the $globe::division_mult_entry 00076 ** separator only if the new entry is not a duplicate. 00077 ** Word-chunking is particular useful for API documentation so that the reader 00078 ** does not have to remember the exact name of a code item in order to find 00079 ** it. An initial index token of "api_GetMovie-list" could be found not just under 00080 ** its name in the "A's", but under "get", "movie", and "list".</li> 00081 ** 00082 ** <li>The expanded list is sorted.</li> 00083 ** 00084 ** <li>Exact duplicates in terms of index entry and URL are removed from the 00085 ** $globe::master_raw hash.</li> 00086 ** 00087 ** <li>The sorted list is output to a series of m_idx_ files. 00088 ** New m_idx_ files are created whenever a new character starts a word in the list.</li> 00089 ** 00090 ** <li>When an index entry is referenced by multiple URLs, the additional references 00091 ** appear in the output as small document icons next to the first reference in 00092 ** plain text.</li> 00093 ** </ol> 00094 ** 00095 ** More information about the true input "index_" files. 00096 ** These files are an unsorted running list of index tokens that were 00097 ** extracted from the HTML files in a directory. The tokens have two 00098 ** parts: the index entry and its URL. The separator is :,: (($globe::word_url_boundary). 00099 ** Additionally, the index entry can have two levels. In such 00100 ** cases, the seperator is :;: ($globe::word_c_boundary). 00101 ** Finally, a given index entry can represent multiple references or URLs. 00102 ** In such cases, the multiple entries are separated by :;;;: ($globe::division_mult_entry) 00103 ** 00104 ** If the separators are changed in the generator program (voyant_nav.pl), 00105 ** they need to be changed here, too. The variable names in both programs are the same. 00106 ** The separators themselves were chosen because they were deemed never to occur 00107 ** in an index entry or URL and aren't Perl special characters. 00108 ** 00109 ** More information about the master_file. Aside from serving as a template for 00110 ** all generated index files, this file chunks information using specially tagged 00111 ** HTML comments in order to simplify locating where generated information is to 00112 ** be placed. In addition, some tags contain information critical to the proper operation 00113 ** of the indexer. 00114 ** 00115 ** The critical tags given by Perl variable and HTML syntax are: 00116 ** <ul><li>($globe::m_define{order}[0]) = "< !-- begin voy_order --"</li> 00117 ** <li>($globe::m_define{order}[1]) = "!-- end voy_order -->"</li> 00118 ** <li>($globe::m_define{structure}[0]) = "< !-- begin voy_structure -->"</li> 00119 ** <li>($globe::m_define{structure}[1]) = "< !-- end voy_structure -->"</li> 00120 ** <li>($globe::m_define{topic_list}[0]) = "< !-- begin voy_topic_list -->"</li> 00121 ** <li>($globe::m_define{topic_list}[1]) = "< !-- end voy_topic_list -->"</li></ul> 00122 ** 00123 ** The HTML syntax can be changed. However, the voy order sections are identical 00124 ** for various programs and their master_files which facilitates propogating information. 00125 ** 00126 ** @lim This does not support index entries that might be or have Perl special 00127 ** characters. These are often eliminated early in the process. 00128 ** 00129 ** The input index_ files cannot have ".htm" as part of the name. 00130 ** This assumes that input information was of the proper format with an 00131 ** index entry, $globe::word_url_boundary, and URL. If any of the input index_ 00132 ** files did not have this, this can cause problems. 00133 ** 00134 ** Rather than passing in variables which can create copies in memory, many items 00135 ** use global variables that are defined in globe.pm. When a 00136 ** variable is known to be global, its name begins with "$globe::". The intent is 00137 ** to facilitate maintenance by having all user-defined tags in one place outside 00138 ** of the program. 00139 ** 00140 ** Many debug statements are left in the code, although commented out or programmed 00141 ** out with if (0){...}. On occassion, a statement is copied, commented out, and then 00142 ** the copy modified in order to keep old techniques around while verifying new techniques. 00143 ** Old techniques were not always purged once the new one worked. 00144 ** 00145 ** @ingroup tp_tools tp_idx 00146 ** 00147 ** @author Glenn C. Maxey 00148 **/ 00149 // # 00150 //# 2002 Created by Voyant Technologies, Inc., Westminster, Colorado, USA. 00151 //# 00152 //# Permission to use, copy, modify, and distribute this software and its 00153 //# documentation under the terms of the GNU General Public License is hereby 00154 //# granted. No representations are made about the suitability of this software 00155 //# for any purpose. It is provided "as is" without express or implied warranty. 00156 //# See the GNU General Public License (http://www.gnu.org/copyleft/gpl.html) 00157 //# for more details. 00158 //# 00159 //# Documents produced by this script are derivative works derived from the 00160 //# input used in their production; they are not affected by this license. 00161 //# 00162 //# Revision Information: 00163 //# 00164 //# $Log: voyant_indexer.pl,v $ 00165 //# Revision 1.29 2002/12/24 15:05:54 gmaxe 00166 //# New name and tweaks to add full paths. 00167 //# 00168 //# Revision 1.28 2002/11/20 15:23:05 gmaxe 00169 // //# Added exit codes so that wrapper scripts can catch errors properly. 00170 //# 00171 //# Revision 1.27 2002/07/26 18:56:11 gmaxe 00172 //# Got rid of old definitions and migrated new structures into all; 00173 //# enhanced xhelp output file names and format; now everything is 00174 //# alphebetized. 00175 //# 00176 //# Revision 1.26 2002/07/22 21:51:31 gmaxe 00177 //# Created separate configuration file; allow for two inputs to 00178 //# the voyant_latex.pl. commented out old variables in globe.pm. 00179 //# 00180 //# Revision 1.25 2002/07/09 21:37:27 gmaxe 00181 //# Changed the data structure to make maintenance, input, and output easier. 00182 //# Improved the efficiency of the word chunking, html output, script output, 00183 //# and duplicate entries. It handles multiple URLs for a given index entry 00184 //# better by putting it into an array inside the hash. However, the 00185 //# elimination of ignore terms is still expensive. 00186 //# 00187 //# Revision 1.24 2002/07/03 15:53:58 gmaxe 00188 //# Commented out ignore_terms loops within expensive word-chunking loops. 00189 //# Ignore terms are dealt with later. Reduces time by over 50%. 00190 //# 00191 //# 00192 //############################################################################# 00193 00194 00195 //############################################################################# 00196 /** @fn int BEGIN 00197 ** @brief Code to execute when first entered. 00198 ** 00199 ** @param None. 00200 ** 00201 ** @return None. 00202 ** 00203 ** @lim None 00204 ** @ingroup tp_nav 00205 **/ 00206 // ############################################################################# 00207 int BEGIN ( ) { 00208 // print "\n============ Starting voyant_indexer.pl ==================================\n"; 00209 00210 $_index_file_list = "_index_file_list"; 00211 $in_file = ""; 00212 $f_type = "index_"; 00213 $_arg_inc = 0; 00214 00215 $no_scope_file = 0; 00216 $scope_pm = "globe.pm"; // first time through; other scope stuff passed in. 00217 00218 00219 push (@INC, `pwd`); 00220 push (@INC, '/rtfm/techpubs/perl'); 00221 if (0){ 00222 // print (@INC, "\n"); 00223 } 00224 // #### 00225 // All global variables are defined in the following file 00226 // #### 00227 unless (open ( IN_LIST, $scope_pm)) { 00228 $scope_pm = "/rtfm/techpubs/perl/$scope_pm"; 00229 unless (open ( IN_LIST, $scope_pm)) { 00230 push (@file_errors, "Cannot open file \"$scope_pm\""); 00231 $no_scope_file++; 00232 } 00233 } 00234 // close (IN_LIST); 00235 push (@INC, $scope_pm); 00236 if (!@file_errors) { 00237 // #### 00238 // All global variables are defined in the following file 00239 // #### 00240 require $scope_pm; 00241 00242 if (&globe::declare_variables()) { 00243 // print "Variables initialized from $scope_pm.\n"; 00244 } else { 00245 push (@file_errors, "Could not initialize variables from $scope_pm.\n"); 00246 } 00247 } // if not @file_errors 00248 00249 00250 // Get path if there is one. 00251 if (@ARGV > $_arg_inc) { 00252 $globe::path = @ARGV[$_arg_inc]; 00253 if ($globe::path =~ /\//) { 00254 // print "The path specified is $globe::path\n"; 00255 00256 @path_chunks = split ( /\//, $globe::path); 00257 if (@path_chunks < 2) { 00258 $globe::rel_path_to_start_point = "./"; 00259 $globe::pdf_name_from_dir = "book_cit_dtoss.pdf"; 00260 } else { 00261 $globe::rel_path_to_start_point = "../"; 00262 $globe::pdf_name_from_dir = "$path_chunks[@path_chunks - 1].pdf"; 00263 } 00264 } else { 00265 push (@file_errors, "The input argument \"$globe::path\" requires a forward slash (\/) at the end.\n"); 00266 } 00267 } else { 00268 push (@file_errors, "ERROR: root path is required."); 00269 } // if 1 or more arguments 00270 $_arg_inc++; 00271 00272 if (@ARGV > $_arg_inc) { 00273 $globe::master_tree_file = @ARGV[$_arg_inc]; 00274 // print "The master file is $globe::master_tree_file\n"; 00275 unless (open ( IN_MASTER, "$globe::master_tree_file")) { 00276 push (@file_errors, "Cannot open file \"$globe::master_tree_file\"."); 00277 } 00278 } // if 2 or more arguments 00279 $_arg_inc++; 00280 00281 if (@ARGV > $_arg_inc) { 00282 $globe::ignore_terms_file = @ARGV[$_arg_inc]; 00283 // print "The ignore terms file is $globe::ignore_terms_file\n"; 00284 if (open ( IN_IGNORE, "$globe::ignore_terms_file")) { 00285 while (<IN_IGNORE>){ 00286 $temp = $_; 00287 $temp =~ s/\n//; 00288 push (@globe::ignore_terms, $temp); 00289 } 00290 if (0) { 00291 // print "Terms to ignore\n"; 00292 foreach $term (@globe::ignore_terms){ 00293 // print "$term\n"; 00294 } 00295 } 00296 } else { 00297 push (@file_errors, "Cannot open file \"$globe::master_tree_file\"."); 00298 } 00299 } // if 3 or more arguments 00300 $_arg_inc++; 00301 00302 if (@ARGV > $_arg_inc) { 00303 $globe::word_chunk = 1; 00304 $globe::word_chunk = @ARGV[$_arg_inc]; 00305 if ($globe::word_chunk =~ /^no/i){ 00306 $globe::word_chunk = 0; 00307 } else { 00308 $globe::word_chunk = 1; 00309 } 00310 } else { 00311 $globe::word_chunk = 1; 00312 } // if 3 or more arguments 00313 // print "The flag for word-chunking is $globe::word_chunk; 1, do; 0, don't.\n"; 00314 $_arg_inc++; 00315 00316 00317 } 00318 00319 //############################################################################# 00320 /** @fn int main 00321 ** @brief The main program. 00322 ** 00323 ** @param None. 00324 ** 00325 ** @return None. 00326 ** 00327 ** @lim None 00328 ** @ingroup tp_nav 00329 **/ 00330 // ############################################################################# 00331 // sub main { 00332 { 00333 // ############################################################################# 00334 // # Program start 00335 // ############################################################################# 00336 00337 if (0){ 00338 // print "=== Definitions 3 \n"; 00339 // exit(1); 00340 } 00341 00342 if (@file_errors) { 00343 // Makes no sense to go on if input parameters are off. 00344 // print "\n============ Summary of errors =================================a\n"; 00345 for ($i=0; $i<@file_errors; $i++){ 00346 // print "$i = $file_errors[$i]\n"; 00347 } 00348 &using_indexer(); 00349 // exit(1); 00350 } 00351 00352 00353 // Get the master definitions 00354 while (<IN_MASTER>){ // entire master file into memory. 00355 $globe::master_index_html .= $_; 00356 } 00357 00358 00359 // #### 00360 // IMPORTANT!!! 00361 // Master ordering not used 00362 // #### 00363 &globe::get_master_nav_info(); 00364 00365 $_index_file_list = "$globe::path$_index_file_list"; 00366 00367 if (system ("ls $globe::path$f_type* > $_index_file_list")) { 00368 // print "There is no $globe::path directory. Nothing done.\n"; 00369 return(1); 00370 } 00371 00372 // ######### 00373 // Create a master index list 00374 // ######### 00375 &process_index_files(); 00376 // print "Total entries incoming = $globe::total_entry_in\n"; 00377 &sort_raw_master(); 00378 &create_index_files(); 00379 &create_index_script(); 00380 00381 if (@file_errors) { 00382 // print "\n============ Summary of errors ==================================\n"; 00383 for ($i=0; $i<@file_errors; $i++){ 00384 // print "$i = $file_errors[$i]\n"; 00385 } 00386 } 00387 00388 00389 // ############################################################################# 00390 // # End of all the work 00391 // ############################################################################# 00392 // exit(0); 00393 } 00394 00395 00396 //############################################################################# 00397 /** @fn process_index_files 00398 ** @brief Gets a list of index_ files and hands them off one-at-a-time to 00399 ** &get_raw_master. 00400 ** 00401 ** @param $_index_file_list The name of the temporary file that contains 00402 ** a list of potential index_ files. 00403 ** 00404 ** @return Nothing. 00405 ** 00406 ** A system call was made prior to running this routine which generated 00407 ** the list of files in $_index_file_list. This routine 00408 ** 00409 ** <ol><li>Opens that temporary file.</li> 00410 ** <li> Picks off each file from the list.</li> 00411 ** <li>Passes the index_ file to get_raw_master.</li></ol> 00412 ** 00413 ** @lim If the index_ file from the list has ".htm" as part of its name, 00414 ** it is ignored. 00415 ** 00416 ** @ingroup tp_idx 00417 **/ 00418 // ############################################################################# 00419 int process_index_files ( ) { 00420 #define $in_file 00421 // undef (@path_chunk); 00422 00423 // print "Entering process_index_files\n"; 00424 00425 unless (open ( IN_LIST, "$_index_file_list")) { 00426 // print "Cannot open file \"$_index_file_list\"\n"; 00427 // exit(1); 00428 } 00429 TOKEN_FILE: while (<IN_LIST>){ // read a line from file into $_ 00430 $in_file = $_; 00431 $in_file =~ s/\n//; 00432 00433 if ($in_file =~ /\.htm/){ 00434 // // print "gcm1 $in_file\n"; 00435 next TOKEN_FILE; 00436 } else { 00437 // print "$in_file\n"; 00438 &get_raw_master($in_file); 00439 } 00440 $globe::relative_path = $in_file; 00441 $globe::relative_path =~ s/index_//; 00442 $globe::relative_path =~ s/\.html//; 00443 $globe::relative_path =~ s/\.htm//; 00444 @path_chunk = split ( /\//, $globe::relative_path, 2); 00445 if (@path_chunk =~ /2/) { 00446 $globe::relative_path = "$path_chunk[1]/"; 00447 } else { 00448 $globe::relative_path .= "\./"; 00449 } 00450 // // print "relative path = $globe::relative_path\n"; 00451 } // while IN_LIST 00452 // close (IN_LIST); 00453 00454 // // print "Master list:\n$globe::master_raw\n"; 00455 } // process_index_files 00456 00457 00458 00459 //############################################################################# 00460 /** @fn get_raw_master 00461 ** @brief Opens the input file and appends its information to $globe::master_raw 00462 ** before closing and returning. 00463 ** 00464 ** @param $ind_file is the $in_file from process_index_files. 00465 ** @return $globe::master_raw has information from the input file appended to 00466 ** it. 00467 ** 00468 ** @lim Does no checking on the information from within the input file 00469 ** to verify that it is correct. Garbage in and this may bomb completely. 00470 ** @ingroup tp_idx 00471 **/ 00472 // ############################################################################# 00473 int get_raw_master ( ) { 00474 #define $ind_file $_[0] 00475 // // print "Entering get_raw_master\n"; 00476 00477 unless (open ( IND_FILE, "$ind_file")) { 00478 // print "Cannot open file \"$ind_file\"\n"; 00479 // exit(1); 00480 } 00481 while (<IND_FILE>){ // read a line from file into $_ 00482 $globe::master_raw .= $_; 00483 $globe::total_entry_in++; 00484 } // while IND_FILE 00485 // close (IND_FILE); 00486 } // get_raw_master 00487 00488 00489 //############################################################################# 00490 /** @fn sort_raw_master 00491 ** @brief Creates a $idx_struct hash table from the raw index material. 00492 ** 00493 ** @param $globe::master_raw is used to get its information, although not 00494 ** explicitly passed in. To free up memory, this gets // undefined. 00495 ** 00496 ** @retval $idx_struct is a hash that uses the index entry as the key 00497 ** and whose contents are the title plus URL. 00498 ** @retval $globe::master_raw is cleared after its information is put into the 00499 ** hash. 00500 ** 00501 ** Parses the raw index information into a key (the index entry) and its 00502 ** associated title plus URL. The title is the same as the key. It is retained 00503 ** in the contents of the hash to help distinguish between entries and to 00504 ** help eliminate duplicates. 00505 ** 00506 ** This routine calls the word_chunking routine to create additional entries from 00507 ** the items. 00508 ** 00509 ** Uses @globe::ignore_terms to eliminate entries. 00510 ** 00511 ** When finished, $idx_struct is a big list of index entries. 00512 ** 00513 ** @lim Assumes that input information was of the proper format with an 00514 ** index entry, $globe::word_url_boundary, and URL. If any of the input index_ 00515 ** files did not have this, this can cause problems. 00516 ** 00517 ** Eliminates special characters and can purge entries that have Perl 00518 ** special characters. 00519 ** 00520 ** Word-chunking should have been made an input option. In order to turn off, 00521 ** the code has to be modified: 00522 ** 00523 ** $entry = compacted and clean display text for sorting. 00524 ** $subentry = compacted and clean display text for sorting. 00525 ** $idx_struct{$entry}{display} = display text 00526 ** $idx_struct{$entry}{url}[] = array of URL's for the $entry. 00527 ** $idx_struct{$entry}{sub}{$subentry}{display} = display text 00528 ** for the $entry's $subentry. 00529 ** $idx_struct{$entry}{sub}{$subentry}{url}[] = array of URL's 00530 ** for the $entry's $subentry. 00531 ** 00532 ** @ingroup tp_idx 00533 **/ 00534 // ############################################################################# 00535 int sort_raw_master ( ) { 00536 // print "Entering sort_raw_master\n"; 00537 00538 @unchunked_list = split (/\n/, $globe::master_raw); 00539 TOKEN: for ($i=0; $i<=$// unchunked_list; $i++){ 00540 if ($unchunked_list[$i] =~ /$globe::word_url_boundary/){ 00541 $unchunked_list[$i] =~ s/ / /g; // some title have 2 spaces 00542 $unchunked_list[$i] =~ s/\r//g; 00543 $unchunked_list[$i] =~ s/\n//g; 00544 @entry_chunk = split (/$globe::word_url_boundary/, $unchunked_list[$i], 2); 00545 $entry_chunk[0] =~ s/\,[\s]*/$globe::word_c_boundary/; 00546 if ($entry_chunk[0] =~ $globe::word_c_boundary){ 00547 // #### 00548 // Indicates that FM had multi-level index token 00549 // This will not have any word chunking done 00550 // #### 00551 @title_chunk = split (/$globe::word_c_boundary/, $entry_chunk[0], 2); 00552 $entry1 = &trash_special_characters ($title_chunk[0]); 00553 $entry1 =~ s/ //g; // get rid of all spaces 00554 $entry1 =~ tr/A-Z/a-z/; 00555 if ((0) && (&ignore_item($entry_chunk[0]))) { 00556 // #### 00557 // This is turned off because it is assumed that if it came 00558 // from FM, then the token is valid 00559 // #### 00560 next TOKEN; 00561 } 00562 &add_to_index_struct ($entry1, $title_chunk[0], $entry_chunk[1]); 00563 00564 $entry2 = &trash_special_characters ($title_chunk[1]); 00565 $entry2 =~ s/ //g; // get rid of all spaces 00566 $entry2 =~ tr/A-Z/a-z/; 00567 if (!(&add_to_lev2_index_struct ($entry1, $entry2, $title_chunk[1], $entry_chunk[1]))) { 00568 // print "Tried adding a second level without the primary level.\n"; 00569 } 00570 } else { // ($entry_chunk[0] =~ $globe::word_c_boundary) 00571 // #### 00572 // This will not have any word chunking done 00573 // #### 00574 $entry1 = &trash_special_characters ($entry_chunk[0]); 00575 $entry1 =~ s/ //g; // get rid of all spaces 00576 $entry1 =~ tr/A-Z/a-z/; 00577 if ((0) && (&ignore_item($entry_chunk[0]))) { 00578 // #### 00579 // This means that the entry is not one of interest 00580 // #### 00581 next TOKEN; 00582 } 00583 &add_to_index_struct ($entry1, $entry_chunk[0], $entry_chunk[1]); 00584 // #### 00585 // the following does word chunking 00586 // #### 00587 if ((1) && ($globe::word_chunk)) { 00588 &word_chunking( $entry_chunk[0], $entry_chunk[1]); 00589 } // globe::word_chunk 00590 } // ($entry_chunk[0] =~ $globe::word_c_boundary) 00591 } else { // TOKEN 00592 // token does not have URL; don't process 00593 next TOKEN; 00594 } // TOKEN 00595 00596 } // $i unchunked_list 00597 00598 $_cnt=0; 00599 if (0){ // debug loop 00600 foreach $entry (sort keys %idx_struct){ 00601 $_cnt++; 00602 // print "$_cnt \"$entry\" = \"$idx_struct{$entry}{display}\"\n"; 00603 if (($_cnt < 12) && (1)){ 00604 foreach $subentry (sort keys %{$idx_struct{$entry}{sub}}) { 00605 // print "== $_cnt \"$subentry\" = \"$idx_struct{$entry}{sub}{$subentry}{display}\" \n"; 00606 } 00607 } 00608 if ((0)){ 00609 foreach $subentry (sort keys %{$idx_struct{$entry}{sub}}) { 00610 // print "== $_cnt \"$subentry\" = \"$idx_struct{$entry}{sub}{$subentry}{display}\" \n"; 00611 } 00612 } 00613 if (($_cnt > 12) && (1)){ 00614 // exit(1); 00615 } 00616 } 00617 // print "Stopping...\n"; 00618 // exit(1); 00619 } // debug loop 00620 00621 } // sort_raw_master 00622 00623 00624 00625 00626 //############################################################################# 00627 /** @fn ignore_item 00628 ** @brief Compares the input term to a list of ignore terms. 00629 ** 00630 ** @param $term_to_test Term to test. 00631 ** 00632 ** @return Returns 1 if the term has matched an ignore term; 00633 ** Otherwise it returns 0. 00634 ** 00635 ** It tests for fragments twice, so that "get" doesn't match on 00636 ** "together". 00637 ** 00638 ** @lim If the term has a perl special character in it, it is 00639 ** sent back immediately and left alone. 00640 ** 00641 ** @ingroup tp_idx 00642 **/ 00643 // ############################################################################# 00644 int ignore_item ( ) { 00645 $term_to_test = $_[0]; 00646 00647 // #### 00648 // Don't mess with entries that have perl special characters, like +, [ 00649 // Otherwise, it will fail in the testing below; not important enough to 00650 // purge from the output; probably won't be in the ignore file anyway. 00651 // #### 00652 if (($term_to_test =~ /\?/) 00653 || ($term_to_test =~ /\+/) 00654 || ($term_to_test =~ /\*/) 00655 || ($term_to_test =~ /\/\./) 00656 || ($term_to_test =~ /\(/) 00657 || ($term_to_test =~ /\)/) 00658 || ($term_to_test =~ /\[/) 00659 || ($term_to_test =~ /\$/) 00660 || ($term_to_test =~ /\// /)) 00661 { 00662 return (0); 00663 } 00664 00665 // #### 00666 // Remove index items that are in the ignore file 00667 // #### 00668 foreach $r_term (@globe::ignore_terms){ 00669 $r2_term = $r_term; 00670 $r2_term =~ s/ //g; 00671 // #### 00672 if (0){ // debug 00673 $stop = 0; 00674 if ($term_to_test =~ /\+/){ 00675 // print "gcm0 $term_to_test\n"; 00676 $stop = 1; 00677 } 00678 if ($r_term =~ /\+/){ 00679 // print "gcm1 $r_term\n"; 00680 $stop = 1; 00681 } 00682 if ($r2_term =~ /\+/){ 00683 // print "gcm2 $r2_term\n"; 00684 $stop = 1; 00685 } 00686 if ($stop){ 00687 // exit(1); 00688 } 00689 } // debug 00690 // #### 00691 // #### 00692 // Need to make sure that "get" doesn't match on "together". 00693 // #### 00694 if ((($term_to_test =~ /$r_term/i) || ($term_to_test =~ /$r2_term/i)) 00695 && (($r_term =~ /$term_to_test/i) || ($r2_term =~ /$entry/i))) { 00696 if (0){ 00697 // print "r_term = $r_term; entry = $term_to_test\n"; 00698 } 00699 return (1); 00700 } 00701 } 00702 // default fall through; 0 means good. 00703 return (0); 00704 } // ignore_item 00705 00706 00707 //############################################################################# 00708 /** @fn add_to_index_struct 00709 ** @brief Adds an element to the complicated hash table. 00710 ** 00711 ** @param $in_entry the compacted entry into the hash 00712 ** @param $in_title the display text to show for the item 00713 ** @param $in_url the anchor to add to the URL array. 00714 ** If you send in $globe::word_c_boundary for the $in_url, 00715 ** then it won't add the URL to the list. 00716 ** 00717 ** @retval Always returns 1. 00718 ** 00719 ** 00720 ** $entry = compacted and clean display text for sorting. 00721 ** $subentry = compacted and clean display text for sorting. 00722 ** $idx_struct{$entry}{display} = display text 00723 ** $idx_struct{$entry}{url}[] = array of URL's for the $entry. 00724 ** $idx_struct{$entry}{sub}{$subentry}{display} = display text 00725 ** for the $entry's $subentry. 00726 ** $idx_struct{$entry}{sub}{$subentry}{url}[] = array of URL's 00727 ** for the $entry's $subentry. 00728 ** 00729 ** @ingroup tp_idx 00730 **/ 00731 // ############################################################################# 00732 int add_to_index_struct ( ) { 00733 $in_entry = $_[0]; 00734 $in_title = $_[1]; 00735 $in_url = $_[2]; 00736 00737 $donothave = 1; 00738 if ($in_url =~ /$globe::word_c_boundary/) { 00739 $donothave = 0; 00740 } 00741 if (exists ($idx_struct{$in_entry})) { 00742 // We already have display text; 00743 // need URL. 00744 // weed out duplicates from the beginning. 00745 for ($j = 0; $j <= $// {$idx_struct{$in_entry}{url}}; $j++) { 00746 if ($idx_struct{$in_entry}{url}[$j] =~ /$in_url/) { 00747 $donothave = 0; 00748 } 00749 } // for $j 00750 if ($donothave) { 00751 push (@{$idx_struct{$in_entry}{url}}, $in_url); 00752 } 00753 } else { 00754 // we don't have anything; need to add it. 00755 $idx_struct{$in_entry}{display} = $in_title; 00756 if ($donothave) { 00757 push (@{$idx_struct{$in_entry}{url}}, $in_url); 00758 } 00759 } 00760 return (1); 00761 } // add_to_index_struct 00762 00763 00764 //############################################################################# 00765 /** @fn add_to_lev2_index_struct 00766 ** @brief Adds an element to the second level of the complicated hash table. 00767 ** 00768 ** @param $in_entry the compacted entry into the hash. 00769 ** @param $in_sub the compacted subentry into the hash. 00770 ** @param $in_title the display text to show for the item. 00771 ** @param $in_url the anchor to add to the URL array. 00772 ** 00773 ** @retval Always returns 1. 00774 ** 00775 ** 00776 ** $entry = compacted and clean display text for sorting. 00777 ** $subentry = compacted and clean display text for sorting. 00778 ** $idx_struct{$entry}{display} = display text 00779 ** $idx_struct{$entry}{url}[] = array of URL's for the $entry. 00780 ** $idx_struct{$entry}{sub}{$subentry}{display} = display text 00781 ** for the $entry's $subentry. 00782 ** $idx_struct{$entry}{sub}{$subentry}{url}[] = array of URL's 00783 ** for the $entry's $subentry. 00784 ** 00785 ** @ingroup tp_idx 00786 **/ 00787 // ############################################################################# 00788 int add_to_lev2_index_struct ( ) { 00789 // assumes that $in_entry already exists 00790 $in_entry = $_[0]; 00791 $in_sub = $_[1]; 00792 $in_title = $_[2]; 00793 $in_url = $_[3]; 00794 00795 if (!(exists ($idx_struct{$in_entry}))) { 00796 return (0); 00797 } 00798 if (exists ($idx_struct{$in_entry}{sub}{$in_sub})) { 00799 // We already have display text; 00800 // need URL. 00801 // weed out duplicates from the beginning. 00802 $donothave = 1; 00803 for ($j = 0; $j <= $// {$idx_struct{$in_entry}{sub}{$in_sub}{url}}; $j++) { 00804 if ($idx_struct{$in_entry}{sub}{$in_sub}{url}[$j] =~ /$in_url/) { 00805 $donothave = 0; 00806 } 00807 } // for $j 00808 if ($donothave) { 00809 push (@{$idx_struct{$in_entry}{sub}{$in_sub}{url}}, $in_url); 00810 } 00811 } else { 00812 // we don't have anything; need to add it. 00813 $idx_struct{$in_entry}{sub}{$in_sub}{display} = $in_title; 00814 push (@{$idx_struct{$in_entry}{sub}{$in_sub}{url}}, $in_url); 00815 } 00816 return (1); 00817 } // add_to_lev2_index_struct 00818 00819 //############################################################################# 00820 /** @fn word_chunking 00821 ** @brief Performs word-chunking on the passed in entries that was extracted 00822 ** from the $globe::master_raw. 00823 ** 00824 ** @param $unproc_title the unprocessed title passed in $entry_chunk[0]. 00825 ** 00826 ** @param $assoc_t_data the associated title and data given by $entry_chunk[1]. 00827 ** 00828 ** @return Updated entries in the hash $globe::master_index. If a word-chunk 00829 ** already is available as a key into the hash, then this appends its information 00830 ** to the contents of the key using $globe::division_mult_entry. 00831 ** 00832 ** Word-chunking is performed on the $unproc_title. 00833 ** Natural boundaries (spaces, dashes, underscores, changes in case in the middle of a word) 00834 ** are used to create additional two-level index entries that contain the word-chunk 00835 ** followed by where it came from. 00836 ** 00837 ** The $globe::ignore_terms_file is used to 00838 ** eliminate unuseful word-chunked entries (such as "the", "a", "to", etc.) 00839 ** 00840 ** The additional useful entries are appended to 00841 ** the contents of the hash $globe::master_raw using the $globe::division_mult_entry 00842 ** separator only if the new entry is not a duplicate. 00843 ** 00844 ** Word-chunking is particular useful for API documentation so that the reader 00845 ** does not have to remember the exact name of a code item in order to find 00846 ** it. An initial index token of "api_GetMovie-list" could be found not just under 00847 ** its name in the "A's", but under "get", "movie", and "list". 00848 ** 00849 ** 00850 ** $entry = compacted and clean display text for sorting. 00851 ** $subentry = compacted and clean display text for sorting. 00852 ** $idx_struct{$entry}{display} = display text 00853 ** $idx_struct{$entry}{url}[] = array of URL's for the $entry. 00854 ** $idx_struct{$entry}{sub}{$subentry}{display} = display text 00855 ** for the $entry's $subentry. 00856 ** $idx_struct{$entry}{sub}{$subentry}{url}[] = array of URL's 00857 ** for the $entry's $subentry. 00858 ** 00859 ** @lim None. 00860 ** @ingroup tp_idx 00861 **/ 00862 // ############################################################################# 00863 int word_chunking ( ) { 00864 #define $unproc_title $_[0] 00865 #define $assoc_t_data $_[1] 00866 #define $capital 1 00867 #define $w_cnt 0 00868 // undef (@w_chunks); 00869 // undef (@word); 00870 #define $c_word 00871 #define $term 00872 00873 #define $proc_title &trash_special_characters($unproc_title) 00874 $proc_title =~ s/ //g; 00875 $proc_title =~ tr/A-Z/a-z/; 00876 00877 if (0) { 00878 // print "$unproc_title $assoc_t_data\n"; 00879 } 00880 00881 // #### 00882 // Return quickly if it is an additional "used by" cross-reference. 00883 // It'll already have too many word-chunk references. No problem. 00884 // #### 00885 if ($unproc_title =~ $globe::ack_used_by){ 00886 return (1); 00887 } 00888 00889 // #### 00890 // Use for word sorting: spaces, underscores, dashes, em-dashes, 00891 // slashes 00892 // #### 00893 @w_chunks = split ( /\s|_|-|\&\// 8212\;|\&\#151\;|\/|\\|\./, $unproc_title); 00894 00895 // #### 00896 // Additional word chunking based on capital letters 00897 // Create even more word chunks based on whether an individual word 00898 // has smaller sections in it based on capitalization. 00899 // For example, "fileIsDirectory" has "Is" and 00900 // "Directory" in it. 00901 // #### 00902 if ($capital) { // on/off for capital word-chunking 00903 NEXT_CC_WORD: foreach $c_word (@w_chunks){ 00904 // #### 00905 // Get rid of elements in word chunk that we don't want hanging around 00906 // More special characters introduced for operator++ 00907 // Don't bother word chunking them 00908 // #### 00909 $c_word = &trash_special_characters($c_word); 00910 // #### 00911 00912 // #### 00913 // Check for one or more capital letters within existing word chunk 00914 // #### 00915 if ($c_word =~ /[A-Z]+/) { 00916 // // print "==== $c_word ===\n"; 00917 // #### 00918 // Split the word chunk into capital letter chunks if it is in word chunk 00919 // #### 00920 @cap_chunk = split (/([A-Z]+)/, $c_word); 00921 if (@cap_chunk > 1) { 00922 // #### 00923 // Go through all cap chunks; ignore the first element 00924 // because this will already be handled as part of a word. 00925 // #### 00926 for ($k=1; $k < @cap_chunk; $k++) { // ignore 0 element 00927 // // print "$k sub_cap = $cap_chunk[$k]\n"; 00928 if (($cap_chunk[$k] =~ /[A-Z]/) && ($k+1 < @cap_chunk)) { 00929 // #### 00930 // Handles when a cap chunk is followed by a small chunk. 00931 // // print "Create chunk = $cap_chunk[$k] with $cap_chunk[$k+1]\n"; 00932 // #### 00933 $temp = "$cap_chunk[$k]$cap_chunk[$k+1]"; 00934 $temp =~ tr/A-Z/a-z/; 00935 push (@word, $temp); 00936 // #### 00937 // jump past next k 00938 // #### 00939 $k++; 00940 } elsif (($cap_chunk[$k] =~ /[A-Z]/) && ($k+1 == @cap_chunk)) { 00941 // #### 00942 // Handles situation where last chunk is all caps. 00943 // // print "Create chunk = $cap_chunk[$k]\n"; 00944 // #### 00945 $temp = "$cap_chunk[$k]"; 00946 $temp =~ tr/A-Z/a-z/; 00947 push (@word, $temp); 00948 // #### 00949 // jump past next k cap chunk 00950 // #### 00951 $k++; 00952 } else { 00953 // #### 00954 // should never really get to this point 00955 // chunk is in small caps somewhere probably at beginning 00956 // #### 00957 // print "$k of $// cap_chunk; \"$cap_chunk[$k]\" of \"$c_word\" from \"$unproc_title\"; probably k=0 or 1\n"; 00958 } 00959 } // for the number of cap chunks 00960 } // if more than one cap chunk 00961 } // if there are capital letters in word chunk 00962 } // for each word chunk 00963 00964 // #### 00965 // Need to add the words to the list. 00966 // #### 00967 foreach $sw_w (@word) { 00968 push (@w_chunks, $sw_w); 00969 if (0) { 00970 // print "fractions: $sw_w\n"; 00971 } 00972 } 00973 00974 if ((0) && ($unproc_title =~ /BapiVersion/i)) { // debug 00975 $_ccc= 0; 00976 // print "==== more chunk \"$unproc_title\"\n"; 00977 foreach $c_word (@w_chunks){ 00978 $_ccc++; 00979 // print "1 $_ccc $c_word\n"; 00980 } 00981 $_ccc= 0; 00982 foreach $c_word (@word){ 00983 $_ccc++; 00984 // print "2 $_ccc $c_word\n"; 00985 } 00986 // exit(1); 00987 } // debug 00988 } // # on/off for capital word-chunking 00989 00990 00991 // #### 00992 // Do something with all of the word chunks 00993 // #### 00994 { // bracket level 00995 // // print "======= $unproc_title \n"; 00996 NEXT_C_WORD: foreach $c_word (@w_chunks){ 00997 if ($unproc_title =~ /^$c_word/i) { 00998 // #### 00999 // Don't do the a word chunk that comes 01000 // // close to matching the original name 01001 // (as in "log" for "logFuncDebugGet". Hence, 01002 // We'll purposely skip over it. 01003 // #### 01004 next NEXT_C_WORD; 01005 } 01006 $c_word =~ tr/A-Z/a-z/; 01007 $c_word = &trash_special_characters($c_word); 01008 01009 if ((0) && (&ignore_item($c_word))) { 01010 // #### 01011 // This means that the entry is not one of interest 01012 // #### 01013 next NEXT_C_WORD; 01014 } 01015 // #### 01016 // We don't want extra hyperlinks to word junk fragment. 01017 // Calling it specifically with $globe::word_c_boundary as url 01018 // #### 01019 &add_to_index_struct ($c_word, $c_word, $globe::word_c_boundary); 01020 01021 if (!(&add_to_lev2_index_struct ($c_word, $proc_title, $unproc_title, $assoc_t_data))) { 01022 // print "Tried adding a second level without the primary level.\n"; 01023 } 01024 01025 } // foreach $c_word 01026 } // bracket level 01027 } // word_chunking 01028 01029 01030 //############################################################################# 01031 /** @fn trash_special_characters 01032 ** @brief Removes all special characters that we don't want in index as 01033 ** word chunks. 01034 ** 01035 ** @param $in_word The word that might have special characters. 01036 ** @return The word without special characters. 01037 ** 01038 ** @lim Debug statements are left in. 01039 ** @ingroup tp_idx 01040 **/ 01041 // ############################################################################# 01042 int trash_special_characters ( ) { 01043 $in_word = $_[0]; 01044 01045 // #### 01046 $in_word =~ s/\&\// 151\;//; 01047 $in_word =~ s/\&\// 8212\;//; 01048 $in_word =~ s/\s//g; 01049 $in_word =~ s/\:\://g; 01050 $in_word =~ s/\(//g; 01051 $in_word =~ s/\)//g; 01052 $in_word =~ s/\,//g; 01053 // $in_word =~ s/\.//g; # 07/08/2002 removed 01054 $in_word =~ s/\-//g; 01055 $in_word =~ s/\_//g; 01056 $in_word =~ s/\://g; 01057 $in_word =~ s/\[//g; 01058 $in_word =~ s/\]//g; 01059 $in_word =~ s/\+//g; 01060 $in_word =~ s/\=//g; 01061 $in_word =~ s/\*//g; 01062 $in_word =~ s/\?//g; 01063 $in_word =~ s/\&//g; 01064 $in_word =~ s/\%//g; 01065 $in_word =~ s/\$//g; 01066 $in_word =~ s/\// //g; 01067 $in_word =~ s/\@//g; 01068 $in_word =~ s/\!//g; 01069 $in_word =~ s/\|//g; 01070 $in_word =~ s/\\//g; 01071 $in_word =~ s/\///g; 01072 $in_word =~ s/\<//g; 01073 $in_word =~ s/\>//g; 01074 $in_word =~ s/\"//g; 01075 $in_word =~ s/\'//g; 01076 $in_word =~ s/\~//g; 01077 $in_word =~ s/\`//g; 01078 // #### 01079 return ($in_word); 01080 } // trash_special_characters 01081 01082 01083 01084 //############################################################################# 01085 /** @fn create_index_files 01086 ** @brief Outputs m_idx_ HTML files from the contents of $idx_struct. 01087 ** 01088 ** @param $idx_struct is the sorted list. 01089 ** @return A series of HTML files that make up the index. The files begin 01090 ** with "m_idx_", are followed by a character, and end with ".html". 01091 ** 01092 ** This uses the $globe::master_index_html and swaps out the information 01093 ** delineated by $globe::m_define{structure}[0] and $globe::m_define{structure}[1]. 01094 ** 01095 ** The $idx_struct is sorted. Every time the first letter of 01096 ** the key changes, a new m_idx_ file is created. 01097 ** 01098 ** When creating the output, it formats it using valid HTML with an anchor href 01099 ** containing the valid URL and the text. It eliminates duplicates (e.g., URL 01100 ** is identical). 01101 ** 01102 ** It handles the levels, as in second-level index entries. 01103 ** 01104 ** $entry = compacted and clean display text for sorting. 01105 ** $subentry = compacted and clean display text for sorting. 01106 ** $idx_struct{$entry}{display} = display text 01107 ** $idx_struct{$entry}{url}[] = array of URL's for the $entry. 01108 ** $idx_struct{$entry}{sub}{$subentry}{display} = display text 01109 ** for the $entry's $subentry. 01110 ** $idx_struct{$entry}{sub}{$subentry}{url}[] = array of URL's 01111 ** for the $entry's $subentry. 01112 ** 01113 ** @lim Debug statements are left in. 01114 ** @ingroup tp_idx 01115 **/ 01116 // ############################################################################# 01117 int create_index_files ( ) { 01118 #define $inner_index "This Letter has no entries." 01119 #define $remember_letter "0" 01120 #define $remember_level "" 01121 #define $out_file $globe::path . "m_idx_" // will have letter and .html appended 01122 01123 // print "Entering create_index_files\n"; 01124 $_cnt=0; 01125 if (0){ // debug loop 01126 foreach $entry (sort keys %idx_struct){ 01127 $_cnt++; 01128 // print "$_cnt \"$entry\" = \"$idx_struct{$entry}{display}\"\n"; 01129 if (($_cnt < 25) && (0)){ 01130 foreach $subentry (sort keys %{$idx_struct{$entry}{sub}}) { 01131 // print "== $_cnt \"$subentry\" = \"$idx_struct{$entry}{sub}{$subentry}{display}\"\n"; 01132 } 01133 } 01134 if ((1)){ 01135 foreach $subentry (sort keys %{$idx_struct{$entry}{sub}}) { 01136 // print "== $_cnt \"$subentry\" = \"$idx_struct{$entry}{sub}{$subentry}{display}\"\n"; 01137 } 01138 } 01139 if (($_cnt > 12) && (0)){ 01140 // exit(1); 01141 } 01142 } 01143 // exit(1); 01144 } // debug loop 01145 01146 unless (open ( OUT_INDEX, ">$out_file$remember_letter.html")) { 01147 push (@file_errors, "Cannot open file \"$out_file$remember_letter.html\"\n"); 01148 // print "Cannot open file \"$out_file$remember_letter.html\"\n"; 01149 } 01150 01151 $_cnt=0; 01152 PURGE_ENTRY: foreach $entry (sort keys %idx_struct){ 01153 if ((1) && (&ignore_item ($idx_struct{$entry}{display}))) { 01154 next PURGE_ENTRY; 01155 } 01156 01157 // #### 01158 // Take care of writing out stored index info for a given letter 01159 // #### 01160 $first_letter = substr($entry, 0, 1); 01161 if ($first_letter !~ /[a-zA-Z0-9]/) { 01162 $first_letter = "\-"; 01163 } 01164 if (0) { 01165 // print "===letter \"$first_letter\" ===\n$inner_index\n"; 01166 } 01167 if ($first_letter =~ /$remember_letter/i){ 01168 if (0) { 01169 // print "Current letter ($remember_letter)\n"; 01170 } 01171 } else { 01172 // #### 01173 // write to the previous index file 01174 // #### 01175 @chunks = split ( /$globe::m_define{structure}[0]|$globe::m_define{structure}[1]/, $globe::master_index_html, 3); 01176 $chunks[1] = $inner_index; 01177 $globe::master_index_html = join ("", $chunks[0], $globe::m_define{structure}[0], $chunks[1], $globe::m_define{structure}[1], $chunks[2]); 01178 // print (OUT_INDEX $globe::master_index_html); 01179 if (0) { 01180 // print "===index entries \"$first_letter\" ===\n$inner_index\n"; 01181 } 01182 // // close the previous index file 01183 // close (OUT_INDEX); 01184 // Change the letter and open the next index file for writing 01185 $remember_letter = $first_letter; 01186 $temp_letter = $remember_letter; 01187 $temp_letter =~ tr/a-z/A-Z/; 01188 unless (open ( OUT_INDEX, ">$out_file$remember_letter.html")) { 01189 push (@file_errors, "Cannot open file \"$out_file$remember_letter.html\"\n"); 01190 // print "Cannot open file \"$out_file$remember_letter.html\"\n"; 01191 } 01192 // reset the index information 01193 $inner_index = "<p class=\"GroupTitlesIX\"><center><b>-$temp_letter-</b></center></p>\n"; 01194 } 01195 // #### 01196 01197 // #### 01198 // Take care of writing out stored index info for a given letter 01199 // Ignore duplicates. 01200 // #### 01201 01202 $inner_index .= "<pre class=\"Level1IX\">"; 01203 if ($// {$idx_struct{$entry}{url}} > 1) { 01204 // #### 01205 // Has multiple destinations to worry about 01206 // #### 01207 $inner_index .= "$idx_struct{$entry}{display}"; 01208 $inner_index .= "<br>"; 01209 for ($i=0; $i <= $// {$idx_struct{$entry}{url}}; $i++) { 01210 $inner_index .= "$idx_struct{$entry}{url}[$i]"; 01211 $inner_index .= "<img src=\"nav_doc.gif\" border=\"0\"></a>"; 01212 $_cnt++; 01213 } 01214 $inner_index .= "</pre>\n"; 01215 } else { 01216 // #### 01217 // Only has one destination to worry about 01218 // #### 01219 $inner_index .= "$idx_struct{$entry}{url}[0]"; 01220 $inner_index .= "$idx_struct{$entry}{display}"; 01221 $inner_index .= "</a></pre>\n"; 01222 $_cnt++; 01223 } 01224 01225 PURGE_SUBENTRY: foreach $subentry (sort keys %{$idx_struct{$entry}{sub}}){ 01226 if ((0) && (&ignore_item ($idx_struct{$entry}{sub}{$subentry}{display}))) { 01227 next PURGE_SUBENTRY; 01228 } 01229 $inner_index .= "<pre class=\"Level2IX\">"; 01230 if ($// {$idx_struct{$entry}{sub}{$subentry}{url}} > 1) { 01231 // #### 01232 // Has multiple destinations to worry about 01233 // #### 01234 $inner_index .= "$idx_struct{$entry}{sub}{$subentry}{display}"; 01235 $inner_index .= "<br>"; 01236 for ($i=0; $i <= $// {$idx_struct{$entry}{sub}{$subentry}{url}}; $i++) { 01237 $inner_index .= "$idx_struct{$entry}{sub}{$subentry}{url}[$i]"; 01238 $inner_index .= "<img src=\"nav_doc.gif\" border=\"0\"></a>"; 01239 $_cnt++; 01240 } 01241 $inner_index .= "</pre>\n"; 01242 } else { 01243 // #### 01244 // Only has one destination to worry about 01245 // #### 01246 $inner_index .= "$idx_struct{$entry}{sub}{$subentry}{url}[0]"; 01247 $inner_index .= "$idx_struct{$entry}{sub}{$subentry}{display}"; 01248 $inner_index .= "</a></pre>\n"; 01249 $_cnt++; 01250 } 01251 } // PURGE_SUBENTRY 01252 } // PURGE_ENTRY for each entry 01253 01254 01255 // print "Duplicates and ignore terms removed.\n"; 01256 // print "Total Individual Hyperlinks: $_cnt\n"; 01257 01258 01259 // #### 01260 // Clean-up for last letter/file. 01261 // #### 01262 // write to the previous index file 01263 @chunks = split ( /$globe::m_define{structure}[0]|$globe::m_define{structure}[1]/, $globe::master_index_html, 3); 01264 $chunks[1] = $inner_index; 01265 $globe::master_index_html = join ("", $chunks[0], $globe::m_define{structure}[0], $chunks[1], $globe::m_define{structure}[1], $chunks[2]); 01266 // print (OUT_INDEX $globe::master_index_html); 01267 if (0) { 01268 // print "===index entries \"$first_letter\" ===\n$inner_index\n"; 01269 } 01270 // // close the previous index file 01271 // close (OUT_INDEX); 01272 // #### 01273 01274 } // create_index_files 01275 01276 //############################################################################# 01277 /** @fn create_index_script 01278 ** @brief Outputs index script file from the contents of $idx_struct. 01279 ** 01280 ** @param $idx_struct is the sorted list. 01281 ** @return A single script files that make up the index. 01282 ** 01283 ** When creating the output, it formats it using valid script with an anchor href 01284 ** containing the valid URL and the text. It eliminates duplicates (e.g., URL 01285 ** is identical). 01286 ** 01287 ** It handles the levels, as in second-level index entries. 01288 ** 01289 ** $entry = compacted and clean display text for sorting. 01290 ** $subentry = compacted and clean display text for sorting. 01291 ** $idx_struct{$entry}{display} = display text 01292 ** $idx_struct{$entry}{url}[] = array of URL's for the $entry. 01293 ** $idx_struct{$entry}{sub}{$subentry}{display} = display text 01294 ** for the $entry's $subentry. 01295 ** $idx_struct{$entry}{sub}{$subentry}{url}[] = array of URL's 01296 ** for the $entry's $subentry. 01297 ** 01298 ** @lim Debug statements are left in. 01299 ** @ingroup tp_idx 01300 **/ 01301 // ############################################################################# 01302 int create_index_script ( ) { 01303 #define $remember_letter "0" 01304 #define $remember_level "" 01305 #define $out_file $globe::path . "m_idx" // will have letter and .script appended 01306 #define $very_critical 0 01307 01308 // print "Entering create_index_script\n"; 01309 $_cnt=0; 01310 if (0){ // debug loop 01311 foreach $entry (sort keys %idx_struct){ 01312 $_cnt++; 01313 // print "$_cnt \"$entry\" = \"$idx_struct{$entry}{display}\"\n"; 01314 if (($_cnt < 25) && (0)){ 01315 foreach $subentry (sort keys %{$idx_struct{$entry}{sub}}) { 01316 // print "== $_cnt \"$subentry\" = \"$idx_struct{$entry}{sub}{$subentry}{display}\"\n"; 01317 } 01318 } 01319 if ((1)){ 01320 foreach $subentry (sort keys %{$idx_struct{$entry}{sub}}) { 01321 // print "== $_cnt \"$subentry\" = \"$idx_struct{$entry}{sub}{$subentry}{display}\"\n"; 01322 } 01323 } 01324 if (($_cnt > 12) && (0)){ 01325 // exit(1); 01326 } 01327 } 01328 // exit(1); 01329 } // debug loop 01330 01331 01332 01333 // #### 01334 // Handle the script file implementation 01335 // #### 01336 01337 unless (open ( OUT_SCRIPT, ">$out_file.script")) { 01338 push (@file_errors, "Cannot open file \"$out_file.script\"\n"); 01339 // print "Cannot open file \"$out_file.script\"\n"; 01340 } 01341 // print "Preparing to output \"$out_file.script\"\n"; 01342 // // $inner_script = "Item level=1 image=nav_folderclosed.gif text=Master Index\r\n"; 01343 $inner_script = ""; 01344 01345 PURGE_ENTRY_SCRIPT: foreach $entry (sort keys %idx_struct){ 01346 if ((0) && (&ignore_item ($idx_struct{$entry}{display}))) { 01347 next PURGE_ENTRY_SCRIPT; 01348 } 01349 01350 // #### 01351 // Take care of index info for a given letter 01352 // #### 01353 $first_letter = substr($entry, 0, 1); 01354 if ($first_letter !~ /[a-zA-Z]/) { 01355 $first_letter = "\-"; 01356 next PURGE_ENTRY_SCRIPT; 01357 } 01358 if (0) { 01359 // print "===letter \"$first_letter\" \n"; 01360 } 01361 if ($first_letter =~ /$remember_letter/i){ 01362 if (0) { 01363 // print "Current letter ($remember_letter)\n"; 01364 } 01365 } else { 01366 // #### 01367 // write to the previous index file 01368 // #### 01369 // Change the letter and open the next index file for writing 01370 $remember_letter = $first_letter; 01371 $temp_letter = $remember_letter; 01372 $temp_letter =~ tr/a-z/A-Z/; 01373 // // $inner_script .= "Item level=2 image=nav_folderclosed.gif text=-$temp_letter-\r\n"; 01374 // $inner_script .= "Item level=1 image=nav_folderclosed.gif text=-$temp_letter-\r\n"; 01375 } 01376 // #### 01377 01378 if ((0) && ($// {$idx_struct{$entry}{url}} > 1)) { 01379 // #### 01380 // Has multiple destinations to worry about 01381 // #### 01382 // // $inner_script .= "Item level=3 image=nav_folderclosed.gif text="; 01383 // $inner_script .= "Item level=2 image=nav_folderclosed.gif text="; 01384 $inner_script .= "$idx_struct{$entry}{display}"; 01385 $inner_script .= "\r\n"; 01386 for ($i=0; $i <= $// {$idx_struct{$entry}{url}}; $i++) { 01387 // $inner_script .= "Item level=4 image=nav_doc.gif url="; 01388 $inner_script .= "Item level=3 image=nav_doc.gif url="; 01389 // $inner_script .= "$idx_struct{$entry}{url}[$i]"; 01390 ($before, $piece, $after) = &globe::get_tag_chunk( $idx_struct{$entry}{url}[$i], 01391 "href[\s]*\=[\s]*\"", 01392 "\"", 01393 $very_critical); 01394 01395 if ($piece) { 01396 $inner_script .= "$piece"; 01397 $inner_script .= ",basefrm "; 01398 } 01399 $inner_script .= "\r\n"; 01400 } 01401 } else { 01402 // #### 01403 // Only has one destination to worry about 01404 // #### 01405 // $inner_script .= "Item level=3 image=nav_doc.gif url="; 01406 $inner_script .= "Item level=2 image=nav_doc.gif url="; 01407 // $inner_script .= "$idx_struct{$entry}{url}[0]"; 01408 ($before, $piece, $after) = &globe::get_tag_chunk( $idx_struct{$entry}{url}[0], 01409 "href[\s]*\=[\s]*\"", 01410 "\"", 01411 $very_critical); 01412 if ($piece) { 01413 $inner_script .= "$piece"; 01414 $inner_script .= ",basefrm "; 01415 } 01416 $inner_script .= " text="; 01417 $inner_script .= "$idx_struct{$entry}{display}"; 01418 $inner_script .= "\r\n"; 01419 } 01420 PURGE_SUBENTRY_SCRIPT: foreach $subentry (sort keys %{$idx_struct{$entry}{sub}}){ 01421 if ((0) && (&ignore_item ($idx_struct{$entry}{sub}{$subentry}{display}))) { 01422 next PURGE_SUBENTRY_SCRIPT; 01423 } 01424 if ((0) && ($// {$idx_struct{$entry}{sub}{$subentry}{url}} > 1)) { 01425 // #### 01426 // Has multiple destinations to worry about 01427 // #### 01428 // // $inner_script .= "Item level=4 image=nav_folderclosed.gif text="; 01429 // $inner_script .= "Item level=3 image=nav_folderclosed.gif text="; 01430 $inner_script .= "$idx_struct{$entry}{sub}{$subentry}{display}"; 01431 $inner_script .= "\r\n"; 01432 for ($i=0; $i <= $// {$idx_struct{$entry}{sub}{$subentry}{url}}; $i++) { 01433 // $inner_script .= "Item level=5 image=nav_doc.gif url="; 01434 $inner_script .= "Item level=4 image=nav_doc.gif url="; 01435 // $inner_script .= "$idx_struct{$entry}{sub}{$subentry}{url}[$i]"; 01436 ($before, $piece, $after) = &globe::get_tag_chunk( $idx_struct{$entry}{sub}{$subentry}{url}[$i], 01437 "href[\s]*\=[\s]*\"", 01438 "\"", 01439 $very_critical); 01440 01441 if ($piece) { 01442 $inner_script .= "$piece"; 01443 $inner_script .= ",basefrm "; 01444 } 01445 $inner_script .= "\r\n"; 01446 } 01447 } else { 01448 // #### 01449 // Only has one destination to worry about 01450 // #### 01451 // $inner_script .= "Item level=4 image=nav_doc.gif url="; 01452 $inner_script .= "Item level=3 image=nav_doc.gif url="; 01453 // $inner_script .= "$idx_struct{$entry}{sub}{$subentry}{url}[0]"; 01454 ($before, $piece, $after) = &globe::get_tag_chunk( $idx_struct{$entry}{sub}{$subentry}{url}[0], 01455 "href[\s]*\=[\s]*\"", 01456 "\"", 01457 $very_critical); 01458 01459 if ($piece) { 01460 $inner_script .= "$piece"; 01461 $inner_script .= ",basefrm "; 01462 } 01463 01464 $inner_script .= " text="; 01465 $inner_script .= "$idx_struct{$entry}{sub}{$subentry}{display}"; 01466 $inner_script .= "\r\n"; 01467 } 01468 } // PURGE_SUBENTRY_SCRIPT: 01469 } // PURGE_ENTRY_SCRIPT for each entry 01470 01471 // print (OUT_SCRIPT $inner_script); 01472 // close (OUT_SCRIPT); 01473 01474 } // create_index_script 01475 01476 01477 //############################################################################# 01478 /** @fn using_indexer 01479 ** @brief What to do when no arguments are given. 01480 ** @param None 01481 ** @return None 01482 ** 01483 ** @lim None 01484 ** @ingroup tp_idx 01485 **/ 01486 // ############################################################################# 01487 int using_indexer ( ) { 01488 // print "\nvoyant_indexer.pl creates index information to be displayed in the navigation\n"; 01489 // print "pane. It assumes all input index files have been copied into the\n"; 01490 // print "input directory are are named uniquely. The index files\n"; 01491 // print "were generated by voyant_nav.pl when processing Doxygen or Mif2Go output.\n"; 01492 // print "The index files are temporary files.\n"; 01493 // print "\n-- All index files must begin with \"index_\".\n"; 01494 // print "-- All files must reside in the input directory.\n\n"; 01495 // print "This takes three arguments:\n"; 01496 // print "[1] The directory (with slash \) of where to find the raw index files.\n"; 01497 // print "[2] The path & name of the HTML file to use as the master template for the individual index files.\n"; 01498 // print " It must have sections for voyant_header, voyant_structure,\n"; 01499 // print " and voyant_footer.\n"; 01500 // print " A section has a begin and end, such as:\n"; 01501 // print " <!-- begin voy_structure -->\n"; 01502 // print " <!-- end voy_structure -->\n"; 01503 // print "[3] The path & name of the file containing words to ignore in word-chunking.\n"; 01504 // print "\nThe output are files begin with \"m_idx_\".\n"; 01505 // print "[4] If you don't want word-chunking, enter \"no_chunk\".\n"; 01506 // print "The m_idx output is displayed in a treefrm while the content it controls\ndisplays in the basefrm.\n"; 01507 // print "\nTerminating voyant_indexer.pl without doing anything.\n"; 01508 return; 01509 } // using_indexer 01510 01511 01512 01513 01514 //############################################################################# 01515 /** @fn int END 01516 ** @brief Code to execute when first entered. 01517 ** 01518 ** @param None. 01519 ** 01520 ** @return None. 01521 ** 01522 ** @lim None 01523 ** @ingroup tp_nav 01524 **/ 01525 // ############################################################################# 01526 int END ( ) { 01527 // undef ($_index_file_list); // "_index_file_list"; 01528 // undef ($in_file); // ""; 01529 // undef ($f_type); // "index_"; 01530 01531 01532 01533 // ############################################################################# 01534 // # Memory clean-up. 01535 // ############################################################################# 01536 &globe::memory_clean_up(); 01537 01538 // print "\n============ Finished voyant_indexer.pl ==================================\n"; 01539 } // END 01540 01541 01542 01543 01544 01545
|
|
|
Open-Source tools compliments of Voyant Technologies, Inc. and Glenn C. Maxey.
01/13/2003
TP Tools v2-00-0a
# tpt-perl-hcr-02