next up previous contents index
Next: 4.4.2 Summarizing SGML data Up: 4.4 Extracting data for Previous: 4.4 Extracting data for

4.4.1 Default actions of ``stock'' summarizers

   

The following table provides a brief reference for how documents are summarized depending on their type. These actions can be customized, as discussed in Section 4.4.4. Some summarizers are implemented as UNIX programs while others are expressed as regular expressions; see Section 4.4.4 or Appendix C.4 for more information about how to write a summarizer.

                                                           

Type            Summarizer Function
--------------------------------------------------------------------
Audio           Extract file name
Bibliographic   Extract author and titles
Binary          Extract meaningful strings and manual page summary
C, CHeader      Extract procedure names, included file names, and comments
Dvi             Invoke the Text summarizer on extracted ASCII text
FAQ, FullText, README     
                Extract all words in file
Framemaker      Up-convert to SGML and pass through SGML summarizer
Font            Extract comments
HTML            Extract anchors, hypertext links, and selected fields (see SGML)
LaTex           Parse selected LaTex fields (author, title, etc.)
Mail            Extract certain header fields
Makefile        Extract comments and target names
ManPage         Extract synopsis, author, title, etc., based on ``-man'' macros
News            Extract certain header fields
Object          Extract symbol table
Patch           Extract patched file names
Perl            Extract procedure names and comments
PostScript      Extract text in word processor-specific fashion, and pass
                through Text summarizer.
RCS, SCCS       Extract revision control summary
RTF             Up-convert to SGML and pass through SGML summarizer
SGML            Extract fields named in extraction table (see Section~\ref{sec:sgml})
ShellScript     Extract comments
SourceDistribution 
                Extract full text of README file and comments from Makefile
                and source code files, and summarize any manual pages
SymbolicLink    Extract file name, owner, and date created
Tex             Invoke the Text summarizer on extracted ASCII text
Text            Extract first 100 lines plus first sentence of each 
                remaining paragraph
Troff           Extract author, title, etc., based on ``-man'', ``-ms'', 
                ``-me'' macro packages, or extract section headers and 
                topic sentences.
Unrecognized    Extract file name, owner, and date created.


next up previous contents index
Next: 4.4.2 Summarizing SGML data Up: 4.4 Extracting data for Previous: 4.4 Extracting data for



Darren Hardy
Mon Apr 3 15:22:37 MDT 1995