
WWW::Search and AutoSearch and WebSearch
========================================


WHAT IS NEW IN WWW::Search 2.04?  (1999-09-30)
----------------------------------------------
overview:
 * Bug fixes for several backends (as usual)
 * NEW backends: GoTo, Deja, NetFind, Lycos::Sites, Lycos::Pages
 * WebSearch has new/changed options: --count, --debug, --lwpdebug

See the file ChangeLog for details.


WHAT IS WWW::Search?
--------------------

WWW::Search is a collection of Perl modules which provide an API to
WWW (and similar) search engines.  Currently WWW::Search includes
back-ends for variations of AltaVista, Dejanews, Excite, HotBot,
Infoseek, Lycos, Magellan, WebCrawler, and Yahoo, among others.  We
include two applications built from this library: AutoSearch (a
program to automate tracking of search results over time), and
WebSearch, a small demonstration program to drive the library.

WWW::Search does NOT try to emulate the search that you would get with
each search engine's GUI.  WWW::Search performs the search in a way
that is efficient and convenient for text processing.  This might
include getting "text-only" pages, making sure descriptions are turned
on, and increasing the number of hits per page, among other tricks.

Because WWW::Search depends on parsing the HTML output of web search
engines it will fail if the search engine operators change their
format (an unfortunately frequent occurrence).  WWW::Search includes a
test suite for most back-ends which verifies that it is functioning
correctly.  As of the day of the release the current back-end status
is:

AltaVista		working
AltaVista::AdvancedNews	working
AltaVista::AdvancedWeb	working
AltaVista::News		working
AltaVista::Web		working
AltaVista::Intranet	working
Crawler			partially working?
Deja/Dejanews		working
Excite			working
Excite::News		working
ExciteForWebServers	not working
Fireball		not working?
FolioViews		working
Google  		working
Gopher			not working? (not in test suite)
GoTo                    working
HotBot			working
HotFiles		working
Infoseek		working
Infoseek::Companies	working
Infoseek::Email		not working
Infoseek::News		working
Infoseek::Web		working
Livelink		not working? (not in test suite)
LookSmart		working? (not in test suite)
Lycos			working
Lycos::Pages		working
Lycos::Sites		working
Magellan		working
Metacrawler             not working
Metapedia		working? (not in test suite)
MSIndexServer		not working?
NetFind                 working
NorthernLight		working
Null			working
OpenDirectory           working
PLweb			not working
Profusion		working
Search97		not working
SFgate			working
Simple			not working? (not in test suite)
Snap			working
Verity			not working (not in test suite)
WebCrawler		working
Yahoo			working
ZDNet			working

``Partially working'' indicates that some tests passed and some failed.


WHAT IS AutoSearch?
-------------------

WWW::Search's primary client is AutoSearch.  AutoSearch performs a
web-based search and puts the results set in a web page.  It
periodically updates this web page, indicating how the search changes
over time.  Sample output from AutoSearch can be found at
<http://www.isi.edu/lsam/tools/autosearch/>.  Output format is
configurable.

See the man page for AutoSearch details, or the DEMONSTRATION section
below for quick-start instructions.


REQUIREMENTS
------------

WWW::Search requires Perl5, the libwww-perl module suite, the URI
module, and the HTML::Parser module.  Some of the "not working"
modules require the HTML::TreeBuilder module (so you can ignore
warnings about TreeBuilder during the build).  For information on
Perl5, see <http://www.perl.com>.  For all the modules, see
<http://www.perl.com/CPAN/> to find a CPAN site near you.

At the time of this release, the primary WWW::Search development and
testing is under perl version 5.005_03 on Sun UltraSparc Solaris 7 and
under ActiveState perl build 519 on Windows NT 4.0 with service pack
5.

WWW::Search has also been built and tested successfully on Win98J
(that's Japanese) with ActiveState perl build 517.

If you have successfully built and tested WWW::Search on any other
(obscure) platform / version combination, please let me know!
MartinThurn@iname.com


AVAILABILITY
------------

The latest version of WWW::Search should always be available on CPAN.
Here is the best URL for finding it:
http://www.perl.com/CPAN-local/modules/by-module/WWW


INSTALLATION
------------

In order to use this package you will need Perl version 5.002 or
better.  

It is hightly recommended that you use CPAN.pm to install WWW::Search.
It will automatically install all the prerequisite modules and put
everything in the right places.  On a unix system, just type

   perl -MCPAN -e 'install WWW::Search'.

Otherwise, you can install WWW::Search as you would any perl module
library, by running these commands in the WWW-Search-2.04 directory
after unpacking the archive (and after installing all the prerequisite
modules):

    perl Makefile.PL
    make
    make test
    make install

On Win32, maintenance and testing is done with Microsoft's nmake.exe;
use 'nmake' instead of 'make' in the above sequence of commands.

See "TESTING" below for a description of what "make test" does.

If you want to install a private copy of WWW::Search in your home
directory, then you should produce the initial Makefile with something
like this command:

    perl Makefile.PL PREFIX=/my/perl/lib

Don't forget to add /my/perl/lib to your PERL5LIB environment variable
(or 'use lib' it, or unshift it onto @INC)!


TESTING
-------

The "make test" command compares expected output from WWW::Search with
actual output.  It detects two kinds of errors:

- internal parsing:
	First it checks to make sure that your system computes
	the same results as my system based on some saved
	Web queries.  This test should always pass for working
        backends; if it doesn't, send me mail.

- external queries:
	Second, it makes real queries against the search engines
	and compares them with some saved results.

External queries can fail for several reasons:

- new pages have been added which match the test queries, or matching
  pages have been deleted, causing the page count to go too far out of
  whack from the expected number (not necessarily a bad thing)

- changes in the web search engine output which break WWW::Search's
  parsers, usually resulting in no URLs being returned (a bad thing)

If the external tests fail, please either investigate the error or
send a description of the problem, a list of your operating system and
all relevant perl version number, and the output of "make test" to the
maintainer of the back-end for the search engine that fails.


DISCUSSION, BUG REPORTS, AND IMPROVEMENTS
-----------------------------------------

Feedback about WWW::Search is encouraged.  If you're using it for a
neat application, please let us know.  If you'd like to (or have)
implemented a new back-end for WWW::Search, let us know so we don't
duplicate work.  <MartinThurn@iname.com>

Feedback, bug reports, fixes, and new back-ends should be sent to
Martin Thurn <MartinThurn@iname.com>.  When sending e-mail, please
please put [WWW::Search] at the beginning of the subject line (or risk
me losing the message in the pile).

There is a mailing list for WWW::Search discussion.  To subscribe,
send "subscribe info-www-search" as the body of a message to
<info-www-search-request@isi.edu>.

Back-end-related bug reports ("search engine ABC doesn't work") should
be sent to the author of the back-end (back-end authors are identified
in the corresponding man page and in the output of ``make test'').
General bugs should be reported to <MartinThurn@iname.com>.

When submitting a bug report, please remember to include
        - your operating system name and version
	- your version of perl
	- your version of WWW::Search
	- your version of the backend
        - the code you ran to produce the error (PLEASE cut-and-paste!)
	- sample output showing the error (PLEASE cut-and-paste!)


DEMONSTRATION
-------------

After installing the distribution, try:

	WebSearch '"Your Name Here"'

or, if you are on Win32:

        WebSearch "\"Your Name Here\""

to see who's talking about you on the web.  Then (in your web page
directory), try:

        cd /path/to/your/web/pages
	AutoSearch -n me_on_the_web -s '"Your Name Here"' me

or, if you are on Win32:

        cd /path/to/your/web/pages
	AutoSearch -n me_on_the_web -s "\"Your Name Here\"" me

and the web page /path/to/your/web/pages/me/index.html will be created
summarizing this information.  If you are on UNIX you can add

	0 3 * * 1 AutoSearch /path/to/your/web/pages/me

to your crontab to update this search every week at 3:00 Monday
morning, for example.


DOCUMENTATION
-------------

See `perldoc WWW::Search` after installation for an overview of the
library.  POD-style documentation is also included in all modules and
scripts, so you can do `perldoc WebSearch` and `perldoc AutoSearch`
and `perldoc WWW::Search::HotBot` after installation.


FUTURE PLANS
------------

Some ideas:

  - a global option that will force WWW::Search to perform the same
search as the engine's web GUI (I'm looking for contributions of the
precise arguments that will produce such a search for each engine;
i.e. the hash that should be passed as the second argument to
native_query).  Contact <MartinThurn@iname.com>

  - application-level proxy support (I'm looking for a contribution
here from someone who uses/needs proxy support and can test it).
Contact <MartinThurn@iname.com>

  - use LWP::ParallelUA to speed up multiple backend search requests
(I'm trying to decide what the API interface will look like; please
send suggestions).  Contact <MartinThurn@iname.com>

  - more widespread use of new result tags across all back-ends
(i.e. description, date, size, etc.)

  - a freeze/restore interface to suspend and resume in-progress queries

  - more back-ends

Contributions are always welcome.  Send me e-mail if you plan a new
back-end and to discuss architectural changes (to avoid duplicating
work).  Contact <MartinThurn@iname.com>


SUPPORT AND CREDITS
-------------------

The WWW::Search architecture is by John Heidemann with feedback from
the other contributors.  NOTE: This list is no longer updated; consult
the on-line documentation and/or the output of `make test` to find out
who is currently maintaining each component.

PLATFORM SUPPORT:
	Unix			John Heidemann <johnh@isi.edu>
	Windows			Jim Smyser <jsmyser@bigfoot.com>
                		(see <http://members.xoom.com/WWW_Search>)

APPLICATIONS:
	WebSearch		John Heidemann
	AutoSearch 		William Scheding <wls@isi.edu>

BACK-ENDS:
	AltaVista		John Heidemann
	Dejanews		Cesare Feroldi de Rosa <C.Feroldi@it.net>
				and Martin Thurn <MartinThurn@iname.com>
	Crawler			Andreas Borchert
	Excite			GLen Pringle <pringle@cs.monash.edu.au>
				and Martin Thurn
	ExciteForWebServers	Paul Lindner <lindner@reliefweb.int>
	Fireball		Andreas Borchert
	FolioViews		Paul Lindner
	Gopher			Paul Lindner
	HotBot			William Scheding and Martin Thurn
	HotFiles		Jim Smyser
	Infoseek		Cesare Feroldi de Rosa and Martin Thurn
	Livelink		Paul Lindner
	Lycos			William Scheding and John Heidemann,
				Martin Thurn
	Magellan		Martin Thurn
	MSIndexServer		Paul Lindner
	NorthernLight		Jim Smyser
	Null			Paul Lindner
	OpenDirectory		Jim Smyser
	PLWeb			Paul Lindner
	Profusion		Jim Smyser
	Search97		Paul Lindner
	SFgate			Paul Lindner
	Simple			Paul Lindner
	Snap			Jim Smyser
	Verity			Paul Lindner
	WebCrawler		Martin Thurn
	Yahoo			William Scheding and Martin Thurn
	ZDNet			Jim Smyser

AutoSearch is based on an earlier implementation by Kedar Jog
<jog@isi.edu> with advice from Joe Touch <touch@isi.edu>.

Bugs and extensions (to the software and documentation) have been
identified by William Scheding <wls@isi.edu>, T. V. Raman
<raman@adobe.com> (proxy support), C. Feroldi <C.Feroldi@it.net>,
Larry Virden <lvirden@cas.org>, Paul Lindner <paul.lindner@itu.int>,
Guy Decoux <decoux@moulon.inra.fr>, R Chandrasekar (Mickey)
<mickeyc@linc.cis.upenn.edu>, Martin Thurn <MartinThurn@iname.com>,
Chris Nandor <pudge@pobox.com>, Martin Valldeby
<martin.valldeby@pakom.se>, Jim Smyser <jsmyser@bigfoot.com>, Darren
Stalder <darren@u.washington.edu>, Neil Bowers
<neilb@cre.canon.co.uk>, Ave Wrigley <wrigley@cre.canon.co.uk>,
Andreas Borchert <borchert@mathematik.uni-ulm.de>, Jim Smyser
<jsmyser@bigfoot.com>.

Bugs have reported by Joseph McDonald <joe@smartlink.net>, Juan Jose
Amor <jjamor@infor.es>, Bowen Dwelle <bowen@hotwired.com>, Vassilis
Papadimos <vpapad@dblab.ece.ntua.gr>, Vidyut Luther <vluther@hpctc.org>, 
Chris P. Acantilado <cacantil@spawar.navy.mil>.


COPYRIGHT
---------

Copyright (c) 1996 University of Southern California.
All rights reserved.                                            
                                                               
Redistribution and use in source and binary forms are permitted
provided that the above copyright notice and this paragraph are
duplicated in all such forms and that any documentation, advertising
materials, and other materials related to such distribution and use
acknowledge that the software was developed by the University of
Southern California, Information Sciences Institute.  The name of the
University may not be used to endorse or promote products derived from
this software without specific prior written permission.

THIS SOFTWARE IS PROVIDED "AS IS" AND WITHOUT ANY EXPRESS OR IMPLIED
WARRANTIES, INCLUDING, WITHOUT LIMITATION, THE IMPLIED WARRANTIES OF
MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE.


Portions of this README are derived from the README for libwww-perl.

