#!/usr/local/bin/perl
use warnings;
use strict;

our $VERSION = "0.3";

use App::SimpleScan;

my $app = new App::SimpleScan;
exit $app->go;

__END__

=head1 NAME

simple_scan - scan a set of Web pages for strings present/absent

=head1 ABSTRACT

App::SimpleScan - Mini-language for website testing

=head1 SYNOPSIS

  simple_scan [--generate] [--run] {file file file ...}

=head1 USAGE

  # Run the tests in the files supplied on the command line.
  # --run (or -run; we're flexible) is assumed if you give no switches.
  % simple_scan file1 file2 file3

  # Generate a set of tests and save them, then run them.
  % <complex pipe> | simple_scan --generate > pipe_scan.t

  # Run one simple test
  % echo "http://yahoo.com yahoo Y Look for yahoo.com"  | simple_scan -run

=head1 DESCRIPTION

C<simple_scan> reads either files supplied on the command line, or standard
input. It creates and runs, or prints, or even both, a L<Test::WWW::Simple>
test for the criteria supplied to it.

C<simple_scan>'s input should be in the following format:

  <URL> <pattern> <Y|N> <comment>

The I<URL> is any URL; I<pattern> is a Perl regular expression, delimited by
slashes; I<Y|N> is C<Y> if the pattern should match, or C<N> if the pattern 
should B<not> match; and I<comment> is any arbitrary text you like (as long 
as it's all on the same line as everything else). 

C<simple_scan>will do its best to try to interpret your pattern; if it can't
parse it as a regular expression, it will assume you meant to match against
a literal character string instead; so a pattern like

   /<b>this</b>/

Would be interpreted as the literal string "<b>this</b>".

=head1 COMMAND-LINE SWITCHES

We use L<Getopt::Long> to get the command-line options, so we're really very
flexible as to how they're entered. You can use either one dash (as in
C<-foo>) or two (as in C<--bar>). You only need to enter the minimum number
or characters to match a given switch.

=over 4

=item C<--run>

C<--run> tells C<simple_scan> to immediately run the tests it's created. Can
be abbreviated to C<-r>.

This option is mosst useful for one-shot tests that you're not planning to
run repeatedly.

=item C<--generate>

C<--generate> tells C<simple_scan> to print the test it's generated on the
standard output.

This option is useful to build up a test suite to be reused later.

=back

Both C<-r> and C<-g> can be specified at the same time to run a test and print 
it simultaneously; this is useful when you want to save a test to be run later 
as well as right now without having to regenerate the test.

=over 4

=item C<--define>

C<--define> allows you to predefine substitutions to be used during a 
C<simple_scan> run. To define a substitution, use this syntax:

  --define foo=bar --define baz="one two three"

The first example defines a single substitution; the second defines a
multiple substitution. In conjunction with C<--override>, C<--define>
can make C<simple_scan> ignore any definitions for variables in the
C<simple_scan> input file. Conversely, if C<--defer> is specified, 
any definitions on the command line will be altered if a definition
for the variable is found in the input file.

Note that C<%%forget> can still make C<simple_scan> forget a definition 
(if C<App::SimpleScan::Plugin::Forget> is installed).

Also note that you define a variable with multiple values like this:

  --define foo="bar baz quux"

butB<not> like this:

  --define foo=bar --define foo=baz --define foo=quux

since multiple definitions of a single substitution use only the
I<last> substitution defined; the example directly above (with
the three "--define" entries) defines "foo" as "quux" and only 
as "quux".

=item C<--override>

Makes any definitions entered on the command line override definitions 
found in the input file.

=item C<--defer>

Makes any definitions entered on the command line defer to defintions
found in the input file - the variables in question will be redefined 
by the command file.

=item C<--debug>

Enables debugging for you C<simple_scan> input file; this outputs a 
lot of extra code which, when executed by C<simple_scan --run>, shows
a lot more information as to what actually happened.

Currently, the only extra debugging information is a list of variables
which were I<not> altered by substitution pragmas when C<--override>
was specified on the command line.

=back

=head1 PRAGMAS

Pragmas are ways to influence what C<simple_scan> does when generating tests.
They don't output anything themselves.

Pragmas are specified with C<%%> in column 1 and the pragma name immediately
following. Any arguments aer supplied after a colon, like this:

   %%foo: bar baz

This invokes the C<foo> pragma with the argument C<bar baz>.

=head2 Substitutions

Any pragma that's otherwise unrecognized by C<simple_scan> is treated as a 
substitution. Substitutions assume that you have a name and a set of strings
following it; these strings wil be substituted into the test specs occuring
between this (set) of substitutions and the next (set).

Here's an example. 

   %% user dconway chromatic petdance
   %% use_perl_id Ovid pemungkah
   http://search.cpan.org/~<user>
   http://use.perl.org/~<use_perl_id>/journal/
   http://search.yahoo.com/
   ...
   
This would fetch the CPAN index page for the users dconway, chromatic, and 
petdance, and the use.perl journals for users Ovid and pemungkah. Finally,
it would (just once) fetch the Yahoo! search page - because there are no
substitutions in that line, it would only be evaluated once.

The substitutions can occur anywhere in the line, including in the comment.

This pragma allows for very simple-minded internationalization. For instance,
let's assume that you want to substitute each of a list of two-character 
country codes into a string (most likely somewhere in the URL, but possibly 
in the comment too). 

C<simple_scan> will do this for you, creating a test for each country code
you specify. For instance:

   %%xx: es au my jp
   http://>xx<.mysite.com/     /blargh/  Y  look for blargh (>xx<)

This would generate 4 tests, for C<es.mysite.com>, C<au.mysite.com>, 
c<my.mysite.com>, and C<jp.mysite.com>, all looking to match C<blargh> 
somewhere on the page.

=head2 agent

The C<agent> pragma allows you to switch user agents during the test. 
C<Test::WWW::Simple>'s default is C<Windows IE 6>, but you can switch it
to any of the other user agents supported by C<WWW::Mechanize>.

   http://gemal.dk/browserspy/basic.html /Explorer/ Y Should be Explorer
   %%agent: Mac Safari
   http://gemal.dk/browserspy/basic.html /Safari/ Y Should be Safari

=head2 cache

The C<cache> pragma turns on URL caching; once enabled, the page returned
on the I<first> access to a URL is returned directly from a memory cache,
without its being reaccessed from the Web.

Using C<cache> can result in major speedups for tests which repeatedly
hit the same page.

=head2 nocache

The C<nocache> pragma turns I<off> URL caching; this is useful if you
have something like a REST interface that may return different values 
from repeated accesses to the same URL.

=head1 AUTHOR

Joe McMahon E<lt>mcmahon@cpan.orgE<gt>

=head1 COPYRIGHT AND LICENSE

Copyright (c) 2005 by Yahoo!

This script is free software; you can redistribute it or modify it under the
same terms as Perl itself, either Perl version 5.6.1 or, at your option, any
later version of Perl 5 you may have available.
