Parse-MediaWikiDump

Parse::MediaWikiDump is a collection of classes for processing various
MediaWiki dump files; the package requires XML::Parser. Using 
Parse::MediaWikiDump it is nearly trivial to get access to the information
in supported dump files.

INSTALLATION

To install this module, run the following commands:

    perl Makefile.PL
    make
    make test
    make install

EXAMPLE

A short program which finds the stub in the main name space with the most
links to it

#!/usr/bin/perl -w

use strict;
use Parse::MediaWikiDump;

my $pages = Parse::MediaWikiDump::Pages->new(shift(@ARGV));
my $links = Parse::MediaWikiDump::Links->new(shift(@ARGV));
my %stubs;
my $page;
my $link;
my @list;

select(STDERR);
$| = 1;
print '';
select(STDOUT);

print STDERR "Locating stubs: ";

while(defined($page = $pages->page)) {
	next unless $page->namespace eq '';

	my $text = $page->text;

	next unless $$text =~ m/stub}}/i;

	my $title = $page->title;
	my $id = $page->id;

	$stubs{$id} = [$title, 0];
}

print STDERR scalar(keys(%stubs)), " stubs found\n";

print STDERR "Processing links: ";

while(defined($link = $links->link)) {
	my $to = $link->to;

	next unless defined($stubs{$to});

	$stubs{$to}->[1]++;
}

print STDERR "done\n";

while(my ($key, $val) = each(%stubs)) {
	push(@list, $val);
}

@list = sort({ $$b[1] <=> $$a[1]} @list);

my $stub = $list[0]->[0];
my $num_links = $list[0]->[1];

print "Most wanted stub: $stub with $num_links links\n";
  
COPYRIGHT AND LICENCE

Copyright (C) 2005 Tyler Riddle

This program is free software; you can redistribute it and/or modify it
under the same terms as Perl itself.
