fd_filter & fd_extract

NeTraMet flow data file utility programs
Version  2.2


Nevil Brownlee

Computer Centre
The University of Auckland
Auckland, New Zealand
July 1994

1.	Introduction
fd_extract and fd_filter are programs for processing flow data files.  They are intended 
first as utilities which perform basic utility functions on these files, and second as examples 
of code for working with them.
1.1.	 Overview
fd_filter reads a flow data file and processes it as requested by a format file so as to 
produce a new flow data file.  Possible processing operations are:
*	Compute flow rates, i.e. compute differences between samples for 
To/From Octet and PDU rates.
*	Change the file format, e.g. rearrange the order of attributes in the flow 
data records.
*	Filter out flows, i.e. select only those of interest.  Identify these with a 
new 'tag' attribute.
*	Filter out NeTraMet statistics records.
Any combination of these operations can be carried out by a single run of fd_filter.
fd_extract reads a flow data file and produces from it a file which is a column list of a 
matrix.  The matrix has one row for each sample in the flow data, and one column for each 
specified set of flows.  Such a file can be used as input to other utility programs, for 
example it can be plotted using gnuplot.
Both programs are Unix filter programs, i.e. they read from an input file (usually standard 
input) and write to standard output.  Instructions describing the processing operations are 
read from a format file.


1.2.	 Format Files
The syntax for format files is given below in the form of railway diagrams, and detailed 
examples are given in the following sections.  
Each statement in a rule file starts at the beginning of a line and ends with a semicolon.     
A cross-hatch character marks the end of a line; all characters following a cross-hatch on 
a line are ignored by the scanner.
The scanner looks for keywords, numbers and addresses.  Keywords are shown in the 
railway diagrams in upper case, but case is ignored by the scanner.  Keywords, including 
attribute names, must be given in full - abbreviations are not allowed.
2.	fd_filter
2.1.	 fd_filter Format File
 
A filter format file contains one or more of four possible elements, which may appear in 
any order in the file.  The case of characters is not significant within rule files.
2.1.1.	RuleSet Statement
 
The RuleSet statement specifies the rule set from which flows are to be drawn.  The 
default is not to filter out any rule sets, i.e. to copy all flows.
2.1.2.	Format Statement
 
The Format statement specifies the format of rule data lines in the file written by fd_filter. 
It starts with the FORMAT keyword, which is followed by a list of flow attributes in the order 
they are to appear in the Flow Data file.  The attributes include all of the flow attributes 
described in the NeTraMet manual, and also the following:
ToOctetRate	Number of octets sent since previous collection
FromOctetRate	Number of octets received since previous collection
ToPDURate	Number of PDUs sent since previous collection
FromPDURate	Number of PDUs received since previous collection
TagNbr	Tag number for this flow (see 'Tag Statement' below)

2.1.3.	Tag Statement
 
A tag statement identifies a flow as having a particular set of attribute values, and provides 
the tag as a convenient way to refer to those flows.  For example we might use tag 2 to 
mean 'all flows with source address 130.216.3.1.'  Note that there may be many flows 
which meet criterion.  Using the tag in this way is much more convenient than using the 
{rule set number, flow number, start time} triple which the meter uses to uniquely define 
each flow, and which the user has no control over.
A tag statement starts with the keyword TAG, followed by a list of attribute values.  The 
attributes allowed here are only those which are known to the meter, i.e. those 
documented in the NeTraMet manual.  Attribute values are separated by commas, and the 
list is terminated by a semicolon.
2.1.4.	Statistics Statement
 
The Statistics statement tells fd_filter to copy meter performance statistics from the 
input flow data file.
2.2.	Using fd_filter
To run fd_filter, give the following command:
fd_filter format_file input_flow_file > output_flow_file
This is the full version of the command, requesting fd_filter to read an input flow file, 
process it as specified by a format file and write a new output flow file.  If the input file isn't 
given fd_filter will read from standard input.  fd_filter always writes to standard output, so 
the above command uses Unix output redirection to write to a disk file instead.
2.3.	A sample filter format file
Here is an example format file for fd_filter:
Format:
   TagNbr SourcePeerType "\t"
   ToPduRate FromPduRate "\t" ToOctetRate FromOctetRate;
Tag 1:
   SourcePeerType=IP;
Tag 2:
   SourcePeerType=Novell;
Tag 3:
   SourcePeerType=DECnet;
Tag 4:
   SourcePeerType=EtherTalk;
Tag 5:
   SourcePeerType=CLNS;
Tag 6: 
   SourcePeerType=Other;
This format file was written to read flow data files produced using the meter's default rule 
set, and produce an output file with Octet and PDU rates and tags for each of the network 
PeerTypes.
3.	fd_extract
3.1.	 fd_extract Format File
 
A filter format file contains one or more of three possible elements, which may appear in 
any order in the file.  The case of characters is not significant within rule files.
3.1.1.	RuleSet Statement
 
The RuleSet statement specifies the rule set from which flows are to be drawn.  The 
default is not to filter out any rule sets, i.e. to copy all flows.
3.1.2.	Column Statement
 
The Column statement specifies one column of data to be written into the output file.  It 
starts with the keyword COLUMN, followed by the column number.  Column 1 is used for 
the sample time, so column numbers must start at 2.  Be careful to specify each column 
once only!
The Scale, if given, is a number used to divide the values from the flow data file before 
writing them to the output file.  The default scale value is one.
  
Any integer may be specified for the scale factor.  KILO and MEGA provide a simple way 
to specify factors of 1,000 and 1,000,000 respectively.
 
The Data Spec tells fd_extract what data is to appear in the output column.  It is given as 
a summation over attributes and tag numbers.  For example we could request 
ToOctetRate + FromOctetRate for flows 3 + 5 + 8.
 
The keyword STATISTICS indicates that one of the meter performance is to be used for 
this column instead of flow data.  sss is the three-letter abbreviation for the particular 
statistic; these are listed in the NeTraMet manual.
3.1.3.	Time Statement
 
The TIME statement specifies units for the time axis, which is always written to the 
output file as column 1.  The time scale may be ELAPSED, i.e. the output time 
starts at 0 for the first output row, or CLOCK, in which case the output is as actual 
time of day.  Time units may be SECONDS, MINUTES, HOURS or DAYS.  The 
default time statement values are CLOCK HOURS.
3.2.	Using fd_extract
To run fd_extract, give the following command:
fd_extract format_file input_flow_file > output_column_file
This is the full version of the command, requesting fd_filter to read an input flow file, 
process it as specified by a format file and write an output column-list file.  If the input file 
isn't given fd_filter will read from standard input.  fd_filter always writes to standard output, 
so the above command uses Unix output redirection to write to a disk file instead.
3.3.	A sample extract format file
Here is an example format file for fd_extract:
time: elapsed minutes

column:  2  scale 2000 ToOctetRate+FromOctetRate tag 1;  # IP
column:  3  scale 2000 ToOctetRate+FromOctetRate tag 2;  # Novell
column:  4  scale 2000 ToOctetRate+FromOctetRate tag 3;  # DECnet
column:  5  scale 2000 ToOctetRate+FromOctetRate tag 4;  # EtherTalk
column:  6  scale 2000 ToOctetRate+FromOctetRate tag 6;  # Other

column:  7  scale 2    ToPDURate+FromPDURate tag 1;  # IP
column:  8  scale 2    ToPDURate+FromPDURate tag 2;  # Novell
column:  9  scale 2    ToPDURate+FromPDURate tag 3;  # DECnet
column: 10  scale 2    ToPDURate+FromPDURate tag 4;  # EtherTalk
column: 11  scale 2    ToPDURate+FromPDURate tag 6;  # Other
This format file was written to read flow data files produced by fd_filter using the example 
format file given above and produce an output plot file with elapsed minutes in column 
one, total OctetRates in columns 2 to 6 and total PDUrates in columns 7 to 11.
		
07/19/94	-- 5  --	NeTraMet