





                                    NNStat:

                    Internet Statistics Collection Package

                          Introduction and User Guide



                               Robert T. Braden
                              Annette L. DeSchon

                     USC / Information Sciences Institute
                          Marina del Rey, California

                                 January 1991

                                  RELEASE 3.0

            This document describes Release 3.0 of NNStat, a         |
            package of programs for the distributed collection of    |
            Internet traffic statistics.                             |



                      SYNOPSIS OF CHANGES IN RELEASE 3.0             |


            *    Support was added for Ultrix and for little-endian  |
                 machine architectures.                              |

            *    The configuration language was extended to include  |
                 all the features originally envisioned in the       |
                 SIGCOMM '88 paper.  In particular, boolean          |
                 expressions, "symmetric if" statements,  and        |
                 "select" statements are now supported.              |

            *    The algorithm used to compile a _s_t_a_t_s_p_y             |
                 configuration was improved to produce more          |
                 efficient execution.  For details, see ISI Report   |
                 RR-88-207.                                          |

            *    Configuration error messages now include line       |
                 numbers, to pinpoint configuration errors.          |

            *    Two new commands were added to statspy: "include"   |
                 allows more convenient creation of configurations,  |
                 and "file" allows retrospective diversion of        |
                 standard output to a file.                          |



        Braden & DeSchon                                       [Page 1]




        NNStat-Internet Statistics Package                  Release 3.0


            *    The _b_i_n-_p_k_t and _w_o_r_k_i_n_g-_s_e_t object classes have     |
                 been extended in several ways, and two new binary   |
                 classes _b_i_n-_p_k_t_2 and _w_o_r_k_i_n_g-_s_e_t_2 have been added.  |
                 See Appendix A for details.                         |

            *    An old bug in _c_o_l_l_e_c_t that caused occasional polls  |
                 to be missed has been fixed.                        |

            *    A number of other minor errors have been found and  |
                 fixed.  See the CHANGES file in the release for     |
                 more details.                                       |

            *    The Ethernet interface code was reorganized to      |
                 simplify support for alternative system             |
                 interfaces.                                         |

            *    There were some significant internal                |
                 reorganizations performed on statspy.  See the      |
                 CHANGES file.                                       |

            *    Many internal cosmetic changes were made.           |


            See Appendix E for a summary of earlier releases.



























        Braden & DeSchon                                       [Page 2]




        NNStat-Internet Statistics Package                  Release 3.0


        _1.  _I_n_t_r_o_d_u_c_t_i_o_n

        NNStat is a facility for the distributed collection of Internet
        traffic statistics. This facility is designed to support the
        requirements of a network administrator for gathering long-term
        usage statistics simultaneously at many network entry points.
        Although it is primarily intended for collecting long-term
        traffic statistics for administration, management, and topology
        engineering, NNStat is sufficiently general to be useful for
        some operational problem solving.

        Distributed statistics collection has two aspects: (1)
        acquisition of the primary data at one or more locations, and
        (2) collection of all this acquired data at a single location.


        (1)  Distributed Data Acquisition

             The raw data must be acquired at a number of
             network/Internet points simultaneously.  In the NNStat
             model, there will be a _s_t_a_t_i_s_t_i_c_s _a_c_q_u_i_s_i_t_i_o_n _a_g_e_n_t (SAA)
             process executing in a computer system attached to each
             network/Internet node for which data is required.  The SAA
             machines could be packet switches, gateways, general-
             purpose hosts, or hosts dedicated to the acquisition
             function.


        (2)  Centralized Data Collection

             Data (or summaries of data) acquired by the SAA processes
             must be transmitted to a central site for analysis,
             reporting, and long-term storage.  This central site, the
             _s_t_a_t_i_s_t_i_c_s _c_o_l_l_e_c_t_i_o_n _h_o_s_t (SCH), will run a data
             collection program to gather the data from the SAA
             processes.  In many cases, a single locus for data
             collection is sufficient; however, it should be possible
             to have multiple SCH's simultaneously gathering data from
             the same set of acquisition agents.  We may think of a
             primary SCH that serves as a central repository for usage
             data by a particular administration, with perhaps
             secondary collection hosts being used intermittently for
             short-term statistical studies.


        The principal components of the NNStat package are an SAA
        program and an SCH program.  The NNStat design is based upon
        the common use of Ethernets for interconnection of networks and



        Braden & DeSchon                                       [Page 3]




        NNStat-Internet Statistics Package                  Release 3.0


        Internet regions.  NSFnet provides an example:


        o    Each component of NSFnet above the campus level (i.e., the
             NSFnet Backbone and each of the middle-level networks)
             consists of a set of IP gateways connected by serial
             lines.


        o    Each gateway is also connected to an Ethernet that is used
             as the interconnect medium to one or more lower-level
             networks.  We refer to this as an _i_n_t_e_r_c_o_n_n_e_c_t _E_t_h_e_r_n_e_t.


        Figure 1 shows a typical configuration at one of the network
        nodes.  The gateway G is a packet switch that forms part of the
        network under consideration.  G1 and G2 are entrance gateways
        to the same or different lower-level networks.


                                Lower-level Network(s)
                                      |        |
                                      |        |
                                      G1       G2
                                      |        |
                         Interconnect |   Ether|net
                        |======.======.========.========.====|
                               |                        |
                               |                      __|__
                               G                     | SAA |
                              / \                    |_____|
                             /   \
                            /     \
                       Serial lines to other
                                 network nodes


                     Figure 1.  Typical Network Node Configuration



        Interconnect Ethernets provide convenient and appropriate
        points for gathering NSFnet statistics.  They are convenient
        because an SAA executing on a host connected to one to these
        Ethernets can monitor the traffic in promiscuous mode (see
        Figure 1).  Thus, we can monitor the entrance and exit traffic
        without changing any gateway code.  The interconnect Ethernets
        are also appropriate points for administrative statistics-



        Braden & DeSchon                                       [Page 4]




        NNStat-Internet Statistics Package                  Release 3.0


        gathering.  Administrators and traffic planners are concerned
        mainly with packets entering and leaving the network; the fact
        that traffic between individual network routers cannot be
        monitored from the Ethernets is not a serious drawback.

        Implementing NNStat in an SAA host rather than in a gateway or
        packet switch had a number of advantages.


        (1)  Timeliness:  The facilities provided by NNStat were needed
             quickly for NSFnet management.  It will be some time
             before equivalent traffic measurement standards are
             developed and implemented by gateway vendors.


        (2)  Generality:  We wanted to incorporate a degree of
             flexibility and generality into NNStat that is not
             currently available in gateways.


        (3)  Performance:  Comprehensive statistics gathering require a
             non-trivial amount of CPU time and memory space; it is
             very undesirable to burden the current generation of
             gateways with this additional resource drain.


        (4)  Experimentation:  By implementing this function outside
             gateways, we are free to experiment with different
             approaches; eventual incorporation of our results into
             gateways is a reasonable goal.


        (5)  Universality:  There may not be a gateway at the point to
             be monitored; for example, there might be a link-level
             bridge.


        The primary task of the SAA is to count the occurrences of
        packets with "interesting" configurations of values in their
        header fields.  In the NNStat design, what is "interesting" is
        determined by the SAA configurations, which can be set or
        changed dynamically.

        Our model of NNStat operation within a particular network is as
        follows.  The administration will set up the acquisition
        agents, one at each point from which data is desired,
        configured to collect a basic set of statistics.  These
        statistics will be reported to the SCH to be summarized over



        Braden & DeSchon                                       [Page 5]




        NNStat-Internet Statistics Package                  Release 3.0


        sites, time, and perhaps administrative subsets of the
        networks.  In addition, management and operational personnel
        will dynamically modify the SAA configurations from time to
        time, to answer additional statistical questions about the
        traffic.


        Finally, we should mention some non-goals for the NNStat
        effort.


        o    NNStat does not provide fancy display or analysis programs
             for presenting the statistics.  This is potentially a
             large and complex problem that is outside the scope of the
             NNStat effort.


        o    NNStat cannot gather statistics for traffic on the serial
             lines between IP routers; it can measure only the network
             entry and exit traffic.  NNStat is intended to complement,
             not replace, the statistics gathering facilities built
             into gateways.  For example, gateways typically count line
             errors and dropped packets on each of their physical
             interfaces, to monitor and diagnose line problems.  These
             facilities are vital for operation and maintenance of the
             gateways and lines, forming the "first line of defense"
             for problem diagnosis.  However, NNStat is not generally
             concerned with short-term operational functions.



        _2.  _O_v_e_r_v_i_e_w _o_f _N_N_S_t_a_t


        The NNStat package, which has been implemented for a 4.2/3BSD
        system, includes the following components:


        (A)  SAA Program - _s_t_a_t_s_p_y

             The statistics acquisition agent program of NNStat is
             named _s_t_a_t_s_p_y.  _S_t_a_t_s_p_y currently supports:                 |

             o    Sun3 and Sun4 workstations under Sun OS releases 3.4,  |
                  3.5, 4.0.3, and 4.1;                                   |

             o    IBM RT processors running 4.3BSD networking code; and  |




        Braden & DeSchon                                       [Page 6]




        NNStat-Internet Statistics Package                  Release 3.0


             o    little-endian architecture machines running Ultrix.    |

        The code could be ported to any 4.2/3BSD system that provides    |
        an interface for promiscuous access to the Ethernet.

        Each Ethernet packet that _s_t_a_t_s_p_y observes contains an Ethernet
        header followed by a sequence of one or more other protocol
        headers (e.g., IP, TCP, etc)., which reflect the successive
        encapsulation implied by protocol layering.  Each protocol
        header may be considered to be a string of bits that is
        logically divided into substrings called _f_i_e_l_d_s.

        A particular _s_t_a_t_s_p_y process can (and typically will) gather a
        number of different statistical measures of the packet traffic
        simultaneously.  Each of these measures is gathered by a
        separate _s_t_a_t_i_s_t_i_c_a_l _o_b_j_e_c_t, or simply _o_b_j_e_c_t.  The set of
        objects and the selection of protocol fields that they monitor
        is determined by the _c_o_n_f_i_g_u_r_a_t_i_o_n, that can be set or changed
        while _s_t_a_t_s_p_y is executing.

        _S_t_a_t_s_p_y is controlled by a command language that provides
        commands for setting and displaying the configuration and for
        displaying the statistical data gathered by its objects.
        _S_t_a_t_s_p_y commands may be entered from three locations:


             o    From a file, at start-up time.

                  This is the recommended way to set up the
                  configuration for collecting long-term statistics, so
                  _s_t_a_t_s_p_y will be self-configuring if the SAA host
                  crashes and restarts.


             o    Interactively, from the local console controlling
                  statspy.

                  This allows _s_t_a_t_s_p_y to be used as a standalone
                  monitoring tool.


             o    Interactively, from a remote system running the _r_s_p_y
                  program (see below).

             Section 3 describes the command language, including the
             command used to set or modify the configuration. Appendix
             C suggests useful configuration techniques.




        Braden & DeSchon                                       [Page 7]




        NNStat-Internet Statistics Package                  Release 3.0


             If it is executed in foreground, _s_t_a_t_s_p_y accepts commands
             and displays statistics locally. Whether in foreground or
             background, it listens for a TCP connection from the
             remote collection machine (SCH) or from a remote _r_s_p_y
             program, and processes all commands entered over that TCP
             connection.  However, the acquisition of new statistical
             data from the Ethernet takes highest priority.

             Note that _s_t_a_t_s_p_y is not expected to record its data on a
             local disk; permanent data recording is assumed to take
             place only at the SCH.  This choice was made to minimize
             operational problems at each SAA site.

             For more details on the operation and configuration of
             _s_t_a_t_s_p_y, see Section 3 below.


        (B)  Remote SAA Control Program - _r_s_p_y

             The _r_s_p_y program provides an interactive command interface
             for controlling a remote _s_t_a_t_s_p_y instance.  _R_s_p_y can be
             used to establish, query, or modify the configuration and
             to read and/or clear the statistical objects.  The use of
             _r_s_p_y is described in Section 3.4.


        (C)  Centralized Collection Program -- _c_o_l_l_e_c_t

             _C_o_l_l_e_c_t is the central data collection program of NNStat;
             it executes on the SCH to collect data from one or more
             _s_t_a_t_s_p_y instances.  _R_s_p_y and _c_o_l_l_e_c_t use the same remote
             network interface to statspy, but they are designed for
             different tasks:  while _r_s_p_y is intended to be used
             interactively for testing, probing, and running short-term
             statistical studies, _c_o_l_l_e_c_t is intended to be executed as
             a daemon, collecting and recording traffic data over a
             long period of time.

             In normal operation, _c_o_l_l_e_c_t will periodically poll a
             specified set of SAA's for statistical data and write the
             results into cumulative data files.  Note that data is
             delivered to _c_o_l_l_e_c_t only as a result of its polling the
             SAA's.  An alternative design would have the SAA's
             spontaneously report their data periodically to the SCH.
             We chose to use polling for data collection in order to
             ensure (approximate) synchronization in gathering
             statistics from all the SAA's, while avoiding an
             "implosion" of reports to at the central site.



        Braden & DeSchon                                       [Page 8]




        NNStat-Internet Statistics Package                  Release 3.0


             The following basic parameters must be defined to run
             _c_o_l_l_e_c_t:


             *    List of SAA host names or addresses.

             *    TCP port for _s_t_a_t_s_p_y on each SAA (optional).

             *    The name(s) of objects whose accumulated data are to
                  be retrieved from each _s_t_a_t_s_p_y instance.

             *    Polling interval Ti.

             *    Checkpoint interval Tc.

             *    Clear ("reset") interval Tr.

             In one data collection cycle, _c_o_l_l_e_c_t will open a TCP
             connection to _s_t_a_t_s_p_y on each of the listed hosts and
             retrieve data from ("read") the specified objects,
             recording the results in files.  This cycle will be
             repeated every Ti minutes, but _c_o_l_l_e_c_t will save or
             "checkpoint" the data for later analysis only every Tc
             minutes.

             The totals returned by each poll are cumulative, unless
             the objects are explicitly cleared by command or the SAA
             (crashes and) restarts.  Therefore, if communication
             between the SCH and an SAA is lost temporarily, a later
             successful poll should return complete data. The minimum
             polling interval Ti should be short enough that data lost
             because of a SAA restart will be negligible.  Of course,
             if an SAA is down for an extended period, there is no way
             to capture statistics from that interconnect Ethernet for
             that period.

             _S_t_a_t_s_p_y generally keeps 32-bit counters for counting
             packet events, and 56-bit counters for accumulating byte    |
             totals.  If the average rate were 1000 packets per second,
             some packet counters might overflow once every 4 weeks.
             _S_t_a_t_s_p_y makes no special provision for overflow, but
             instead expects that _c_o_l_l_e_c_t will be set up to
             periodically clear all the counters using the Tc
             parameter.  Every Tc minutes, _c_o_l_l_e_c_t will instruct each
             _s_t_a_t_s_p_y to clear its data counters after the current
             values are retrieved.

             Suggested values for the time parameters to _c_o_l_l_e_c_t are:



        Braden & DeSchon                                       [Page 9]




        NNStat-Internet Statistics Package                  Release 3.0


              Ti = 5 minutes
              Tc = 60 minutes
              Tr = 1440 minutes (24 hours).

             _C_o_l_l_e_c_t will produce a separate data file for each
             (statistical measure, SAA host) pair, for all the
             statistical measures and hosts specified in its
             parameters.  Each of these data files will contain the
             read data for every checkpoint time, plus the last data
             recorded before the _s_t_a_t_s_p_y was restarted or its object(s)
             cleared, and will be cumulative from the time that _c_o_l_l_e_c_t
             program was started.  If the SCH crashes or _c_o_l_l_e_c_t is
             restarted for some reason, a new set of data files will be
             created.

             Section 4 explains how to use _c_o_l_l_e_c_t.



        (D)  Data Reduction Programs

             The NNStat distribution includes some useful programs and
             AWK scripts for processing and summarizing the data files
             created by the _c_o_l_l_e_c_t program.  These will be described
             in the Section 4.


























        Braden & DeSchon                                      [Page 10]




        NNStat-Internet Statistics Package                  Release 3.0


        _3.  _S_t_a_t_s_p_y



        _3._1.  _U_s_i_n_g _S_t_a_t_s_p_y


        To execute statspy, issue the following system command:


          statspy [-i _i_n_t_e_r_f_a_c_e] [-p _p_o_r_t] [-h] [-1] [_c_o_m_m_a_n_d-_f_i_l_e]


        The parameters are:


        -i   Ethernet interface device name; the default is the (first)  |
             Ethernet interface on the local system.

        -p   TCP port number on which _s_t_a_t_s_p_y will listen for a
             connection from _c_o_l_l_e_c_t or _r_s_p_y.  The default is 2222.


        -h    _S_t_a_t_s_p_y will write a history of remote commands into the
             standard output.


        -1   ("minus one") Output from read operations will be           |
             displayed in single-column format.                          |


        _c_o_m_m_a_n_d-_f_i_l_e
             This optional parameter is the name of a file containing
             commands to be executed when _s_t_a_t_s_p_y starts.  Normally,
             these will be commands to establish the initial
             configuration of objects for gathering data.  If this
             parameter is omitted, _s_t_a_t_s_p_y will await commands from the
             local console or from _r_s_p_y (executing either on the SAA
             host or remotely) to establish the configuration.


        When it starts, _s_t_a_t_s_p_y executes the commands found in
        _c_o_m_m_a_n_d-_f_i_l_e, if any.  If it has been executed in foreground,
        _s_t_a_t_s_p_y then enters an interactive command mode in which it
        repeatedly issues a prompt (">") and awaits command input.  If
        _s_t_a_t_s_p_y is executed in background, its standard output and
        standard error output should be directed to a file to aid
        diagnosis in case a problem occurs.



        Braden & DeSchon                                      [Page 11]




        NNStat-Internet Statistics Package                  Release 3.0


        _S_t_a_t_s_p_y listens on the specified TCP port for a connection from
        a remote _c_o_l_l_e_c_t or _r_s_p_y program.  It is currently limited to
        one TCP connection at a time, so the TCP connection is opened
        for each sequence of remote commands and closed again when the
        responses have been returned.


        _3._2.  _S_t_a_t_s_p_y _C_o_m_m_a_n_d _L_a_n_g_u_a_g_e


        The operation and configuration of _s_t_a_t_s_p_y are controlled by a
        simple command language.  Commands to _s_t_a_t_s_p_y can be entered
        from three sources:


        (1)  the initial command file (see preceding section);


        (2)  interactively from the controlling console (i.e., from
             standard input);


        (3)  remotely from a _c_o_l_l_e_c_t or _r_s_p_y program.


        Remote command requests have priority over local commands,
        while processing new data from the Ethernet generally preempts
        either local or remote command processing.

        Commands from any source are free-form and may occupy as many
        lines as necessary.  Any text following the "#" character and
        up to the next newline will be ignored, to allow comments in
        the command stream.

        Various _s_t_a_t_s_p_y commands reference objects and protocol fields
        by name.  The field names are built into the program (see
        Figure 2 in Section 3.3), while object names are assigned by
        configuration commands.  There are commands to return a
        complete lists of the names of objects and of fields ("read ?"
        and "show ?", respectively).

        Commands refer to particular objects by their names.  They can
        refer to a set of objects by using a "wildcard" matching
        scheme. An object specification parameter,  known as an _o_b_j_e_c_t
        _s_p_e_c, may contain asterisks as wildcard characters to match any
        number of characters.  For example, the command:

           read *IP*



        Braden & DeSchon                                      [Page 12]




        NNStat-Internet Statistics Package                  Release 3.0


        will apply the read operation to all objects whose names
        include the string "IP", and

           read *

        will read all objects.  In setting up the configuration, the
        user should choose a consistent scheme for assigning object
        names to increase the usefulness of this wildcard matching.

        As we will see in the next section, some objects do not
        themselves gather data, but instead conditionally select other
        objects that do. Such conditional objects can be left unnamed,
        since they will generally not be referenced by a command after
        they are created. Commands differ in how they treat such
        unnamed objects (see below).

        We now list all the commands recognized by _s_t_a_t_s_p_y.


        o    read <object spec>

             Displays the data recorded by the object(s) whose names
             match <object spec>. Unnamed objects cannot be the target
             of a read operation.


        o    read ?

             Displays a summary, which includes the names of all
             objects.  Unnamed objects will be included in this
             summary.


        o    clear <object spec>

             Sets all objects whose names match <objec spec> to their
             initial states, i.e., clears their statistical
             accumulation. "Clear *" will clear all objects, including
             unnamed objects.


        o    readclear <object spec>

             Executes a read followed by a clear operation, atomically.
             "Readclear *" will clear but not read all unnamed objects.


        o    show *



        Braden & DeSchon                                      [Page 13]




        NNStat-Internet Statistics Package                  Release 3.0


             Displays a summary of the current configuration.


        o    show ?

             Displays a list of the built-in field names.


        o    restrict readwrite <address> <mask>
             restrict readonly  <address> <mask>

             These commands may be used to restrict remote _s_t_a_t_s_p_y
             access, by allowing access from only a specified set of
             hosts.  The set is defined by the IP address value
             <address> and the 32-bit address mask <mask>.  Here
             <address> must be expressed in dotted-decimal notation,
             while <mask> may be in dotted-decimal or written as a
             hexadecimal constant.

             The 1 bits in <mask> correspond to significant bits in
             <address>; thus, the bits in <address> that correspond to
             zero bits in <mask> are "wild".  The _r_e_s_t_r_i_c_t command can
             be entered either on the local console or from the
             initialization file; a _r_e_s_t_r_i_c_t command cannot be be
             entered remotely.

             If no _r_e_s_t_r_i_c_t commands have been executed by _s_t_a_t_s_p_y,
             then full remote access is available from any host.  If
             any _r_e_s_t_r_i_c_t command has been executed, however, then
             remote commands will be accepted only from a remote host
             whose IP address matches the (<address>,<mask>) pair of
             one of the _r_e_s_t_r_i_c_t commands.  The commands are examined
             in the order that they have been executed; the first match
             also determines the access mode for the host, either
             read/write or read-only.  In read-only mode, only the _r_e_a_d
             and _s_h_o_w commands are allowed; in read/write mode, all
             remote commands are allowed (except _r_e_s_t_r_i_c_t commands).  A
             mask of 0.0.0.0 will allow all hosts to access with the
             specified mode, regardless of the <address> value.

             Example:
                restrict readonly 128.9.0.0  255.255.0.0
                restrict readwrite 128.9.1.51 0xffffffff

             allows any host on network 128.9 read-only access to
             _s_t_a_t_s_p_y data, but only host 128.9.1.51 can remotely clear
             the objects or change the configuration.




        Braden & DeSchon                                      [Page 14]




        NNStat-Internet Statistics Package                  Release 3.0


        o    subnet <network> <mask>

             This command specifies that the network specified by
             <network> is be subnetted with address mask <mask>.  Here
             <network> is an IP network address expressed in dotted-
             decimal notation, and <mask> is a 32-bit address mask
             expressed as a hexadecimal constant or in dotted-decimal.

             If the specified <network> was defined in a previous
             subnet command, then the new <mask> replaces the previous
             mask, and the command completes with the user reply
             "Replaced".  Otherwise, the new (network,mask) pair is
             added to the existing list of subnets, and the command
             completes with the user reply "Added".

             For every IP datagram received by _s_t_a_t_s_p_y, the source and
             destination addresses are compared with each of the
             subnetted networks in the table, to define the value of
             the virtual subnet fields (see below) as appropriate.
             Since these comparisons increase the CPU load, _s_t_a_t_s_p_y may
             optionally be generated without the comparison code or
             virtual fields for subnets by using: "make statspy
             SUBNET=".  If the _s_u_b_n_e_t command is issued locally or
             remotely to an instance of _s_t_a_t_s_p_y that has been generated
             in this manner, the _s_u_b_n_e_t command will fail with the
             message: "Subnets not supported."


        o    attach { <configuration program> }

             Augments the current configuration with the additional
             statistical object(s) specified by <configuration
             program>. The curly braces are required.  The
             <configuration program> is written using a set of rules
             that we will refer to as the _c_o_n_f_i_g_u_r_a_t_i_o_nlanguage,
             although it is really a sub-language of the command
             language; see Section 3 for details.

             The _a_t_t_a_c_h command is atomic - if any error is found, the
             current configuration will remain unchanged.


        o    detach <object spec>

             Deletes from the configuration each object whose name
             matches <object spec>. This may implicitly delete other
             objects in order to keep the configuration consistent.
             "detach *" will detach all objects, including those that



        Braden & DeSchon                                      [Page 15]




        NNStat-Internet Statistics Package                  Release 3.0


             have no names.


        o    ?

             Displays a list of the commands.  This command is only
             available on the local console.


        o    quit

             Exits to the operating system (shell) on the SAA host.
             This command cannot be issued across the network.


        o    enum { <enum parameters> }

             Defines a set of label strings for use in _r_e_a_d command
             displays.  See Section 3.3.4 for more explanation.  The
             curly braces are required.


        o    include <file name>                                         |

             Replaces the include command with the configuration         |
             program text contained in the specified file.               |


        o    list "<file name>"                                          |

             The standard output is diverted to the specified file,      |
             whose name must be enclosed in quotation marks.  An empty   |
             <file name> (list "") will return output to standard out.


        These commands may be entered either remotely from _r_s_p_y or else
        locally.  The _c_o_l_l_e_c_t program effectively issues the _r_e_a_d and
        _r_e_a_d_c_l_e_a_r commands.

        When a "show ?" command is issued to _s_t_a_t_s_p_y, the first line
        displayed summarizes the overall packet processing since
        _s_t_a_t_s_p_y was started.  For example:

            Acquired 56343 packets in 163 secs=>
                                   345(avg) 755(max) 1250(inst)/sec

        This shows the total Ethernet packets acquired, the elapsed
        time since _s_t_a_t_s_p_y started, the average packets per second, the



        Braden & DeSchon                                      [Page 16]




        NNStat-Internet Statistics Package                  Release 3.0


        maximum number of packets processed in one second, and finally
        the maximum "instantaneous" packet rate.  The last is obtained
        by extrapolating to one second the maximum number of packets
        captured in one clock tick (20ms on the Sun workstation).


        _3._3.  _C_o_n_f_i_g_u_r_i_n_g _S_t_a_t_s_p_y


        We divide the extraction of statistical data from a particular
        Ethernet packet into two phases:


        (1)  Parse the protocol headers to determine the values of the
             various header fields.

             Since efficiency is essential and packet header formats do
             not change very often, the header formats are compiled
             into the _s_t_a_t_s_p_y code.  Each incoming Ethernet packet is
             passed to a subroutine that "knows" how to parse all the
             headers and where to locate the fields.  To add new
             protocols or change header formats, it will be necessary
             to recompile this packet-parsing subroutine of _s_t_a_t_s_p_y.


        (2)  Analyze the parsed field values and gather the desired
             statistics.

             This phase is performed interpretively, using a set of
             rules that comprises the _s_t_a_t_s_p_y configuration.



        _3._3._1.  _F_i_e_l_d_s


        Figure 2 shows a list of the fields that will be extracted by
        _s_t_a_t_s_p_y and made available to the analysis phase.  A particular
        packet will define values for only a subset of the possible
        fields; for example, a TCP packet will define the TCP source
        and destination ports but cannot define UDP ports or an ICMP
        type field.

        As Figure 2 shows, each field is assigned a mnemonic name
        string, a size in bytes, and an intrinsic type.  The type is
        used principally to choose an appropriate format for displaying
        the data values from that field.  Each field is extracted into
        an integral number of 8-bit bytes.  Thus, the IP version number



        Braden & DeSchon                                      [Page 17]




        NNStat-Internet Statistics Package                  Release 3.0


        (field "IP.version") is actually 4 bits but is extracted
        (right-justified) by the parser into a byte.


             Field Name   Length(bytes)    Type

             Ether.src        6            Ethernet Address
             Ether.dst        6            Ethernet Address
             Ether.type       2            Integer

             IP.version *     1            Integer
             IP.length        2            Integer
             IP.option *      1            Integer
             IP.TOS           1            Bits
             IP.offset *      2            Integer
             IP.protocol      1            Integer

             IP.srchost       4            IP Address
             IP.dsthost       4            IP Address
             IP.srcnet *      4            IP Address
             IP.dstnet *      4            IP Address
             IP.srcsubn *     4            IP Address                    |
             IP.dstsubn *     4            IP Address                    |

             TCP.srcport      4            Integer
             TCP.dstport      4            Integer
             UDP.srcport      4            Integer
             UDP.dstport      4            Integer
             ICMP.type        1            Integer
             ICMP.code        1            Integer                       |

             packet *      Variable        Bits
             length *         4            Integer

                  Figure 2.  Field Definitions in Packet Parser


        This list includes virtual fields whose values are derived from
        those actually appearing in the header; these are marked with
        "*" in Figure 2.  The virtual fields have the following
        meanings:


        (a)  packet

             This virtual field contains the binary value of the header
             sequence.  It is intended for recording particular packet
             headers for diagnostic rather than statistical purposes.



        Braden & DeSchon                                      [Page 18]




        NNStat-Internet Statistics Package                  Release 3.0


        (b)  length, IP.length

             The "length" field is defined for all packets to be the
             total length in bytes exclusive of the Ethernet header.
             Field "IP.length" is only defined for IP datagrams, but
             when it is defined it has the same value as the field
             "length".


        (c)  IP.version

             This virtual field containing the IP version number is
             extracted by _s_t_a_t_s_p_y for analysis _o_n_l_y for a packet with a
             non-standard IP version number (i.e., not 4). No later
             fields (IP, TCP, or UDP) can or will be extracted from the
             same packet.


        (d)  IP.option

             This is the code byte for each IP option field found in
             the packet, or zero if there are no options.  Note that a
             single packet may contain several options, so this
             pseudo-field may be multiply defined.


        (e)  IP.offset

             This virtual field is extracted (we say "defined") by
             _s_t_a_t_s_p_y only for a packet that is a fragment of a complete
             IP datagram.  When it is defined, IP.offset is the
             reassembly offset of this fragment in bytes (i.e., 8 times
             the offset field in the IP header).

             Only the first fragment, i.e., the fragment at offset
             zero, can be parsed further for higher-level protocol
             headers (TCP, UDP, or ICMP).  We made the reasonable
             assumption that these headers will always fit within the
             first fragment, i.e, that the first fragment will always
             be larger than about 90 bytes (unless it is also the last
             fragment).

             Note that for a fragmented packet the IP.length,
             IP.option, and IP.TOS fields are defined for each fragment
             separately. Thus, IP.length gives the length of the
             fragment; _s_t_a_t_s_p_y cannot determine the total length the
             reassembled IP datagram.




        Braden & DeSchon                                      [Page 19]




        NNStat-Internet Statistics Package                  Release 3.0


        (f)  IP.srcnet, IP.dstnet

             These are the (Class A, B, C, or D) network numbers         |
             derived from the real IP source and destination address     |
             fields, respectively.  These virtual fields provide a       |
             simple and efficient way to develop statistics based upon   |
             networks rather than hosts.  Note: for a class D address,   |
             the network number and the host number are the same.        |


        (g)  IP.srcsubn, IP.dstsubn*

             These are the source and destination subnet numbers,
             derived using the address masks found in the subnet table
             for the corresponding Class A, B, or C network numbers.
             The subnet table is built using the _s_u_b_n_e_t configuration
             command (see above).  If a particular network does not
             correspond to an entry in the subnet table, then the
             corresponding subnet number virtual field will be the same
             as the network number field.

             Example: Suppose the command has been issued:

               subnet 128.9.0.0 0xffffff00

             and a packet is received with destination address
             128.9.7.25.  Then:

               IP.srchost = 128.9.7.25
               IP.srcsubn = 128.9.7.0
               IP.srcnet =  128.9.0 0

             As noted earlier, _s_t_a_t_s_p_y may be generated without the
             subnet virtual fields IP.srcsubn and IP.dstsubn by using:
             "make statspy SUBNET=".



        _3._3._2.  _O_b_j_e_c_t_s _a_n_d _I_n_v_o_c_a_t_i_o_n_s


        Statistical analysis of field values is performed by a set of
        _________________________
        *Note: Release 2.4  documentation  used  the  incorrect
        names:  IP.ssubnet,  IP.dsubnet  for the subnet fields.
        We regret the confusion caused by this faulty  documen-
        tation.




        Braden & DeSchon                                      [Page 20]




        NNStat-Internet Statistics Package                  Release 3.0


        _s_t_a_t_s_p_y entities known as (statistical) _o_b_j_e_c_t_s.  NNStat
        implements unary and binary objects, i.e., objects that take
        one and two input values, respectively.  Each object may have a
        unique name that is assigned when the object is defined.

        Objects are logically independent of fields; objects simply
        build and report statistical data structures based on (field)
        values written into them. The analysis phase is essentially a
        series of calls on object _W_r_i_t_e subroutines; in each call, a
        particular field value (or pair of field values) is passed as a
        parameter.  These Write subroutine calls are known as
        _i_n_v_o_c_a_t_i_o_n_s.

        For example, the configuration might specify that an object
        named "Protocol.freq" is to be invoked on the field named
        "IP.protocol"; that is, the value of field "IP.protocol" will
        be written into object "Protocol.freq".  The configuration may
        specify that the same field is to invoke more than one object.
        Conversely, the same object may be invoked on more than one
        field, to build a composite statistic. Fields that invoke the
        same object must be compatible, i.e., they must have the same
        size and type (see Figure 2 for the types).

        Each object is an instance of a particular object class; all
        objects of the same class share the same program modules but
        each has its private data structure.  The _s_t_a_t_s_p_y object
        classes generally fall into two categories:  recorders and
        filters.


        (A)  Recorders

             A data recorder object or _r_e_c_o_r_d_e_r builds some statistical
             data structure (e.g., a frequency distribution table) from
             the field values with which it is invoked.

                  Example:  _f_r_e_q-_a_l_l

                  An object of class _f_r_e_q-_a_l_l builds a frequency
                  distribution table for all distinct values of the
                  field on which it is invoked.

             Figure 3 shows an example of the display output resulting
             from a read operation on a _f_r_e_q-_a_l_l object named "gwys".
             The field values recorded in this object are 48-bit
             Ethernet addresses.





        Braden & DeSchon                                      [Page 21]




        NNStat-Internet Statistics Package                  Release 3.0



              OBJECT: gwys Class= freq-all [CreationTime: 11:51:25 11-05-87]
                ReadTime: 11:52:18 11-05-87,
                ClearTime: 11:51:25 11-05-87 (@-53 secs)
               Total Count= 492 (+0 orphans)
               #bins= 8
              [2:7:1:0:8:30]= 219     (45%) @-1secs
              [8:0:2:0:49:30]= 127    (26%) @-1secs
              [2:60:8c:ee:2:34]= 52   (11%) @-1secs
              [24:24:80:9:0:6b]= 44   (8.9%) @-1secs
              [8:0:14:10:12:8]= 27    (5.5%) @-1secs
              [8:0:2:0:f7:2b]= 20     (4.1%) @-2secs
              [aa:0:3:1:5:90]= 2      (0.41%) @-13secs
              [2:7:1:0:4:45]= 1       (0.2%) @-32secs


                   Figure 3. Example of Read Output


             Each of the bottom 8 lines displays the contents of one
             bin: the value (6 bytes in hex), the count, the percentage
             count, and the last-update time relative to the current
             time ("ReadTime").

             Figure 4 shows another example display example, the output
             of a read operation on an object named "nets" of class
             _m_a_t_r_i_x-_s_y_m-_a_l_l.  This object builds a table of frequencies
             of pairs of values, in this case the  IP source and
             destination networks.  Objects of this class accumulate
             not only the packet counts but also the total byte counts
             for each bin; the byte count is bracketed in "&...B".  The
             "Total Count" and "Total Bytes" values are the sums across
             all the bins.


















        Braden & DeSchon                                      [Page 22]




        NNStat-Internet Statistics Package                  Release 3.0



             OBJECT: nets  Class= matrix-sym-bytes [Created: 17:00:03 11-29-89]
               ReadTime: 08:21:07 11-30-89,
               ClearTime: 17:00:03 11-29-89 (@-55264sec)
               Total Count= 430374 (+0 orphans)
               Total Bytes= 76886567B  #bins= 7  Maxchain = 1  SortMoves = 34
             [128.9.0.0 : 128.9.0.0]= 400426 &68310714B (93.0%) @-2sec
             [128.9.0.0 : 128.18.0.0]= 11766 &4011645B  ( 2.7%) @-49145sec
             [128.9.0.0 : 192.48.219.0]= 7678 &2098103B ( 1.8%) @-48187sec
             [128.9.0.0 : 128.89.0.0]= 7378 &2332623B   ( 1.7%) @-52265sec
             [128.9.0.0 : 128.125.0.0]= 3052 &128965B   ( 0.7%) @-33sec
             [128.9.0.0 : 192.5.18.0]= 49 &2605B        ( <.1%) @-48315sec
             [128.9.0.0 : 131.179.0.0]= 25 &1912B       ( <.1%) @-48177sec


                   Figure 4. Example of Read Output



        (B)  Filters

             Data filter objects or _f_i_l_t_e_r_s provide conditional
             branches in the configuration.

             When invoked with a field value, a filter tests it against
             some numerical or set-inclusion criterion, to determine a
             Boolean (True/False) value.  This Boolean value is then
             used by the statspy interpreter to select one of two
             alternative sub-sequences of invocations, where either of
             these sub-sequences may be empty.

                  Example 1: _e_q_f

                  An object of class _e_q_f tests a field value for
                  equality to a parameter value that is specified when
                  the object is created.

                  Example 2: _s_e_t_f                                        |

                  An object of class _s_e_t_f tests a field value for        |
                  equality to one of a set of parameter values,          |
                  specified when the object is created.                  |

             Note that _e_q_f is simply a special case of _s_e_t_f, for a set   |
             of one member; _e_q_f is included as a separate class because  |
             it is significantly faster in execution.  The _s_e_t_f class    |
             uses a hash lookup to provide efficient matches against     |
             large numbers of parameters.                                |



        Braden & DeSchon                                      [Page 23]




        NNStat-Internet Statistics Package                  Release 3.0


             By nesting filter invocations in a configuration, _r_e_c_o_r_d
             invocations can be conditioned upon an arbitrary Boolean
             expression over field values.

             Figure 5 shows an example of (a fragment of) a pseudo-
             program, in flow-chart form. This sequence of invocations
             was designed to answer the question: "what are the
             Ethernet addresses of hosts sending or receiving local
             packets, i.e., of [IP] packets that have both source and
             destination IP addresses on the local network?".

                   ____________________________
                  |  Invoke _e_q_f filter object  |
                  |     with parm (128.9.0.0)  |
                  |     on field "IP.srcnet"   |
                  |____________________________|
                       |         |
                       | FALSE   | TRUE
                       V         |
                     (Null)      |
                                 |
                                 V
                    _____________________________
                    |  Invoke _e_q_f filter object  |
                    |     with parm (128.9.0.0)  |
                    |     on field "IP.dstnet"   |
                    |____________________________|
                         |         |
                         | FALSE   | TRUE
                         V         |
                          (Null)   |
                                   |
                                   V
                        __________________________
                       | Invoke _f_r_e_q-_a_l_l recorder |
                       |    object named "gwys"   |
                       |   on field "Ether.src"   |
                       |__________________________|
                                 |
                                 |
                                 V
                        __________________________
                       | Invoke _f_r_e_q-_a_l_l recorder |
                       |    object named "gwys"   |
                       |   on field "Ether.dst"   |
                       |__________________________|

                   Figure 5.  Example Pseudo-program



        Braden & DeSchon                                      [Page 24]




        NNStat-Internet Statistics Package                  Release 3.0


             Figure 5 includes two invocations of an unnamed _e_q_f filter
             object whose parameter is the value 128.9.0.0 (the IP
             address of the local Ethernet).  Thus, the first
             invocation shown in Figure 5 will return TRUE if the IP
             source network number in field "IP.srcnet" is 128.9.0.0,
             and FALSE otherwise.


        _3._3._3.  _C_o_n_f_i_g_u_r_a_t_i_o_n _L_a_n_g_u_a_g_e


        We begin with an example of the configuration (sub-)language
        for _s_t_a_t_s_p_y.  The configuration of Figure 5 could be created by
        entering an _a_t_t_a_c_h command with the <configuration program>
        shown in Figure 6.


               if IP.srcnet is eqf(128.9.0.0)  {
                  if IP.dstnet is eqf(128.9.0.0)  {
                     record Ether.src in local freq-all;
                     record Ether.dst in local;
                  }
               }

              Figure 6. Configuration program that compiles into Figure
             5.


        Figure 6 illustrates several points about the configuration
        language:


        1.   The language is free-field with newlines having no
             meaning.  Hence, we can indent to illuminate the structure
             of the program.


        2.   The language includes compound statements and if
             statements; the latter correspond to invocations of filter
             objects.


        3.   The first time a named object occurs, its class (and
             parameters, if any) must be specified.  They may be
             omitted in later references to the same object. When class
             and/or parameters are included in a later reference, their
             values must exactly match the corresponding values
             specified in the first occurrence of the same object.



        Braden & DeSchon                                      [Page 25]




        NNStat-Internet Statistics Package                  Release 3.0


        4.   Parameters, when required, are enclosed in parentheses
             following the class name.  If there is more than one
             parameter, they are listed separated by commas.


        5.   An unnamed object may be created, by giving only its class
             (and parameter, if any).  Filter objects are often left
             unnamed, since there is usually no need to read them.

             Note: class names are reserved, and may not coincide with
             object names.  The valid class names are all listed in
             Appendix A.


        Two more things about the language are not apparent from this
        example:


        6.   The outer set of braces in Figure 6 is unnecessary.  The
             syntax of the configuration language is generally like
             "C".


        7.   Execution of a specific configuration rule is triggered
             only when the fields upon which it depends are defined in
             a packet.

             Thus, in Figure 6 it was not necessary to explicitly test
             that the packet in question was an IP packet; if it were
             not an IP packet, then the IP.srcnet and IP.dstnet fields
             would not be defined.

             Furthermore, _s_t_a_t_s_p_y checks and enforces consistency among
             the fields, so that the configuration cannot include
             invocations that are logically impossible.  An example of
             such an illegal configuration is:

                 if TCP.srcport is eqf(23)
                    IF UDP.dstport is eqf(6)
                       record Ether.src in Imposs-obj freq-all;

             This configuration is illegal because TCP.srcport and
             UDP.dstport cannot be defined in the same packet, hence
             the "Imposs-obj" recorder would never be invoked. This      |
             conflict will be detected as an error when the              |
             configuration is compiled; the following error message      |
             results: "Config Error: impossible combo of fields:         |
             TCP.srcport UDP.dstport".



        Braden & DeSchon                                      [Page 26]




        NNStat-Internet Statistics Package                  Release 3.0


        In Release 3.0, boolean expressions were introduced in "if"
        statements.  Boolean expressions are built using "and" and "or"
        operators.  For example, Figure 7 will compile the same
        configuration as Figure 6.


               if ((IP.srcnet is eqf(128.9.0.0) ) and
                   (IP.dstnet is eqf(128.9.0.0) ))  {
                     record Ether.src in local freq-all;
                     record Ether.dst in local;
               }

               Figure 7. Another Version of Figure 6.

        The parentheses surrounding the parameter, as in: "eqf(...)"
        are necessary, but all the other parentheses in Figure 7 are
        optional.  We included them in Figure 7 to illustrate that
        parentheses can always be used to avoid ambiguity.

        We now list (partial) syntax rules for the configuration
        language; Appendix B specifies the full syntax.


        (1)  RECORDER OBJECT:

             To create a (unary) recorder object that is invoked on a
             specified field, use the following statement:

                 record <field name> in <object name>

                             <class> ( <parameters> ) ;

             For a binary object, two fields are required:

                 record  <field name>, <field name> in <object name>

                             <class> ( <parameters> ) ;


             In either case, <class> must match the name of a valid
             recorder class.* The current set of object classes and
             corresponding <parameters> are defined in Appendix A.

        _________________________
        *This is not strictly true; a  "record"  statement  may
        specify  a  filter object, although this is rarely use-
        ful.




        Braden & DeSchon                                      [Page 27]




        NNStat-Internet Statistics Package                  Release 3.0


             Normally, every recorder object ought to have a unique
             <object name>, so that it can be read, cleared, and/or
             detached independently of other objects.  However, a
             recorder object may be created with a null <object name>.
             Such an object can be destroyed (detached) or cleared, but
             not read.

             Note the semicolons following record statements; these are
             required.

             The <parameters> string generally specifies a list of one
             or more values separated by commas.  The number and
             meaning of these values depend upon the particular class
             (see Appendix A for details).  The <parameters> string and
             the surrounding parentheses may be omitted if the
             particular class does not require parameters or if the
             specified object has been defined previously with
             parameters.

             If the specified <object name> already exists, a new
             invocation specification refers to the same object.  In
             this case, <class> may be omitted, but if it is
             respecified then it must agree with the class of the
             existing object.  If <class> is respecified, then
             " ( <parameters> )" may also be respecified, but its
             values must agree with the parameters given for the
             existing object.


        (2)  FILTER OBJECTS:

             To create a filter object to be invoked by a specified
             field, use a conditional statement.  The simplest form is
             an <if clause>:

                if <field name> is <object name>
                                          <class> (<params>)

             followed by either:

                 <TRUE invocation>

              or:

                 <TRUE invocation> else <FALSE invocation>


             Here <class> must match the name of a valid filter class.



        Braden & DeSchon                                      [Page 28]




        NNStat-Internet Statistics Package                  Release 3.0


             The sense of the filter clause may be reversed by
             specifying isnot instead of is.

             <TRUE invocation> and <FALSE invocation> may themselves be
             recorder or filter invocations, or may be arbitrary
             sublists of invocations, grouped together inside braces
             "{ }".  These sublists may themselves include filter
             invocations, and this nesting can go to any depth.

             More general conditional clauses may be constructed by      |
             combining conditions of the form:                           |

              <field name> is/isnot <object name> <class>                |
             (<parameters>)                                              |

             using "and" and "or" operators with optional parentheses.   |
             See Figure 7 for a simple example.                          |

             It should be pointed out that boolean expressions are a     |
             notational convenience rather than a performance            |
             improvement.  In the compilation process, any "and" or      |
             "or" operator is expanded to the equivalent nesting of      |
             simple if statements.  For example, Figure 7 is             |
             effectively converted into Figure 6.                        |


        (3)  SYMMETRIC IF STATEMENTS                                     |

             Protocol header fields containing address-like values       |
             frequently occur as (source, destination) pairs.  For       |
             example, there are six such pairs among the statspy fields  |
             listed in Figure 2: Ethernet addresses, IP addresses, IP    |
             networks, IP subnets, TCP ports, and UDP ports.  We refer   |
             to these as "symmetric pairs" of fields.                    |

             In measuring traffic, we sometimes want to create full-     |
             duplex statistics that combine both directions of the       |
             flow.  For example, we want to lump together TCP data       |
             packets and the resulting acknowledgment packets; both are  |
             part of the packet load imposed by those connections.       |
             This leads to configurations of the following form:         |










        Braden & DeSchon                                      [Page 29]




        NNStat-Internet Statistics Package                  Release 3.0


                  { if Fc ...  {
                          record Fx in...;
                          . . .  }
                     else if Fc' ... {
                           record Fx' in...;
                          . . .  }
                  }

                      Figure 8: Symmetric Configuration


             Here Fc, Fc', Fs, and Fs' all represent field names, and    |
             field Fc' (Fs') is related to Fc (Fs, respectively)         |
             through a field-symmetry mapping.  That is, either (Fc,     |
             Fc') form a symmetric pair, or else they are identical,     |
             and similarly for (Fs, Fs').                                |

             To compactly represent such full-duplex configurations,     |
             statspy implements the "symmetric if" statement.  This is   |
             a variant of an "if" statement, with the keyword "if"       |
             replaced by "symif". The statement:                         |

                symif <condition> <T-Statement> else <F-Statement>       |

             is logically equivalent to the expanded form:               |

                { if <condition>  <T-Statement>                          |
                 else if <condition'>  <T-Statement'>                    |
                 else  <F-Statement>                                     |
                }                                                        |

             where the primes indicate that the field symmetry mapping   |
             has been applied to all fields in the syntactic unit.       |

             Like boolean expressions, the "symmetric if" statement is   |
             a notational convenience rather than a performance          |
             improvement.                                                |


        (4)  SELECT STATEMENT                                            |

             Suppose that it is desired to record the source and         |
             destination addresses of TCP packets in three classes:      |
             interactive, file transfer, and email.  The configuration   |
             might include a nested conditional statement such as the    |
             following:                                                  |





        Braden & DeSchon                                      [Page 30]




        NNStat-Internet Statistics Package                  Release 3.0


             if TCP.dstport is  port.telnet  setf(23, 43, 79, 513)
                record IP.srchost, IP.dsthost in Telnet.hosts matrix-sym;

             else if TCP.dstport is  port.ftp  setf(20, 21, 69)
                record IP.srchost IP.dsthost in ftp.hosts  matrix-sym;

             else if TCP.dstport is  port.mail  setf(25, 103, 104, 119)
                record IP.srchost IP.dsthost in mail.hosts matrix-sym;

             The setf objects are filters that have the value "True" if  |
             the field value is equal to one of the listed parameter     |
             values, "False" otherwise.                                  |

             An alternative way to express this problem is with a        |
             select statement, which is generically a form of "case"     |
             statement.  Here is the same example as a "select"          |
             statement:                                                  |

             select  TCP.dstport {
               case (23, 43, 79, 513):
                 record IP.srchost, IP.dsthost in Telnet.hosts matrix-sym;

               case (20, 21, 69):
                 record IP.srchost, IP.dsthost in FTP.hosts matrix-sym;

               case (25, 103, 104, 119):
                 record IP.srchost, IP.dsthost in Mail.hosts matrix-sym;
             }

             Note that the braces surrounding the list of cases, the     |
             parentheses enclosing lists of values, and the colons are   |
             all required. When one of the values in a particular case   |
             is matched, the statement following the corresponding       |
             colon is executed; this completes execution of the entire   |
             "select".  Thus, control does not "fall through" to the     |
             next case as in the "C" _s_w_i_t_c_h statement, and therefore no  |
             "break" statements are needed or allowed in the             |
             configuration language.                                     |

             In addition to the cases, there can be one "default"        |
             alternative of the form:                                    |

                    default:                                             |
                          <Statement>                                    |

             The select statement provides a performance improvement     |
             over the corresponding nested if statements, since select   |
             is implemented by a single hashed lookup to obtain an       |



        Braden & DeSchon                                      [Page 31]




        NNStat-Internet Statistics Package                  Release 3.0


             index, and this index selects one case from a vector of     |
             cases.                                                      |



        _3._3._4.  _E_n_u_m_e_r_a_t_i_o_n_s


        Some packet header fields (e.g., the IP protocol number) may be
        characterized as "enumerations", meaning that there is a
        discrete set of possible values.  It is helpful to humans
        viewing the output of a _r_e_a_d command to have appropriate
        mnemonic labels attached to the values of enumeration fields.
        The _e_n_u_m command may be used to define such mnemonic label
        strings.

        The _e_n_u_m command has only local effect; _e_n_u_m commands taken
        from the _s_t_a_t_s_p_y configuration file or entered locally on the
        _s_t_a_t_s_p_y console control only the formatting of local read
        commands.  Similarly, an _e_n_u_m command entered in _r_s_p_y is used
        locally at _r_s_p_y for formatting read results; it is not
        transmitted across the network to _s_t_a_t_s_p_y.

        The _e_n_u_m command has the form:

             enum { <enum parameters> }

        The general form of <enum parameters> is:


                <object spec> ( <value> <label>, ... , <value> <label> ),

                     ...

                <object spec> ( <value> <label>, ... , <value> <label> )



        That is, it generally specifies a list of object name specs,
        and for each a sub-list of (label, value) pairs.

        Here is a possible _e_n_u_m command parameter that defines label
        strings for objects attached to the IP protocol field:








        Braden & DeSchon                                      [Page 32]




        NNStat-Internet Statistics Package                  Release 3.0


           *IP.proto* (1 "ICMP", 3 "GGP", 6 "TCP", 8 "EGP",

                   12 "PUP", 17 "UDP",  20 "HMP", 21 "XNS-IDP",

                   27 "RDP", 77 "ND")

        The string "*IP.proto*" is an <object spec>, indicating that
        this list of labels will be used in formatting a read operation
        for any object whose name contains the embedded string
        "IP.proto".

        Note that each sublist is keyed to an <object spec>, not an
        object name or a field name.  When the results of reading a
        specific object are formatted for display, the <object name> of
        that object is matched against each <object spec> that has
        appeared in any _e_n_u_m command; the first match causes the
        corresponding set of (label, value) pairs to be used.  Careful
        choice of object names is necessary to take advantage of this
        wildcard matching mechanism.

        The <label> elements may be surrounded with quotation marks
        (""), and must be if they contain embedded blanks or other
        special characters.  A label surrounded with quotation marks
        may contain any printable characters except: comma, linefeed
        ("\n"), or quotation marks themselves.

        The effect of _e_n_u_m commands is cumulative.  There is no command
        to delete an enumeration; however, new label definitions will
        override previous definitions for the same (object-spec, value)
        pairing.



        _3._4.  _R_e_m_o_t_e _C_o_n_t_r_o_l _o_f _S_t_a_t_s_p_y


        The _r_s_p_y program may be used to enter commands remotely to a
        running _s_t_a_t_s_p_y program.  The command to execute _r_s_p_y is:


          rspy  [-p _p_o_r_t] [-h _h_o_s_t] [-1] [_c_o_m_m_a_n_d-_f_i_l_e]


        Here the parameters are:


        -p   TCP port number on which _s_t_a_t_s_p_y is listening.  The
             default is 2222.



        Braden & DeSchon                                      [Page 33]




        NNStat-Internet Statistics Package                  Release 3.0


        -h   The name or dotted-decimal IP address of the _s_t_a_t_s_p_y host.
             The default is the local host.


        -1   Output from read operations will be displayed in single-    |
             column format.                                              |


        _c_o_m_m_a_n_d-_f_i_l_e
             The optional name of a file containing a script of
             commands to be executed when _r_s_p_y begins.


        _R_s_p_y will then prompt for input ("Rspy>"), accepting new
        commands from standard input and writing any output to standard
        output.

        The commands to _r_s_p_y are those listed in Section 3.2 for
        _s_t_a_t_s_p_y, plus one additional command peculiar to _r_s_p_y:


        o    _h_o_s_t <IP address>

             Overrides the -h parameter to specify the remote host to
             which following commands will be directed.  Here <IP
             address> may be a host domain name or a dotted-decimal IP
             address.  Also note that the _s_t_a_t_s_p_y command _r_e_s_t_r_i_c_t
             cannot be entered remotedly from _r_s_p_y.


        Generally, commands entered to _r_s_p_y are sent to _s_t_a_t_s_p_y on the
        remote host.  However, the ?, _e_n_u_m, _h_o_s_t, and _q_u_i_t commands are
        executed locally by rspy.


















        Braden & DeSchon                                      [Page 34]




        NNStat-Internet Statistics Package                  Release 3.0


        _4.  _D_a_t_a _C_o_l_l_e_c_t_i_o_n



        _4._1.  _U_s_i_n_g _C_o_l_l_e_c_t


        _C_o_l_l_e_c_t is executed as:



          collect [-e _e_n_u_m_f_i_l_e]  [-h _h_o_s_t_1] ... [-h _h_o_s_t_n]   [-p _p_o_r_t]

              [-i _m_i_n]  [-c _m_i_n]  [-r _m_i_n]  [-d|-dl|-dx]  _o_b_j_e_c_t-_s_p_e_c


        These parameters are:


        _o_b_j_e_c_t-_s_p_e_c
             The objects from which data is to be collected.  _o_b_j_e_c_t-
             _s_p_e_c may contain the "wild-card" character "*".  _o_b_j_e_c_t-
             _s_p_e_c is a mandatory parameter, with no default.


        -e   Name of a file containing <enum parameters> (see Section
             3.3.4) for labeling results.  Default is no enum file.


        -p   TCP port to connect to; default is 2222.


        -h   An SAA host from which data is being collected.  A series
             of -h parameters may appear, to define a list of SAA
             hosts.  Parameter may be a dotted decimal IP address or a
             domain name.  Default is the local host.


        -i   Polling interval Ti in minutes. Default is 0, causing
             collect to run once.


        -c   Checkpoint interval Tc in minutes.  Default is 0, causing
             only the latest data to be saved.


        -r   Clear interval Tr in minutes.  This is the interval at
             which collect sends _r_e_a_d_c_l_e_a_r instead of _r_e_a_d command to



        Braden & DeSchon                                      [Page 35]




        NNStat-Internet Statistics Package                  Release 3.0


             the hosts.  Default is zero, causing no clearing to take
             place.


        -d   Direct trace & log to stdout.

        -dl  Direct trace to stdout, log to files.

        -dx  Direct trace & hex dump to stdout.

             Default is no trace, log directed to files.


        The time parameters used by the _c_o_l_l_e_c_t program must be entered
        in minutes; this unit was chosen both for convenience and to
        avoid giving the user a false sense of precision.

        An example of typical parameters that one might use to run
        _c_o_l_l_e_c_t from a "C shell" is as follows:

           collect '*' -h 35.1.1.21 -i 5 -c 60 -r 1440 >& errors.log &

        In this example:


        o    The read interval Ti is 5 minutes, the checkpoint interval
             Tc is 60 minutes, and the clear interval Tr is 1440
             minutes (24 hours).


        o    The '*' object-spec tells _c_o_l_l_e_c_t to read and save
             statistics from all objects.


        o    The -h 35.1.1.21 parameter specifies the host on which
             _s_t_a_t_s_p_y is executing.


        o    The ">& errors.log" redirects any error reports to the
             file "errors.log".

        o    The final "&" starts the collection of statistics in
             background mode, so that it is unaffected by other use of
             the shell and logouts.  It can be stopped via use of the
             "kill" command.






        Braden & DeSchon                                      [Page 36]




        NNStat-Internet Statistics Package                  Release 3.0


        _4._2.  _C_o_l_l_e_c_t _L_o_g _F_i_l_e_s


        _C_o_l_l_e_c_t saves statistics in files whose names are formed from
        the _s_t_a_t_s_p_y host name, the object name, and the time that
        _c_o_l_l_e_c_t is started.  For example, the statistics file named:

            "35.1.1.21-gwys.1214.1540"

        was created at 1540 (local time for _c_o_l_l_e_c_t ) on December 14th,
        and contains statistics from the object named "gwys" on _s_t_a_t_s_p_y
        host "35.1.1.21".

        Figure 3 shows an example of an individual _c_o_l_l_e_c_tlog entry,
        which has the same format as a local display of the read data.
        Each log entry contains three timestamps:

        o    "CreationTime" is the time that the _s_t_a_t_s_p_y module was
             started.

        o    "ReadTime" is the time associated with the data in the
             current log entry.

        o    "ClearTime" is the last time that this object was cleared.
             If the object has never been explicitly cleared, the
             ClearTime is the same as the CreationTime.


        The _s_t_a_t_s_p_y module sends these timestamps in universal (UNIX)
        time, and _c_o_l_l_e_c_t converts them to its local time when they are
        formatted and written.

        Not all of the data that _c_o_l_l_e_c_t receives from _s_t_a_t_s_p_y are
        saved permanently in log files.  Data must be saved in two
        situations:


        (1)  The time between checkpoints has elapsed.

             The "checkpoint" parameter is typically used to provide a
             statistical breakdown of traffic by time of day, e.g., the
             number of packets received during each hour of the day.


        (2)  The _s_t_a_t_s_p_y object has been cleared.

             The _s_t_a_t_s_p_y object may have been cleared intentionally by
             a clear command or unintentionally by a crash and restart



        Braden & DeSchon                                      [Page 37]




        NNStat-Internet Statistics Package                  Release 3.0


             of the SAA machine.  The _c_o_l_l_e_c_t program polls each
             _s_t_a_t_s_p_y every Ti minutes, which should be short enough to
             minimize the statistical loss due to SAA crashes.


        It is also possible that the SCH running _c_o_l_l_e_c_t will itself
        crash.  Therefore, _c_o_l_l_e_c_t always writes new data into the log
        file, but it may either overwrite the previous log entry or
        append to the end of the file, saving the previous entry.

        In order to decide whether a particular entry should be saved,
        _c_o_l_l_e_c_t keeps track of the last "ClearTime" received and the
        next time that a checkpoint log entry should be saved.
        Specifically, _c_o_l_l_e_c_t will overwrite the previous entry unless:

        a.   The previous entry is the very first entry appearing in
             the log file, or

        b.   The previous entry was received after/at the time that a
             checkpoint was required, or

        c.   The ClearTime on the current entry is different from the
             ClearTime on the previous entry.



        _4._3.  _D_a_t_a _R_e_d_u_c_t_i_o_n _P_r_o_g_r_a_m_s


        The NNStat package includes several programs to process log
        files produced by the _c_o_l_l_e_c_t program. The _l_o_o_k_u_p_n_a_m_e_s program
        scans through log files, outputting the original text of the
        log file with the appropriate domain name added following each
        instance of a host number or a network number.  Shell scripts
        that invoke AWK programs have also been included. These scripts
        may be installed as command aliases named _c_o_u_n_t-_t_o_t_a_l_s and
        _b_i_n-_t_o_t_a_l_s.  These commands are intended to provide part of the
        data reduction capability needed to produce traffic statistics.


        _4._3._1.  _L_o_o_k_u_p_n_a_m_e_s _P_r_o_g_r_a_m


        The _l_o_o_k_u_p_n_a_m_e_s program may be used to scan log files for
        embedded host and network numbers, map these numbers into
        corresponding names, and create a new file with the names
        inserted immediately after the corresponding numbers.




        Braden & DeSchon                                      [Page 38]




        NNStat-Internet Statistics Package                  Release 3.0


        The following is an example of output produced by _l_o_o_k_u_p_n_a_m_e_s:

          [35.0.0.0 MERIT:35.0.0.0 MERIT]= 112665 (93.9%) @-0sec
          [35.0.0.0 MERIT:128.116.0.0 USAN]= 387  ( 0.3%) @-36sec
          [128.116.0.0 USAN:35.0.0.0 MERIT]= 462  ( 0.4%) @-35sec
          [35.0.0.0 MERIT:128.182.0.0 PSCNET]= 388 ( 0.3%) @-46sec
           [128.182.0.0 PSCNET:35.0.0.0 MERIT]= 345 ( 0.3%) @-45sec


        The number-to-name conversion is performed using the Domain
        Name system, or if no matching entry is returned, by a local
        file of network names.  A standard hosts.txt file may also be
        used to supply the network names. If no host/network name is
        found in either database, the string "(UNKNOWN-HOST)" is
        displayed in place of the name.

        The usage is:


          lookupnames [-n _f_i_l_e_n_a_m_e] [-t _s_e_c_o_n_d_s] _i_n_p_u_t-_f_i_l_e-_l_i_s_t


        The input files, concatenated and augmented with the name
        strings, are written to standard output.

        The optional command line flags are as follows:


        -n   The name of a networks file. If this parameter is
             unspecified, the program looks for a file named:
             "networks.txt". This file should contain the networks
             portion of a standard "hosts.txt" file, for example:

                  NET : 128.9.0.0 : ISI-NET :

             Alternatively, the full hosts.txt may be used.  The
             _l_o_o_k_u_p_n_a_m_e_s program scans the networks list for "NET"
             entries, until the beginning of first "GATEWAY" entry or
             the end of the file.


        -t   The timeout time for host name lookups; the default is 5
             seconds.  If this timeout expires, the _l_o_o_k_u_p_n_a_m_e_s program
             checks the networks-file for a matching entry.  If none is
             found, the string "(TIMEOUT)" is printed in place of the
             host/network name.





        Braden & DeSchon                                      [Page 39]




        NNStat-Internet Statistics Package                  Release 3.0


        _4._3._2.  _C_o_u_n_t-_t_o_t_a_l_s


        The command _c_o_u_n_t-_t_o_t_a_l_s can be used to summarize total packet
        counts logged by the _c_o_l_l_e_c_t program.  Taking into account any
        _s_t_a_t_s_p_y restarts or _s_t_a_t_s_p_y totals that were cleared, it
        computes both daily totals and a grand total from a given log
        file.  It may be invoked on a list of log files, in which case
        it summarizes each file independently.

        The command used to invoke _c_o_u_n_t-_t_o_t_a_l_s is:

            count-totals [v=1] _l_o_g_f_i_l_e ...

        The required _l_o_g_f_i_l_e parameter is a list of one or more log
        files produced by _c_o_l_l_e_c_t (and no others).  The list may
        contain the usual wildcard specification(s).  Output is written
        to a results file named "count-totals.out", as well as to the
        standard output.

        An example is:

            count-totals v=1 *IP*

        The optional v=1 parameter is used to signify that the results
        should be "verbose", which in this case means that a line
        summarizing each update appears in the results file.  If the
        v=1 parameter is omitted, only daily totals and a grand total
        are included.

        The following is an example of the format of a verbose results
        file:



















        Braden & DeSchon                                      [Page 40]




        NNStat-Internet Statistics Package                  Release 3.0



          File: 35.1.1.21-IP.lens.1221.1523

          Log created on Tue Dec 22 08:00:25 1987, for host 35.1.1.21.
            Sample interval = 60 min; checkpoint interval = 60 min.
            Object name = 'IP.lens'.

            Read-Time          Clear-TimeTotal-CountIncrement
            ---------          ------------------------------

            08:01:10 12-2207:06:22 12-2219262         0
            09:00:55 12-22     52297     33035
            10:01:04 12-22     81393     29096
            11:00:54 12-22    119954     38561
           ... (etc)
            23:00:54 12-22    468588     12147
            00:00:55 12-23    493492     24904
          Daily total = 474230 (08:01:10 12-22 to 00:00:55 12-23)

            01:00:54 12-23    515588     22096
            02:00:54 12-23    542095     26507
            03:00:54 12-23    566430     24335
            04:00:54 12-23    586511     20081
           ... (etc)
            22:00:57 12-23   1043959      7317
            23:00:55 12-23   1054753     10794
            00:00:58 12-24   1070114     15361
          Daily total = 576622 (00:00:55 12-23 to 00:00:58 12-24)
           ... (etc)
           ... (etc)
          Daily total = 182408 (00:00:58 12-24 to 14:01:17 12-24)

          Grand Total = 1233260 (08:01:10 12-22 to 14:01:17 12-24)


        The following is an example of the corresponding non-verbose
        format:














        Braden & DeSchon                                      [Page 41]




        NNStat-Internet Statistics Package                  Release 3.0



          File: 35.1.1.21-IP.lens.1221.1523

          Log created on Tue Dec 22 08:00:25 1987, for host 35.1.1.21.
            Sample interval = 60 min; checkpoint interval = 60 min.
            Object name = 'IP.lens'.

          Daily total = 474230 (08:01:10 12-22 to 00:00:55 12-23)
          Daily total = 576622 (00:00:55 12-23 to 00:00:58 12-24)
          Daily total = 182408 (00:00:58 12-24 to 14:01:17 12-24)

          Grand Total = 1233260 (08:01:10 12-22 to 14:01:17 12-24)



        In the verbose format, column headings have the following
        meanings:


        o    "Read-Time" contains the ReadTime returned from each
             _s_t_a_t_s_p_y response to a query performed by _c_o_l_l_e_c_t.


        o    "Clear-Time" is filled in for each new ClearTime found in
             the current log file being processed.  A blank Clear-Time
             field signifies that the field is unchanged since the
             previous entry.


        o    "Total-Count" corresponds to the "TotalCount" on each
             response.


        o    "Increment" contains the number of packets counted between
             the current response and the previous response.



        _4._3._3.  _B_i_n-_t_o_t_a_l_s


        The command _b_i_n-_t_o_t_a_l_s produces a summary showing the total
        number of packets counted in each of the corresponding bins
        appearing in a log file, taking into account any _s_t_a_t_s_p_y
        restarts or object clears.  It can be invoked on a list of log
        files to summarize each independently.

        The command to invoke _b_i_n-_t_o_t_a_l_s is:



        Braden & DeSchon                                      [Page 42]




        NNStat-Internet Statistics Package                  Release 3.0


            bin-totals _l_o_g_f_i_l_e ...

        As before, the parameter is a list of name(s) of one or more
        log files produced by the _c_o_l_l_e_c_t program, or a wildcard file
        specification that matches (only) files produced by the _c_o_l_l_e_c_t
        program.  Output is written to a results file named "bin-
        totals.out," as well as to the standard output.

        The following is an example of the output from the _b_i_n-_t_o_t_a_l_s
        command:


          File: 35.1.1.21-IP.lens.1221.1523

          Log created on Tue Dec 22 08:00:25 1987, for host 35.1.1.21.
            Sample interval = 60 min; checkpoint interval = 60 min.
            Object name = 'IP.lens'.

          Summary Period: 08:01:10 12-22 to 14:01:17 12-24.

            [0-9] total = 0
            [10-19] total = 0
            [20-39] total = 1112
            [40-79] total = 557177
            [80-159] total = 149395
            [160-319] total = 194274
            [320-639] total = 61625
            [640-1279] total = 192769
            [1280-2559] total = 76908
            [2560-5119] total = 0





















        Braden & DeSchon                                      [Page 43]




        NNStat-Internet Statistics Package                  Release 3.0


        _A_p_p_e_n_d_i_x _A - _C_a_t_a_l_o_g _o_f _O_b_j_e_c_t_s


        This Appendix describes the statistical object classes
        currently implemented in statspy.


        _R_e_c_o_r_d_e_r _O_b_j_e_c_t_s


        A recorder object is invoked at its Write() entry point to
        record values of a specific field or set of fields.


        1.   Frequency of all Values

                   freq-all

             An object of class _f_r_e_q-_a_l_l (abbreviated as _F_A) builds a
             frequency distribution table for a single field.  This
             table is built dynamically, with a bin for every distinct
             value that occurs.

             Each time a bin is added or incremented, the current time
             (in seconds since Jan 1, 1970) is recorded in the bin.  A
             read operation returns this last-update time with the
             value and count for each bin.  We expect that these times
             will be useful in analysis of the data; for example, it
             would be possible to extract only recently occuring
             values.

             The list of bins returned by a read operation is sorted
             into order of decreasing counts, and within the same
             count, by last update time.

             The implementation of the _f_r_e_q-_a_l_l class uses a chained
             hash scheme, dynamically allocating memory for bins in
             "pages" of 2K bytes.  There is a built-in limit of 1024
             bins.  In addition to the hash chain, each bin is chained
             into a doubly-linked sorted list that is used to order the
             read sequence.  Sorting bins into this list is
             accomplished using an incremental algorithm whose CPU time
             is linear in the total count (and is in fact negligible).


        2.   Frequency of all Values, with Byte Totals

                   freq-all-bytes



        Braden & DeSchon                                      [Page 44]




        NNStat-Internet Statistics Package                  Release 3.0


             An object of class _f_r_e_q-_a_l_l-_b_y_t_e_s (abbreviated as _F_A_B)
             builds a frequency distribution table for a single field.
             This table is built dynamically, with a bin for every
             distinct value that occurs.

             The object also accumulates in each bin the total lengths
             of the corresponding packets, in bytes.  In all other
             respects, objects of this class are the same as objects of
             class freq-all.  Note that the packet length is an
             implicit parameter; the object is unary.  The lengths that
             are accumulated are of the entire packet, exclusive of
             Ethernet header.







































        Braden & DeSchon                                      [Page 45]




        NNStat-Internet Statistics Package                  Release 3.0


        3.   Frequency of Selected Set of Values

                   freq-only ( <value>, ... <value> )

             An object of class _f_r_e_q-_o_n_l_y (abbreviated as _F_O) builds a
             frequency distribution table for only those values that
             are included in the parameter list; values not in the list
             are counted in a single "default" bin.  A read operation
             on the object displays this frequency table and the
             "default" count.  If the given set of values is empty, the
             "default" count equals the total number of invocations.

             The <value> entries may be expressed in a variety of ways.


             o    Decimal integer, limited to 2**31 maximum.


             o    Hex integer, using the "C" notation 0x....


             o    IP address, specified as either a dotted-decimal
                  number or as a domain name.


             o    Ethernet Address, specified in "coloned-hex" format:
                  xx:xx:xx:xx:xx:xx, where each x represents a hex
                  digit.


             o    A  quoted enumeration label: "<label>".  This implies
                  the corresponding value, and provides a way to define
                  field values symbolically.  See Appendix C for
                  examples.



        4.   Frequency of Selected Set of Values, with Byte Totals

                   freq-only-bytes ( <value>, ... <value> )

             An object of class _f_r_e_q-_o_n_l_y-_b_y_t_e_s (abbreviated as _F_O_B)
             builds a frequency distribution table for only those
             values that are included in the parameter list; values not
             in the list are counted in a single "default" bin. The
             object also accumulates in each bin the total lengths of
             the corresponding packets, in bytes.  In all other
             respects, objects of this class are the same as objects of



        Braden & DeSchon                                      [Page 46]




        NNStat-Internet Statistics Package                  Release 3.0


             class freq-only.  The lengths that are accumulated are of
             the entire packet, exclusive of the Ethernet header.

        5.   Frequency of all Value Pairs

                    matrix-all

             An object of class _m_a_t_r_i_x-_a_l_l (abbreviated as _M_A) builds a
             table of frequencies of all pairs of values in two fields.
             This table is built dynamically, with a bin for every
             distinct value pair that occurs.  The pair of values (a,b)
             is counted separately from the pair (b,a). The list of
             bins returned by a read operation is sorted into order of
             decreasing counts, and within the same count, by last
             update time.

             The internal structure and implementation of this class is
             the same as the _f_r_e_q-_a_l_l class, described above.

             If the object name matches an enumeration, the
             corresponding labels are used for the first value of each
             pair.

             Note: if a _m_a_t_r_i_x-_a_l_l object is defined with a non-zero
             parameter, it operates as a _m_a_t_r_i_x-_s_y_m object (see the
             following).


        6.   Frequency of all Value Pairs, with Byte Totals

                    matrix-all-bytes

             An object of class _m_a_t_r_i_x-_a_l_l-_b_y_t_e_s (abbreviated as _M_A_B)
             is exactly like a matrix-all object, except that a
             _m_a_t_r_i_x-_a_l_l-_b_y_t_e_s object also accumulates in each bin the
             total lengths of the corresponding packets, in bytes.  The
             lengths are of the entire packet, exclusive of Ethernet
             header.













        Braden & DeSchon                                      [Page 47]




        NNStat-Internet Statistics Package                  Release 3.0


        7.   Symmetric Frequency of Value Pairs

                    matrix-sym

             An object of class _m_a_t_r_i_x-_s_y_m (abbreviated as _M_S) builds a
             table of frequencies of all pairs of values in two fields.
             This table is built dynamically, with a bin for every
             distinct value pair that occurs.  The list of bins
             returned by a read operation is sorted in order of
             decreasing counts, and within the same count, by last
             update time.

             If the two argument fields have the same length, they are
             treated "symmetrically":  (b,a) and (a,b) are counted in
             the same bin.  If the lengths differ, _m_a_t_r_i_x-_s_y_m operates
             like _m_a_t_r_i_x-_a_l_l.

             The internal structure and implementation of this class is
             the same as the _m_a_t_r_i_x-_a_l_l class, described above.

             If the object name matches an enumeration, the
             corresponding labels are used for the first value of each
             pair.


        8.   Symmetric Frequency of Value Pairs, with Byte Totals

                    matrix-sym-bytes

             An object of class _m_a_t_r_i_x-_s_y_m-_b_y_t_e_s (abbreviated as _M_S_B)
             is exactly like a matrix-sym object, except that a
             _m_a_t_r_i_x-_s_y_m-_b_y_t_e_s object also accumulates in each bin the
             total lengths of the corresponding packets, in bytes.  The
             lengths are of the entire packet, exclusive of Ethernet
             header.
















        Braden & DeSchon                                      [Page 48]




        NNStat-Internet Statistics Package                  Release 3.0


        9.   Histogram

                   hist ( <scale factor> [, <max bin>] )

             An object of class _h_i_s_t (abbreviated as _H_I) builds a
             linear histogram of (unsigned) integer field values. Each
             bin of the histogram has the same size, given by the value
             <scale factor>. The optional second parameter specifies
             the ordinal number of the maximum bin that is collected;
             if it is omitted, 1024 is used.

             If <scale factor> is S and <max bin> is M, a Read
             operation on the object defined by hist(S, M) returns the
             counts:

                 Bin 0:  Count( 0 <= X < S )
                 ...
                 Bin j:  Count( j*S <= X < (J+1)*S )
                 ...
                 Bin M:  Count( M*S <= X < (M+1)*S )

             plus a count of values that were off-scale, i.e., >=
             (M+1)*S.  Here "Count( f(X) )" means the number of
             invocations with value X for which f(X) was true.

             The Read operation also reports the average, maximum, and
             minimum values observed.  A _h_i_s_t object is restricted to
             invocation on a field of 4 bytes or less.


        10.  Logarithmic Histogram

                   hist-pwr2 ( <scale factor> )

             An object of class _h_i_s_t-_p_w_r_2 (abbreviated as _P_2) builds a
             logarithmic histogram, i.e, one with intervals increasing
             as powers of 2.  Specifically, a Read() operation on a
             _h_i_s_t-_p_w_r_2 object returns the following counts:

                 Bin 0:  Count(X < S)

                 Bin 1:  Count(S <= X < 2*S)
                 ...
                 Bin j:  Count(S*(2**j) <= X < S*(2**(j+1)) )

             where S is the value of the unsigned integer <scale
             factor>.




        Braden & DeSchon                                      [Page 49]




        NNStat-Internet Statistics Package                  Release 3.0


             A _h_i_s_t-_p_w_r_2 object also reports the average, maximum, and
             minimum values observed.  A _h_i_s_t object is restricted to
             invocation on a field of 4 bytes or less.


        11.  Measure Temporal Locality of Reference

                    working-set                                          |
                    working-set2                                         |

             An object of class _w_o_r_k_i_n_g-_s_e_t (abbreviated as _W_S)
             measures the degree of temporal clustering of values of a
             given field.  This clustering is known as "locality of
             reference", and in the memory domain leads to the concept
             of a _w_o_r_k_i_n_g _s_e_t.  However, the _w_o_r_k_i_n_g-_s_e_t class is        |
             misnamed; rather than measuring the size of the working     |
             set, it measures LRU ("Least-Recently Used") cache hit      |
             probabilities.                                              |

             Suppose that we maintain a list of the n distinct values    |
             that have occurred most recently in the field.  This list   |
             will change over time, as new values occur that were not    |
             in the list replace the oldest ("least-recently used")      |
             values in the list.  Let C(n) be the number of values in    |
             the observed sequence that are already in the list.  If     |
             there have been a total of N packets in the sequence,       |
             C(n)/N is the probability of the next value being already   |
             in the list (cache).  The _w_o_r_k_i_n_g-_s_e_t object measures the   |
             values of C(1), C(2), C(4), C(8),...  C(4096).              |

             A _w_o_r_k_i_n_g-_s_e_t_2 object takes two fields (i.e, it is a        |
             binary object), concatenating the two fields into a single  |
             value that is used to build an LRU cache just like a        |
             _w_o_r_k_i_n_g-_s_e_t object.                                         |


        12.  Record Sequence of Values in Binary

                     bin-pkt ( <count> [, <max length>] )                |
                     bin-pkt2 ( <count> [, <max length>] )               |

             An object of class _b_i_n-_p_k_t (abbreviated as _B_P) builds a     |
             circular buffer of up to <count> entries, containing the    |
             most recent values in a specified field.  If the second     |
             parameter is specified, only the first <max length> bytes   |
             of each field are saved.  A read operation on the object    |
             displays all values in this buffer, oldest first.           |




        Braden & DeSchon                                      [Page 50]




        NNStat-Internet Statistics Package                  Release 3.0


             Although this object may be invoked from any field, it is   |
             really intended for recording the complete headers of       |
             packets.  For this purpose, the virtual field "packet" is   |
             defined as the entire set of packet headers captured from   |
             the Ethernet; this is at most the first 108 bytes of the    |
             packet, currently.                                          |

             A bin-pkt2 object takes two fields (i.e, it is a binary     |
             object), concatenating the two fields into a single value   |
             that truncated if necessary to <max length> packets and     |
             then recorded.  For example, the following program will     |
             save a circular buffer of 8-byte quantities, containing     |
             source and destination address pairs:                       |

                 record IP.srchost, IP.dsthost in AddrPairs BP2(100);    |

             A read operation on one of these objects will behave in a   |
             special way when the "file" command is in effect: the       |
             buffer contents will be recorded as BINARY data.  The       |
             format of this data is defined in the struct bpe_entry in   |
             sobjbp.c                                                    |






























        Braden & DeSchon                                      [Page 51]




        NNStat-Internet Statistics Package                  Release 3.0


        13.  Variant of _f_r_e_q-_a_l_l

                   freq-all2

             An object of class _f_r_e_q-_a_l_l_2 (abbreviated as _F_A_2) performs
             the same function as an object of the _f_r_e_q-_a_l_l class,
             except a _f_r_e_q-_a_l_l_2 objects does not sort the list of bins,
             but rather displays bins in the order of their first
             occurrence.  As a result, _f_r_e_q-_a_l_l_2 objects may use
             slightly less CPU time (although the difference appears to
             be negligible) and always use less memory for bins (16-20
             bytes per bin, compared to 24-28 bytes for _f_r_e_q-_a_l_l
             objects).


        14.  Variant of _m_a_t_r_i_x-_a_l_l

                   matrix-all2

             An object of class _m_a_t_r_i_x-_a_l_l_2 (abbreviated  as _M_A_2)
             performs the same function as an object of the _m_a_t_r_i_x-_a_l_l
             class, except a _m_a_t_r_i_x-_a_l_l_2 objects does not sort the list
             of bin, but rather displays bins in the order of their
             first occurrence.  As a result, _m_a_t_r_i_x-_a_l_l_2 objects may
             use slightly less CPU time (although the difference
             appears to be negligible) and always use less memory for
             bins (16-24 bytes per bin, compared to 24-32 bytes for
             _m_a_t_r_i_x-_a_l_l objects).























        Braden & DeSchon                                      [Page 52]




        NNStat-Internet Statistics Package                  Release 3.0


        _F_i_l_t_e_r _O_b_j_e_c_t_s


        A filter object tests given field values against some criterion
        and returns a Boolean value; the interpreter uses this result
        to select one of two alternative sequences of invocations.

        A filter object generally has a read-only data structure, but
        it does keep two statistical counters: the total number of
        invocations, and the number that resulted in a TRUE result.
        These two numbers are returned by a Read operation.


        1.   Filter on range of values

                    rangef ( <Lower>, <Upper> [ ,<mask> ])               |

             A _r_a_n_g_e_f object (abbreviated as _R_F) returns TRUE if the     |
             given value X, after ANDing with <mask> if one is present,  |
             falls inside the specified range:                           |

                    L <= X&M <= U                                        |

             Otherwise, it returns FALSE. Here L and U are the unsigned  |
             integer values corresponding to <Lower> and <Upper>,        |
             respectively, and M is <mask> or all one bits if <mask> is  |
             omitted.  Note that L, U, and <mask> are permitted to be    |
             integers or 6-byte Ethernet addresses.                      |


        2.   Filter on equality

                    eqf( <value> )

             An _e_q_f object (abbreviated as _E_Q) returns TRUE if the
             given field value matches the specified parameter value,
             otherwise it returns false.  Here <value> may take any of
             the forms described earlier for the _f_r_e_q-_o_n_l_y class.


        3    Filter on selected set of values

                    setf ( <value>, ... <value> )

             A _s_e_t_f object (abbreviated as _S_F) returns TRUE if the
             given field value matches one of the values in the
             parameter list, otherwise it returns FALSE.  Each <value>
             may take any of the forms described earlier for the _f_r_e_q-



        Braden & DeSchon                                      [Page 53]




        NNStat-Internet Statistics Package                  Release 3.0


             _o_n_l_y class.

             Note that _s_e_t_f with a single value is equivalent to _e_q_f,
             but is much less efficient.



        _L_i_m_i_t_s _o_f _S_t_a_t_i_s_t_i_c_a_l _O_b_j_e_c_t_s


        The objects that have been defined have the following limits:


        (1)  Max Frequency Counts

             All frequency counts, both individual bins and totals
             across bins, are maintained as 32-bit unsigned integers,
             and are therefore limited to 4*10**9 packets.  Removing
             this restriction would involve major changes throughout
             the code.


        (2)  Maximum Byte Totals

             Byte totals, both in bins and across bins, are maintained
             in a multiple-precision format with a 24-bit low-order
             part and a 32-bit high-order part.  In practice, it should
             never be possible to reach this limit of approximately
             10**16 bytes.


        (3)  Number of Bins

             The object classes that build bins dynamically (_f_r_e_q-
             _a_l_l..., _m_a_t_r_i_x..., and _w_o_r_k_i_n_g-_s_e_t) all impose a limit on
             the total number of bins.  It is set by a #define BIN_MAX
             in each source file; it is currently 1024 everywhere.
             This limit is only a reasonableness check and is
             arbitrary; to expand it, simply change the source file and
             re-make the program.


        (4)  Field Sizes

             In the current implementation, the frequency distribution
             classes _f_r_e_q-_a_l_l..., _f_r_e_q-_o_n_l_y..., and _m_a_t_r_i_x... can
             handle values of 1 to 8 bytes in length.  This has not
             been a restriction in practice, since the largest field



        Braden & DeSchon                                      [Page 54]




        NNStat-Internet Statistics Package                  Release 3.0


             that actually occurs (see Figures 1 and 6) is 6 bytes.
             Increasing the maximum field size beyond 8 would require
             significant programming changes.


        (5)  Number of Parameters

             The maximum number of parameters that may be listed when
             an object is first created by an attach command is set by
             a #define MAXPARMS to 256.  This limit is arbitrary and
             can be expanded by recompilation.  The only object classes
             that use extended parameters lists are _f_r_e_q-_o_n_l_y... and
             _s_e_t_f.






































        Braden & DeSchon                                      [Page 55]




        NNStat-Internet Statistics Package                  Release 3.0


        _A_p_p_e_n_d_i_x _B - _S_y_n_t_a_x _o_f _A_t_t_a_c_h _C_o_m_m_a_n_d


        This Appendix contains a BNF specification of the _a_t_t_a_c_h
        command syntax.

        The syntax of Attach parameters has been (deliberately)
        designed to parallel the syntax of "C" statements (that
        correspond to simple invocations) and statement-lists (that
        correspond to lists of invocations).

          <Attach command> ::= attach { <S-list> }

          <S-list> ::= <Statement> | <S-list> <Statement>

          <Statement> ::= record <record-invoke> ; |

                  <if clause> <Statement> else <Statement> |

                  <if clause> <Statement> |

                  <select clause> { <case body> } |

                 { <S-list> }  |  ;

          <record-invoke> ::= <field name> in <object defn>  |

                              <field name>, <field name> in <object defn>   |

                              <field name> <field name> in <object defn>

          <if clause> ::=  if <condition> |  symif <condition>


          <condition> ::=   <C-term> | <condition> or <C-term>

          <C-term> ::=      <C-primary> | <C-term> and <C-primary>

          <C-primary> ::=   <if-invoke> | ( <condition> )

          <if-invoke> ::=   <field name> is <object defn>  |

                            <field name> isnot <object defn>


          <select clause> ::=   select <field name> <object name> |

                                      select <field name>



        Braden & DeSchon                                      [Page 56]




        NNStat-Internet Statistics Package                  Release 3.0



          <case body> ::=   <case body> <case label> <Statement> | <empty>

          <case label> ::=  case <value> : | case ( <value list> ) : |

                            default <value> : | default ( <value list> ) :


          <object defn> ::=   <object name> <class> <class parm> |

                              <class> <class parm> |

                              <object name>


          <class parm> ::=  <empty> | ( <value list> )

          <value list> ::= <empty> | <value list> <value>  |

                           <value list> , <value>

          <value> ::= <decimal integer> |

                       0x<hex number>  |  0X<hex number>  |

                       <IP address>  |

                       <Ethernet address> |

                       "<label>"

          <IP address> ::=

                       <dotted-decimal number> |

                       <host domain name>

          <Ethernet address> ::=

                        <hex digit>:<hex digit>:<hex digit>:
                             <hex digit>:<hex digit>:<hex digit>

          <hex number> ::= <hex digit>  |  <hex number><hex digit>

          <hex digit> ::= 00 | 01 | ... | fe | ff

          <field name> ::= <identifier>




        Braden & DeSchon                                      [Page 57]




        NNStat-Internet Statistics Package                  Release 3.0


          <object name> ::= <identifier>

          <identifier> ::= a letter, followed by: any string of
                           letters, digits, or any of the
                           special characters +-&._














































        Braden & DeSchon                                      [Page 58]




        NNStat-Internet Statistics Package                  Release 3.0


        _A_p_p_e_n_d_i_x _C - _B_u_i_l_d_i_n_g _C_o_n_f_i_g_u_r_a_t_i_o_n _F_i_l_e_s


        This Appendix provides some guidelines, suggestions, and
        examples for building _s_t_a_t_s_p_y configurations.


        _C._1  _E_x_a_m_p_l_e _1


        Traffic flowing to and from a particular gateway can be
        selected with a filter:

        attach {
            if Ether.dst is eqf(08:00:2b:03:4a:e7) {
                  ####going to gateway
               record ... ;
               record ... ;
               ...
               }
            if Ether.src is eqf(08:00:2b:03:4a:e7) {
                  ####coming from gateway
               record ... ;
               record ... ;
               ...
               }
            }

        where 08:00:2b:03:4a:e7 is the Ethernet address of the gateway.
        (Note the convention for Ethernet addresses: "coloned-hex").


        _C._2  _E_x_a_m_p_l_e _2


        Another filtering approach is to select packets by classes of
        IP addresses.  For example, suppose it is known that the list
        128.1.0.0, 128.2.0.0, and 128.3.0.0 includes all local
        networks.  In that case, the following selects "transit"
        packets, i.e., packets whose source and destination are both
        outside the local administrative area:










        Braden & DeSchon                                      [Page 59]




        NNStat-Internet Statistics Package                  Release 3.0


          if IP.srcnet isnot
                      setf(128.1.0.0, 128.2.0.0, 128.3.0.0)
               if IP.dstnet isnot
                      setf(128.1.0.0, 128.2.0.0, 128.3.0.0) {
                   record ... ;
                   record ... ;
                      ...
                }

        _S_t_a_t_s_p_y has been designed to be efficient even if the list of
        values used as parameters to _s_e_t_f() is very large (say, several
        hundred values) - a _s_e_t_f object uses a hash-table.


        _C._3  _E_x_a_m_p_l_e _3


        Suppose the problem is to collect data on all TCP traffic
        destined for a particular gateway, broken down by source and
        destination IP addresses as well as packet type (Telnet, FTP,
        etc)..

        Here, "packet type" is a little bit hazy, but it is related to
        the occurrence of a well-known port number in the TCP source or
        destination port.

        In principle, there is no reason why _s_t_a_t_s_p_y could not provide
        for three-way distributions, but it does not.  One of the main
        reasons (besides distaste for the resulting messiness in the
        specifications and code) for not implementing three-way
        distributions is disbelief that administrators will want all
        that data!  Running _s_t_a_t_s_p_y 12 hours on a typical large
        Ethernet has found packets to/from 100 networks and 500
        different IP hosts.  The complete three-way matrix asked for
        here may therefore contain 10**6 bins.  It seems unlikely that
        anyone will have use for a million numbers, accumulated over
        days, weeks, and months!

        In fact, it seems doubtful that even the hardiest administrator
        will really want to keep complete statistics by (IP source, IP
        destination) host pairs; after a few months of looking at 10**5
        numbers, he/she will tire of it and begin to collect data only
        on source and destination network, or only for specific subsets
        of networks.

        The _s_t_a_t_s_p_y design includes a number of features to contain the
        amount of data it can generate, for example the inclusion of IP
        network numbers distinct from host numbers, the conditional



        Braden & DeSchon                                      [Page 60]




        NNStat-Internet Statistics Package                  Release 3.0


        (filter) mechanism, and the _s_e_t_f() filter described above.

        The recommended approach to using NNStat is as follows: set up
        some simple, general overall statistical measures, producing a
        volume of data that can reasonably be scanned.  If some
        apparent anomalies are observed - e.g., a particular network
        seems to be producing more packets than expected - then augment
        the configuration for 24 hours with specific objects to analyze
        exactly those anomalous data.

        In any case, the following command sets up a configuration to
        provide counts broken down by source address, destination
        address, and packet type.

         attach {

           if TCP.dstport is  setf(23, 43, 79, 513)
              record IP.srchost, IP.dsthost in Telnet.hosts matrix-sym;
           else if TCP.srcport is  setf(23, 43, 79, 513)
              record IP.srchost, IP.dsthost in Telnet.hosts;

           else if TCP.dstport is  setf(20, 21, 69)
              record IP.srchost IP.dsthost in ftp.hosts matrix-sym;
           else if TCP.srcport is  setf(20, 21, 69)
              record IP.srchost IP.dsthost in ftp.hosts;

           else if TCP.dstport is  setf(25, 103, 104, 119)
              record IP.srchost IP.dsthost in mail.hosts matrix-sym;
           else if TCP.srcport is  setf(25, 103, 104, 119)
              record IP.srchost IP.dsthost in mail.hosts;
         }

        We can avoid replicating the parameter lists to the setf
        objects by naming the first occurrence of each case and
        referencing the same object in later occurrences, as shown by
        following:















        Braden & DeSchon                                      [Page 61]




        NNStat-Internet Statistics Package                  Release 3.0


         attach {

           if TCP.dstport is  port.telnet  setf(23, 43, 79, 513)
              record IP.srchost, IP.dsthost in Telnet.hosts matrix-sym;
           else if TCP.srcport is  port.telnet
              record IP.srchost, IP.dsthost in Telnet.hosts;

           else if TCP.dstport is  port.ftp  setf(20, 21, 69)
              record IP.srchost IP.dsthost in ftp.hosts matrix-sym;
           else if TCP.srcport is  port.ftp
              record IP.srchost IP.dsthost in ftp.hosts;

           else if TCP.dstport is  port.mail  setf(25, 103, 104, 119)
              record IP.srchost IP.dsthost in mail.hosts matrix-sym;
           else if TCP.srcport is  port.mail
              record IP.srchost IP.dsthost in mail.hosts;
         }

        Using symbolic labels defined by an _e_n_u_m command, we can write
        this as:































        Braden & DeSchon                                      [Page 62]




        NNStat-Internet Statistics Package                  Release 3.0


         enum {
          *port* (20 "FTP data", 21 FTP, 23 Telnet, 25 SMTP,
             37 Time, 42 Name, 43 Whois, 53 Domains,
             69 TFTP, 79 Finger, 103 X.400, 104 "X.400-SND",
             109 POP2, 111 sunrpc, 115 SFTP, 119 NetNews,
             153 SGMP, 512 exec, 513 "rwho|rlogin", 514 shell,
             515 printer, 520 RIP)
         }

         attach {
           if TCP.dstport is port.telnet
                      setf("Telnet", "Whois", "Finger", "rwho|rlogin")
                record IP.srchost, IP.dsthost in Telnet.hosts matrix-sym;

           else if TCP.srcport is port.telnet
                record IP.srchost, IP.dsthost in Telnet.hosts;

           else if TCP.dstport is port.ftp setf("FTP data", "FTP", "TFTP")
                record IP.srchost IP.dsthost in ftp.hosts matrix-sym;

           else if TCP.srcport is port.ftp
                record IP.srchost IP.dsthost in ftp.hosts;

           else if TCP.dstport is port.mail
                      setf("SMTP", "X.400", "X.400-SND", "NetNews")
                record IP.srchost IP.dsthost in mail.hosts matrix-sym;

           else if TCP.srcport is port.mail
                record IP.srchost IP.dsthost in mail.hosts;
         }

        Now, all this needs to be conditional upon packets coming and
        going through a specific gateway.  This requires a very
        redundant configuration file, but the good news is that the
        redundancy does not effect either the CPU time or memory space
        required for data collection. Assuming the _e_n_u_m command of the
        previous example, the complete _a_t_t_a_c_h command can be written
        as:













        Braden & DeSchon                                      [Page 63]




        NNStat-Internet Statistics Package                  Release 3.0


         attach {

         if  Ether.src is eqf(08:00:2b:03:4a:e7) {
           if TCP.dstport is port.telnet
                      setf("Telnet", "Whois", "Finger", "rwho|rlogin")
                record IP.srchost, IP.dsthost in Telnet.hosts matrix-sym;

           else if TCP.srcport is port.telnet
                record IP.srchost, IP.dsthost in Telnet.hosts;

           else if TCP.dstport is port.ftp setf("FTP data", "FTP", "TFTP")
                record IP.srchost IP.dsthost in ftp.hosts matrix-sym;

           else if TCP.srcport is port.ftp
                record IP.srchost IP.dsthost in ftp.hosts;

           else if TCP.dstport is port.mail
                      setf("SMTP", "X.400", "X.400-SND", "NetNews")
                record IP.srchost IP.dsthost in mail.hosts matrix-sym;

           else if TCP.srcport is port.mail
                record IP.srchost IP.dsthost in mail.hosts;
         }
         else if  Ether.dst is eqf(08:00:2b:03:4a:e7) {
           if TCP.dstport is port.telnet
                record IP.srchost, IP.dsthost in Telnet.hosts matrix-sym;

           else if TCP.srcport is port.telnet
                record IP.srchost, IP.dsthost in Telnet.hosts;

           else if TCP.dstport is port.ftp
                record IP.srchost IP.dsthost in ftp.hosts matrix-sym;

           else if TCP.srcport is port.ftp
                record IP.srchost IP.dsthost in ftp.hosts;

           else if TCP.dstport is port.mail
                record IP.srchost IP.dsthost in mail.hosts matrix-sym;

           else if TCP.srcport is port.mail
                record IP.srchost IP.dsthost in mail.hosts;
          }
         }

        In Release 3.0, there is even more good news.  This example can  |
        be streamlined using symif and/or select statements.             |

        First, we can use symif statements to collapse pairs of the      |



        Braden & DeSchon                                      [Page 64]




        NNStat-Internet Statistics Package                  Release 3.0


        inner if statements:                                             |

        attach {

         if  Ether.src is eqf(08:00:2b:03:4a:e7) {
           symif TCP.dstport is port.telnet
                      setf("Telnet", "Whois", "Finger", "rwho|rlogin")
              record IP.srchost, IP.dsthost in Telnet.hosts matrix-sym;

           else symif TCP.dstport is port.ftp
                      setf("FTP data", "FTP", "TFTP")
              record IP.srchost IP.dsthost in ftp.hosts matrix-sym;

           else symif TCP.dstport is port.mail
                      setf("SMTP", "X.400", "X.400-SND", "NetNews")
              record IP.srchost IP.dsthost in mail.hosts matrix-sym;
         }
         else if  Ether.dst is eqf(08:00:2b:03:4a:e7) {
           symif TCP.dstport is port.telnet
              record IP.srchost, IP.dsthost in Telnet.hosts matrix-sym;

           else symif TCP.dstport is port.ftp
              record IP.srchost IP.dsthost in ftp.hosts matrix-sym;

           else symif TCP.dstport is port.mail
              record IP.srchost IP.dsthost in mail.hosts matrix-sym;
         }
        }

        Now, due to the symmetry of the example, we can use another      |
        symif statement for the outer alternative:                       |

         attach {

         symif  Ether.src is eqf(08:00:2b:03:4a:e7) {
           symif TCP.dstport is port.telnet
                    setf("Telnet", "Whois", "Finger", "rwho|rlogin")
              record IP.srchost, IP.dsthost in Telnet.hosts matrix-sym;

           else symif TCP.dstport is port.ftp
                    setf("FTP data", "FTP", "TFTP")
              record IP.srchost IP.dsthost in ftp.hosts matrix-sym;

           else symif TCP.dstport is port.mail
                    setf("SMTP", "X.400", "X.400-SND", "NetNews")
              record IP.srchost IP.dsthost in mail.hosts matrix-sym;
         }
        }



        Braden & DeSchon                                      [Page 65]




        NNStat-Internet Statistics Package                  Release 3.0


        Alternatively, select statements can be used in the inner        |
        nesting.  Here is another equivalent program:                    |

        attach {
         symif  Ether.src is eqf(08:00:2b:03:4a:e7) {
           select TCP.dstport selectSport {
              case ("Telnet", "Whois", "Finger", "rwho|rlogin"):
                  record IP.srchost, IP.dsthost in Telnet.hosts matrix-sym;

              case ("FTP data", "FTP", "TFTP"):
                  record IP.srchost IP.dsthost in ftp.hosts matrix-sym;

              case ("SMTP", "X.400", "X.400-SND", "NetNews"):
                  record IP.srchost IP.dsthost in mail.hosts matrix-sym;

              default:
                  select TCP.srcport  selectDport {
                      case ("Telnet", "Whois", "Finger", "rwho|rlogin"):
                         record IP.srchost, IP.dsthost in Telnet.hosts
                                                                   matrix-sym;

                     case ("FTP data", "FTP", "TFTP"):
                         record IP.srchost IP.dsthost in ftp.hosts matrix-sym;

                     case ("SMTP", "X.400", "X.400-SND", "NetNews"):
                         record IP.srchost IP.dsthost in mail.hosts matrix-sym;
                  }  #end of select TCP.srcport

            }  # end of select TCP.dstport
          }
        }




















        Braden & DeSchon                                      [Page 66]




        NNStat-Internet Statistics Package                  Release 3.0


        _A_p_p_e_n_d_i_x _D - _A_t_t_a_c_h _E_r_r_o_r _M_e_s_s_a_g_e_s


        This section lists the error messages that may occur in
        processing an attach command.


        *    ATTACH error - Bad field name: <field name>

             The specified string is not the name of any defined field.
             The valid field names can be obtained at any time using
             "show ?".

        *    ATTACH error - Class Conflict for: <object name>

             Two invocations of the same object specify conflicting
             class names.

        *    ATTACH error - Parm list conflict for: <object name>

             Two invocations of the same object specify conflicting
             parameter lists.

        *    ATTACH error - Unknown class for new object: <object name>

             In the first invocation of an object, no class has been
             specified.

        *    ATTACH error - Conflicting data type: <object name>

             The same object is being invoked on different fields
             thathave different types and are therefore incompatible.

        *    ATTACH error - Conflicting field size: <object name>

             The same object is being invoked on different fields that
             have different lengths and are therefore incompatible.

        *    ATTACH error - Cannot start with <input text>

             Syntax error.

        *    ATTACH error - Syntax error at <input text>

        *    ATTACH error - No matching enum for <text>

             Unable to find matching enum string for symbolic parameter
             value.



        Braden & DeSchon                                      [Page 67]




        NNStat-Internet Statistics Package                  Release 3.0


        *    ATTACH error - Unknown name: <string>

             Unknown host domain name used as a parameter value.
















































        Braden & DeSchon                                      [Page 68]




        NNStat-Internet Statistics Package                  Release 3.0


        _A_p_p_e_n_d_i_x _E - _S_u_m_m_a_r_y _o_f _E_a_r_l_i_e_r _R_e_l_e_a_s_e_s


        Release 2.4 contained following important changes:

        *    It incorporates the few minor bug fixes reported for
             Release 2.3.

        *    It includes four new frequency distribution object
             classes, to collect total bytes as well as a packet count
             in each bin.  These new objects are:

             _f_r_e_q-_o_n_l_y-_b_y_t_e_s (FOB)

             _f_r_e_q-_a_l_l-_b_y_t_e_s (FAB)

             _m_a_t_r_i_x-_a_l_l-_b_y_t_e_s (MAB)

             _m_a_t_r_i_x-_s_y_m-_b_y_t_e_s (MSB)

        The packet length is an IMPLICIT parameter to these objects; as
        a result, their usage is exactly the same as the corresponding
        objects _f_r_e_q-_o_n_l_y, _f_r_e_q-_a_l_l, _m_a_t_r_i_x-_a_l_l, and _m_a_t_r_i_x-_s_y_m.

        *    In connection with this these new objects, a new virtual
             field named "length" contains the total packet length
             exclusive of the Ethernet header.  For an IP datagram,
             this will have the same value as "IP.length", which is
             retained for compatibility.

        *    The display format for a _f_r_e_q-_o_n_l_y object has been changed
             slightly, to be consistent with _f_r_e_q-_o_n_l_y-_b_y_t_e_s object.

        *    A mechanism has been added to allow a remote _r_s_p_y or
             _c_o_l_l_e_c_t to query a _s_t_a_t_s_p_y about its version number.  This
             scheme allows the introduction of version numbers
             compatibly with earlier versions.  Specifically, when rspy
             opens a new TCP connection, it first sends a new command
             VERSION; statspy replies with its version string.  Earlier
             versions of statspy will reply with an error, which
             identifies their antiquity.  This scheme will allow
             possible future changes in the network encoding of remote
             commands.

        *    A "terse" mode has been added to _c_o_l_l_e_c_t, to reduce the
             volume of data collected over a long time period.  The
             main changes are to suppress the percentages and to
             truncate trailing zero bytes in network addresses.



        Braden & DeSchon                                      [Page 69]




        NNStat-Internet Statistics Package                  Release 3.0


        Release 2.3 (November 6, 1989) contained the following
        important changes:

        *    Supports Sun 4 (SPARC) hardware, based on code provided by
             Phil Wood of the Los Alamos National Laboratory.

        *    Provides access controls, heavily based on code developed
             for the NSFnet backbone by Dave Katz of Merit.

        *    Includes support for running statspy on a PC RT.  This
             support was developed by Dave Katz for use in the
             IBM/Merit NSFnet backbone packet switches.

        *    Includes two new virtual fields for subnetted networks,
             inspired by code supplied by Alan Stebbens of UCSB.

        *    Includes additional 'collect' parameters:  -u (universal
             time) and -m (mode), and a new statspy parameter: -s
             scheduling_priority. These were provided by Dave Katz.

        *    Provides for statspy finding a default Ethernet interface
             by taking the first entry in the kernel's interface list.





























        Braden & DeSchon                                      [Page 70]

