To build a parser you must provide:
After a parser is generated it can be used for parsing strings of the language.
We assume familiarity with some basic notions of formal languages as regular expressions, context-free grammars, LR grammars and LR parsing [ASU86,HMU00,GJ90]. Some knowledge of the Python language [Lut96] is also required.
The class Lexer implements a lexical analyser based on Python regular expressions. An instance must be initialised with a tokenizer, which is, a list of tuples:
(re,funct,op?)
where:
Restriction: if a keyword is substring of another its rule must appear after the larger keyword for the obvious reasons...The following list presents a tokenizer for regular expressions:
l = [("\s+",""),
     ("@epsilon",lambda x: (x,x)),
     ("@empty_set",lambda x: (x,x)),
     ("[A-Za-z0-9]",lambda x: ("id",x)),
     ("[+]",lambda x: ("+",x),("+",100,'left')),
     ("[*]",lambda x: (x,x),("*",300,'left')),
     ("\(|\)",lambda x: (x,x)) ]
A lexical analyser is created by instantiating a Lexer class:
>>> from yappy.parser import * >>> a=Lexer(l)
>>> from yappy.parser import *
>>> a=Lexer(l)
>>> a.scan("(a + b)* a a b*")
[('(', '('), ('id', 'a'), ('+', '+'), ('id', 'b'), (')', ')'), 
('*', '*'), ('id', 'a'), ('id', 'a'), ('id', 'b'), ('*', '*')]
>>>
See Yappy documentation for more details.
The class LRParser implements a LR parser generator. An instance must be initialised with:
(LeftHandSide,RightHandSide,SemFunc,Prec)with
Restriction: The first production is for the start symbol.
Here is an unambiguous grammar for regular expressions:
grammar = [("r",["r","|","c"],self.OrSemRule),
                        ("r",["c"],self.DefaultSemRule),
                    ("c",["c","s"],self.ConcatSemRule),
                    ("c",["s"],self.DefaultSemRule),
                    ("s",["s","*"],self.StarSemRule),
                    ("s",["f"],self.DefaultSemRule),
                    ("f",["b"],self.DefaultSemRule),
                    ("f",["(","r",")"],self.ParSemRule),
                    ("b",["id"],self.BaseSemRule),
                    ("b",["@empty_set"],self.BaseSemRule),
                    ("b",["@epsilon''],self.BaseSemRule)]
The previous description can be easily rephrased in a more user-friendly manner. We provide two ways:
The previous grammar can be rewritten as:
 grammar = grules([("r -> r | c",self.OrSemRule),
                   ("r -> c",self.DefaultSemRule),
                   ("c -> c s",self.ConcatSemRule),
                   ("c -> s",self.DefaultSemRule),
                   ("s -> s *",self.StarSemRule),
                   ("s -> f",self.DefaultSemRule),
                   ("f -> b",self.DefaultSemRule),
                   ("f -> ( r )",self.ParSemRule),
                   ("b -> id",self.BaseSemRule),
                   ("b -> @empty_set",self.BaseSemRule),
                   ("b -> @epsilon",self.BaseSemRule)])
We can also write an ambiguous grammar if we provided precedence information, that allows to solve conflicts (shift-reduce).
grammar = grules([("reg -> reg | reg",self.OrSemRule),
                  ("reg -> reg reg",self.ConcatSemRule,(200,'left')),
                  ("reg -> reg *",self.StarSemRule),
                  ("reg -> ( reg )",self.ParSemRule),
                  ("reg -> id",self.BaseSemRule),
                  ("reg ->  @empty_set",self.BaseSemRule),
                  ("reg ->  @epsilon",self.BaseSemRule)
                   ])
  grammar ="""  reg -> reg + reg {{ self.OrSemRule }} |
               reg reg {{ self.ConcatSemRule }} // 200 left  |
               reg * {{ self.StarSemRule }} |
               ( reg ) {{self.ParSemRule }} |
               id {{ self.BaseSemRule }};
            """
where:
The separators can be redefined in the tokenizer of
Yappy_grammar class. An empty rule can be 
. If no semantic rule is given, DefaultSemRule is assumed.
See Yappy documentation for more details.
As usual the semantic value of an expression will be a function of the semantic values of its parts. The semantics of a token is defined by the tokenizer 2.1. The semantic actions for grammar rules are specified by Pythonfunctions that can be evaluated in a given context. Our approach is essentially borrowed from the kjParsing package [rs00]: a semantic function takes as arguments a list with the semantic values of the RightHandSide of a rule and a context and returns a value that represents the meaning of the LeftHandSide and performs any side effects to the context.
For instance, by default the semantic value of a rule can be the semantic value of the first element of the RightHandSide:
 def DefaultSemRule(list,context=None):
    """Default  semantic rule"""
    return list[0]
Assuming the definition of some objects for regular expressions, trivial semantic rules for printing regular expressions can be:
     def OrSemRule(self,list,context):
         return "%s+%s" %(list[0],list[2])
     def ConcatSemRule(self,list,context):
         return list[0]+list[2]
     def ParSemRule(self,list,context):
         return "(%s)" %list[1]
     def BaseSemRule(self,list,context):
         return list[0]
     def StarSemRule(self,list,context):
         return list[0]+'*'
Semantic actions can also be more Bison like, if they are a string
where  represents the semantics of its argments. For
instance:  .
No error recovery is currently implemented. Errors are reported with rudimentary information, see the exception error classes in Yappy documentation.
>>>from yappy.parser import * >>>parse = LRparser(grammar,table,no_table,tabletype,operators)
Some information about LR table generated can be retrieved, by printing some attributes:
>>> print parse.cfgr
0 | ('reg', ['reg', '+', 'reg'], RegExp2.OrSemRule, ('100', 'left')) 
1 | ('reg', ['reg', 'reg'],RegExp2.ConcatSemRule, ('200', 'left')) 
2 | ('reg', ['reg', '*'],RegExp2.StarSemRule) 
3 | ('reg', ['(', 'reg', ')'], RegExp2.ParSemRule) 
4 | ('reg', ['id'], RegExp2.BaseSemRule ) 
5 | ('@S', ['reg'], DefaultSemRule) 
>>> print parse
Action table:
 
State
        +       *       (       )       id      $       #
0                       s1              s2
1                       s1              s2
2       r4      r4      r4      r4      r4      r4
3       s5      s6      s1              s2      r[]
4       s5      s6      s1      s8      s2
5                       s1              s2
6       r2      r2      r2      r2      r2      r2
7       r1      s6      r1      r1      s2      r1
8       r3      r3      r3      r3      r3      r3
9       r0      s6      r0      r0      s2      r0
 Goto table:
State
        reg     @S
0       3
1       4
3       7
4       7
5       9
7       7
9       7
If _DEBUG is set, several comments are printed during the table construction, in particular the collection of LR items.
If any of these conflicts occurs, a list of the resolved conflicts are listed and more information can be found in the Log attribute. The Log has the following attributes:
Currently no prettyprinting is available for these values.
>>>parse.parsing(a.scan("(a+b)*aab*"))
The attribute output records the grammar rules that were applied for parsing the string:
>>>parse.output [4, 4, 0, 3, 2, 4, 1, 4, 1, 4, 1, 2]If _DEBUG is set, it is possible to see each application of a table action and the values in the stack.
The Yappy class is a wrapper for defining a parser and for parsing. Basically it creates the lexical analyser and the parser. This class is a subclass of LRparser and can also define the Directories where the parsing tables are stored:
It defines the following I/O functions:
Here is a complete parser for regular expressions:
from yappy.parser import *
class ParseReg(Yappy):
     def __init__(self,no_table=0, table='tablereg'):
        grammar ="""
        reg -> reg + reg {{ self.OrSemRule }} |
               reg reg {{ self.ConcatSemRule }} // 200 left |
               reg * {{ self.StarSemRule }} |
               ( reg ) {{self.ParSemRule}} |
               id {{ self.BaseSemRule}} | 
               @empty_set {{ self.BaseSemRule}} | 
               @epsilon {{ self.BaseSemRule}} | ;
        """
        tokenize = [
        ("\s+",""),
        ("[A-Za-z0-9]",lambda x: ("id",x)),
        ("[+]",lambda x: ("+",x),("+",100,'left')),
        ("[*]",lambda x: (x,x),("*",300,'left')),
        ("\(|\)",lambda x: (x,x)) ]
        Yappy.__init__(self,tokenize,grammar,table,no_table)
     ##Semantic rules build a parse tree...
     def OrSemRule(self,list,context):
         return "(%s+%s)" %(list[0],list[2])
     def ConcatSemRule(self,list,context):
         return "(%s%s)" %(list[0],list[1])
     def ParSemRule(self,list,context):
         return "(%s)" %list[1]
     def BaseSemRule(self,list,context):
         return list[0]
     def StarSemRule(self,list,context):
         return "(%s*)" %list[0]
An instance is used as:
>>> d = ParseReg()
>>> d.input("(a+b)*aab*")
>>> (a+b)*aab*
See Yappy  documentation
or Section ![[*]](crossref.gif) for
more details.
 for
more details.
Nelma Moreira, Rogério Reis 2010-06-10