TDParser Programmers Guide¶ ↑
TDParser is a Ruby component that helps us to construct a top-down parser using method calls. This document describes how to use TDParser in two styles. Both of styles are similar to one of JavaCC on the surface. However, one is a style in which we define rules of a grammar as methods (like shown in sample4.rb). The other is a style in which each rule is defined as if it is a property of a grammar (see also sample5.rb).
Defining Rules in Module¶ ↑
The following class is a parser class, and it accepts expressions that consists of digits and +.
class MyParser include TDParser def expr token(/\d+/) - token("+") - rule(:expr) >> proc{|x| x[0].to_i + x[2] } | token(/\d+/) >> proc{|x| x[0].to_i } end end
In this class, the method expr represents the following production rule.
expr := int '+' expr
| int
In addition, at the first line of the method expr, values accepted by token(/\d+/), token("+") and rule(:expr) are assigned to x[0], x[1] and x[2] respectively. After that, in order to parse 1 + 2, we first split it into an array of tokens like ["1", "+", "2"], and then call the parse method of a parser object, which is created by MyParser.new(), as follows.
parser = MyParser.new() parser.expr.parse(["1", "+", "2"])
Note that we can pass one of the following objects to the parse method.
-
an Enumerable object. E.g.:
expr.parse(["1", "+", "2"]) -
an object which has methods ‘shift’ and ‘unshift’. E.g.:
expr.parse(TDParser::TokenGenerator{|x| x.yield("1"); x.yield("+"); x.yield("2") }) -
a block. E.g.:
expr.parse{|x| x.yield("1"); x.yield("+"); x.yield("2") }
In that syntax, + is right-associative. However, we can’t write as follows.
def expr rule(:expr) - token("+") - token(/\d+/) >> proc{|x| x[0].to_i + x[2].to_i } token(/\d+/) >> proc{|x| x[0].to_i } end
This problem is called left-recursion problem. So we have to use one of the following rules instead.
def expr token(/\d+/) - (token("+") - token(/\d+/))*0 >> proc{|x| x[1].inject(x[0]){|acc,y| case y[0] when "+" acc + y[1] end } } end def expr # javacc style n = nil (token(/\d+/) >> proc{|x| n = x }) - (token("+") - rule(/\d+/) >> proc{|y| case y[0] when "+" n += y[1].to_i end })*0 >> proc{|x| n } end
In the rules, (...)*N represents N or more rules (...). x[1] has multiple sequences of tokens accepted by (...)*0. For example, if ["1", "+","1","+","2"] is parsed by the rule: token(/\d+/) - (token("+") - token(/\d+/))*0, we obtain [["+", "1"], ["+", "2"]] by x[1].
Defining Rules using TDParser.define()¶ ↑
The rule defined in the first sample script, shown in the previous section, can also be defined as follows.
parser = TDParser.define{|g| g.expr = g.token(/\d+/) - g.token("+") - g.expr >> proc{|x| x[0].to_i + x[2] } | g.token(/\d+/) >> proc{|x| x[0].to_i } }
(See also sample5.rb and sample6.rb)
Parser Combinators¶ ↑
-
Constructors
-
token(obj) -
rule(method) any()-
any token
none()-
no more token
empty()-
empty
fail()-
failure
backref(label)-
back reference
stackref(stack)-
stack reference
-
-
Operators
rule - rule-
sequence
rule | rule-
choice
rule * n-
iteration
rule * n..m-
iteration
rule / label-
label
rule % stack-
stack
~ rule-
negative lookahead
-
Utility Functions
leftrec(base, rule1, ..., ruleN, &action)-
This constructs the following rule:
base - ruleN* >> action' | ... | base - rule1* >> action' | fail()
rightrec(rule1, ..., ruleN, base, &action)-
This constructs the following rule:
ruleN* - base >> action' | ... | rule1* - base >> action' | fail()
-
chainl(base, infix1, ..., infixN, &action) -
chainr(base, infix1, ..., infixN, &action)
StringTokenizer¶ ↑
There is a simple tokenizer called TDParser::StringTokenizer in the library tdparser/utils. (See MyParser#parse in sample2.rb)