TDParser
Programmers Guide¶ ↑
TDParser
is a Ruby component that helps us to construct a top-down parser using method calls. This document describes how to use TDParser
in two styles. Both of styles are similar to one of JavaCC on the surface. However, one is a style in which we define rules of a grammar as methods (like shown in sample4.rb
). The other is a style in which each rule is defined as if it is a property of a grammar (see also sample5.rb
).
Defining Rules in Module¶ ↑
The following class is a parser class, and it accepts expressions that consists of digits and +
.
class MyParser include TDParser def expr token(/\d+/) - token("+") - rule(:expr) >> proc{|x| x[0].to_i + x[2] } | token(/\d+/) >> proc{|x| x[0].to_i } end end
In this class, the method expr
represents the following production rule.
expr := int '+' expr | int
In addition, at the first line of the method expr
, values accepted by token(/\d+/)
, token("+")
and rule(:expr)
are assigned to x[0]
, x[1]
and x[2]
respectively. After that, in order to parse 1 + 2
, we first split it into an array of tokens like ["1", "+", "2"]
, and then call the parse
method of a parser object, which is created by MyParser.new()
, as follows.
parser = MyParser.new() parser.expr.parse(["1", "+", "2"])
Note that we can pass one of the following objects to the parse method.
-
an Enumerable object. E.g.:
expr.parse(["1", "+", "2"])
-
an object which has methods ‘shift’ and ‘unshift’. E.g.:
expr.parse(TDParser::TokenGenerator{|x| x.yield("1"); x.yield("+"); x.yield("2") })
-
a block. E.g.:
expr.parse{|x| x.yield("1"); x.yield("+"); x.yield("2") }
In that syntax, +
is right-associative. However, we can’t write as follows.
def expr rule(:expr) - token("+") - token(/\d+/) >> proc{|x| x[0].to_i + x[2].to_i } token(/\d+/) >> proc{|x| x[0].to_i } end
This problem is called left-recursion problem. So we have to use one of the following rules instead.
def expr token(/\d+/) - (token("+") - token(/\d+/))*0 >> proc{|x| x[1].inject(x[0]){|acc,y| case y[0] when "+" acc + y[1] end } } end def expr # javacc style n = nil (token(/\d+/) >> proc{|x| n = x }) - (token("+") - rule(/\d+/) >> proc{|y| case y[0] when "+" n += y[1].to_i end })*0 >> proc{|x| n } end
In the rules, (...)*N
represents N or more rules (...)
. x[1]
has multiple sequences of tokens accepted by (...)*0
. For example, if ["1", "+","1","+","2"]
is parsed by the rule: token(/\d+/) - (token("+") - token(/\d+/))*0
, we obtain [["+", "1"], ["+", "2"]]
by x[1]
.
Defining Rules using TDParser.define()
¶ ↑
The rule defined in the first sample script, shown in the previous section, can also be defined as follows.
parser = TDParser.define{|g| g.expr = g.token(/\d+/) - g.token("+") - g.expr >> proc{|x| x[0].to_i + x[2] } | g.token(/\d+/) >> proc{|x| x[0].to_i } }
(See also sample5.rb
and sample6.rb
)
Parser Combinators¶ ↑
-
Constructors
-
token(obj)
-
rule(method)
any()
-
any token
none()
-
no more token
empty()
-
empty
fail()
-
failure
backref(label)
-
back reference
stackref(stack)
-
stack reference
-
-
Operators
rule - rule
-
sequence
rule | rule
-
choice
rule * n
-
iteration
rule * n..m
-
iteration
rule / label
-
label
rule % stack
-
stack
~ rule
-
negative lookahead
-
Utility Functions
leftrec(base, rule1, ..., ruleN, &action)
-
This constructs the following rule:
base - ruleN* >> action' | ... | base - rule1* >> action' | fail()
rightrec(rule1, ..., ruleN, base, &action)
-
This constructs the following rule:
ruleN* - base >> action' | ... | rule1* - base >> action' | fail()
-
chainl(base, infix1, ..., infixN, &action)
-
chainr(base, infix1, ..., infixN, &action)
StringTokenizer
¶ ↑
There is a simple tokenizer called TDParser::StringTokenizer
in the library tdparser/utils
. (See MyParser#parse
in sample2.rb
)