`TDParser` Programmers Guide¶ ↑

TDParser is a Ruby component that helps us to construct a top-down parser using method calls. This document describes how to use TDParser in two styles. Both of styles are similar to one of JavaCC on the surface. However, one is a style in which we define rules of a grammar as methods (like shown in sample4.rb). The other is a style in which each rule is defined as if it is a property of a grammar (see also sample5.rb).

Defining Rules in Module¶ ↑

The following class is a parser class, and it accepts expressions that consists of digits and +.

class MyParser
  include TDParser

  def expr
    token(/\d+/) - token("+") - rule(:expr) >> proc{|x| x[0].to_i + x[2] } |
    token(/\d+/) >> proc{|x| x[0].to_i }
  end
end

In this class, the method expr represents the following production rule.

expr := int '+' expr
      | int

In addition, at the first line of the method expr, values accepted by token(/\d+/), token("+") and rule(:expr) are assigned to x[0], x[1] and x[2] respectively. After that, in order to parse 1 + 2, we first split it into an array of tokens like ["1", "+", "2"], and then call the parse method of a parser object, which is created by MyParser.new(), as follows.

parser = MyParser.new()
parser.expr.parse(["1", "+", "2"])

Note that we can pass one of the following objects to the parse method.

an Enumerable object. E.g.: expr.parse(["1", "+", "2"])

an object which has methods ‘shift’ and ‘unshift’. E.g.:

expr.parse(TDParser::TokenGenerator{|x|
             x.yield("1"); x.yield("+"); x.yield("2")
           })

a block. E.g.: expr.parse{|x| x.yield("1"); x.yield("+"); x.yield("2") }

In that syntax, + is right-associative. However, we can’t write as follows.

def expr
  rule(:expr) - token("+") - token(/\d+/) >> proc{|x| x[0].to_i + x[2].to_i }
  token(/\d+/) >> proc{|x| x[0].to_i }
end

This problem is called left-recursion problem. So we have to use one of the following rules instead.

def expr
  token(/\d+/) - (token("+") - token(/\d+/))*0 >> proc{|x|
    x[1].inject(x[0]){|acc,y|
      case y[0]
      when "+"
        acc + y[1]
      end
    }
  }
end

def expr  # javacc style
  n = nil
  (token(/\d+/) >> proc{|x| n = x }) -
  (token("+") - rule(/\d+/) >> proc{|y|
    case y[0]
    when "+"
      n += y[1].to_i
    end
  })*0 >> proc{|x| n }
end

In the rules, (...)*N represents N or more rules (...). x[1] has multiple sequences of tokens accepted by (...)*0. For example, if ["1", "+","1","+","2"] is parsed by the rule: token(/\d+/) - (token("+") - token(/\d+/))*0, we obtain [["+", "1"], ["+", "2"]] by x[1].

Defining Rules using `TDParser.define()`¶ ↑

The rule defined in the first sample script, shown in the previous section, can also be defined as follows.

parser = TDParser.define{|g|
  g.expr =
    g.token(/\d+/) - g.token("+") - g.expr >> proc{|x| x[0].to_i + x[2] } |
    g.token(/\d+/) >> proc{|x| x[0].to_i }
}

(See also sample5.rb and sample6.rb)

Parser Combinators¶ ↑

Constructors
- token(obj)
- rule(method)
- any()
  
  any token
- none()
  
  no more token
- empty()
  
  empty
- fail()
  
  failure
- backref(label)
  
  back reference
- stackref(stack)
  
  stack reference
Operators
- rule - rule
  
  sequence
- rule | rule
  
  choice
- rule * n
  
  iteration
- rule * n..m
  
  iteration
- rule / label
  
  label
- rule % stack
  
  stack
- ~ rule
  
  negative lookahead
Utility Functions
- leftrec(base, rule1, ..., ruleN, &action)
  
  This constructs the following rule:
```
base - ruleN* >> action' |
... |
base - rule1* >> action' |
fail()
```
- rightrec(rule1, ..., ruleN, base, &action)
  
  This constructs the following rule:
```
ruleN* - base >> action' |
... |
rule1* - base >> action' |
fail()
```
- chainl(base, infix1, ..., infixN, &action)
- chainr(base, infix1, ..., infixN, &action)

`StringTokenizer`¶ ↑

There is a simple tokenizer called TDParser::StringTokenizer in the library tdparser/utils. (See MyParser#parse in sample2.rb)

TDParser Programmers Guide¶ ↑

Defining Rules in Module¶ ↑

Defining Rules using TDParser.define()¶ ↑

Parser Combinators¶ ↑

StringTokenizer¶ ↑

`TDParser` Programmers Guide¶ ↑

Defining Rules using `TDParser.define()`¶ ↑

`StringTokenizer`¶ ↑