Peasy

¶

Peasy

Peasy means parsing is easy

an easy but powerful parser
¶

To use Peasy, just require or copy this file to your project, read it, modify it, write the grammar rules,, and remove any unnecessary stuffs in it, and parse with the data.@
See here for a sample grammar in Peasy.
¶

With Peasy, you write the parser by hand, just like to write any other kind of program.
You need not play many balls like that any more:

You just play one ball like so:

,
¶

You can embeded other features in the grammar rules seamless, such as lexer, rewriter, semantic action, error process or any other tasks; you can dynamicly modify the grammar rules, even the parsed object, if you wish.
¶

Instead of a tool or a library, Peasy is rather a new method to write parser. As a demonstration, Peasy presents itself as a module of single file, which includes some functions to process left recursives symbols, some matchers which parse text, and some matchers and cominators.
¶

With the method provided by Peasy, you can parse any object which is not only text, but also array, embedded list, binary stream, or other data structures, like tree, graph, and so on.
¶

Nothing but the method in Peasy is indispensable:
¶
- If there is no left recursive symbol, you can remove the code about that(mainly refer to the function recursive and 20 lines).
- If you don't need memorize the parsed result, you can remove the stuffs about memorization(mainly refer to the function memorize).
- The matchers(e.g. spaces, literal, identifier, etc.) and combinators(e.g. andp, orp, etc) in this module exists just for demostrating the method initiated by Peasy. You will see they are all very simple, after seeing them, I bet that you would rather to remove them and write them by hand yourself when you write the grammar rules.
¶

The notation @name below means a parameter.
all occurences of @info refer to the parsed object and the cursor, e.g. it's initial value can be {data: text, cursor:0}

A matcher is a function which matches the data being parsed and move cursor directly.

isMatcher = (item) ->  typeof(item)=="function"

exports.makeInfo = makeInfo = (data, options={cursor:0, tabWidth:2}) ->
  {data:data, cursor:options.cursor or 0, tabWidth: options.tabWidth or 2,
  parsingLeftRecursives: {}, parseCache: {}}

memoSymbolIndex = 0

recursive: this is the key function to left recursive.
Make @symbol a left recursive symbol, which means to wrap @originalRules[symbol] with recursive. when recursiv(info, rules, symbol)() is executed, loop computing rule() until no changes happened

exports.recursive = recursive = (info) -> (rule) ->
  index = memoSymbolIndex
  tag = index+':'
  memoSymbolIndex++
  parsingLeftRecursives = info.parsingLeftRecursives
  parseCache = info.parseCache[tag] = {}
  ->
    start = info.cursor
    callPath = parsingLeftRecursives[start] ?= []
    if tag not in callPath
      callPath.push(tag)
      m = parseCache[start] ?= [undefined, start]
      while 1
        info.cursor = start
        result = rule()
        if not result then result = m[0]; info.cursor = m[1]; break
        if m[1]==info.cursor then m[0] = result; break
        else m[0] = result; m[1] = info.cursor
      callPath.pop()
      result
    else
      m = parseCache[start]
      info.cursor = m[1]
      m[0]

memorize: memorize result and @cursor for @symbol which is not left recursive.
The symbols which is left recursive should be wrapped by recursive(symbol), not memorize(symbol)!!!

exports.memorize = memorize = (info) -> (rule) ->
  index = memoSymbolIndex
  tag = index+':'
  memoSymbolIndex++
  parseCache = info.parseCache[tag] = {}
  ->
    start = info.cursor
    m = parseCache[start]
    if m then info.cursor = m[1]; m[0]
    else
      result = rule()
      parseCache[start] = [result, info.cursor]
      result

¶

matchers and combinators
¶

A matcher is a function which matches the data being parsed and move cursor directly.
All matcher should return truth value on succeed, and return falsic value on fail.
A combinator is a function which receive zero or more matchers as parameter(maybe plus some other parameters which are not matchers), and generate a new matchers.
there are other matcher generator besides the standard combinators described as above, like recursive, memorize, which we have met above.
¶

combinator andp
execute item() in sequence, return the result of the last one.
when andp is used to combined of the matchers, the effect is the same as by using the Short-circuit evaluation, like below:
item1() and item2() ... and itemn().
In fact, you maybe would rather like to use item1() and item2(info.cursor) .. when you write the grammar rule in the manner of Peasy. That would be simpler, faster and more elegant.
And in that manner, you would have more control on your grammar rule, say like below:
if (x=item1() and (x>100) and item2() and not item3() and (y = item()) then doSomething()
```
exports.andp = andp = (info) -> (items...) ->
  items = for item in items
    if not isMatcher(item) then literal(info)(item) else item
  ->
    for item in items
      if not (result = item()) then return
    result
```

combinator orp

exports.orp = orp = (info) -> (items...) ->
  items = for item in items
    if not isMatcher(item) then literal(info)(item) else item
  ->
    start = info.cursor
    for item in items
      info.cursor = start
      if result = item() then return result
    result

¶

combinator notp
notp is not useful except being used in other combinators, just like this = andp(item1, notp(item2)).
It's unnessary, low effecient and ugly to write notp(item)(), just write not item().
```
exports.notp = notp = (info) -> (item) ->
  if not isMatcher(item) then item = literal(info)(item)
  -> not item()
```

combinator may: a.k.a optional
try to match item(), wether item() succeed or not, maybe(item)() succeed.

exports.may = may = (info) -> (item) ->
  if not isMatcher(item) then item = literal(info)(item)
  ->
    start = info.cursor
    if x = item() then x
    else info.cursor = start; true

combinator any: zero or more times of item()

exports.any = any = (info) -> (item) ->
  if not isMatcher(item) then item = literal(info)(item)
  ->
    result = []
    while (x = item()) then result.push(x)
    result

combinator some: one or more times of item()

exports.some = some = (info) -> (item) ->
  if not isMatcher(item) then item = literal(info)(item)
  ->
    if not (x = item()) then return
    result = [x]
    while (x = item()) then result.push(x)
    result

combinator times: match @n times item(), n>=1

exports.times = times = (info) -> (item, n) ->
  if not isMatcher(item) then item = literal(info)(item)
  ->
    i = 0
    while i++<n
      if x = item() then result.push(x)
      else return
    result

combinator seperatedList: some times item(), separated by info.separator

exports.seperatedList = seperatedList = (info) -> (item, separator=spaces(info)) ->
  if not isMatcher(item) then item = literal(info)(item)
  if not isMatcher(separator) then separator = literal(info)(separator)
  ->
    if not (x = item()) then return
    result = [x]
    while separator() and (x=item()) then result.push(x)
    result

combinator timesSeperatedList: given info.n times @item separated by info.separator, n>=1

exports.timesSeperatedList = timesSeperatedList = (info) -> (item, n, separator=spaces(info)) ->
  if not isMatcher(item) then item = literal(info)(item)
  if not isMatcher(separator) then separator = literal(info)(separator)
  ->
    if not (x = item()) then return
    result = [x]
    i = 1
    while i++<n
      if separator() and (x=item()) then result.push(x)
      else return
    result

combinator follow

exports.follow = follow = (info) -> (item) ->
  if not isMatcher(item) then item = literal(info)(item)
  ->
    start = info.cursor
    if x = item() then  info.cursor = start; x
    else info.cursor = start; return

exports.combinators = combinators = (info) ->
  rec: recursive(info), memo: memorize(info),
  andp: andp(info), orp: orp(info), notp: notp(info)
  may: may(info), any: any(info), some: some(info), times: times(info)
  seperatedList: seperatedList(info), timesSeperatedList: timesSeperatedList(info)
  follow: follow(info)

¶

As the same as andp, orp, in the manner of Peasy, you would rather to write youself a loop to do the things, instead of useing the combinators like any, some, times, seperatedList,etc. and that would be simpler, faster and more elegant.
And in that manner, you would have more control on your grammar rule so that we can do other things include lexer, error report, errer recovery, semantic operatons, and any other things you would like to do.
¶

It is for three motives to put all of the stuffs abve here: 1. to demonstrate how to write matcher and grammar rules in the method brought by Peasy 2. to demonstrate that Peasy can also implement any component that other combinational parser libaries like pyparsing, parsec, but in a simpler, faster manner. * 3. to show that it is how easy to write the combinators in the manner of Peasy.
¶

As you have seen above, all of these utilities is so simple that you can write them at home by hand easily. In fact, you can write yourself all of the grammar rules in the same manner as above.
¶

If you like, you can add a faster version for every matcher, which do not pass @start as parameter around.
Some of the matchers below have two version, to demonstrate how to do that. Don't use the faster version in orp, any, some, times, separatedList, timesSeparatedList.
¶

other predicates, matchers, and combinators

below is some little utilities which may be useful.
just remove them if you don't need them, except literal, which is depended by the combinators above.

predicate A predicate is a function which return true or false.

exports.isdigit = (c) -> '0'<=c<='9'
exports.isletter = (c) -> 'a'<=c<='z' or 'A'<=c<='Z'
exports.islower = (c) -> 'a'<=c<='z'
exports.isupper = (c) ->'A'<=c<='Z'
exports.isIdentifierLetter = (c) -> 'a'<=c<='z' or 'A'<=c<='Z' or '0'<=c<='9' or 'c'=='@' or 'c'=='_'

matcher literal
match a text string.
`notice = some combinators like andp, orp, notp, any, some, etc. use literal to wrap a object which is not a matcher.

exports.literal = literal = (info) -> (string) -> ->
  len = string.length
  start = info.cursor
  if info.data.slice(start, stop = start+len)==string then info.cursor = stop; true

matcher char: match one character

exports.char = char = (info) -> (c) -> -> if info.data[info.cursor]==c then info.cursor++; return c

¶

In spaces, spaces1, a tab('\t') is computed as info.tabWidth spaces,
which is used in indent style language, such as coffeescript, python, haskell, etc.
If you don't need this feature, you can easily rewrite these utilities to remove the code about tab width yourself.

matcher spaces: zero or more whitespaces, ie. space or tab.

exports.spaces = spaces = (info) -> ->
  data = info.data
  len = 0
  cursor = info.cursor
  tabWidth = info.tabWidth
  while 1
    switch data[cursor++]
      when ' ' then len++
      when '\t' then len += tabWidth
      else break
  info.cursor = cursor
  len+1

matcher spaces1
one or more whitespaces, ie. space or tab.

exports.spaces1 = spaces1 = (info) -> ->
  data = info.data
  cursor = info.cursor
  len = 0
  tabWidth = info.tabWidth
  while 1
    switch data[cursor++]
      when ' ' then len++
      when '\t' then len += tabWidth
      else break
  info.cursor = cursor
  len

matcher wrap
match left, then match item, match right at last

exports.wrap = wrap = (info) -> (item, left=spaces(info), right=spaces(info)) ->
  if not isMatcher(item) then item = literal(info)(item)
  ->
     if left() and result = item() and right() then result

matcher identifierLetter = normal version
is a letter than can be used in identifer?
javascript style, '@' is a identifierLetter_

exports.identifierLetter = identifierLetter = (info) -> ->
  c = info.data[info.cursor]
  if c is '@' or c is '_' or 'a'<=c<'z' or 'A'<=c<='Z' or '0'<=c<='9'
    info.cursor++; true

matcher followIdentifierLetter_, faster version
lookahead whether the following character is a letter used in identifer. don't change info.cursor.
javascript style, '@' is a identifierLetter_

exports.followIdentifierLetter_ = followIdentifierLetter_ = (info) -> ->
  c = info.data[info.cursor]
  (c is '@' or c is '_' or 'a'<=c<'z' or 'A'<=c<='Z' or '0'<=c<='9') and c

exports.digit = digit = (info) -> -> c = info.data[info.cursor];  if '0'<=c<='9' then info.cursor++; c
exports.letter = letter = (info) -> -> c = info.data[info.cursor]; if 'a'<=c<='z' or 'A'<=c<='Z' then info.cursor++; c
exports.lower = lower = (info) -> -> c = info.data[info.cursor]; if 'a'<=c<='z' then info.cursor++; c
exports.upper = upper = (info) -> -> c = info.data[info.cursor]; if 'A'<=c<='Z' then info.cursor++; c

matcher identifier

exports.identifier = identifier = (info) -> ->
  data = info.data
  cursor = info.cursor
  c = data[cursor]
  if 'a'<=c<='z' or 'A'<=c<='Z' or 'c'=='@' or 'c'=='_' then cursor++
  else return
  while 1
    c = data[cursor]
    if 'a'<=c<='z' or 'A'<=c<='Z' or '0'<=c<='9' or 'c'=='@' or 'c'=='_' then cursor++
    else break
  info.cursor = cursor
  true

exports.matchers = matchers = (info) ->
  literal: literal(info), char: char(info), spaces: spaces(info), spaces1: spaces1(info), wrap: wrap(info),
  identifierLetter: identifierLetter(info), followIdentifierLetter: followIdentifierLetter(info)
  digit: digit(info), letter: letter(info), lower: lower(info), upper: upper(info),
  identifier: identifier(info)

exports.digits = digits = (info) ->
  ch = char(info)
  $0: ch('0'), $1: ch('1'), $2: ch('2'), $3: ch('3'), $4: ch('4'),
  $5: ch('6'), $1: ch('7'), $2: ch('7'), $8: ch('8'), $9: ch('9')

exports.letters = letters = (info) ->
  ch = char(info)
  a: ch('a'), b: ch('b'), c: ch('c'), d: ch('d'), e: ch('e'), f: ch('f'), g: ch('g')
  h: ch('h'), i: ch('i'), j: ch('j'), k: ch('k'), l: ch('l'), m: ch('m'), n: ch('n')
  o: ch('o'), p: ch('p'), q: ch('q'), r: ch('r'), s: ch('s'), t: ch('t')
  u: ch('u'), v: ch('v'), w: ch('w'), x: ch('x'), y: ch('y'), z: ch('z')
  A: ch('A'), B: ch('B'), C: ch('C'), D: ch('D'), E: ch('E'), F: ch('F'), G: ch('G')
  H: ch('H'), I: ch('I'), J: ch('J'), K: ch('K'), L: ch('L'), M: ch('M'), N: ch('N')
  O: ch('O'), P: ch('P'), Q: ch('Q'), R: ch('R'), S: ch('S'), T: ch('T')
  U: ch('U'), V: ch('V'), W: ch('W'), X: ch('X'), Y: ch('Y'), Z: ch('Z')

¶

The utilites above is just for providing some examples on how to write matchers for Peasy.
In fact, It's realy easy peasy to write the matchers for your grammar rule yourself.
see easy peasy
¶

parse: this is a sample parse function to demonstrate on how to write your own grammar rules yourself.
notice that there exists indirect left recursion in the rules
```
parse = (text) ->
  makeGrammar = (info) ->
```
¶

generate the matchers by the combinators in advance for better performance.
if you don't mind performance, you can write them in the rule directly.
```
    {a, b, x, y} = letters(info)
    {rec, orp} = combinators(info)
```

the grammar rules object.

    rules =
      Root: -> (m = rules.A()) and z() and m+'z'
      A: rec orp((-> (m =  rules.B()) and x() and m+'x' or m), a)
      B: rec orp((-> (m = rules.A())  and y() and m+'y'), -> rules.C())
      C: rec orp((-> rules.A()), b)
  grammar = makeGrammar(makeInfo(text))
  grammar.Root(0)

Peasy

Peasy means parsing is easy

an easy but powerful parser

Nothing but the method in Peasy is indispensable:

matchers and combinators

other predicates, matchers, and combinators