项目作者: frasieroh

项目描述 :
Packrat parser generator for C99
高级语言: C
项目地址: git://github.com/frasieroh/quxer.git
创建时间: 2019-12-07T06:50:09Z
项目社区:https://github.com/frasieroh/quxer

开源协议:MIT License

下载


quxer

quxer generates dependency-free C99-compliant PEG packrat parsers from text
files in a domain-specific language similar to EBNF. Semantic actions may be
optionally embeded to construct an abstract syntax tree. The backend of the
parser writer was used to boostrap the parser, so the core of the program is
self-hosted.

Compilation

Compile quxer with the standard cmake procedure:

  1. mkdir build && cd build
  2. CC=gcc cmake ..
  3. make

The parsers quxer generates must also be compiled with internal.c/h in
shared.

Example

quxer is self-hosted, so a brief rundown of its inner workings are a good
example of its usage. Internally and prior to generation, grammars are
represented as a set of parser trees comprised of operators, terminals,
nonterminals, etc. A procedure traverses these trees with a bunch of templates
to write the resulting source file. Writing grammars symbolically in C is
tedious, so the challenge is to convert a text file into the symbolic
representation. PEGs themselves can be used to solve this problem! The following
grammar parses the grammar of PEGs and adds semantic actions to produce the
symbolic representation (technically, a wrapper around it):

  1. ~grammar = (ws r : rule)+ ws {{ result = ast_grammar(r, countr); }}
  2. ~ws = (" " / "\n" / "\t")*
  3. ~wsp = (" " / "\n" / "\t")+
  4. ~rule = "~" <name> ws "=" ws b : body {{
  5. result = ast_rule(c(0), *b);
  6. }}
  7. ~name = [A-Za-z0-9_]+
  8. ~body = o : alt {{ result = *o; }}
  9. ~alt = first : seq (ws "/" ws rest : seq)* {{
  10. result = ast_Alt(first, rest, countrest + 1);
  11. }}
  12. ~seq = first : operator (wsp rest : operator)* {{
  13. result = ast_Seq(first, rest, countrest + 1);
  14. }}
  15. ~operator = o : prefix (ws <code>)? {{
  16. if (ccount) { result = ast_with_code(c(0), *o); } else { result = *o; }
  17. }}
  18. / o : postfix (ws <code>)? {{
  19. if (ccount) { result = ast_with_code(c(0), *o); } else { result = *o; }
  20. }}
  21. / o : group (ws <code>)? {{
  22. if (ccount) { result = ast_with_code(c(0), *o); } else { result = *o; }
  23. }}
  24. ~prefix = and ws o : group {{ result = ast_And(*o); }}
  25. / not ws o : group {{ result = ast_Not(*o); }}
  26. ~and = "&"
  27. ~not = "!"
  28. ~postfix = o : group ws star {{ result = ast_Star(*o); }}
  29. / o : group ws plus {{ result = ast_Plus(*o); }}
  30. / o : group ws option {{ result = ast_Option(*o); }}
  31. ~star = "*"
  32. ~plus = "+"
  33. ~option = "?"
  34. ~group = "(" ws o : body ws ")" {{ result = *o; }}
  35. / "<" ws o : body ws ">" {{ result = ast_Capture(*o); }}
  36. / o : final {{ result = *o; }}
  37. ~final = o : nonterminal {{ result = *o; }}
  38. / o : literal {{ result = *o; }}
  39. / o : cclass {{ result = *o; }}
  40. / o : dot {{ result = *o; }}
  41. ~nonterminal = <name> ws ":" ws <name> {{
  42. result = ast_AliasedNontm(c(1), c(0));
  43. }}
  44. / <name> {{
  45. result = ast_Nontm(c(0));
  46. }}
  47. ~literal = "\"" <(&"\\" . . / !"\"" .)*> "\"" {{
  48. result = ast_Literal(c(0));
  49. }}
  50. ~cclass = "[" <(&"\\" . . / !"]" .)*> "]" {{
  51. result = ast_CClass(c(0));
  52. }}
  53. ~dot = "." {{ result = ast_dot(); }}
  54. ~code = "{{" (&"\\" . . / !"}}" .)* "}}"

This grammar was first written symbolically (in src/peg/pegspec.c) but the
current parser.c was generated directly from this text.

Explanation

There are many resources for PEGs online to explain the basic stuff. Otherwise:

  • Semantic actions are delimited by {{ ... }}.
  • Rule definitions are prefixed with ~ but are referenced with their base name
    (this makes the grammar simpler).
  • Nonterminals can be referenced in semantic actions with the syntax alias : nonterminal. A single nonterminal node may parse multiple times (if its
    within a * or + operator), so alias is an array of the AST type. Its
    length is given by countalias. Results are stored by assigning result.
  • You can capture by bracketing with < ... >. Captures are anonymous and done
    sequantially. The n-th capture (of type uint8_t*) is given by c(n), its
    start index s(n), and its end index e(n). The number of captures is given by
    ccount.
  • quxer employs a visitor pattern, that is, semantic actions are not evaluated
    until parsing is complete. Consequently semantic actions cannot provide live
    updates on the parser’s progress.
  • When all is said and done, you can run your parser with parse_file(char* filename). This links against internal.h which will start parsing at the
    first grammar rule.

    TODO

  • Add failure traces
  • Add support for live actions