项目作者: goodmami

项目描述 :
Parsing Expressions
高级语言: Python
项目地址: git://github.com/goodmami/pe.git
创建时间: 2020-02-18T13:01:49Z
项目社区:https://github.com/goodmami/pe

开源协议:MIT License

下载



pe logo


Parsing Expressions


PyPI link
Python Support
tests


pe is a library for parsing expressions, including parsing
expression grammars
(PEGs). It aims to join the expressive power of
parsing expressions with the familiarity of regular expressions. For
example:

  1. >>> import pe
  2. >>> pe.match(r'"-"? [0-9]+', '-38') # match an integer
  3. <Match object; span=(0, 3), match='-38'>

A grammar can be used for more complicated or recursive patterns:

  1. >>> float_parser = pe.compile(r'''
  2. ... Start <- INTEGER FRACTION? EXPONENT?
  3. ... INTEGER <- "-"? ("0" / [1-9] [0-9]*)
  4. ... FRACTION <- "." [0-9]+
  5. ... EXPONENT <- [Ee] [-+]? [0-9]+
  6. ... ''')
  7. >>> float_parser.match('6.02e23')
  8. <Match object; span=(0, 7), match='6.02e23'>

Quick Links

Features and Goals

  • Grammar notation is backward-compatible with standard PEG with few extensions
  • A specification describes the semantic
    effect of parsing (e.g., for mapping expressions to function calls)
  • Parsers are often faster than other parsing libraries, sometimes by
    a lot; see the benchmarks
  • The API is intuitive and familiar; it’s modeled on the standard
    API’s re module
  • Grammar definitions and parser implementations are separate

Syntax Overview

pe is backward compatible with standard PEG syntax and it is
conservative with extensions.

  1. # terminals
  2. . # any single character
  3. "abc" # string literal
  4. 'abc' # string literal
  5. [abc] # character class
  6. # repeating expressions
  7. e # exactly one
  8. e? # zero or one (optional)
  9. e* # zero or more
  10. e+ # one or more
  11. e{5} # exactly 5
  12. e{3,5} # three to five
  13. # combining expressions
  14. e1 e2 # sequence of e1 and e2
  15. e1 / e2 # ordered choice of e1 and e2
  16. (e) # subexpression
  17. # lookahead
  18. &e # positive lookahead
  19. !e # negative lookahead
  20. # (extension) capture substring
  21. ~e # result of e is matched substring
  22. # (extension) binding
  23. name:e # bind result of e to 'name'
  24. # grammars
  25. Name <- ... # define a rule named 'Name'
  26. ... <- Name # refer to rule named 'Name'
  27. # (extension) auto-ignore
  28. X < e1 e2 # define a rule 'X' with auto-ignore

Matching Inputs with Parsing Expressions

When a parsing expression matches an input, it returns a Match
object, which is similar to those of Python’s
re module for regular
expressions. By default, nothing is captured, but the capture operator
(~) emits the substring of the matching expression, similar to
regular expression’s capturing groups:

  1. >>> e = pe.compile(r'[0-9] [.] [0-9]')
  2. >>> m = e.match('1.4')
  3. >>> m.group()
  4. '1.4'
  5. >>> m.groups()
  6. ()
  7. >>> e = pe.compile(r'~([0-9] [.] [0-9])')
  8. >>> m = e.match('1.4')
  9. >>> m.group()
  10. '1.4'
  11. >>> m.groups()
  12. ('1.4',)

Value Bindings

A value binding extracts the emitted values of a match and associates
it with a name that is made available in the Match.groupdict()
dictionary. This is similar to named-capture groups in regular
expressions, except that it extracts the emitted values and not the
substring of the bound expression.

  1. >>> e = pe.compile(r'~[0-9] x:(~[.]) ~[0-9]')
  2. >>> m = e.match('1.4')
  3. >>> m.groups()
  4. ('1', '4')
  5. >>> m.groupdict()
  6. {'x': '.'}

Actions

Actions (also called “semantic actions”) are callables that transform
parse results. When an arbitrary function is given, it is called as
follows:

  1. func(*match.groups(), **match.groupdict())

The result of this function call becomes the only emitted value going
forward and all bound values are cleared.

For more control, pe provides the Action class and a number of
subclasses for various use-cases. These actions have access to more
information about a parse result and more control over the
match. For example, the Pack class takes a function and calls it
with the emitted values packed into a list:

  1. func(match.groups())

And the Join class joins all emitted strings with a separator:

  1. func(sep.join(match.groups()), **match.groupdict())

Auto-ignore

The grammar can be defined such that some rules ignore occurrences of
a pattern between sequence items. Most commonly, this is used to
ignore whitespace, so the default ignore pattern is simple whitespace.

  1. >>> pe.match("X <- 'a' 'b'", "a b") # regular rule does not match
  2. >>> pe.match("X < 'a' 'b'", "a b") # auto-ignore rule matches
  3. <Match object; span=(0, 3), match='a b'>

This feature can help to make grammars more readable.

Example

Here is one way to parse a list of comma-separated integers:

  1. >>> from pe.actions import Pack
  2. >>> p = pe.compile(
  3. ... r'''
  4. ... Start <- "[" Values? "]"
  5. ... Values <- Int ("," Int)*
  6. ... Int < ~( "-"? ("0" / [1-9] [0-9]*) )
  7. ... ''',
  8. ... actions={'Values': Pack(list), 'Int': int})
  9. >>> m = p.match('[5, 10, -15]')
  10. >>> m.value()
  11. [5, 10, -15]

Similar Projects