JavaScript monadic parser combinators
This is a tool for building parsers and parse, so that you do not have to be a parser expert to do it.
const myParser=
skip(char('#'))
.then(many(letter).join())
.skip(char('-'))
.then(digits.join().as(parseInt))
parse(">")(myParser)("#AN-123")
outputs:
Right { value: [ 'AN', 123 ] }
All parsers can chain up or group to form other parsers that still can chain up and group.
Some available metaparsers like many()
, some()
, skip()
can accept other parsers or metaparsers.
Some parsers are already a composition with metaparsers, that is the case of digits
, it will perform many(digit)
.
abbreviations
a single string can now be used in place of a non-chaining parser and it will translate either to a char
or string
parser.
digits.then('.')
is valid asdigits.then(char('.'))
.to(tag)
extender will grab the current parsing group result and store it on object key tag
if an object does not exist yet it is created, if there is already an object on the results tail it will be used.
const kchk=
string("temp: ")
.then(
option("",oneOf("-+"))
.then(digits)
.join().as(parseInt).to("temp")
.then(char('K').to("unit"))
.verify(o=>o[0].unit==='K'&&o[0].temp>=0)
.failMsg("positive Kelvin!")
)
#>res(">")(kchk.parse("temp: 12K")).value
[ 'temp: ', { temp: 12, unit: 'K' } ]
or failing:
#>res(">")(kchk.parse("temp: -12K")).value
'>error, expecting positive Kelvin! but found `-` here->-12K...'
this, along .verify
, .post
and .as
allow event callbacks and all sort of automation during the parsing, if not then let me know.
It’s now possible to parse this: enable/disable by config
Enable with config.backtrackExclusions=true
#>config.optimize=true//turn on optimizations on construction
#>config.backtrackExclusions=true//track exclusions on optimization
#>digits.join().as(parseInt).then(count(2,digit).join()).parse("12345")
Right { value: Pair { a: [ 123, '45' ], b: '' } }
.then
, .skip
and others can inject exclusion checks on the chain at construction time.
We allow the parser base to be re-writen at construction time, keeping away all checking from parse time.
many
will peek this injected parameters and possibly exclude them from the sequence match
one can still call
.optim
even with optimizations turned off
however backtrack will still respect its flagoptimization chain is not very populated yet, there are many things to fit in…
module exported variable
now on by default (>=1.2)
var config={
optimize:false,//all optimizations
backtrackExclusions: false//exclude next selector root from current loop match
}
optimize disable all optimizations when false
backtrackExclusions exclude next parser root from the current selection
backtrack can be dismissed for well writen parsers
(there is still a ling way to go here)
The chaining is done with .then
or .skip
, the first combines the output, while the second will drop it.
Provide an alternative value on parser fail, can modify any parser to have a default and do not fail.
Make the target parser optional, silently fail.
Parsers can alternate with .or
Take the target parser and collect a sequence of it, with optional separator and/or terminator. Also checks min and max (0->Infinity by default)
digit.seq()
=>many(digit)
digit.seq(null,null,1)
=>some(digit)
A shortcut to .seq, to parse a specific quantity of target parser
parser succeeds only if p
fails
predicated p
with no consume before parsing, if p
fails the parsing will fail
predicated p
before parsing, if p
succeedes the parsing will fail
must apply to same level parser. using .excluding(char(..))
at character level on a string level parser will have no effect
digits.excluding(oneOf("89"))//this will have no effect
many(digit.excluding(oneOf("89")))//but this will
if optimizatumizing with exclusion back-track, the the first will have effect
as PaCo will re-write the base to be exactly the second
Parse output can be formated with .as
, it will apply to the parser or group where inserted. .as
will accept an output transformer function.
Output transformations can stack up.
.join()
and .join(«sep»)
are shortcuts for .as(mappend)
and .as(o=>o.join(«sep»))
Parsers can group by nesting ex: x.then( y.then(z).join() )
, here the join
will only apply to the (y.z) results.
TODO: this (grouping) is not fully generalized yet
Put the result inside a list (same as as(o=>[o])
)
.verify(func,msg)
function func
will receive the parse group result (list) and should return true
if approved or false
to resume in error with message msg
.
.post(f)
post-processing the result, this is still a static parser definition. Function f
return will replace the previous result.
.onFailMsg(msg)
provides a message for a failing parser
.parse("...")
can be used to quick feed a string to any parser.
The result will include both input and output state.
ex:
digits.parse("123a")
use parse
function to get only output
all transformation definitions should be applyed to the parser and not to the result, so .parse
should be the last item of the group.
a parser can be stored, combined, passed around and perform parsing on many contents many times, all transitory state is kept outside.
this parse will fail as it expects at least one digit
#>parse(">")(some(digit))("#123")
Left { value: 'error, expecting digit but found `#` here->#123' }
parse(">")(
many(
some(digit.or(letter)).join()
.skip(spaces)
).join("-")
)("As armas e os baroes")
expected result
Right { value: [ 'As-armas-e-os-baroes' ] }
const nr=
skip(spaces)
.then(digits).join().as(parseInt)//get first digits as number
.then(many(//then seek many separated by `,` or '|'
skip(spaces)
.skip(char(',').or(char('|')))//drop the separators (not included in output)
.skip(spaces)
.then(digits.join().as(parseInt))
)).as(foldr1(a=>b=>a+b))//transform output by adding all values
parse(">")(nr)(" 12 , 2 | 1")
expected result
Right { value: [ 15 ] }
above parser could be writen using sepBy
, we were just emphasizing the combinatory
satisfy(f) uses a function char->bool
to evaluate a character
char(c) matches charater c
cases(c) case insensitive character c
match
oneOf(“…”) matches any given string character
noneOf(“…”) matches any character not included in string
range(a,z) matches characters between the given ones (inclusive)
digit any digit 0-9
lower lower case letters a-z
upper upper case letters A-Z
letter any letter a-z
or A-Z
alphaNum letter or digit
hexDigit hexadecimal digit
octDigit octal digit
space single space
tab single tab
nl newline
cr carriage return
blank tab or space
spaces optional many space
blanks optional many white space
spaces1 one or more spaces -> use some(space)
blanks1 one or more white spaces -> use some(blank)
digits optional many digits
eof end of file
string(“…”) match with given string
cis(“…”) non case-sensitive string match
regex(expr) match with regex expression
#>parse(">")(regex("#([a-zA-Z]+)[ -]([0-9]+)"))("#an-123...")
Right { value: [ 'an', '123' ] }
skip(…) ignore the group/parser output
many(p) optional many ocourences or parser p
targets. This parser never fails as it can return an empty list.
some(p) one or more ocourences of parser p
targets
manyTill(p,end) one or more ocourences of parser p
terminating with parser end
optional(p) parse p
if present, otherwise ignore and continue parsing
choice[ps] parse from a list of alternative parsers, this is just an abbreviation of .or
sequence.
count(n)(p) parses n
ocourences of p
between(open)(close)(p) parses p
surounded by open
and close
, dropping the delimiters.
Be sure to exclude the delimiters from the content or provide any other meaning of content end
#>parse(">")(between(space,space,some(noneOf(" "))).join())(" ab.12 ")
Right { value: [ 'ab.12' ] }
p
or returns x
if it fails, this parser never fails.
#>parse(">")(option(["0"])(digit))("1")
Right { value: [ '1' ] }
#>parse(">")(option(["0"])(digit))("")
Right { value: [ '0' ] }
#>parse(">")(option(["0"])(digit))("#")
Right { value: [ '0' ] }
optionMaybe(p) parse p
and returns Just
the result or Nothing
if it fails, this parser never fails
sepBy(p)(sep) parses zero or more ocourences of p
separated by sep
and droping the separators, this parser never fails.
sepBy1(p)(sep) parses one or more ocourences of p
separated by sep
and droping the separators, this parser never fails.
endBy(p)(sep)(end) parses zero or more ocourences of p
separated by sep
droping the separators and terminating with end
endBy1(p)(sep)(end) parses one or more ocourences of p
separated by sep
droping the separators and terminating with end
none non-consume happy parser.
none is an identity parser, will just output the given input as a successful parse. So it never fails or consumes.
We use it to turn binary combinators into unary metaparsers. That is the case of.skip(...)
, it uses thenone
parser to be available as a unary modifierskip()
.none
can do so for any binary combinator and can apear where you want to disable a part.using
none
assep
withendBy(p,sep,end)
whill silentrly skip thesep
need.
Untill now, all failing parsers do not consume… lets see… while so, no need to inplement *try
to be more accurate, failing parsers do consume, we need the failing point on the reports, however the upper parser might pick the starting point to move on, ignoring the consume (as try do).
For now parsers accept a state pair of (input,output) and will return Either
:
*expect changes on this arguments format (changed on v1.1)
testing a simple parser
#>digits.run(Pair([],"123"))
Right { value: Pair { a: [ '1', '2', '3' ], b: '' } }
This is the basic form of parsing (feeding a parser).
However a parse
function is available, it will perform as the former but gives only output state or a fancy error message.
#>parse(">")(digits)("123")
Right { value: [ '1', '2', '3' ] }
Same with
#>digits.parse("123")
Right { value: Pair { a: [ '1', '2', '3' ], b: '' } }
the only difference is that this last one, as the first will give full output, including the input state.
parse(filename)(parser)(input string or stream)
the filename is merelly a decoration here, to be used on error report
#>parse(">")(letter.or(digit))("1")
Right { value: [ '1' ] }
#>parse(">")(letter.or(digit))("a")
Right { value: [ 'a' ] }
#>parse(">")(letter.or(digit))("#123")
Left {
value: 'error, expecting letter or digit but found `#` here->#123' }
direct parse
#>letter.or(digit).parse("1")
Right { value: Pair { a: '', b: [ '1' ] } }
#>letter.or(digit).parse("a")
Right { value: Pair { a: '', b: [ 'a' ] } }
#>letter.or(digit).parse("#123")
Left { value: Pair { a: '#123', b: 'letter or digit' } }
desugared parse
#>letter.or(digit).run(Pair("1",[]))
Right { value: Pair { a: '', b: [ '1' ] } }
#>letter.or(digit).run(Pair("a",[]))
Right { value: Pair { a: '', b: [ 'a' ] } }
#>letter.or(digit).run(Pair("#123",[]))
Left { value: Pair { a: '#123', b: 'letter or digit' } }
process a parser return to produce a result or error message, discarding input state description.
#>res(">")(letter.then(digits).parse("123"))
Left { value: '>error, expecting letter but found `1` here->1...' }
without res()
procesing
#>letter.then(digits).parse("123")
Left { value: Pair { a: '123', b: 'letter' } }
as a consequence of the error report system we got a parser description for free, no great effort was put to it thou
const p=
optional(skip(char('#')))
.then(some(letter).join())
.skip(char('-').or(spaces1))
.then(digits.join().as(parseInt))
description:
#>console.log(p.expect)
optional skip character `#`
then (at least one letter)->join()
skip character `-` or at least one space
then ((digits)->join())->as(parseInt)
using:
#>console.log(parse(">")(p)("#AN-123"))
Right { value: [ 'AN', 123 ] }
added:
.seq(...)
super parser.qt(min[,max])
parser modifyer (quantification), based on .seq().group()
as o=>[o].else()
provide alternativfe values for failing parsers (they will never fail then)Using character domain analysis to detect parser overlap
[0-9] ∩ ([0-9] ∪ [a-z])
<=> ([0-9] ∩ [0-9]) ∪ ([0-9] ∩ [a-z])
<=> ((∅)) ∪ (([0-9]))
<=> [0-9] ∪ ∅
<=> [0-9]
version 1.1 is a full re-write with focus on speed
many1
replaced by some
onFailMsg
replaced by failMsg
.run
instead of direct function call.some experiments with composition and parser analysis, coding was easy with no performance care.
this parser is inspired but not following “parsec”