parser — Parser module

This module contains functions that parse strings into Grammar, Import, Rule and Expansion objects.

Supported functionality

The parser functions support the following:

  • Alternative sets, e.g. a|b|c.
  • Alternative set weights (e.g. /10/ a | /20/ b | /30/ c).
  • C++ style single/in-line and multi-line comments (// ... and /* ... */ respectively).
  • Import statements.
  • Optional groupings, e.g. [this is optional].
  • Public and private/hidden rules.
  • Required groupings, e.g. (a b c) | (e f g).
  • Rule references, e.g. <command>.
  • Sequences, e.g. run <command> [now] [please].
  • Single or multiple JSGF tags, e.g. text {tag1} {tag2} {tag3}.
  • Special JSGF rules <NULL> and <VOID>.
  • Unary kleene star and repeat operators (* and +).
  • Using Unicode alphanumeric characters for names, references and literals.
  • Using semicolons or newlines interchangeably as line delimiters.

Limitations

This parser will fail to parse long alternative sets due to recursion depth limits. The simplest workaround for this limitation is to split long alternatives into groups. For example:

// Raises an error.
<n> = (0|...|100);

// Will not raise an error.
// As a side note, this will be parsed to '(0|...|100)'.
<n> = (0|...|50)|(51|...|100);

This workaround could be done automatically in a future release.

This limitation also applies to long sequences, but it is much more difficult to reach the limit.

Extended Backus–Naur form

Extended Backus–Naur form (EBNF) is a notation for defining context-free grammars. The following is the EBNF used by pyjsgf’s parsers:

alphanumeric = ? any alphanumeric Unicode character ? ;
weight = '/' , ? any non-negative number ? , '/' ;
atom = [ weight ] , ( literal | '<' , reference name , '>' |
       '(' , exp , ')' | '[' , exp , ']' ) ;
exp = atom , [ { tag | '+' | '*' | exp | '|' , [ weight ] , exp } ] ;
grammar = grammar header , grammar declaration ,
          [ { import statement } ] , { rule definition } ;
grammar declaration = 'grammar' , reference name , line end ;
grammar header = '#JSGF', ( 'v' | 'V' ) , version , word ,
                 word , line end ;
identifier = { alphanumeric | special } ;
import name = qualified name , [ '.*' ] | identifier , '.*' ;
import statement = 'import' , '<' , import name  , '>' , line end ;
line end = ';' | '\n' ;
literal = { word } ;
qualified name = identifier , { '.' , identifier }  ;
version = ? an integer or floating-point number ? ;
reference name = identifier | qualified name ;
rule definition = [ 'public' ] , '<' , reference name , '>' , '=' ,
                  exp , line end ;
special = '+' | '-' | ':' | ';' | ',' | '=' | '|' | '/' | '$' |
          '(' | ')' | '[' | ']' | '@' | '#' | '%' | '!' | '^' |
          '&' | '~' | '\' ;
tag = '{' , { tag literal } , '}' ;
tag literal = { word character | '\{' | '\}' } ;
word = { word character } ;
word character = alphanumeric | "'" | '-' ;

I’ve not included comments for simplicity; they can be used pretty much anywhere. pyparsing handles that for us.

Functions

jsgf.parser.parse_expansion_string(s)

Parse a string containing a JSGF expansion and return an Expansion object.

Parameters:s – str
Returns:Expansion
Raises:ParseException, GrammarError
jsgf.parser.parse_grammar_file(path)

Parse a JSGF grammar file and a return a Grammar object with the defined attributes, name, imports and rules.

This method will not attempt to import rules or grammars defined in other files, that should be done by an import resolver, not a parser.

Parameters:path – str
Returns:Grammar
Raises:ParseException, GrammarError
jsgf.parser.parse_grammar_string(s)

Parse a JSGF grammar string and return a Grammar object with the defined attributes, name, imports and rules.

Parameters:s – str
Returns:Grammar
Raises:ParseException, GrammarError
jsgf.parser.parse_rule_string(s)

Parse a string containing a JSGF rule definition and return a Rule object.

Parameters:s – str
Returns:Rule
Raises:ParseException, GrammarError
jsgf.parser.valid_grammar(s)

Whether a string is a valid JSGF grammar string.

Note that this method will not return False for grammars that are otherwise valid, but have out-of-scope imports.

Parameters:s – str
Returns:bool