Curbing ANTLR4 greediness (Building ANTLR4 Grammar for existing DSL) -


i have dsl , build antlr4 grammar it.

here exaple of dsl:

rule isc {     true  when o_m in [5, 6, 17, 34]     false in other cases }  rule iscontract {     true  when o_c in ['xx','xy','yy']     false in other cases }  rule isfixed {     true  when f3 ==~ '.*/.*/.*-f.*/.*'     false in other cases }  rule temp[1].future {     false when o_of in ['c','p']     true  in other cases }  rule temp[0].scale {     10 when o_m == 5 && o_c in ['yx']     1  in other cases  } 

how dsl parsed using regular expressions have became total mess - grammar needed.

the way works following: extracts left (before when) , right parts , they're evaluated groovy.

i still have evaluated groovy, organize parsing process using grammar. so, in essence, need extract these left , right parts using kind of wildcards.

i unfortunatelly cannot figure out how that. here have far:

grammar ruledsl;  rules: basic_rule+ eof;  basic_rule: 'rule' rule_name '{' condition_expr+ '}';  name: char+; list_index: '[' digit+ ']'; name_expr: name list_index*; rule_name: name_expr ('.' name_expr)*;  condition_expr: when_condition_expr | otherwise_condition_expr;  condition: .*?; result: .*?; when_condition_expr: result when condition;  otherwise_condition_expr: result in_other_cases;  when: 'when'; in_other_cases: 'in other cases';   digit: '0'..'9'; char: 'a'..'z' | 'a'..'z'; symbol: '?' | '!' | '&' | '.' | ',' | '(' | ')' | '[' | ']' | '\\' | '/' | '%'        | '*' | '-' | '+' | '=' | '<' | '>' | '_' | '|' | '"' | '\'' | '~';   // whitespace , comments  ws: [ \t\r\n\u000c]+ -> skip; comment: '/*' .*? '*/' -> skip; 

this grammar "too" greedy, , 1 rule processed. mean, if listen parsing with

@override public void enterbasic_rule(basic_rulecontext ctx) {     system.out.println("entering rule"); }  @override public void exitbasic_rule(basic_rulecontext ctx) {     system.out.println(ctx.gettext());     system.out.println("leaving rule"); } 

i have following output

entering rule -- tons of text leaving rule 

how can make less greedy, if parse given input, i'll 5 rules? greediness comes condition , result suppose.


update: turned out skipping whitespaces wasn't best idea, after while ended following: link gist

thanks 280z28 hint!

instead of using .*? in parser rules, try using ~'}'* ensure rules won't try read past end of rule.

also, skip whitespace in lexer use char+ , digit+ in parser rules. means following equivalent:

  1. rule temp[1].future
  2. rule t e m p [ 1 ] . f u t u r e

beyond that, made in other cases single token instead of 3, following not equivalent:

true  in other cases true  in  other cases 

you should start making following lexer rules, , making char , digit rules fragment rules:

id : char+; int : digit+; 

Comments

Popular posts from this blog

How to mention the localhost in android -

php - Calling a template part from a post -

c# - String.format() DateTime With Arabic culture -