Curbing ANTLR4 greediness (Building ANTLR4 Grammar for existing DSL) -
i have dsl , build antlr4 grammar it.
here exaple of dsl:
rule isc { true when o_m in [5, 6, 17, 34] false in other cases } rule iscontract { true when o_c in ['xx','xy','yy'] false in other cases } rule isfixed { true when f3 ==~ '.*/.*/.*-f.*/.*' false in other cases } rule temp[1].future { false when o_of in ['c','p'] true in other cases } rule temp[0].scale { 10 when o_m == 5 && o_c in ['yx'] 1 in other cases } how dsl parsed using regular expressions have became total mess - grammar needed.
the way works following: extracts left (before when) , right parts , they're evaluated groovy.
i still have evaluated groovy, organize parsing process using grammar. so, in essence, need extract these left , right parts using kind of wildcards.
i unfortunatelly cannot figure out how that. here have far:
grammar ruledsl; rules: basic_rule+ eof; basic_rule: 'rule' rule_name '{' condition_expr+ '}'; name: char+; list_index: '[' digit+ ']'; name_expr: name list_index*; rule_name: name_expr ('.' name_expr)*; condition_expr: when_condition_expr | otherwise_condition_expr; condition: .*?; result: .*?; when_condition_expr: result when condition; otherwise_condition_expr: result in_other_cases; when: 'when'; in_other_cases: 'in other cases'; digit: '0'..'9'; char: 'a'..'z' | 'a'..'z'; symbol: '?' | '!' | '&' | '.' | ',' | '(' | ')' | '[' | ']' | '\\' | '/' | '%' | '*' | '-' | '+' | '=' | '<' | '>' | '_' | '|' | '"' | '\'' | '~'; // whitespace , comments ws: [ \t\r\n\u000c]+ -> skip; comment: '/*' .*? '*/' -> skip; this grammar "too" greedy, , 1 rule processed. mean, if listen parsing with
@override public void enterbasic_rule(basic_rulecontext ctx) { system.out.println("entering rule"); } @override public void exitbasic_rule(basic_rulecontext ctx) { system.out.println(ctx.gettext()); system.out.println("leaving rule"); } i have following output
entering rule -- tons of text leaving rule how can make less greedy, if parse given input, i'll 5 rules? greediness comes condition , result suppose.
update: turned out skipping whitespaces wasn't best idea, after while ended following: link gist
thanks 280z28 hint!
instead of using .*? in parser rules, try using ~'}'* ensure rules won't try read past end of rule.
also, skip whitespace in lexer use char+ , digit+ in parser rules. means following equivalent:
rule temp[1].futurerule t e m p [ 1 ] . f u t u r e
beyond that, made in other cases single token instead of 3, following not equivalent:
true in other cases true in other cases you should start making following lexer rules, , making char , digit rules fragment rules:
id : char+; int : digit+;
Comments
Post a Comment