CS 101 Description of the JavaCC Grammar File
CS 101 Description of the JavaCC Grammar File This web page contains the complete syntax of Java Compiler Compiler grammar files with detailed explanations of each construct. Tokens in the grammar files follow the same conventions as for Java. Hence identifiers, strings, characters, etc. used in the grammars are the same as Java identifiers, Java strings, Java characters, etc. White space in the grammar files also follows the same conventions as for Java. This includes the syntax for comments. Most comments present in the grammar files are generated into the generated parser/lexical analyzer. Grammar files are preprocessed for Unicode escapes just as Java files are (i.e., occurrences of strings such as uxxxx - where xxxx is a hex value - are converted the the corresponding Unicode character before lexical analysis). Exceptions to the above rules: The Java operators "", "", "", "=", "=", and "=" are left out of Java Compiler Compiler's input token list in order to allow convenient nested use of token specifications. Finally, the following are the additional reserved words in the Java Compiler Compiler grammar files. EOF IGNORE_CASE JAVACODE LOOKAHEAD MORE options PARSER_BEGIN PARSER_END SKIP SPECIAL_TOKEN TOKEN TOKEN_MGR_DECLS Any Java entities used in the grammar rules that follow appear italicized with the prefix java_ (e.g., java_compilation_unit). javacc_input ::= javacc_options "PARSER_BEGIN" "(" IDENTIFIER ")" java_compilation_unit "PARSER_END" "(" IDENTIFIER ")" ( production )* EOF The grammar file starts with a list of options (which is optional). This is then followed by a Java compilation unit enclosed between "PARSER_BEGIN(name)" and "PARSER_END(name)". After this is a list of grammar productions. Options and productions are described later. The name that follows "PARSER_BEGIN" and "PARSER_END" must be the same and this identifies the name of the generated parser. For example, if name is "MyParser", then the following files are generated: MyP: The generate parser. MyParserTokenM: The generated token manager (or scanner/lexical analyzer). MyParserC: A bunch of useful constants. Other files such as "T", "ParseE", etc. are also generated. However, these files contain boilerplate code and are the same for any grammar and may be reused across grammars. Between the PARSER_BEGIN and PARSER_END constructs is a regular Java compilation unit (a compilation unit in Java lingo is the entire contents of a Java file). This may be any arbitrary Java compilation unit so long as it contains a class declaration whose name is the same as the name of the generated parser ("MyParser" in the above example). Hence, in general, this part of the grammar file looks like: PARSER_BEGIN(parser_name) . . . class parser_name . . . { . . . } . . . PARSER_END(parser_name) JavaCC does not perform detailed checks on the compilation unit, so it is possible for a grammar file to pass through JavaCC and generate Java files that produce errors when they are compiled. If the compilation unit includes a package declaration, this is included in all the generated files. If the compilation unit includes imports declarations, this is included in the generated parser and token manager files. The generated parser file contains everything in the compilation unit and in addition contains the generated parser code that is included at the end of the parser class. For the above example, the generated parser will look like: . . . class parser_name . . . { . . . // generated parser is inserted here. } . . . The generated parser includes a public method declaration corresponding to each non-terminal (see javacode_production and bnf_production) in the grammar file. Parsing with respect to a non-terminal is achieved by calling the method corresponding to that non-terminal. Unlike yacc, there is no single start symbol in JavaCC - one can parse with respect to any non- terminal in the grammar. The generated token manager provides one public method: Token getNextToken() throws ParseError; For more details on how this method may be used, please read the description of the Java Compiler Compiler API. javacc_options ::= [ "options" "{" ( option_binding )* "}" ] The options if present, starts with the reserved word "options" followed by a list of one or more option bindings within braces. Each option binding specifies the setting of one option. The same option may not be set multiple times. Options may be specified either here in the grammar file, or from the command line. If the option is set from the command line, that takes precedence. Option names are not case-sensitive. option_binding ::= "LOOKAHEAD" "=" java_integer_literal ";" | "CHOICE_AMBIGUITY_CHECK" "=" java_integer_literal ";" | "OTHER_AMBIGUITY_CHECK" "=" java_integer_literal ";" | "STATIC" "=" java_boolean_literal ";" | "DEBUG_PARSER" "=" java_boolean_literal ";" | "DEBUG_LOOKAHEAD" "=" java_boolean_literal ";" | "DEBUG_TOKEN_MANAGER" "=" java_boolean_literal ";" | "OPTIMIZE_TOKEN_MANAGER" "=" java_boolean_literal ";" | "ERROR_REPORTING" "=" java_boolean_literal ";" | "JAVA_UNICODE_ESCAPE" "=" java_boolean_literal ";" | "UNICODE_INPUT" "=" java_boolean_literal ";" | "IGNORE_CASE" "=" java_boolean_literal ";" | "USER_TOKEN_MANAGER" "=" java_boolean_literal ";" | "USER_CHAR_STREAM" "=" java_boolean_literal ";" | "BUILD_PARSER" "=" java_boolean_literal ";" | "BUILD_TOKEN_MANAGER" "=" java_boolean_literal ";" | "SANITY_CHECK" "=" java_boolean_literal ";" | "FORCE_LA_CHECK" "=" java_boolean_literal ";" | "COMMON_TOKEN_ACTION" "=" java_boolean_literal ";" | "CACHE_TOKENS" "=" java_boolean_literal ";" | "OUTPUT_DIRECTORY" "=" java_string_literal ";" LOOKAHEAD: The number of tokens to look ahead before making a decision at a choice point during parsing. The default value is 1. The smaller this number, the faster the parser. This number may be overridden for specific productions within the grammar as described later. See the description of the lookahead algorithm for complete details on how lookahead works. CHOICE_AMBIGUITY_CHECK: This is an integer option whose default value is 2. This is the number of tokens considered in checking choices of the form "A | B | ..." for ambiguity. For example, if there is a common two token prefix for both A and B, but no common three token prefix, (assume this option is set to 3) then JavaCC can tell you to use a lookahead of 3 for disambiguation purposes. And if A and B have a common three token prefix, then JavaCC only tell you that you need to have a lookahead of 3 or more. Increasing this can give you more comprehensive ambiguity information at the cost of more processing time. For large grammars such as the Java grammar, increasing this number any further causes the checking to take too much time. OTHER_AMBIGUITY_CHECK: This is an integer option whose default value is 1. This is the number of tokens considered in checking all other kinds of choices (i.e., of the
Geschreven voor
- Instelling
- CS 101 Description Of The JavaCC Grammar File
- Vak
- CS 101 Description Of The JavaCC Grammar File
Documentinformatie
- Geüpload op
- 13 februari 2024
- Aantal pagina's
- 14
- Geschreven in
- 2023/2024
- Type
- Tentamen (uitwerkingen)
- Bevat
- Vragen en antwoorden
Onderwerpen
-
cs 101 description of the javacc grammar file t