Welcome to ShenZhenJia Knowledge Sharing Community for programmer and developer-Open, Learning and Share
menu search
person
Welcome To Ask or Share your Answers For Others

Categories

For last few weeks, I am trying to write a parser for bibtex (http://www.bibtex.org/Format/) file using flex and bison.

$ cat raw.l
%{
#include "raw.tab.h" 
%}
value ["{][a-zA-Z0-9 .{} "\]*["}]
%%
[a-zA-Z]*               return(KEY);
"                          return(QUOTE);
{                          return(OBRACE);
}                          return(EBRACE);
;                           return(SEMICOLON);
[ ]+                  /* ignore whitespace */;
{value}     {
    yylval.sval = malloc(strlen(yytext));
    strncpy(yylval.sval, yytext, strlen(yytext));
    return(VALUE);
}

$ cat raw.y
%{
#include <stdio.h>
%}

//Symbols.
%union
{
 char *sval;
};
%token <sval> VALUE
%token KEY
%token OBRACE
%token EBRACE
%token QUOTE
%token SEMICOLON 

%start Entry
%%

Entry:
     '@'KEY OBRACE VALUE ',' 
     KeyVal
     EBRACE
     ;

KeyVal:
      /* empty */
      | KeyVal '=' VALUE ','
      | KeyVal '=' VALUE 
      ;
%%

int yyerror(char *s) {
  printf("yyerror : %s
",s);
}

int main(void) {
  yyparse();

}

%% A sample bibtex is:

@Book{a1,
    author = "a {"m}ook, Rudra Banerjee",
    Title="ASR",
    Publisher="oxf",
    Year="2010",
    Add="UK",
    Edition="1",
}
@Article{a2,
    Author="Rudra Banerjee",
    Title="Fe{"Ni}Mo",
    Publisher={P{"R}B},
    Issue="12",
    Page="36690",
    Year="2011",
    Add="UK",
    Edition="1",
}

When I am trying to parse it, its giving syntax error. with GDB, it shows it expect fields in KEY to be declared(probably),

Reading symbols from /home/rudra/Programs/lex/Parsing/a.out...done.
(gdb) Undefined command: "".  Try "help".
(gdb) Undefined command: "Author".  Try "help".
(gdb) Undefined command: "Editor".  Try "help".
(gdb) Undefined command: "Title".  Try "help".
.....

I will be grateful if someone kindly help me on this.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
thumb_up_alt 0 like thumb_down_alt 0 dislike
795 views
Welcome To Ask or Share your Answers For Others

1 Answer

Lots of problems. First, your lexer is confused, trying to recognize quoted strings and braced things as a single VALUE as well as trying to recognize single characters like " and {. For quotes, it makes sense to have the lexer recognize the whole string, but for structural things that you want to parse (like braced lists), you need to return single tokens for the parser to parse. Second, when allocating space for a string, you aren't allocating space for a NUL-terminiator. Finally, your grammar looks odd, wanting parse things like = VALUE = VALUE as a KeyValue, which doesn't correspond to anything in a bibtex file.

So first, for the lexer. You want to recognize quoted strings and identifiers, but other things should be single characters:

[A-Za-z][A-Za-z0-9]*      { yylval.sval = strdup(yytext); return KEY; }
"([^"]|\.)*"          { yylval.sval = strdup(yytext); return VALUE; }
[ 
]                   ; /* ignore whitespace */
[{}@=,]                   { return *yytext; }
.                         { fprintf(stderr, "Unrecognized character %c in input
", *yytext); }

Now you need a parser for the entries:

Input: /* empty */ | Input Entry ;  /* input is zero or more entires */
Entry: '@' KEY '{' KEY ',' KeyVals '}' ;
KeyVals: /* empty */ | KeyVals KeyVal ; /* zero or more keyvals */
KeyVal: KEY '=' VALUE ',' ;

That should parse the example you give.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
thumb_up_alt 0 like thumb_down_alt 0 dislike
Welcome to ShenZhenJia Knowledge Sharing Community for programmer and developer-Open, Learning and Share
...