c - Elementary XML parser -
i trying write elementary xml parser in c, without using non-standard libraries, able to:
- detect several different tags
- detect empty tag
- detect tag mismatch
the main problem have how differ which: beginning of tag, content , ending of tag.
my idea implement finite-state machine while reading file in order know reading.
please tell me ideas , correct me if pointed wrong direction.
edit: added chunk of code detects elements , content
char tmp, buff = -1; char *content = (char*) malloc(sizeof(char) * (size + 1)); int stage = -1; int = 0; while((tmp = fgetc(file)) != eof) { if(tmp == '<') { if(stage == 2 && buff != '>'){ printf("content: "); printcont(content,i); } stage = 1; buff = tmp; = 0; continue; }else if(tmp == '/' && buff == '<') { stage = 3; buff = tmp; = 0; continue; } else if(tmp == '>') { if (stage == 1) { printf("tag_start: "); } else if (stage == 3) { printf("tag_end: "); } else if (stage == 2) { printf("content: "); } buff = tmp; printcont(content,i);//reads contnet stage = 2; = 0; continue; } if(tmp != ' ' && tmp != '\n' && tmp != '\t') {//simple filter content[i] = tmp; buff = tmp; i++; } }
i greatful if comment me on code above , tell me how improve it. far detects tags , content, needed in first place.
an fsm, itself, not enough. need 1 break text tokens specified xml spec, you'll need use other techniques recognize valid xml (or reject invalid xml).
you'll need write basic recursive descent parser take tokens , use them recognize valid xml.
this sounds basic enough assignment don't have worry 80% of what's in xml spec, make sure understand start tags , end tags. so, going non-trivial amount of work.
Comments
Post a Comment