|
|||||
| | |||||
parsing definitions
recursive descent
backtracking
left recursive productions
LL(1)
context free language: a BNF grammar can be used to express context-free
langauges. most constructs in modern programming languages can be represented
in BNF.
myowndictionary and parsing
parsing is taking a string and decomposing it.
you need to do two things to parse something
1. decompose it
2. add the symbols between the tokens generated in (1) from the decomposition
by doing both of those things, you can
... make some rules ............
and recombine the original string somehow.
i have done that on myowndictionary.com
tokens: spaces, periods in fact, the following structure
if ($from_lang=="english") $split_pattern = "/((?:[\.!\s\?,:-]|\\\")+)/";
which splits on spaces, literal periods, exclamation points, literal question
marks, commas, semicolons, colons, hyphens, double quotes.
(one or more of the above)
it goes over a paragraph ....
it then generates the data structure of splits and words (i.e. in computer
terms, "tokens")
.
then, there are rules to recombine the original string somehow.
i make a new string, which is the original one, plus the HREF tags which
provide definitions of the words when a word is put over it.
most of the "application logic" comes from the database. a database provides
for the rule to rewrite the tokens.
lex
it does the above
you can generate C-code for each match
yacc
yacc is Baucus-naru form.
that means, as the parser proceeds, the tokens which it leaves behind .... if
there are 2 or more of them, they will
be combined and then the parser will proceed.
in that case, the string is reduced to nothing but a single token at the end
of the processing.
(sounds like making a calculator application, not processing natural language
in any way)
i need a bitwise parser which compares the value to the database values.
the regular expression matches of the language used against the database
will take care of gobbling up the parsing sequence.
if there is nothing else,
also if there is a match for anything in the "notes" fields,
their will have to be a regular expression match on the bit-wise type
to check for any of those fields.
| Leave a Reply |