Download (direct link):
Having explored, in a very general way, the requirements for putting a natural language “front end” on a database, we will now look in more detail at various modules within existing such systems and their implementation in currently working programs. We will investigate:
— The parser
— Extensibility of the grammar
— The English dictionary
— Spelling correction
— The data model
— Query language
— DBMS independence
— Topic independence
— General queries
— Referring back to the user
NATURAL LANGUAGE ENQUIRY 21
2. THE PARSER
The “parser” is the program that analyses the grammar of a sentence, English grammar provides the relevant context for understanding each function word in the English language (see section 1.1 above). In the example
“What age is Robert and what is his salary?”
the grammatical analysis yield two clauses, the first of which, “What age is Robert . ”, provides the context for the function word “is”. There is not a
sharp distinction between function words and content words, and a content word such as “horse” can also have different meanings in different contexts like a function word “Horse” can be a noun, as in “How fast is that horse?”; but it can also be a classifier, as in “horse fly” A horse fly is not, of course, a type of horse but a type of fly
English grammar, then, defines the function of each word in an English sentence.
From the descriptions of some currently implemented parsers it is very hard to tell what grammars they analyse. The LIFER system  is, however, exceptionally comprehensible, Thus to include the sentence
“What <noun-phrasel> is <noun-phrase2>?”
in the grammar, the LIFER parser simply requires a ‘production’:
SENTENCE -+ WHAT <NP1> IS <NP2> I ‘print NP1 of NPT
“NP1” and “NP2” are ‘non-terminals’, which have their own pioductionsf, The part after the vertical bar, ‘print NP1 of NP2\ is the meaning of the sentence, where NP1 is the meaning of noun-phrasel and NP2 is the meaning of noun-phrase2
In general the semantic part (i.e. the part after the vertical bar) in a LIFER production can be any expression that returns the meaning. This expression may be used to augment the grammar, since it can return the value *ERROR* which causes the parser to reject the production. The augmentation could be used to implement the applicability restrictions of section 1,3 above, Thus for
SENTENCE -* WHAT (NP1> IS <NP2> I expression
the “expression” might be used to check that NP2 referred to an entity, and NP1 referred to a property applicable to that entity.
} The actual syntax for defining productions to the LIFER paiser is more complicated.
22 NATURAL LANGUAGE ENQUIRY
The LIFER parser is a top-down, left-to-iight parser. When attempting to match the single production
SENTENCE -»• WHAT (NP1) IS <NP2> I expression onto an English sentence, the LIFER parser:
(1) Looks for the word “what”
(2) Looks for a noun-phrase, and extracts its meaning NP1 (,3) Looks for the word “is”
(4) Looks for another noun-phrase, and extracts its meaning NP2
(5) Evaluates the expression.
At any stage the match may fail. Concentrating too much “intelligence” into the expression may cause the parser to do a great deal of matching which ultimately fails because the expression yields the *ERROR* result. In the LIFER grammar for the US Navy’s LADDER system therefore, the applicability restrictions are built into the. grammar rather than the augmentation.
Thus four example productions for SENTENCE are:
This grammar has no noun-phrases, or verb-phrases, but rather a special set of categories for the particular task of providing a front end for the LADDER system. To avoid any unnecessary backtracking these productions are compiled into a “transition tree” which can be diagrammed as in Fig 2,1, The clarity of
design of the LIFER parser is self-evident The design also provides for excellent facilities to extend the grammar, a very clever spelling module, the power to deal with short follow-up questions, and a well-defined interface to the database.
LIFER’s transition trees are based on a simplification of the ‘Augmented Transition Network’ (“ATN”, ) We can redraw the transition tree as a transition network (Fig, 2.2). This network is a “simple” network. It has no cycles, and a
SENTENCE -> (PRESENT) THE (ATTRIBUTE) OF (SHIP) (PRESENT) (SHIP’S) (ATTRIBUTE)
HOW MANY (SHIP) ARE THERE
I el I e2 I e3
HOW MANY (SHIP) ARE THERE WITH (PROPERTY) I e4„
THE-* (ATTRIBUTE) —>OF-> (SHIP) -»el
(PRES ENT) —KSHIP’S) —> (ATTRIBUTE) ->> e2
HOW—»M ANY-»<SHIP)-> ARE-V THERE » e3
WITH-> (PROPERTY) -»e4
Fig, 2 1 - A transition tree.
Fig. 2.2 - A transition network.
NATURAL LANGUAGE ENQUIRY 23
24 NATURAL LANGUAGE ENQUIRY
single start and end node. In fact any unaugmented transition network is equivalent to a set of simple networks which, in turn, can be translated into a LIFER transition tree
An augmented transition network also includes a set of registers which can be set, tested and altered as the parser works its way through the network. When the ATN parser reaches a “subnet” (such as “*SHIP” in the network above), it does three things: