-
Natural language processing
-
Generation
-
Understanding
-
Syntactic Analysis
-
What is a language?
-
What should a grammar tell us?
-
A more complex grammar
-
Components of a syntactic analysis program
-
Additional grammar issues
-
Results of syntactic analysis
-
Semantic Analysis
-
Reference
-
Lexical Meaning
-
Relationship among entities
Natural language processing (NLP)
Why do it?
- to make interfaces easier to use (e.g., database queries)
- to summarize large volumes of text
- to translate text
- to do better information retrieval
- to enable alternative interfaces (e.g., speech)
Note that NLP is concerned with:
Generation
Natural language generation is a planning problem.
Need to:
- decide that a speech act is called for
- decide which speech act is the right one
- decide what information you want to convey
- make choices about vocabulary
- create language that is appropriate (i.e., syntactically correct
and unambiguous)
Speech acts are as follows:
- inform
- query
- answer
- request or command
- promise
- acknowledge
- share
Understanding
Goal: to determine what the speaker/writer is trying to say. Note
that we will consider only written text.
Need to:
- understand syntax - i.e., the structure of the text.
- understand semantics - i.e., a partial representation of the meaning
of the text
- pragmatics - i.e., the complete meaning of the text, determined by
using contextual information.
Note that we will focus on understanding, rather than
generation.
Syntactic Analysis
Why bother with syntactic analysis?
Because it helps us understand the roles played by different words in
a body of text. The words themselves are not enough. Consider
Innocent peacefully children sleep little vs
Innocent little children sleep peacefully
There is some evidence that human understanding of language is, in part,
based on structural analysis. Consider
"Twas brillig and the slithy toves did gyre and gimble in the wabe."
[Lewis Carroll]
Here we can understand the sentence on some level, even though most of
the words make no sense to us.
Colorless green ideas sleep furiously makes more sense to us than
Ideas green furiously colorless sleep
Finally, consider the sentence
The old dog the footsteps of the young.
In reading this sentence, you might have found yourself having to re-examine
the first part. In all likelihood, you thought that the word "dog"
was being used as a noun, when, in fact, it is a verb.
What is a language?
The most basic building blocks of language are words. Every
language has a large, but finite, set of words.
Words are formed into sentences. A sentence, then, is a well-formed
sequence of words.
The language is the set of all sentences that are well-formed, i.e.,
that follow a set of rules.
This set of rules is called a grammar.
To do syntactic analysis, we build a parser, i.e., a software
systems that checks whether the rules are followed and that provides
an analysis based on the grammar.
What should a grammar tell us?
As much information as possible about the structure of sentences.
For example, the grammar might tell us that some legal sentence structures
are as follows:
- noun verb noun (i.e., a noun followed by a verb followed by a noun), as in
Dogs chase cats.
- determiner noun verb determiner noun, as in
The cat ate the fish.
- det noun prep adj noun verb prep det noun, as in
The dog with one eye ran from the cat.
But none of these helps us determine the relationships between the
words.
We want a grammar to give us more structural information. So, for
example, some better grammar rules for sentences might be:
- Sentence -> NP VP (i.e., a sentence can be formed from a noun phrase
followed by a verb phrase).
- NP -> noun (a noun phrase can be just a single noun)
- NP -> det noun (a noun phrase can be a determiner followed by a noun)
- VP -> verb (a verb phrase can be just a single verb)
- VP -> verb NP (a verb phrase can be a verb followed by a noun phrase)
Now if we were given the sentence
The cat ate the fish
our grammar could help us determine the following structure, which
we call a parse tree:
This might be represented textually as:
(Sentence
(NP
(det the)
(noun cat))
(VP
(verb ate)
(NP
(det the)
(noun fish))))
A more complex grammar
We could add complexity to the grammar to allow for prepositional
phrases:
- Sentence -> NP VP
- NP -> noun
- NP -> det noun
- NP -> adj noun
- NP -> det noun PP (where PP means "prepositional phrase")
- VP -> verb
- VP -> verb NP
- VP -> verb PP
- PP -> prep NP
Now if we had the sentence
The dog with one eye ran from the cat., a parser could produce
the following parse tree:
Components of a syntactic analysis program
In order to perform syntactic analysis, we need
- a parser - i.e., a program that takes as input a sentence and produces
the analysis.
- a grammar - i.e., a set of rules that the parser can use.
- a lexicon - i.e., a dictionary of legal words and their parts of
speech
Note that semantic analysis is limited in the following ways:
- It relies on the grammar and lexicon -- if there are rules in the language
that are not specified in the grammar, certain sentences will not be
analyzed. If there are words that are not listed in the lexicon, they
will not be recognized as legal words.
- It will produces analyses of syntactically correct but meaningless
sentences.
Additional grammar issues
There's more to grammar than determining parts of speech and overall
sentence structure, including:
- subject-verb agreement
- distinguishing between transitive and intransitive verbs (i.e., those
that require a direct object and those that don't.)
Can augment the grammar and the lexicon to help us deal with these issues.
Results of syntactic analysis
Syntactic analysis helps us to identify sentence structure and relationships
between entities. This gives us
- clues on word meaning:
The nurses hand the doctors the scalpels.
The nurse's hand was bandaged.
- clues on overall phrase meaning:
Flying planes is dangerous.
Flying planes are dangerous.
Semantic Analysis
Semantic analysis includes:
- problems of reference
- lexical meaning
- relationships (modifier attachment, etc.)
Reference
Here the goal is to determine what words and phrases refer to in
the real world. Issues include
Determining the type of thing referred to. For example, noun
phrases can refer to:
- entities (e.g., semantics)
- objects (e.g., the desk)
- events (e.g., the lecture)
while verbs can refer to:
- events (e.g., I gave a talk in Vancouver.)
- states (e.g., It was hot on Tuesday.)
- relations e.g., I own a pet lovebird.
Definite vs indefinite reference. For example,
Your parser should be able to handle 10 sentences. (indefinite)
Your parser should be able to handle the 10 sentences I gave you.
(definite)
Generic vs instance. For example,
Of all the breeds of dogs, the dalmation is my favorite.
My friend Jane has two dogs. My favorite is the dalmation.
Anaphora, i.e., pronouns and definite reference. For example,
The lizard's tail fell off and three days later it had grown
a new one.
Quantification. For example,
Jane bought every girl a yellow T-shirt.
Lexical Meaning
A single word can have multiple meanings. For example, "fly" can mean:
(a) a winged insect, (b) a fish hook, (c) the action of flying, (d) a baseball
hit, (d) motion (as in "on the fly").
Kathy was so busy she ate her lunch on the fly.
Kathy was distracted and put her sandwich on the fly.
How can we do lexical disambiguation?
- Can use knowledge of context.
The waiter served the lasagna.
Venus Williams served first.
- One syntactic form can restrict another (called selectional restriction).
The gasoline killed all the flies.
If we have a rule that says that the object of the verb "kills" must be
animate, then we can determine the meaning of the noun "flies".
- Pure syntactic cues.
to keep
(1) meaning: continue to be
object: adjective phrase
(2) meaning: maintain
object: noun phrase
I kept calm. vs I keep a diary.
- Semantic association with nearby words, as in The dog's bark
woke me up. (not talking about tree bark here)
- Higher level inferences, as in I swung a hammer and the head flew
off.
Relationship among entities
Here the issue is determining the role played by each entity in a sentence.
Roles include:
- agent - carries out the action
- goal - entity that undergoes the action
- beneficiary - something that benefits from the action
- instrument - by which the action is done
- from-loc - location of origin
- to-loc - location to which an object goes
For example, I gave my students an assignment.
Syntax can sometimes signal a role.