What is the specific research problem used to measure the effectiveness of sentiment
analysis - answer- Uses a corpus of movie reviews where the rating associated with
each review is known. Hence, there is an objective measure of whether a review was
positive or negative.
- Pang et al. balances the corpus so it had 50% positive reviews and 50% negative
- Research problem is to assign sentiment automatically to each document in the entire
corpus to agree with the known ratings.
How may a bag of words technique for sentiment analysis be improved - answer- Using
a form of compositional semantics had positive results
- Neural network methods has shown high performance, it is hypothesized that they
induced some internal representation of syntactic and/or semantic structure.
- Many 'obvious' techniques such as accounting for negating words turn out not be so
good, especially as deeper parsing is necessary to determine the scope of negation
correctly.
Why is knowing trivial information both a difficult and essential problem for information
retrieval systems? - answerTrivial information is critical for human reasoning but tends
not to be explicitly stated anywhere, since humans find it trivial.
Why does limiting natural language interface systems to limited domains make it a
relatively easy problem? - answerIt removes a lot of ambiguity, e.g. LUNAR (the lunar
rock sample database querying system) only dealt with "rock" in the sense of the
material, never the music.
What is morphology? - answerThe study of the structure of words.
What is a morpheme? - answerThe minimal information carrying unit of a word
What is an affix? - answerMorphemes which can only occur in conjunction with other
morphemes, e.g. words that are made up of a stem and zero or more affixes.
What are the different types of affixes? Which occur in English? - answerPrefix, suffix,
infix, circumfix
English has pre- and suffix
,What does it mean for a certain linguistic construct to be "productive"? - answerIs
applied to new words
Define the difference between inflectional and derivational morphology? -
answerInflectional morphology can be thought of as setting values of slots in some
paradigm (i.e. there is a fixed set of slots which can be thought of as being filled with
simple values). Inflectional morphology concerns properties such as tense, aspect,
number, person, gender and case.
Derivational morphology have a broader range of semantic possibilities, in the that there
seems no principled limit on what they can mean and don't fit into neat paradigms.
What is derivational morphology? - answerDerivational affixes, such as un-, re-, anti-
etc, have a broader range of semantic possibilities (there seems no principled limit on
what they can mean) and don't fit into neat paradigms. Inflectional affixes may be
combined (though not in English). However, there are always obvious limits to this,
since once all the possible slot values are 'set', nothing else can happen.
What is inflectional morphology? - answerInflectional morphology can be thought of as
setting values of slots in some paradigm (i.e., there is a fixed set of slots which can be
thought of as being filled with simple values). Inflectional morphology concerns
properties such as tense, aspect, number, person, gender, and case, although not all
languages code all of these: English, for instance, has very little morphological marking
of case and gender.
What is a "full form lexicon"? - answerA list of all inflected forms treating derivational
morphology as non-productive. But since the vast majority of words in English have
regular morphology so a full-form lexicon can be regarded as a form of compilation - it is
redundant to have to specify the inflected form as well as the stem.
What is "stemming"? - answerA technique in traditional information retrieval systems.
Involves reducing all morphologically complex forms to a canonical form. The canonical
form may not be the linguistic stem, despite the name of the technique. The most
commonly used algorithm is the Porter stemmer, which uses a series of simple rules to
strip endings.
What is "lemmatization"? - answerAnother name for morphological analysis.
Describe English morphological structure - answerGenerally concatenative
Describe the formation of spelling rules - answerAKA orthographic rules
In such rules, the mapping is always given from the 'underlying' form to the surface
form, the mapping is shown to the left of the slash and the context to the right, with the
indicating the position in question. Example:
,$\epsilon \to e/{ s } \textasciicircum _s$
What sort of lexical information is needed for full, high precision morphological
processing - answer- Affixes, plus the associated information conveyed by the affix
- Irregular forms, with associated information similar to that for affixes
- Stems with syntactic categories (plus more information if derivational morphology is to
be treated as productive)
Give a simple way to encode affix lexicons in a - answerPair affixes with an encoding of
the syntactic/semantic effect of it. E.g.:
ed PAST_VERB
ed PSP_VERB
s PLURAL_NOUN
A lexicon of irregular forms is also necessary. One approach is just a triple consisting of
inflected form, 'affix information' and stem, where 'affix information' corresponds to
whatever encoding is used for the regular affix. E.g.:
began PAST_VERB begin
begun PSP_VERB begin
This approach can be used for generation as well as analysis
Give examples where the idea that morphology is purely concatenative breaks down -
answerunkempt - kempt is no longer a word
feed - could be fee -ed but fee is a noun
corpus - there is no such single "corpu"
What does it mean for a generative system to "overgenerate"? - answerOne that
generates output which is invalid (as well as valid ones)
Why are FSTs more useful than FSAs for morpheme analysis - answerFSAs can be
used to recognise certain patterns but don't by themselves allow for any analysis of
word forms. Hence for morphology we use FSTs which allow the surface structure to be
mapped into the list of morphemes. FSTs are useful for both analysis ad generation
since the mapping in bidirectional. This approach is known as "two-level morphology".
What sort of formalism do spelling rules map to? - answerFinite state transducers
What does 'two level morphology' mean? - answerA system which is good for both
analysing and generating mophemes
, Describe a finite state transducer - answerTransducers map between two
representations, so each transition corresponds to a pair of characters. As with the
spelling rule, we use the special character 'ε' to correspond to the empty character and
'ˆ' to correspond to an affix boundary. The abbreviation 'other : other' means that any
character not mentioned specifically in the FST maps to itself.18
AswiththeFSAexample,weassumethattheFSTonlyacceptsaninputiftheendoftheinputcorre
sponds to an accept state (i.e., no 'left-over' characters are allowed).
List some uses of finite state techniques in NLP - answer- Morpheme
analysis/generation
- Grammars for simple dialog systems
- Partial grammars for named entity recognition
- Dialogue models for spoken dialogue systems (SDS). SDS use dialogue models for a
variety of purposes: in- cluding controlling the way that the information acquired from
the user is instantiated (e.g., the slots that are filled in an underlying database) and
limiting the vocabulary to achieve higher recognition rates. FSAs can be used to record
possible transitions between states in a simple dialogue.
What useful additions can be made to FSAs? - answerTransition probabilities
Define 'corpus' - answerA body of text that has been collected for some purpose.
Define 'balanced corpus' - answerA corpus which contains texts which represent
different genres (newspapers, fiction, textbooks, parliamentary reports, cooking recipes,
scientific papers etc etc): early examples were the Brown corpus (US English: 1960s)
and the Lancaster- Oslo-Bergen (LOB) corpus (British English: 1970s) which are each
about 1 million words: the more recent British National Corpus (BNC: 1990s) contains
approximately 100 million words, including about 10 million words of spoken English.
Why did many mainstream linguists discount the use of corpuses? - answerMainstream
linguists in the past mostly dismissed their use in favour of reliance on intuitive
judgements about whether or not an utterance is grammatical (a corpus can only
(directly) provide positive evidence about grammaticality). However, many linguists do
now use corpora.
What is a 'Wizard of Oz' experiment? - answerFor interface applications in particular,
collecting a corpus requires a simulation of the actual application: this has often been
done by a Wizard of Oz experiment, where a human pretends to be a computer.
Why are corpuses needed in NLP? - answerFirstly, we have to evaluate algorithms on
real language: corpora are required for this purpose for any style of NLP. Secondly,
corpora provide the data source for many machine-learning approaches.
Why do we want to use prediction in NLP? - answer- Some machine learning systems
can be trained using prediction on general text corpora in a way that also makes them
useful on other tasks where there is limited training data.
The benefits of buying summaries with Stuvia:
Guaranteed quality through customer reviews
Stuvia customers have reviewed more than 700,000 summaries. This how you know that you are buying the best documents.
Quick and easy check-out
You can quickly pay through credit card or Stuvia-credit for the summaries. There is no membership needed.
Focus on what matters
Your fellow students write the study notes themselves, which is why the documents are always reliable and up-to-date. This ensures you quickly get to the core!
Frequently asked questions
What do I get when I buy this document?
You get a PDF, available immediately after your purchase. The purchased document is accessible anytime, anywhere and indefinitely through your profile.
Satisfaction guarantee: how does it work?
Our satisfaction guarantee ensures that you always find a study document that suits you well. You fill out a form, and our customer service team takes care of the rest.
Who am I buying these notes from?
Stuvia is a marketplace, so you are not buying this document from us, but from seller julianah420. Stuvia facilitates payment to the seller.
Will I be stuck with a subscription?
No, you only buy these notes for $13.99. You're not tied to anything after your purchase.