100% satisfaction guarantee Immediately available after payment Both online and in PDF No strings attached
logo-home
NLP TEST 1 QUESTIONS AND ANSWERS 2024 $12.99   Add to cart

Exam (elaborations)

NLP TEST 1 QUESTIONS AND ANSWERS 2024

 1 view  0 purchase
  • Course
  • Institution

Exam of 5 pages for the course NLP at NLP (NLP TEST 1)

Preview 2 out of 5  pages

  • October 31, 2024
  • 5
  • 2024/2025
  • Exam (elaborations)
  • Questions & answers
avatar-seller
NLP TEST 1

classic parsing method - answer1. parse as search: top-down or bottom-up;
2. shift-reduce
3. cky
4. Earley

CKY parser - answer bottom-up; requires a binarized grammar

earley parser - answer top-down, complex

generative classifier - answer Naive Bayes. Build a model of each class. Given an
observation, they return the class most likely to have generated the observation.

discriminative classifier - answer Logistic regression (MaxEnt). Learn what features from
the input are most useful to discriminate between the different classes.

10-fold cross-validation - answer 留太多 training set 的话,test set 小就不够有代表
性。Thus use all data both for training and test.
1. Randomly choose a training and test set division of data, train the classifier, compute
the error rate on the test set.
2. Repeat with a different randomly selected training set and test set.
3. Do it 10 times
4. average 10 runs to get an average error rate
又因为所有 data 都用来 test,我们不能去看 data,分析有哪些 feature。为避免这种情
况:
create a fixed training set and test set, then do 10-fold cross-validation inside the
training set, compute error rate the normal way in the test set.

overfitting - answerA model that learned the noise instead of the signal is considered
overfit because it fits the training dataset but has poor fit with new datasets.

two common architectures for corpus-based chabots - answer1. information retrieval
2. machine learned sequence transduction

types of chatbots - answerrule-based, corpus-based, frame-based(task-based)

domain ontology - answermodern frame-based dialogue systems are based on domain
ontology.
The ontology defines one or more frames, each a collection of slots, and
slot defines the values that each slot can take

, frame-based chatbot/GUS architecture - answerbased on hand designed FSA.

NLU goal for filling frame-based chatbot slots - answer1. domain classification
2. user intent determination
3. slot filling

language models - answerModels that assign probabilities to sequences of words. The
simplest model is N-gram model.

N-gram model - answerInstead of computing the probability of a word given its entire
history, we can approximate the history by just the last few words. It is based on Markov
assumption: the probability of a word depends only on the previous word.

Markov models - answerthe class of probabilistic models that assume we can predict
the probability of some future unit without looking too far into the past.

maximum likelihood estimation - answerThe procedure of computing the score for all
possible parameter values to identify the parameter value that confers the highest
likelihood score

evaluate language models - answer1. extrinsic evaluation: to embed the model in an
application and measure how much the application improves. Expensive
2. intrinsic evaluation: to measure the quality of a model independent of any application.
80% training, 10%development set, 10% test set

perplexity - answer In practice, we don't use raw probability as our metric for evaluating
language models but a variant called perplexity. It is the inverse probability of the test
set. The lower the perplexity, the higher the probability.

Perplexity can also be thought as the weighted average branching factor of a language
(Not just a branching factor).
The branching factor of a language is the number of possible next words that can follow
any word.

OOV - answer out of vocabulary, words that we haven't seen before.
The percentage of OOV words that appear in the test set is called the OOV rate.

Smoothing - answers keep a language model from assigning zero probability to these
unseen events, we'll have to shave off a bit of probability mass from some more
frequent events and give it to the events we've never seen. This modification is called
smoothing or discounting.

Laplace/add-1 smoothing, add-k smoothing, stupid backoff, Kneser-Ney
smoothing(most useful for language modeling)

(add-one and add-k are not good for language modeling, but good for classification)

The benefits of buying summaries with Stuvia:

Guaranteed quality through customer reviews

Guaranteed quality through customer reviews

Stuvia customers have reviewed more than 700,000 summaries. This how you know that you are buying the best documents.

Quick and easy check-out

Quick and easy check-out

You can quickly pay through credit card or Stuvia-credit for the summaries. There is no membership needed.

Focus on what matters

Focus on what matters

Your fellow students write the study notes themselves, which is why the documents are always reliable and up-to-date. This ensures you quickly get to the core!

Frequently asked questions

What do I get when I buy this document?

You get a PDF, available immediately after your purchase. The purchased document is accessible anytime, anywhere and indefinitely through your profile.

Satisfaction guarantee: how does it work?

Our satisfaction guarantee ensures that you always find a study document that suits you well. You fill out a form, and our customer service team takes care of the rest.

Who am I buying these notes from?

Stuvia is a marketplace, so you are not buying this document from us, but from seller julianah420. Stuvia facilitates payment to the seller.

Will I be stuck with a subscription?

No, you only buy these notes for $12.99. You're not tied to anything after your purchase.

Can Stuvia be trusted?

4.6 stars on Google & Trustpilot (+1000 reviews)

84866 documents were sold in the last 30 days

Founded in 2010, the go-to place to buy study notes for 14 years now

Start selling
$12.99
  • (0)
  Add to cart