100% satisfaction guarantee Immediately available after payment Both online and in PDF No strings attached
logo-home
Natural Language Processing - Summary Slides $15.92
Add to cart

Summary

Natural Language Processing - Summary Slides

 6 views  0 purchase
  • Course
  • Institution

A summary of all the slides for the course Natural Language Processing, MSc AI.

Preview 4 out of 87  pages

  • December 30, 2024
  • 87
  • 2022/2023
  • Summary
avatar-seller
Natural Language Processing - slides


Lecture 1
Natural Language
● Natural language is quite complex
● Yet, children acquire language easily and are able to understand and produce utterances
they have never heard (Poverty of the Stimulus)
● How is this possible?
○ Language is innate?
○ Learn through imitation?
○ Learn through interaction?
○ Language just like any cognitive faculty, but with more input?

Language Acquisition




Natural language is:
● Compositional
○ The meaning of a sentence is determined by the meanings of its individual words
and the way they are combined/composed
■ "The cat is on the mat." → the animal referred to as a "cat" is located "on"
top of the object referred to as a "mat."
● Arbitrary
○ There is no inherent or logical relationship between the form of a word or
expression and its meaning
■ "Dog" → refers to the domesticated four-legged animal we commonly
associate with the word "dog," but there is no inherent reason why the
sounds "d", "o", and "g" arranged in that particular order should convey
that meaning
● Creative
○ Ability of speakers of a natural language to generate new and meaningful
expressions that may not have been previously encountered or explicitly learned
■ “Selfie”
● Displaced
○ Ability of speakers to refer to things that are not directly perceivable or present
■ "Yesterday, I went to the store and bought some groceries."




1

,Natural Language Processing - slides


What does an NLP system need to know?
● Language consists of many levels of structure.
● Humans fluently integrate all of these in producing and understanding language.
● Ideally, so would a computer!




● Morphology
○ Study of words and their parts or smallest meaningful units of meaning
■ prefixes, suffixes and base words
● Parts of speech
○ Word classes or grammatical categories such as noun, verb, adjective, adverb,
pronoun, preposition, conjunction, interjection
● Syntax
○ Rules that govern the arrangement of words and phrases in a sentence, including
rules for word order, word agreement (e.g., subject-verb agreement), and the
formation of phrases (such as noun phrases, verb phrases, and adjective
phrases)
● Semantics
○ Meaning of words, phrases, sentences
● Pragmatics/discourse
○ Analysis of extended stretches of language use, such as conversations, texts,
and narratives, in their social and cultural contexts

What is Natural Language Processing?
● Core technologies:
○ Language modeling / text generation
○ Sequence / POS tagging
○ Syntactic parsing
○ Named Entity Recognition (NER)
○ Coreference resolution
○ Word disambiguation
○ Semantic role labeling



2

,Natural Language Processing - slides


● Natural language processing (NLP) refers to the branch of computer science—and more
specifically, the branch of artificial intelligence or AI—concerned with giving computers
the ability to understand text and spoken words in much the same way human beings
can.
● NLP combines computational linguistics—rule-based modeling of human language—with
statistical, machine learning, and deep learning models. Together, these technologies
enable computers to process human language in the form of text or voice data and to
‘understand’ its full meaning, complete with the speaker or writer’s intent and sentiment.

What is Natural Language Processing?
● Represent language in a way that a computer can process it
○ representing input
● Process language in a way that is useful for humans
○ generating output
● Understanding language structure and language use
○ computational modelling

Why is NLP hard?
1. Ambiguity
2. Sparse data due to Zipf’s Law
3. Variation
4. Expressivity
5. Context dependence
6. Unknown representation

Ambiguity at many levels
● Word senses: bank (noun: place where people deposit money or verb: to bounce off of
something)
● Part of speech: chair (noun: seat, person in charge of an organization or verb: act as
chairperson)
● Syntactic structure: I saw a man with a telescope (either I had a telescope or the man)
● Quantifier scope: Every child loves some movie (every child loves at least one movie or
every child loves one particular movie)
● Multiple: I saw her duck (saw as in see or a hand tool)
● How can we model ambiguity, and choose the correct analysis in context?

What can we do about ambiguity?
● Non-probabilistic methods (FSMs for morphology, CKY parsers for syntax)
○ Return all possible analyses
● Probabilistic models (HMMs for POS tagging, PCFGs for syntax) and algorithms (Viterbi,
probabilistic CKY)
○ Return the best possible analysis
● But the “best” analysis is only good if our probabilities are accurate. Where do they come
from?


3

, Natural Language Processing - slides



Statistical NLP
● Like most other parts of AI, NLP is dominated by statistical methods
○ Typically more robust than ealier rule-based methods
○ Relevant statistics/probablities are learned from data
○ Normally requires lots of data about any particular phenomenon

Sparse data due to Zipf’s Law
● We have different word counts/ frequencies in large text corpuses




○ Takeaway: Rank-frequency distribution is an inverse relation
■ To really see what’s going on, use logarithmic axes:




● Assume “word” is a string of letters separated by spaces (a great oversimplification…)
● Zipf’s law
○ Summarizes the behaviour above




○ Implications
■ Regardless of how large our corpus is, there will be a lot of infrequent
(and zero-frequency) words
■ In fact, the same holds for many other levels of linguistic structure (e.g.,
syntactic rules in a CFG)
■ This means we need to find clever ways to estimate probabilities for
things we have rarely or never seen

Variation



4

The benefits of buying summaries with Stuvia:

Guaranteed quality through customer reviews

Guaranteed quality through customer reviews

Stuvia customers have reviewed more than 700,000 summaries. This how you know that you are buying the best documents.

Quick and easy check-out

Quick and easy check-out

You can quickly pay through credit card or Stuvia-credit for the summaries. There is no membership needed.

Focus on what matters

Focus on what matters

Your fellow students write the study notes themselves, which is why the documents are always reliable and up-to-date. This ensures you quickly get to the core!

Frequently asked questions

What do I get when I buy this document?

You get a PDF, available immediately after your purchase. The purchased document is accessible anytime, anywhere and indefinitely through your profile.

Satisfaction guarantee: how does it work?

Our satisfaction guarantee ensures that you always find a study document that suits you well. You fill out a form, and our customer service team takes care of the rest.

Who am I buying these notes from?

Stuvia is a marketplace, so you are not buying this document from us, but from seller tararoopram. Stuvia facilitates payment to the seller.

Will I be stuck with a subscription?

No, you only buy these notes for $15.92. You're not tied to anything after your purchase.

Can Stuvia be trusted?

4.6 stars on Google & Trustpilot (+1000 reviews)

48072 documents were sold in the last 30 days

Founded in 2010, the go-to place to buy study notes for 15 years now

Start selling
$15.92
  • (0)
Add to cart
Added