Tentamen (uitwerkingen)

Natural Language Processing (CS4990). Top Exam Questions and answers, 100% Accurate, verified

18 keer bekeken 0 keer verkocht

Vak
Natural Language Processing

Instelling
Natural Language Processing

[Meer zien]

Voorbeeld 4 van de 46 pagina's

Bekijk voorbeeld

Geupload op 13 juni 2023
Aantal pagina's 46
Geschreven in 2022/2023
Type Tentamen (uitwerkingen)
Bevat Vragen en antwoorden

natural language processing cs4990 top exam que

Volgen

PassPoint02

Lid sinds 2 jaar 125 documenten verkocht

€10,02

Toegevoegd

In winkelwagen

Op verlanglijstje

100% tevredenheidsgarantie
Direct beschikbaar na betaling
Zowel online als in PDF
Je zit nergens aan vast

Natural Language Processing (CS4990).
Top Exam Questions and answers, 100%
Accurate, verified

Why Python?

- Shallow learning curve
- Good string handling
- Combines OO, aspect-oriented and FP paradigms
- Extensive standard libraries (e.g. NLTK)
- Great support for Deep Learning

Human language

Ultimate interface for interaction and communication.
But something to understand, because it's:
- highly ambiguous at all level
- complex and a subtle use of context to convey meaning
- fuzzy and probabilistic

Understanding a language requires domain knowledge, discourse knowledge, world knowledge and
linguistic knowledge

Word level ambiguity

- Spelling (e.g. colour vs color)
- Pronunciation
• 1 word can have multiple pronunciations (e.g. abstract, desert)
• Multiple words can share the same pronunciation (e.g. flower/flour)
- Meaning (1 word can have multiple meanings, i.e. homonyms; e.g. date, crane, leaves)

Natural Language Processing (NLP)

A subfield of linguistics, CompSci, Information Engineering and AI concerned with the interactions
between computers and human (natural) languages, in particular how to program computers to process
and analyse large amounts of natural language data

NLP tasks & applications

- Writing assistance (spell/grammar/style checking, auto completion).
- Text classification (spam detection, sentiment analyses, fake news/propaganda detection, news topic
classification, customer reviews category classification).
- Information retrieval (search engine)
- NL Understanding (argumentation mining, question-answering, NL inference,

,humorous/ironic/metaphoric language analysis).
- NL generation (document summarisation, machine translation, sentence paraphrasing/simplification,
dialogue/exercise generation)

NLP limits & outlook

- Language problems are hard - for most of them, there's still no fully accurate solution (like Physics,
History and Psychology).

Data types (based on structures)

- Structured data
- Semi-structured data
- Unstructured data

Corpus (=body)

A large body of text.
It usually contains raw text and any metadata associated with the text (e.g. timestamp, source, index,
...).
It's also known as a dataset

Text cleaning & normalisation

Remove useless information (e.g. email headers) and extract useful information (e.g. words, word
sequences, verbs, nouns, adjectives, names, locations, orgs, ...).

1. Tokenization (sentence, words)
2. Stemming / Lemmatization
3. Stop-words removal

,Tokenization

Process of splitting sentences into their constituents, i.e. tokens (generally done by white-space or
punctuation character separation in English), which are meaningful segments.

Type

Element in the vocabulary. Also known as the form or spelling of the token (including words and
punctuation) independently of its specific occurrences in a text.

Token

Instance of a type in a text, which is a sequence of characters that is treated as a single group (i.e. words
and punctuation).

E.g. To be or not to be
- 2x to, be
- 1x or, not

Simple tokenization

Split with white-space (for English texts).

Pros: simple and natively supported by Python.

Cons: it fails to tokenize punctuation and hyphenated words (e.g. "state-of-the-art").

Natural Language Tool Kit (NLTK)

(FOSS) Python library to make programs that work with NL.
It can perform different operations such as tokenization, stemming, classification, parsing, tagging and
semantic reasoning.

Word tokenizer (from NLTK)

NLTK' standard tokenizer.

Pros: successfully tokenizes punctuations, split hashtags into separate words (e.g. #70thRepublic_Day
into "#" and "70thRepublic_Day")

, Cons: it fails to identify widely used symbol combinations (e.g. ":)" is split into 2 symbols)

Tweet tokenizer (from NLTK)

Pros: correctly handles hashtags and mentions (`@somone`)

Cons: it fails at abbreviations (e.g. U.K)

Sentence tokenization

For long documents, we may not be interested in words but instead in sentences therein:
- Check whether a sentence's sentiment is positive or negative.
- Check whether a sentence contains propaganda content.
- Check the grammatical correctness of a sentence
- ...

Stemming

Process of reducing the inflection in words to their root forms such as mapping a group of words to the
same stem even if the stem itself isn't a valid word in the language.

NLTK includes 2 widely used ones: Port Stemmer and Lancaster Stemmer (younger and more
aggressive); they both regard an input text as a single word.

Pros: quick to run (because it's based on simple rules) and suitable for processing a large amount of text

Cons: the resulting words may not carry any meaning (or be actual words)

Voordelen van het kopen van samenvattingen bij Stuvia op een rij:

Verzekerd van kwaliteit door reviews

Stuvia-klanten hebben meer dan 700.000 samenvattingen beoordeeld. Zo weet je zeker dat je de beste documenten koopt!

Snel en makkelijk kopen

Je betaalt supersnel en eenmalig met iDeal, creditcard of Stuvia-tegoed voor de samenvatting. Zonder lidmaatschap.

Focus op de essentie

Samenvattingen worden geschreven voor en door anderen. Daarom zijn de samenvattingen altijd betrouwbaar en actueel. Zo kom je snel tot de kern!

Veelgestelde vragen

Wat krijg ik als ik dit document koop?

Je krijgt een PDF, die direct beschikbaar is na je aankoop. Het gekochte document is altijd, overal en oneindig toegankelijk via je profiel.

Tevredenheidsgarantie: hoe werkt dat?

Onze tevredenheidsgarantie zorgt ervoor dat je altijd een studiedocument vindt dat goed bij je past. Je vult een formulier in en onze klantenservice regelt de rest.

Van wie koop ik deze samenvatting?

Stuvia is een marktplaats, je koop dit document dus niet van ons, maar van verkoper PassPoint02. Stuvia faciliteert de betaling aan de verkoper.

Zit ik meteen vast aan een abonnement?

Nee, je koopt alleen deze samenvatting voor €10,02. Je zit daarna nergens aan vast.

Is Stuvia te vertrouwen?

4,6 sterren op Google & Trustpilot (+1000 reviews)

Afgelopen 30 dagen zijn er 83750 samenvattingen verkocht

Opgericht in 2010, al 14 jaar dé plek om samenvattingen te kopen

Start met verkopen

Populaire Universiteiten

Populaire Hogescholen

Populaire Scholen

Populaire samengevatte studieboeken voor Communicatie en Taal

Populaire samengevatte studieboeken voor Economie en Bedrijf

Populaire samengevatte studieboeken voor Exact en Informatica

Populaire samengevatte studieboeken voor Gedrag en Maatschappij

Populaire samengevatte studieboeken voor Gezondheid en Geneeskunde

Populaire samengevatte studieboeken voor Onderwijs en Opvoeding

Populaire samengevatte studieboeken voor Recht en Bestuur

De beste samenvattingen om je Wft-diploma te behalen

De beste samenvattingen om je theorie examens te behalen

De beste samenvattingen voor je cursus in de Veiligheidsbranche

De beste samenvattingen voor Gezondheid & Hygiëne cursussen

De beste samenvattingen voor zakelijke cursussen

De beste samenvattingen voor je PABO WisCAT cursus

Populaire vakken

Populaire vakken

Populaire vakken

Boekverslagen en samenvattingen

Tentamen (uitwerkingen)

Natural Language Processing (CS4990). Top Exam Questions and answers, 100% Accurate, verified

Document informatie

Onderwerpen

Geschreven voor

Verkoper

Ontvangen beoordelingen

Voorbeeld van de inhoud

Voordelen van het kopen van samenvattingen bij Stuvia op een rij:

Verzekerd van kwaliteit door reviews

Snel en makkelijk kopen

Focus op de essentie

Veelgestelde vragen

Wat krijg ik als ik dit document koop?

Tevredenheidsgarantie: hoe werkt dat?

Van wie koop ik deze samenvatting?

Zit ik meteen vast aan een abonnement?

Is Stuvia te vertrouwen?