100% tevredenheidsgarantie Direct beschikbaar na betaling Zowel online als in PDF Je zit nergens aan vast
logo-home
Samenvatting - Text Mining (XB_0085) €7,49   In winkelwagen

Samenvatting

Samenvatting - Text Mining (XB_0085)

 56 keer bekeken  4 keer verkocht

Text mining samenvatting voor studie Artificial Intelligence & Data Science minor. Samenvatting van lectures and notes.

Voorbeeld 2 van de 9  pagina's

  • 30 mei 2023
  • 9
  • 2022/2023
  • Samenvatting
Alle documenten voor dit vak (2)
avatar-seller
simonvanrens
Text mining summary
1 – Search is divided into 2: meta data index and text index
Metadata index = index based on metadata associated with a document: title, author, date
and keywords. Like a library computer where you can search on “author”
Text index = index based on the full text of documents themselves. Often used in search
engines, like finding a keyword in texts.
Event index = type of index used to organize and retrieve information about events, like
meetings or concerts. Typically involves meta data such as date, time, location etc as well as
any related documents such as agendas or presentations. Like when planning a vacation, to
see what will happen during your time there.

Anomalies = refer to data points that differ significantly from the norm or expected
behaviour, like if sentiment analysis model identifies a piece of text as positive while most
other similar texts are negative

Short text is complex message with a lot of relations and information  NLP technology to
extract this information

Computational linguistics = algo’s that model language data and define notions like:
similarity, info value, sequence probabilities (develop chatbots that can understand and
respond to user queries)
Natural Language Processing (NLP) = engineering to address aspects of natural language like:
tokenisation, lemmatisation, compound splitting, sentiment analysis (=> uses ML algo to
determine the emotional tone of given text)
NLP toolkits = software packages and resources that provide and/or combine collections of
NLP modules (NLTK in python)
Language applications = machine translation, summarisation, chat bots, text mining (google
translate or siri)
Text mining = from unstructured text to structured data (information or knowledge)(our
focus: understand the technology, limitations, build applications) (=> topic modelling which
uses statistical algo’s to identify topics and patterns within a large corpus of text)

Week 2&3 – part 1:
NLP:
 Complex problem (extracting info from texts) is broken down into a number of
smaller problems
 Simple, structural problems solved first and higher-level semantics tasks are solved
later, using output of earlier modules as input
o So called pipeline architecture with dependencies across modules
o Error propagation
 For each problem different techniques:
o Knowledge-base & rules (linguistic knowledge)
o Machine learning (supervised and unsupervised) data driven

We always need to do preprocessing:
 Even for the current state-of-the-art deep learning systems
 First problem: what is a word, what is a sentence? Not trivial

,  Tokenization (example of problems):
o 21st century, quotes, don’t, hyphens, $100,45 etc
 Sentence splitting (example of problems):
o Dr., bol.com, etc.m white spaces, tables, HTML markup, <h1></h1>, new lines

Named entity recognition pipeline example: sentiment analysis pipeline example




Some issues:
 Dependencies across modules result in error propagation
 Ambiguities (multiple values with confidence score, like POS tagging: 80% noun, 20%
verb) are often not exploited by next levels
 Conflicts: different modules state information that is not compatible
 Complex and difficult to maintain, like: input and output needs to be interoperable
across modules

Text mining is like solving a puzzle. You have to put all the pieces together to understand
what the puzzle is trying to show you.

But sometimes there are problems that can make solving the puzzle difficult. One problem is
when the different pieces of the puzzle depend on each other, so if one piece is wrong, it
affects all the other pieces.

Another problem is when there are different meanings for the same word, like "run" can
mean to jog or to manage. This can make it hard to understand what the puzzle is trying to
say.

Also, sometimes there can be different parts of the puzzle that don't match up or agree with
each other. This can make it even harder to understand the puzzle.

Lastly, text mining is complex and requires a lot of work to make sure everything fits together
properly. It's like putting together a big Lego castle where each block has to fit with the
others.

For example, imagine you are trying to understand a book about a dog named Max. One
piece of the puzzle might be the word "bark." Depending on how it's used, it could mean Max
is barking at someone, or he's barking up a tree. If the wrong meaning is chosen, it could
affect the rest of the puzzle.

Voordelen van het kopen van samenvattingen bij Stuvia op een rij:

Verzekerd van kwaliteit door reviews

Verzekerd van kwaliteit door reviews

Stuvia-klanten hebben meer dan 700.000 samenvattingen beoordeeld. Zo weet je zeker dat je de beste documenten koopt!

Snel en makkelijk kopen

Snel en makkelijk kopen

Je betaalt supersnel en eenmalig met iDeal, creditcard of Stuvia-tegoed voor de samenvatting. Zonder lidmaatschap.

Focus op de essentie

Focus op de essentie

Samenvattingen worden geschreven voor en door anderen. Daarom zijn de samenvattingen altijd betrouwbaar en actueel. Zo kom je snel tot de kern!

Veelgestelde vragen

Wat krijg ik als ik dit document koop?

Je krijgt een PDF, die direct beschikbaar is na je aankoop. Het gekochte document is altijd, overal en oneindig toegankelijk via je profiel.

Tevredenheidsgarantie: hoe werkt dat?

Onze tevredenheidsgarantie zorgt ervoor dat je altijd een studiedocument vindt dat goed bij je past. Je vult een formulier in en onze klantenservice regelt de rest.

Van wie koop ik deze samenvatting?

Stuvia is een marktplaats, je koop dit document dus niet van ons, maar van verkoper simonvanrens. Stuvia faciliteert de betaling aan de verkoper.

Zit ik meteen vast aan een abonnement?

Nee, je koopt alleen deze samenvatting voor €7,49. Je zit daarna nergens aan vast.

Is Stuvia te vertrouwen?

4,6 sterren op Google & Trustpilot (+1000 reviews)

Afgelopen 30 dagen zijn er 67474 samenvattingen verkocht

Opgericht in 2010, al 14 jaar dé plek om samenvattingen te kopen

Start met verkopen
€7,49  4x  verkocht
  • (0)
  Kopen