Samenvatting

Data Science and Society End Term Summary

1 keer verkocht

Instelling
Universiteit Utrecht (UU)

This summary provides an in-depth summary of the lectures, literature, & assignments in the course Data Science and Society.

[Meer zien]

Voorbeeld 4 van de 64 pagina's

Bekijk voorbeeld

Geupload op 2 november 2019
Aantal pagina's 64
Geschreven in 2019/2020
Type Samenvatting

Volgen

samoyediran4 Lid sinds 6 jaar 32 documenten verkocht

€8,99

Ook beschikbaar in voordeelbundel v.a. €10,49

In winkelwagen

Op verlanglijstje

100% tevredenheidsgarantie
Direct beschikbaar na je betaling
Lees online óf als PDF
Geen vaste maandelijkse kosten

Ook beschikbaar in voordeelbundel (1)

Data Science and Society Super Bundle

€ 23,47 € 10,49

1x verkocht

3 items

1. Samenvatting - Data science and society midterm exam summary
2. Samenvatting - Data science and society midterm summary
3. Samenvatting - Data science and society end term summary
Meer zien

Data Science & Society
End Term Summary

End Term Design 2

Example Questions 2

Week 5 4
Assignment: Statistics in R (part 2) 4

Week 6 6
Lecture: Natural Language Processing 6
Lecture: NLP Embeddings 11
Assignment: NLP part 1 24
Literature: Computational Linguistics and NLP (Clark et al., 2013) 27

Week 7 40
Lecture: Automated Machine Learning 40
Lecture: Cloud Computing & Spot Pricing 51
ADD MONDAY Assignment: NLP Part 2 56
Literature: AutoML (Hutter et al., 2019) 58

Week 9 62
Lecture: Trends in Data Science & Society 62

,End Term Design
50 questions workshop assignments
Mostly NLP
R/Spark
AutoML
50 questions on literature and lectures
25 questions on NLP
15 questions on AutoML
10 questions on cloud computing

Example Questions
Complete this script by selecting the appropriate command at ____
A. Match
B. Fit (correct)
C. Solve
D. Map

How do contemporary neutral networks generally implement language models?
A. They treat each different word (or at least each different lemma) as a distinct atomic category
B. They use a single hidden layer at each position in a sequence
C. They exploit similarities between words by training feature-based representations of them (correct)
D. They consist of two components, one which models the similarities between words and one which
models the individual probabilities

What does this figure visualise?
A. Linguistic Regression Model
B. Projective Dependency Grammar
C. Social Network Analysis
D. Multi-layered Perception (correct)

A. CFG
B. PCFG (correct)
C. CNF
D. PPDG

Which of the following phrase structure statements (between quotes) is correct?
A. “They” is an NP (noun phrase) (correct)
B. “The garden with flowers” is a PP (prepositional phrase)
C. “Below sea level fish thrive.” is a VP (verb phrase)
2

, D. “Come on!” is a GP (GP is not a thing)

What is Parsing in NLP?
A. The algorithms to automatically...
B. The process of automatically analyzing a given sentence to determine underlying syntactic structures
(correct)
C. ..
D. ..

What is k-means?
A. Clustering Algorithm (correct)
B. Meaning Abstraction Algorithm
C. Document embedding algorithm
D. …

Which Python package did we use to retrieve PubMed abstracts?
A. ..
B. ..
C. Biopython (correct)
D. ..

3

, Week 5
Assignment: Statistics in R (part 2)
In this tutorial, you will learn several things about the popular statistical program R http://www.r-project.com
and about how to perform some of the machine learning possibilities of R and Spark.

Code

install.packages("tidyverse")

Explanation

There are many ways to do data analysis in R. An especially easy way to do your “data
wrangling” is by using the so-called Tidyverse. Since dplyr is not only a function, but a
grammar of data manipulation, the same grammar can work regardless of whether you are
using R directly, or use R to provide Spark with commands.

Code

library(sparklyr)

spark_install(version = "2.1.0")

sc <- spark_connect(master = "local")

Explanation

To use R and Spark, we will make use of a package called sparklyr. If Spark is running, it is
possible to connect to the Spark instance using the function spark_connect(). However, if you
use a local installation, you can install Spark through R, and connect to it immediately

Code

library(nycflights13)

flights = na.omit(flights)

flights_tbl <- copy_to(sc, flights, "flights", overwrite = TRUE)

Explanation

4

Dit zijn jouw voordelen als je samenvattingen koopt bij Stuvia:

Bewezen kwaliteit door reviews

Studenten hebben al meer dan 850.000 samenvattingen beoordeeld. Zo weet jij zeker dat je de beste keuze maakt!

In een paar klikken geregeld

Geen gedoe — betaal gewoon eenmalig met iDeal, creditcard of je Stuvia-tegoed en je bent klaar. Geen abonnement nodig.

Direct to-the-point

Studenten maken samenvattingen voor studenten. Dat betekent: actuele inhoud waar jij écht wat aan hebt. Geen overbodige details!

Veelgestelde vragen

Wat krijg ik als ik dit document koop?

Je krijgt een PDF, die direct beschikbaar is na je aankoop. Het gekochte document is altijd, overal en oneindig toegankelijk via je profiel.

Tevredenheidsgarantie: hoe werkt dat?

Onze tevredenheidsgarantie zorgt ervoor dat je altijd een studiedocument vindt dat goed bij je past. Je vult een formulier in en onze klantenservice regelt de rest.