Week 9 62
Lecture: Trends in Data Science & Society 62
,End Term Design
50 questions workshop assignments
Mostly NLP
R/Spark
AutoML
50 questions on literature and lectures
25 questions on NLP
15 questions on AutoML
10 questions on cloud computing
Example Questions
Complete this script by selecting the appropriate command at ____
A. Match
B. Fit (correct)
C. Solve
D. Map
How do contemporary neutral networks generally implement language models?
A. They treat each different word (or at least each different lemma) as a distinct atomic category
B. They use a single hidden layer at each position in a sequence
C. They exploit similarities between words by training feature-based representations of them (correct)
D. They consist of two components, one which models the similarities between words and one which
models the individual probabilities
What does this figure visualise?
A. Linguistic Regression Model
B. Projective Dependency Grammar
C. Social Network Analysis
D. Multi-layered Perception (correct)
A. CFG
B. PCFG (correct)
C. CNF
D. PPDG
Which of the following phrase structure statements (between quotes) is correct?
A. “They” is an NP (noun phrase) (correct)
B. “The garden with flowers” is a PP (prepositional phrase)
C. “Below sea level fish thrive.” is a VP (verb phrase)
2
, D. “Come on!” is a GP (GP is not a thing)
What is Parsing in NLP?
A. The algorithms to automatically...
B. The process of automatically analyzing a given sentence to determine underlying syntactic structures
(correct)
C. ..
D. ..
What is k-means?
A. Clustering Algorithm (correct)
B. Meaning Abstraction Algorithm
C. Document embedding algorithm
D. …
Which Python package did we use to retrieve PubMed abstracts?
A. ..
B. ..
C. Biopython (correct)
D. ..
3
, Week 5
Assignment: Statistics in R (part 2)
In this tutorial, you will learn several things about the popular statistical program R http://www.r-project.com
and about how to perform some of the machine learning possibilities of R and Spark.
Code
install.packages("tidyverse")
Explanation
There are many ways to do data analysis in R. An especially easy way to do your “data
wrangling” is by using the so-called Tidyverse. Since dplyr is not only a function, but a
grammar of data manipulation, the same grammar can work regardless of whether you are
using R directly, or use R to provide Spark with commands.
Code
library(sparklyr)
spark_install(version = "2.1.0")
sc <- spark_connect(master = "local")
Explanation
To use R and Spark, we will make use of a package called sparklyr. If Spark is running, it is
possible to connect to the Spark instance using the function spark_connect(). However, if you
use a local installation, you can install Spark through R, and connect to it immediately
Stuvia customers have reviewed more than 700,000 summaries. This how you know that you are buying the best documents.
Quick and easy check-out
You can quickly pay through credit card or Stuvia-credit for the summaries. There is no membership needed.
Focus on what matters
Your fellow students write the study notes themselves, which is why the documents are always reliable and up-to-date. This ensures you quickly get to the core!
Frequently asked questions
What do I get when I buy this document?
You get a PDF, available immediately after your purchase. The purchased document is accessible anytime, anywhere and indefinitely through your profile.
Satisfaction guarantee: how does it work?
Our satisfaction guarantee ensures that you always find a study document that suits you well. You fill out a form, and our customer service team takes care of the rest.
Who am I buying these notes from?
Stuvia is a marketplace, so you are not buying this document from us, but from seller samoyediran4. Stuvia facilitates payment to the seller.
Will I be stuck with a subscription?
No, you only buy these notes for $9.63. You're not tied to anything after your purchase.