Logistic Regression - answer An algebraic function that is used to relate any and all
independent variables to the expected dependent variable.
INPUT X ^(i) = column = [[1],[8],[11]]
HYPERPARAMTERS θ
sigmoid ( θ.T, X )
LABEL Y ex positive sentiment 1, Neg Sentiment =0
PREDECITED LABEL Y'
COST FUNCTION TO MINIMIZE L(Y,Y')
Gradient Descent = θ - alpha * gradient slope
vocabulary - answer List of unique words in a document
sentiment analysis - answer an automated process of analyzing and categorizing social
media to determine the amount of positive, negative, and neutral online comments a
brand receives
Looking at
Vocabulary you can create a
Positive Frequency
Negative frequency
associated to every word in vocabulary
sentiment analysis: Positive Frequency Dictionary - answer"I am happy because I am
learning NLP"
"I am Happy"
vocabulary:
I am happy because learning nlp sad not
33211100
Feature Extraction: Spare Representation - answerA representation that contains a lot
of zeros
example
vector of 1, 0's each representing the existence of the words in the vocabulary
CONS: Features are as large as the size of a Vocabulary. This could result in larger
training time, and large prediction time.
sentiment analysis: Negative Frequency Dictionary - answer"I am sad, I am not learning
NLP"
"I am Sad"
,vocabulary:
I am happy because learning nlp sad not
33001121
vocab: I am happy because learning nlp sad not
pos: 3 3 2 1 1 1 0 0
neg: 3 3 0 0 1 1 2 1
Σw = "I am sad, I am not learning NLP"
Σ freqs(w,1) pos.=I:3+am:3 +sad:0 +not:0+learn:1+NLP:1=8
Σ freqs(w,0) neg.=I:3+am:3 +sad:2 +not:1+learn:1+NLP:1=11
Feature Extraction Preprocessing: Stop Words - answerFrequently used words that are
part of sentence but don't add value such as conjunctions and punctuations
Feature Extraction Preprocessing: Stemming - answerReducing Words to their base
derivations removing tense example
Assumptions of Naïve Bayes - answerWords in a Sentence are assumed independent
Bad: Words can be used together to describe/reference another word in a sentence and
not necessarily be stand alone. example "sunny and hot" of "cold and snowy"
Relies on data distribution of training sets. Good training sets have equal frequencies of
data classifications.
Bias is present in sentiments of training tweets for example
Stuvia customers have reviewed more than 700,000 summaries. This how you know that you are buying the best documents.
Quick and easy check-out
You can quickly pay through credit card or Stuvia-credit for the summaries. There is no membership needed.
Focus on what matters
Your fellow students write the study notes themselves, which is why the documents are always reliable and up-to-date. This ensures you quickly get to the core!
Frequently asked questions
What do I get when I buy this document?
You get a PDF, available immediately after your purchase. The purchased document is accessible anytime, anywhere and indefinitely through your profile.
Satisfaction guarantee: how does it work?
Our satisfaction guarantee ensures that you always find a study document that suits you well. You fill out a form, and our customer service team takes care of the rest.
Who am I buying these notes from?
Stuvia is a marketplace, so you are not buying this document from us, but from seller julianah420. Stuvia facilitates payment to the seller.
Will I be stuck with a subscription?
No, you only buy these notes for $13.49. You're not tied to anything after your purchase.