100% satisfaction guarantee Immediately available after payment Both online and in PDF No strings attached
logo-home
Summary Machine Learning - PYTHON PART (complete walkthrough) $7.06   Add to cart

Summary

Summary Machine Learning - PYTHON PART (complete walkthrough)

 252 views  15 purchases
  • Course
  • Institution

Complete guide through all notebooks. Each type of exercise clearly explained.

Preview 4 out of 46  pages

  • June 3, 2020
  • 46
  • 2019/2020
  • Summary
avatar-seller
Python for Machine Learning
Huge potential helper: https://ml-cheatsheet.readthedocs.io/en/latest/loss_functions.html


Notebook 1: Evaluation
MAE & MSE
We want to calculate the MAE and the MSE for the evaluation of the model.

MAE: to calculate the absolute error, we need to do two steps
1. Transform the predicted values and the actual values to arrays
2. Transform the values to absolute values &calculate the difference between those the
two
3. Take the mean of the absolute error
In code that looks something like this:
def MAE(pred, actual):
abs_error = abs(np.array(actual)- np.array(pred))
mae = sum(abs_error) / len(actual)
return mae

MSE: we do the exact same, except now we use the exponential of abs_error.
def MAE(pred, actual):
sq_error = np.exp(abs(np.array(actual)- np.array(pred)))
mae = sum(abs_error) / len(actual)
return mae



Binary classification
In this exercise we need to calculate the accuracy of a spam filter. This spam filter classifies
between spam or non-spam. To calculate the accuracy, we need to see how many times the
filter was (not) correct. The trick is to check when the prediction is equal to the actual value.
We can take two routes:
A) For loop:
The steps we need to undertake are:
1. Make a range of the length of the dataset
2. Iterate over each element in the dataset and check if ypred == yactual. Count += 1 if
True
3. Divide the count by the total amount of predictions.
def accuracy(y_true, y_pred):
count = 0
for i in range(0, len(y_true)):
if y_true[i] == y_pred[i]:
count += 1
return count/len(y_true)
B) NumPy
Steps:

, 1. We transform both the pred and the actual into arrays
2. We create an object that compares ypred and y_actual. The output is an array that
contains True, True, Ture, False, True, False, False etc..
3. Because booleans can be seen as 0 and 1, we can use the np.mean() to
get the average rate that ypred == yactual → this rate is equal to the
accuracy.

def accuracy_np(y_true, y_pred):
acc2 = np.mean(np.array(y_true) == np.array(y_pred))
return acc2

Building a confusion Matrix
If we want to calculate the Recall and Precision, we will need a confusion matrix. We start off
with making an empty matrix and we are going to fill this matrix with values. This works for
both binary and multi-classification matrices.
1. First, we check how many unique classes the list has. The function np.unique
collects all unique values in an array. The len() function turns this into an int.
2. We make an empty matrix using the np.zeros() function, with N x N as their shape
3. We use a for loop to iterate over two zipped lists: ypred & yactual.
4. We use the values in each iteration step to index the position in the matrix, and we
add 1 to that position.

N = len(np.unique(y_true)
def confusion_matrix(y_true, y_pred):
M = np.zeros((N, N))
for i, j in zip(y_true, y_pred):
M[i, j] += 1
return M

Def precision(M):
TP = M[1, 1]
FP = M[0, 1]
return TP/(TP+FP)



You can also se set() operations. Check notebook 1 exercise 7 for this.

,Notebook 2: Decision Trees
Decision trees have a recursive structure: If condition A holds, then move on to the following
check. The example below shows how recursive functions work in Python. Essentially, you
call the function within the function, however, the input of different than the first call. This is
an example of a recursive function calculating the factorial:

def factorial(n):
if n == 0:
print("This I know! (the base case)")
return 1
else:
print("I don't know the factorial for", n, "let's try", n-1)
return n * factorial(n-1)
factorial(5)

In the if-statement you define the base case. This is relevant because it will keep on calling
itself until it reaches the base case. Under the hood python stores the number of times it
called itself. When it reaches the base case, it can trace back what values it should use for
the ‘n-1’.

Example 2:

def rec_sum(a):
if len(a) == 1:
return a[0]
else:
return a[0] + rec_sum(a[1:])

rec_sum([1,2,3,4,5,6])

Example 3: We need to count the number of brackets in this nested lists:
nested = [[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[13]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]
To do so we must use a function to check whether its content is a list or not. It keeps on
doing this until its content is an integer:

def search(a, depth=0):
if isinstance(a, list):
return search(a[0], depth + 1)
else:
return depth
the a[0] returns the 0th element of the list. --> therefore remove on pair of brackets. If you do
this recursively, it will operate: a[0][0], a[0][0][0] and so on.. Until the content is not a list
anymore, because 13 is an int. Meanwhile, for each recursion the depth is increased by +1.

, Recursion in decision trees
Recursive functions are very useful when dealing with tree structures, which are recursive
structures themselves. We do not know how deep the tree is. All we can see is if the node we are
currently looking at has any children, and if it does we can try to visit those, and repeat this.
Decision trees are usually full binary trees which means that every node has either 0 or 2
children. If it has 0 then it is a leaf node.

We start off by creating a function with which we can call a node:

def Node(left=None, right=None, feature=None, value=None, predict=None):
"Return a node in a binary decision tree"
return dict(left=left, right=right, feature=feature, value=value, predict=predict)

def isLeaf(node):
"""Helper function to check if the current node is a leaf"""
return node['left'] is None and node['right'] is None

Now that we have specified the function for an empty decision tree, we can start giving it
content:

# We want to first ask about value Round in column at index 2.
root = Node(feature=2, value="Round",

# If false, in the left branch, which is a leaf node, we'll predict Banana
left=Node(predict="Banana"),

# If true, in the right branch we'll ask about the color Red
right=Node(feature=1, value="Red",

# Based on the answer to question about color Red,
# we'll predict either Lime
left=Node(predict="Lime"),

# or Apple
right=Node(predict="Apple")))

The benefits of buying summaries with Stuvia:

Guaranteed quality through customer reviews

Guaranteed quality through customer reviews

Stuvia customers have reviewed more than 700,000 summaries. This how you know that you are buying the best documents.

Quick and easy check-out

Quick and easy check-out

You can quickly pay through credit card or Stuvia-credit for the summaries. There is no membership needed.

Focus on what matters

Focus on what matters

Your fellow students write the study notes themselves, which is why the documents are always reliable and up-to-date. This ensures you quickly get to the core!

Frequently asked questions

What do I get when I buy this document?

You get a PDF, available immediately after your purchase. The purchased document is accessible anytime, anywhere and indefinitely through your profile.

Satisfaction guarantee: how does it work?

Our satisfaction guarantee ensures that you always find a study document that suits you well. You fill out a form, and our customer service team takes care of the rest.

Who am I buying these notes from?

Stuvia is a marketplace, so you are not buying this document from us, but from seller jeroenverboom. Stuvia facilitates payment to the seller.

Will I be stuck with a subscription?

No, you only buy these notes for $7.06. You're not tied to anything after your purchase.

Can Stuvia be trusted?

4.6 stars on Google & Trustpilot (+1000 reviews)

62555 documents were sold in the last 30 days

Founded in 2010, the go-to place to buy study notes for 14 years now

Start selling
$7.06  15x  sold
  • (0)
  Add to cart