University of California, BerkeleyDS 100sp18_hw2_solution.ipynb at master DS-100_sp18 GitHub
124 views 0 purchase
Course
DS 100
Institution
University Of California, Berkeley
Homework 2: Food Safety
Course Policies
Here are some important course policies. These are also located at
Collaboration Policy
Data science is a collaborative activity. While you may talk with others about the homework, we ask that you write your solutions individually. If
you do discuss the ...
homework 2 food safety course policies here are some important course policies these are also located at httpwwwds100orgsp18 httpwwwds100orgsp18 collaboration policy data science is
Written for
University Of California, Berkeley
DS 100
All documents for this subject (1)
Seller
Follow
Examhack
Reviews received
Content preview
4/18/2018 sp18/hw2_solution.ipynb at master · DS-100/sp18 · GitHub
,4/18/2018 sp18/hw2_solution.ipynb at master · DS-100/sp18 · GitHub
Homework 2: Food Safety
Course Policies
Here are some important course policies. These are also located at http://www.ds100.org/sp18/ (http://www.ds100.org/sp18/).
Collaboration Policy
Data science is a collaborative activity. While you may talk with others about the homework, we ask that you write your solutions individually. If
you do discuss the assignments with others please include their names at the top of your solution.
Due Date
This assignment is due at 11:59pm Tuesday, February 6th. Instructions for submission are on the website.
Homework 2: Food Safety
Cleaning and Exploring Data with Pandas
<img src="scoreCard.jpg" width=400>
In this homework, you will investigate restaurant food safety scores for restaurants in San Francisco. Above is a sample score card for a restaurant.
The scores and violation information have been made available by the San Francisco Department of Public Health, and we have made these data
available to you via the DS 100 repository. The main goal for this assignment is to understand how restaurants are scored. We will walk through the
various steps of exploratory data analysis to do this. To give you a sense of how we think about each discovery we make and what next steps it
leads to we will provide comments and insights along the way.
As we clean and explore these data, you will gain practice with:
Reading simple csv files
Working with data at different levels of granularity
Identifying the type of data collected, missing values, anomalies, etc.
Exploring characteristics and distributions of individual variables
Question 0
To start the assignment, run the cell below to set up some imports and the automatic tests that we will need for this assignment:
In many of these assignments (and your future adventures as a data scientist) you will use os, zipfile, pandas, numpy, matplotlib.pyplot, and
seaborn.
1. Import each of these libraries as their commonly used abbreviations (e.g., pd, np, plt, and sns).
2. Don't forget to use the jupyter notebook "magic" to enable inline matploblib plots
(http://ipython.readthedocs.io/en/stable/interactive/magics.html#magic-matplotlib).
3. Add the line sns.set() to make your plots look nicer.
In [1]: import os
import zipfile
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline
sns.set()
In [2]: import sys
assert 'zipfile'in sys.modules
assert 'pandas'in sys.modules and pd
assert 'numpy'in sys.modules and np
assert 'matplotlib'in sys.modules and plt
assert 'seaborn'in sys.modules and sns
Downloading the data
As you saw in lectures, we can download data from the internet with Python.
Using the utils.py file from the lectures (see link (http://www.ds100.org/sp18/assets/lectures/lec05/utils.py)), define a helper function
fetch_and_cache to download the data with the following arguments:
data_url: the web address to download
file: the file in which to save the results
data_dir: (default="data") the location to save the data
f if t th fil i l d l d d
https://github.com/DS-100/sp18/blob/master/hw/hw2/solution/hw2_solution.ipynb 2/19
, 4/18/2018 sp18/hw2_solution.ipynb at master · DS-100/sp18 · GitHub
force: if true the file is always re-downloaded
This function should return pathlib.Path object representing the file.
In [3]: import requests
from pathlib import Path
def fetch_and_cache(data_url, file, data_dir="data", force=False):
"""
Download and cache a url and return the file object.
data_url: the web address to download
file: the file in which to save the results.
data_dir: (default="data") the location to save the data
force: if true the file is always re-downloaded
return: The pathlib.Path object representing the file.
"""
### BEGIN SOLUTION
data_dir = Path(data_dir)
data_dir.mkdir(exist_ok = True)
file_path = data_dir / Path(file)
# If the file already exists and we want to force a download then
# delete the file first so that the creation date is correct.
if force and file_path.exists():
file_path.unlink()
if force or not file_path.exists():
print('Downloading...', end=' ')
resp = requests.get(data_url)
with file_path.open('wb') as f:
f.write(resp.content)
print('Done!')
else:
import time
last_modified_time = time.ctime(file_path.stat().st_mtime)
print("Using cached version last modified (UTC):", last_modified_time)
return file_path
### END SOLUTION
Now use the previously defined function to download the data from the following URL: http://www.ds100.org/sp18/assets/datasets/hw2-
SFBusinesses.zip (http://www.ds100.org/sp18/assets/datasets/hw2-SFBusinesses.zip)
Stuvia customers have reviewed more than 700,000 summaries. This how you know that you are buying the best documents.
Quick and easy check-out
You can quickly pay through credit card or Stuvia-credit for the summaries. There is no membership needed.
Focus on what matters
Your fellow students write the study notes themselves, which is why the documents are always reliable and up-to-date. This ensures you quickly get to the core!
Frequently asked questions
What do I get when I buy this document?
You get a PDF, available immediately after your purchase. The purchased document is accessible anytime, anywhere and indefinitely through your profile.
Satisfaction guarantee: how does it work?
Our satisfaction guarantee ensures that you always find a study document that suits you well. You fill out a form, and our customer service team takes care of the rest.
Who am I buying these notes from?
Stuvia is a marketplace, so you are not buying this document from us, but from seller Examhack. Stuvia facilitates payment to the seller.
Will I be stuck with a subscription?
No, you only buy these notes for $13.99. You're not tied to anything after your purchase.