Summary Computational biology guide: codes and explanation
3 views 0 purchase
Course
Computational biology
Institution
Katholieke Universiteit Leuven (KU Leuven)
this document contains information discussed in the classes and exercises (datacamp). The codes are in here, as well as some explanation on how to use them.
Introduction to python
General Example: calculate BMI
• The symbol for exponentiation is ** # Import numpy
• Call an index from a list or array using [ ] import numpy as np
• When using [… : … ] to make a subset, the last
index is not included # Create array from height_in with metric units:
np_height_m
np_height_m = np.array(height_in) * 0.0254
List methods
List have methods = built in functions
# Create array from weight_lb with metric units:
• Index: gives you the index of a certain number np_weight_kg
#Print out the index of the element 20.0:
np_weight_lb = np.array(weight_lb)
print(areas.index(20.0))
np_weight_kg = np_weight_lb * 0.453592
Areas is the list name in this case
• Count: gives you the occurrence of an element # Calculate the BMI: bmi
#Print out how often 9.50 appears in areas
bmi = np_weight_kg / np_height_m**2
print(areas.count(9.50))
Areas is the list name in this case
# Print out bmi
• Append: adds an element to the list it is called on print (bmi)
areas.append(24.5)
• Remove: removes the first element of a list that SIDE EFFECTS
matches the input
• Reverse: reverses the order of the elements in the • Numpy arrays cannot contain elements with
different types. If you try to build such a list, some
list it is called on.
of the elements' types are changed to end up with
areas.reverse()
a homogeneous list = type coercion.
• The typical arithmetic operators, such as +, -, * and
Packages / have a different meaning for regular Python lists
When installed: import them using and numpy arrays.
import numpy
import numpy as np 2D ARRAYS
When using a function from that package, always You can have multi-dimensional arrays:
use the name of the package 2d_array= np.array([… , … , …] , [… , … , …])
numpy.array() • This is how rows and three columns: retrieved by
np.array() 2d_array.shape. The output is (2, 3))
• You can select one row by using [ ]. You can also
Math package select a specific index from that row by using a
• Pi function: pi (π) second set of [ ]: 2d_array [0] [2] or 2d_array [0, 2]
math.pi my_array[rows,colums]
• Radians function: convert degrees into radians
math.radians(degrees) OTHER FUNCTIONS
• Mean function: to get the average
NumPy • Median function: to get the middle value when
• Array function: same as a list but you can preform sorted small to big
calculations on arrays, not on lists. • Corrcoef function: to check correlation between
array([…]) height and weight.
bmi > 23 creates an array of booleans where it np.corrcoef(np_city[: , 0], np_city[: , 1])
will be “false” if the vmi is smaller than 23 and true if • Std: calculate standard deviation
its above 23 • Column.stack to make one array
bmi [bmi > 23] creates an array, only np.column.stack((height, weight))
containing the values that were above 23
Evolutiebiologie
 
, Laura van den End
Example soccer
• Convert heights and positions, which are regular
lists, to numpy arrays. Call them np_heights and
np_positions.
• Extract all the heights of the goalkeepers. You can
use a little trick here: use np_positions == 'GK' as
an index for np_heights. Assign the result to
gk_heights.
• Extract all the heights of all the other players. This
time use np_positions != 'GK' as an index for
np_heights. Assign the result to other_heights.
# Convert positions and heights to numpy arrays:
np_positions, np_heights
np_positions = np.array(positions)
np_heights = np.array(heights)
# Heights of the goalkeepers: gk_heights
gk_heights = np_heights [np_positions == 'GK']
# Heights of the other players: other_heights
other_heights = np_heights [np_positions !='GK']
# Print out the median height of goalkeepers.
print("Median height of goalkeepers: " +
str(np.median(gk_heights)))
# Print out the median height of other players.
print("Median height of other players: " +
str(np.median(other_heights)))
Evolutiebiologie
 
, Laura van den End
Intermediate Python
Data visualisation Dictionary of dictionaries
Use sub package ptplot from matplotlib → imported europe = { 'spain': { 'capital':'madrid', 'population':46.77 },
as plt. 'france': { 'capital':'paris', 'population':66.03 },
'germany': { 'capital':'berlin', 'population':80.62 },
'norway': { 'capital':'oslo', 'population':5.084 } }
• Line plot Create a dictionary, named data, with the keys
plt.plot (x, y)
'capital' and 'population'. Set them to 'rome' and
• Scatter plot
59.83, respectively.
plt.scatter (x, y)
data = {'capital':'rome', ‘population':59.83}
• Histogram Add data to europe under key 'italy'
plt.hist(data, bins = nr)
europe[‘italy']=data
• To show the plot:
plt.show() Pandas
Make tables → dataframe
• Change the x-axis in a logarithmic scale
plt.xscale(‘log’) Rows and columns have labels and there are
multiple types
• Clean the plot
plt.clf()
Manually
Customization dict = {
“country”:[‘Brazil”, “Russia”, “India”],
• Add labels
“capital”:[“Brasilia”, “Moscow”, “New Delhi”],
plt.xlabel(‘label’)
“area”:[8.516, 17.10, 3.286],
plt.ylabel(‘label’) “population”:[200.4, 143.5, 1252] }
• Add a title Import pandas as pd
plt.title(‘title’) brics = pd.DataFrame(dict)
• Change y-axis
plt.yticks([0,2,4,6,8,10], [“names of the ticks”]) Change the labels of the rows
• Add more data brics.index = [“BR”, “RU”, “IN”]
year = [1800, 1850, 1900] + year
pop = [1.0, 1.262, 1.650] + year
• Add text
Import from external le
plt.text(1550, 71, ‘India') Brics = pd.read_csv(“path/to/brics.csv”, index_col =
0)
• Add grid
plt.grid(True)
Select one column as a dataframe:
• Add color using dictionaries
brics[[“country”]]
Instead of 2 separate lists with countries and
populations: Select rows as a dataframe:
world = {‘Afghanistan’:30.55, ‘Albania’:2.77, brics[1:4]
‘Algeria’:29.21} brics.loc[[“RU”]]
Key:value → key opens the door to value: brics.loc[[“RU”, “IN”, “CH”]]
world[“albania”] gives 2.77
Add elements to dictionary (can also be used to Combined: (iloc can be used in combination with
update values) index numbers)
world[“sealand”] = 0.000027 brics.loc[[“RU”, “IN”, “CH”], [“country”,
Delete elements to dictionary “capital”]]
del(world[“sealand”]) brics.loc[: , [“country”, “capital”]]
Evolutiebiologie
 
fi
The benefits of buying summaries with Stuvia:
Guaranteed quality through customer reviews
Stuvia customers have reviewed more than 700,000 summaries. This how you know that you are buying the best documents.
Quick and easy check-out
You can quickly pay through credit card or Stuvia-credit for the summaries. There is no membership needed.
Focus on what matters
Your fellow students write the study notes themselves, which is why the documents are always reliable and up-to-date. This ensures you quickly get to the core!
Frequently asked questions
What do I get when I buy this document?
You get a PDF, available immediately after your purchase. The purchased document is accessible anytime, anywhere and indefinitely through your profile.
Satisfaction guarantee: how does it work?
Our satisfaction guarantee ensures that you always find a study document that suits you well. You fill out a form, and our customer service team takes care of the rest.
Who am I buying these notes from?
Stuvia is a marketplace, so you are not buying this document from us, but from seller lauravandenend. Stuvia facilitates payment to the seller.
Will I be stuck with a subscription?
No, you only buy these notes for $6.33. You're not tied to anything after your purchase.