"Your Python code may run correctly, but what if you need it to run faster? This practical book shows you how to locate performance bottlenecks and significantly speed up your code in high-data-volume programs. By explaining the fundamental theory behind design choices, this expanded edition of Hig...
Chapter 2: Profiling to Find Bottlenecks (available)
Chapter 3: Lists and Tuples (available)
Chapter 4: Dictionaries and Sets (available)
Chapter 5: Iterators and Generators (available)
Chapter 6: Matrix and Vector Computation (unavailable)
Chapter 7: Compiling to C (unavailable)
Chapter 8: Asynchronous I/O (unavailable)
Chapter 9: The multiprocessing Module (unavailable)
Chapter 10: Clusters and Job Queues (unavailable)
Chapter 11: Using Less RAM (unavailable)
Chapter 12: Lessons from the Field (unavailable)
Chapter 1. Understanding Performant
Python
A NOTE FOR EARLY RELEASE
READERS
With Early Release ebooks, you get books in their earliest form—the author’s raw and unedited
content as they write—so you can take advantage of these technologies long before the official
release of these titles.
This will be the 1st chapter of the final book. Please note that the GitHub repo will be made active
later on.
,If you have comments about how we might improve the content and/or examples in this book, or
if you notice missing material within this chapter, please reach out to the editor
at shunter@oreilly.com.
QUESTIONS YOU’LL BE ABLE TO
ANSWER AFTER THIS CHAPTER
What are the elements of a computer’s architecture?
What are some common alternate computer architectures?
How does Python abstract the underlying computer architecture?
What are some of the hurdles to making performant Python code?
What strategies can help you become a highly performant programmer?
Programming computers can be thought of as moving bits of data and transforming them in special
ways to achieve a particular result. However, these actions have a time cost. Consequently, high
performance programming can be thought of as the act of minimizing these operations either by
reducing the overhead (i.e., writing more efficient code) or by changing the way that we do these
operations to make each one more meaningful (i.e., finding a more suitable algorithm).
Let’s focus on reducing the overhead in code in order to gain more insight into the actual hardware
on which we are moving these bits. This may seem like a futile exercise, since Python works quite
hard to abstract away direct interactions with the hardware. However, by understanding both the
best way that bits can be moved in the real hardware and the ways that Python’s abstractions force
your bits to move, you can make progress toward writing high performance programs in Python.
The Fundamental Computer System
The underlying components that make up a computer can be simplified into three basic parts: the
computing units, the memory units, and the connections between them. In addition, each of these
units has different properties that we can use to understand them. The computational unit has the
property of how many computations it can do per second, the memory unit has the properties of
how much data it can hold and how fast we can read from and write to it, and finally, the
connections have the property of how fast they can move data from one place to another.
Using these building blocks, we can talk about a standard workstation at multiple levels of
sophistication. For example, the standard workstation can be thought of as having a central
processing unit (CPU) as the computational unit, connected to both the random access memory
(RAM) and the hard drive as two separate memory units (each having different capacities and
read/write speeds), and finally a bus that provides the connections between all of these parts.
However, we can also go into more detail and see that the CPU itself has several memory units in
it: the L1, L2, and sometimes even the L3 and L4 cache, which have small capacities but very fast
speeds (from several kilobytes to a dozen megabytes). Furthermore, new computer architectures
generally come with new configurations (for example, Intel’s SkyLake CPUs replaced the
, frontside bus with the Intel Ultra Path Interconnect and restructured many connections). Finally,
in both of these approximations of a workstation we have neglected the network connection, which
is effectively a very slow connection to potentially many other computing and memory units!
To help untangle these various intricacies, let’s go over a brief description of these fundamental
blocks.
Computing Units
The computing unit of a computer is the centerpiece of its usefulness—it provides the ability to
transform any bits it receives into other bits or to change the state of the current process. CPUs are
the most commonly used computing unit; however, graphics processing units (GPUs) are gaining
popularity as auxiliary computing units. They were originally used to speed up computer graphics
but are becoming more applicable for numerical applications and are useful thanks to their
intrinsically parallel nature, which allows many calculations to happen simultaneously. Regardless
of its type, a computing unit takes in a series of bits (for example, bits representing numbers) and
outputs another set of bits (for example, bits representing the sum of those numbers). In addition
to the basic arithmetic operations on integers and real numbers and bitwise operations on binary
numbers, some computing units also provide very specialized operations, such as the “fused
multiply add” operation, which takes in three numbers, A, B, and C, and returns the value A * B +
C.
The main properties of interest in a computing unit are the number of operations it can do in one
cycle and the number of cycles it can do in one second. The first value is measured by
its instructions per cycle (IPC),1 while the latter value is measured by its clock speed. These two
measures are always competing with each other when new computing units are being made. For
example, the Intel Core series has a very high IPC but a lower clock speed, while the Pentium 4
chip has the reverse. GPUs, on the other hand, have a very high IPC and clock speed, but they
suffer from other problems like the slow communications that we discuss in “Communications
Layers”.
Furthermore, although increasing clock speed almost immediately speeds up all programs running
on that computational unit (because they are able to do more calculations per second), having a
higher IPC can also drastically affect computing by changing the level of vectorization that is
possible. Vectorization occurs when a CPU is provided with multiple pieces of data at a time and
is able to operate on all of them at once. This sort of CPU instruction is known as single instruction,
multiple data (SIMD).
In general, computing units have advanced quite slowly over the past decade (see Figure 1-1).
Clock speeds and IPC have both been stagnant because of the physical limitations of making
transistors smaller and smaller. As a result, chip manufacturers have been relying on other methods
to gain more speed, including simultaneous multithreading (where multiple threads can run at
once), more clever out-of-order execution, and multicore architectures.
Hyperthreading presents a virtual second CPU to the host operating system (OS), and clever
hardware logic tries to interleave two threads of instructions into the execution units on a single
The benefits of buying summaries with Stuvia:
Guaranteed quality through customer reviews
Stuvia customers have reviewed more than 700,000 summaries. This how you know that you are buying the best documents.
Quick and easy check-out
You can quickly pay through credit card or Stuvia-credit for the summaries. There is no membership needed.
Focus on what matters
Your fellow students write the study notes themselves, which is why the documents are always reliable and up-to-date. This ensures you quickly get to the core!
Frequently asked questions
What do I get when I buy this document?
You get a PDF, available immediately after your purchase. The purchased document is accessible anytime, anywhere and indefinitely through your profile.
Satisfaction guarantee: how does it work?
Our satisfaction guarantee ensures that you always find a study document that suits you well. You fill out a form, and our customer service team takes care of the rest.
Who am I buying these notes from?
Stuvia is a marketplace, so you are not buying this document from us, but from seller RobertCuong. Stuvia facilitates payment to the seller.
Will I be stuck with a subscription?
No, you only buy these notes for $4.99. You're not tied to anything after your purchase.