4.2 SORTING AND SEARCHING
Binary Search:
Example of a binary search: questions.py
Algorithm: start with lo = 0 and hi = n (notation -> [lo,hi)) and use the following recursive
strategy:
- Base case: If hi-lo equals 1, then the secret number is lo.
- Recursive step: Otherwise, ask whether the secret number is greater than or equal to
the number mid = (hi + lo) / 2. If so, look for the number in [mid , hi); if not, look for
the number in [lo , mid).
Correctness proof:
Check that the strategy is correct by establishing the following facts:
- The interval always contains the secret number. (enforced by code)
- The interval lengths are the powers of 2, decreasing from 2ᵏ.
Analysis of running time:
We have n = 2ᵏ, where k = logn. Now, let T(n) be the number of questions. We get T(n) =
logn.
Therefore, the running time of a program that uses binary search is logarithmic.
*Note: Binary search and questions.search() work even when n is not a power of 2
Linear-logarithmic chasm:
The alternative to using binary search is to guess 0, then 1, then 2, then 3, and so forth, until
we hit the secret number -> brute-force algorithm. the running time of the brute-force
algorithm is sensitive to the secret number; can take as many as n questions in the worst
case. Whereas binary search guarantees to take no more than logn questions (huge cost
difference).
Binary representation:
Thinking in terms of the binary representation is another way to understand the linear–
logarithmic chasm: when we have a program whose running time is linear in a parameter n,
its running time is proportional to the value of n, whereas a logarithmic running time is just
proportional to the number of digits in n
Inverting a function:
Given an increasing function f and a value y, and an open interval (lo, hi), find a value x
within the interval such that f(x)=y using the same approach as the questions.py problem.
Algorithm: start with an interval (lo , hi) known to contain x and use the following recursive
procedure:
- Compute mid = (hi + lo) / 2.
- Base case: If hi - lo is less than δ, then return mid as an estimate of x.
- Recursive step: Otherwise, test whether f(mid) > y. If so, look for x in (lo , mid); if not,
look for x in (mid , hi).
This approach in often called bisection search with running time also equal to logn.
, Weighing an object:
With binary search, you can determine the weight of an object with weights that are powers
of 2 (and just one weight of each type). Algorithm: Put the object on the right side of the
balance and try the weights in decreasing order on the left side. If a weight causes the
balance to tilt to the left, remove it; otherwise, leave it.
Binary search in a sorted array:
A brute-force solution known as sequential search is to start at the beginning, examine each
key one at a time, and continue until you find the word. Applying binary search: Whether
you look exactly in the middle is immaterial, as long as you eliminate at least a constant
fraction of the keys each time that you look, your search will be logarithmic.
Exception filter:
Reads in a sorted array of strings from a file (whitelist) and an arbitrary sequence of strings
from standard input, and writes those in the sequence that are not in the whitelist -> can
require a fast algorithm like binary search.
Insertion sort:
- Type of brute-force algorithm
- contains an implementation of a sort() function
- the outer for loop sorts the first i elements in the array. The inner while loop
completes the sort by putting a[i] into its proper position in the array
Analysis of running time:
The sort() function contains a while loop nested inside a for loop -> running time is
quadratic, unless the array is already sorted -> running time is linear
Sensitivity to input:
the running time of insertion sort is sensitive to its input values. When performance is
sensitive to input values, you might not be able to make accurate predictions without taking
the input values into account, therefore running time of an insertion sort could be linear,
based on the application’s input.