Data structures
An introducti on to Algorithms and data structures
log ( n )=log 2 (n)
-
,Sorting
In the sorting problem there is:
- Input: A sequence of n numbers ⟨ a1 , … , an ⟩
- Output: A permutation of the input such that ⟨ ai 1 ≤… ≤ a¿ ⟩
The input is stored in arrays; a collection of objects stored in linear order
accessible by an integer index. A[1] is the first element of the array. A[1. . k ] is
the subarray of array A with elements 1 till k . The numbers are the keys.
Additional information is called satellite data.
set ( A , i, x ) → A [ i ] =x; this is one operation
get ( A , i ) → x=A [ i ]; this is one operation
Create array A → A=new Array [n]; n + 1 operations
Abstract Data Type (ADT) for the sorting problem that maintains a set S of n
elements, tracks the order of the elements and lets us exchange the order of the
elements. This is a set of data values and associated operations that are
precisely specified independent of any particular implementation.
A description of an algorithm should contain:
1. The algorithm; English/Pseudocode
2. The correctness; proof
3. The running time; derivation
InsertionSort: like sorting a hand of playing cards.
- Good to sort a small number of elements
- It is an incremental algorithm: process the input
elements one-by-one and maintain the solution for the
elements processed so far.
- It is also an inplace algorithm: the numbers are
rearranged within the array with only constant extra
space.
Use a loop invariant to understand why an algorithm gives the correct answers.
An example:
Loop invariant for InsertionSort: at the start of an iteration j of the “outer” for-
loop the subarray A[1 … j−1] consists of the elements originally in A[1 … j−1] but
in sorted order.
To prove correctness with a loop invariant, show three things:
1. Initialization: loop invariant is true prior to the first iteration of the loop.
2. Maintenance: if the loop invariant is true before an iteration of the loop, it
remains true before the next iteration.
3. Termination: when the loop terminates, the loop invariant gives us a
useful property that helps show that the algorithm
is correct.
MergeSort: A divide-and-conquer sorting algorithm
Divide-and-conquer: first, break the problem into two
or more subproblems. Then, solve the subproblems
recursively. And then combine these solutions to create a
solution to the original problem.
,Induction is a way of proving the correctness. It contains proof that the base
case is solved correctly and has proof that if all subproblems are solved correctly,
then the complete problem is solved correctly. It should contain: base case,
inductive hypothesis, inductive step, conclusion.
Analysis of algorithms
To analyse the running time as a function of n, you should look at the:
Best case
Average case
Worst cage
An algorithm has worst case running time T (n) if for any input of size n the
maximal number of elementary operations executed is T (n). Elementary
operations are: add, subtract, multiply, divide, load, store, copy, branch, return,
etc…
The rate of growth (order of growth) of the running time is far more important
than constants: Θ(n).
- InsertionSort: Θ(n 2), small values before n^2
- MergeSort: Θ ( n log ( n ) ), large values before n log n
The Θ -notation: concentrate on the leading term and ignore constants. For
example, 19 n3 +17 n2−3 n becomes Θ ( n3 ) and 2 n log ( n ) +5 n1.1−5 becomes Θ ( n1.1 ), as
polynomials rule. If the functions does not grow (it goes down), there is no
growth.
Logarithmic properties for a , b , c >0 :
- log c ( ab )=log c ( a ) + log c ( b )
log c ( a )=b∗log c ( a )
b
-
- log a (b)=log c ( b ) / log c ( a )
Properties for growth:
Logarithmic functions grow slower than polynomial functions
Polynomial functions grow slower than exponential functions
Linear Search:
- Input: increasing sequence of n numbers; A=⟨ a1 ,… ,a n ⟩ and
value v
- Output: an index i such that A [ i ] =v or NIL if v is not in A
- Running time:
o Best case: 1
o Average case: n /2 (if successful)
o Worst case: n
Binary search:
- Input: increasing sequence of n numbers;
A=⟨ a1 ,… ,a n ⟩ and value v
, - Output: an index i such that A [ i ] =v or NIL if v is not in A
- Running time:
o Best case: 1
o Average case: log (n)
o Worst case: log (n)
It is constantly halving the sequence. This is only possible for
log (n) times.
The Θ -notation: Θ ( g ( n ) ) is the set of functions that grow as fast as
g ( n ).
It is an asymptotically tight bound. Let g ( n ) : N → N be a function. Then
we have Θ ( g ( n ) )={f ( n ) : there exist positive constants c 1 , c2 , and n0 such
that c 1 g ( n ) ≤ f ( n ) ≤ c 2 g(n) for all n ≥ n0 }.
Notation: f ( n )=Θ( g ( n ) )
Example:
- Claim: 19 n3 +17 n2−3 n=Θ(n3 )
- Proof: Choose c 1=19 , c 2=19+17=36 and n0 =1.
Then we have for all n ≥ n0 :
- Claim: 3
19 n +17 n −3 n ≠ Θ(n )
2 2
- Proof by contradiction: Assume there are positive constants c 1 , c2 , n0 such
that for all n ≥ n0
The O -notation: O ( g ( n ) ) is the set of functions that grow at most as fast as g ( n ) .
It is an asymptotically upper bound of Θ . Let g ( n ) : N → N be a function. Then we
have O ( g ( n ) )={f ( n ) : there exist positive constants c , n0 such that f ( n ) ≤ cg (n) for all
n ≥ n0 }.
There is an other notation: o ( n ) → grows strictly slower than…
The Ω -notation: Ω ( g ( n ) ) is the set of functions that grow at least as fast as g ( n ) .
It is an asymptotically lower bound of Θ . Let g ( n ) : N → N be a function. Then we
have Ω ( g ( n ) ) ={f ( n ) : there exist positive constants c , n0 such that cg (n)≤ f ( n ) for all
n ≥ n0 }.
There is an other notation: ω ( n ) → grows strictly faster than…