INTRODUCTION
Communication pipeline: Agent →Input device →Data reduction →Source coding →Encryption →Channel coding →Modulation →Channel →∗the inverse operations at the receiver side∗
Binary Symmetric Channel: bitflips occur with probability p. Binary Asymmetric Channel: only 1’s can flip to 0’s with p. Binary Error-Erasure Channel: bitflips occur with p1 , and erasures happen with p2 .
Random errors: noise influences each of the transmitted symbols/bits independently in a memoryless channel. Burst errors: dependency in the affection of subsequent symbols in a channel with memory.
Gilbert model: state diagram that models burst errors with a good (low p) and a bad state (high p), both are BSCs. State transition probabilities and error probabilities are gotten from experiments.
Repetition code repeats the info bits (message) n times. Often not the most efficient.
Codes can be represented as spheres in the vector space, each with codewords as centers and radii t and containing vectors which differ at most in t positions to the codeword. The idea of error
corrections is choosing the largest t s.t. the spheres are still disjoint (else we can’t unambiguously decode). For error detection, the spheres can overlap, but a sphere center can’t be in another sphere.
Bit error probability Pe : probability of bitflip. Depends on modulation and coding methods, channel behaviour and the received signal-to-noise ratio (SNR).
Forward Error Correction (FEC) protocol corrects. Automatic Repeat reQuest (ARQ) protocol detects and rerequests along return channel (variable transmission time, low decode complexity, low Pe ).
MATHEMATICS
Group: set G with an operation ∗ such that Field: set F with operations + and × such that
(i) ∀a, b ∈ G : a ∗ b ∈ G (closedness) (i) F, + is an abelian group with identity element e = 0
(ii) ∀a, b, c ∈ G : a ∗ (b ∗ c) = (a ∗ b) ∗ c (associativity) (ii) F\{0}, × is an abelian group
(iii) ∀a ∈ G, ∃e ∈ G : a = a ∗ e = e ∗ a (identity/neutral element) (iii) ∀a, b, c ∈ F : a × (b + c) = a × b + a × c (distributivity)
−1
(iv) ∀a ∈ G, ∃a ∈ G: a ∗ a−1 = a−1 ∗ a = e (inverse element)
Abelian group has the above plus (v) ∀a, b ∈ G : a ∗ b = b ∗ a (commutativity)
m n
Finite field GF (q) is a field with a finite number (order ) q of elements, which is always a power of a prime: q = p . Elements a 6= 0 also have an order, which is the smallest n > 0 s.t. a = 1.
1 2 q−1
Primitive element/generator of GF (q) is any element a of order q − 1. A primitive element generates GF (q): GF (q) = {0, a , a , ..., a = 1}.
Prime finite fields GF (p) always have a primitive element, and every element has an inverse under addition and multiplication.
n
Polynomial over GF (q): fn x + ... + f1 x + f0 with fi ∈ GF (q). When adding/subtracting/multiplying/dividing polynomials, all is done in GF (q) (e.g., in binary, add and sub are the same).
u(x)
Division is similar to integers:
v(x)
→ u(x) = q(x)v(x) + r(x), where q(x) is quotient and r(x) remainder polynomial (degree(r(x)) < degree(v(x))). v(x) divides u(x) iff r(x) = 0.
To get q(x) and r(x), sort terms by power and iteratively add term i to q(x) s.t. u(x)1 = v(x)1 · q(x)i , and subtract v(x)q(x)i from u(x). Repeat until u(x) is irreducible and becomes r(x).
Irreducibility of a polynomial of degree n: not divisible by any polynomial of degree 0 < d < n. Polynomials can be decomposed into the product of irreducible polynomials. (Analogous to primes.)
m
If q = p is a prime, GF (p) = {0, ..., p − 1} with calculations modulo p. If q = p is not a prime, this approach does not yield a field, because the element p would not have a multiplicative inverse.
m
Instead, we let α generate GF (q = p ), where α is defined as a root of p(x) s.t. p(α) = 0, where p(x) is an irreducible polynomial (so normally it wouldn’t have roots) over GF (p) of degree m
q−1 √
and a divisor of x − 1 but not of xj−1 − 1 for any j < q (analogous to how i is defined as a root of x2 + 1; i = −1). There are tables that list irreducible polynomials for given p and m.
q−1
The q elements of GF (q) are 0, α, ..., α = 1, but all elements with power ≥ m are rewritten as b0 + b1 α + ...bm−1 αm−1 , where bi ∈ GF (p) (likely binary), using αm rewritten from p(α) = 0.
This is the power representation, used for multiplication, but we use the m-tuple (b0 , ..., bm−1 ) as the tuple representation, used for vector addition mod p.
m pe p pp e−1
Minimal polynomial of β ∈ GF (p ) is smallest degree polynomial φ(x) over GF (p) with β as root: φ(β) = 0. If e is smallest positive int s.t. β = β , φ(x) = (x−β)(x−β )(x−β )...(x−β p ).
Primitive polynomial is the minimal polynomial of a primitive element.
Vector space over field F is a vector set V with + operator, such that, if ∀a, b ∈ F and ∀u, v ∈ V :
(i) V, + is an abelian group
(ii) av ∈ V
(iii) a(u + v) = au + av & (a + b)v = av + bv
(iv) (ab)v = a(bv)
(v) 1v = v
Subspace is a subset S of V s.t. ∀a, b ∈ F and ∀u, v ∈ S : au + bv ∈ S , i.e. any linear combination of elements in S is an element in S . The all-0 should be in any subspace, due to self summation.
Subspace basis are linearly independent vectors v1 , ..., vk ∈ V that generate S (not unique), where k is the dimension of S and linear independence means a1 v1 + ... + ak vk = 0 ⇐⇒ ∀i : ai = 0.
⊥ ⊥
Dual space S of a k-dimensional subspace S is formed by the set of vectors which are orthogonal (u · v = 0) to all vectors in S . S has dimension n − k and may contain elements from S (e.g. all-0).
BLOCK CODES I – FUNDAMENTALS
Block code C ⊂ Vn = {(a1 , ..., an )}, where ai ∈ A = {α1 , ...αq } (often A = {0, 1}). Message u (all vectors are row-wise/horizontal in this course) is one-to-one mapped to codeword c ∈ C .
logq |C|
Code rate R = is a measure of efficiency (ratio of #bits needed to encode to #bits send), where |C| is code cardinality/size (#codewords/#messages) and n is the code length.
n
k logq |C| k.
If C is a k-dimensional subspace of Vn , then it is called a linear (n, k) block code, in which case |C| = q and hence R = = n
n
Hamming distance is the number of positions in which two vectors differ: d(a, b) = |{i : ai 6= bi }|. The weight of a vector is its Hamming distance to the all-0 vector: w(a) = |{i : ai 6= 0}|.
Hamming distance of a code d is the smallest Hamming distance between any two different codewords: d(C) = d = min{d(a, b) : a, b ∈ C, a 6= b}.
For linear codes, this is the same as the minimum weight of all codewords except the all-0 codeword: d(C) = d = min{w(a) : a ∈ C, a 6= 0}.
d−1
A code of distance d offers: • correction of t = b c • detection of d − 1 errors. If d is known, we can denote linear block codes as (n, k, d) codes.
2
Generator matrix G of C is k × n with basis vector bi ∈ C as row i. C is the row space of G; any c ∈ C is a linear combination of the basis vectors/rows: c = u1 b1 + ... + uk bk with ui ∈ A.
If u is information vector and c is codeword: c = uG. G has rank (dimension of C ) k.
For a systematic code, G has the format G = (Ik | P ), where Ik is the identity matrix (for the first k info bits in a code) and P (for ”parity”) is for the n − k check bits.
⊥
Dual code (n, n − k) C is the dual space of the linear (n, k) code C .
Parity-check matrix H = (−P
T
| In−k ) of C (rank n − k) is a n − k × n generator matrix of C ⊥ . c ∈ C iff cH T = 0 (orthogonality); so every row i of H defines a parity-check equation: cHi = 0.
m
Hamming codes have H consist of all 2 − 1 possible columns 6= 0T with n − k = m ≥ 2 being the height of H , which leads to (2m − 1, 2m − 1 − m, 3) codes.
Coset of a code C : C + a = {c + a | c ∈ C} for any a ∈ Vn . In each coset, the vector of minimum weight is the coset leader.
T T
Syndrome decoding decodes output y = x + e for input x and bitflip vector e. If syndrome s = yH (= eH ) 6= 0, it detects an error. For correction, find smallest weight e with syndrome s, using a:
n−k
Standard matrix, where each row maps to any of the q syndromes. The top row (s = 0) contains all codewords. Each row with syndrome s contains the coset of the codewords with a = 0 except
at the index of the column in H that equals s. If a row’s s does not appear as a column in H , take as a multiple indices of columns that sum to the s. For decoding, take s’s row’s coset leader as ê.
k
For Hamming codes, if s 6= 0, obtain x̂ by flipping y ’s j -th bit, where j is the column index in H of s. This is much more efficient than bruteforce searching all 2 codewords for a closest match.
Three tactics for changing a code’s parameters (e.g. for adapting to a particular application):
• Shortening: (n, k, d) → (n − a, k − a, d) for any integer 1 ≤ a < k, achieved when using the subset of codewords of which the information symbols are prefixed by a 0’s.
• Puncturing: (n, k, d) → (n − a, k, d − a) for any integer 1 ≤ a < d, achieved by removing a parity bits.
• Extension: (n, k, d) → (n + 1, k, d + 1) for any odd d, achieved by adding one bit to each row in G such that the number of 1’s in each row is even.
Interleaving rearranges the symbols deterministically to change burst errors into random errors, at the cost of buffering and delay.
d−1
Block interleaving buffers the next s codewords, where s is the interleaving depth, and transmits their bits in turn. Any burst of length up to s · b c = st can be corrected.
2
BLOCK CODES II – CYCLIC CODES
Polynomial code representation, where ui , gi , ci ∈ GF (q):
k−1
• Information polynomial: u(x) = u0 + u1 x + ... + uk−1 x , where the ui contain the information, i.e. the k message symbols.
n−k
• Generator polynomial: g(x) = g0 + g1 x + ... + gn−k x , where gn−k = 1.
n−1
• Code polynomial: c(x) = c0 + c1 x + ... + cn−1 x = u(x)g(x) (similar to multiplying information vector with generator matrix). However, encoding with c(x) = u(x)g(x) is not systematic.
n−k u(x)·xn−k
Systematic encoding, with k info bits at the end: c(x) = u(x) · x − r(x) (= q(x)g(x)), where r(x) is the remainder of the division g(x)
→ u(x) · xn−k = q(x)g(x) + r(x).
n−1
Cyclic codes are linear codes with the additional property that a cyclic shift of any codeword gives again a codeword: c0 + c1 x + ... + cn−1 x ∈ C =⇒ cn−1 + c0 x + ... + cn−2 xn−1 ∈ C .
n
Easy way to check cyclicity: code C with length n and g(x) is cyclic iff g(x) is a divisor of x − 1.
n
For any n, we can construct a slew of (binary) cyclic codes along the k-d trade-off spectrum by factorizing 1 + x into irreducible polynomials and taking any combination of these as g(x).
For any of these codes, we can obtain k = n − degree(g(x)), but d is more difficult to obtain and requires enumerating the codewords.
To get G from g(x), put its coefficients g0 , ..., gn−k in the first row, and right-shift the remaining k − 1 rows. Add/subtract rows from each other to get a systematic matrix (containing Ik ).
xn −1 ⊥
Parity-check polynomial: h(x) = h0 + h1 x + ... + hk x =
k
g(x)
. To get H , insert the k + 1 coefficients of h(x) in reverse (hk , ..., h0 ) for the first row, and cyclic shift the rest. g(x) = xk h(x−1 ).
y(x) ê(x) ê(x)
Syndrome decoding: syndrome s(x) is remainder of (= ). ĉ(x) = y(x) − ê(x), where ê(x) is error polynomial with s(x) (remainder of ) and minimal weight.
g(x) g(x) g(x)
Both systematic encoding and syndrome decoding involve dividing polynomials by g(x), which is implemented using shift register circuits. Cyclicity allows an efficient decoder implementation.
m−1
BCH codes are cyclic codes with two parameters m ≥ 2 (determines length) and 0 < t < 2 (minimum number of correctable errors). g(x) is the least common multiple of the minimal polynomials
2 2t m
of α, α , ..., α in GF (2 ). So for these powers of α, list the minimal polynomials and take g(x) as the product of the unique ones (a minimal polynomial can be shared by multiple powers of α).
m m
n = 2 − 1, k = n − degree(g(x)) ≥ 2 − 1 − mt, d ≥ 2t − 1 and d ≤ the weight of the generator polynomial (which is a code polynomial itself). If t = 1, it’s a Hamming code.
m
Reed-Solomon (RS) codes are cyclic (2 − 1, 2m − 1 − 2t, 2t + 1) (in symbols ∈ GF (2m )) codes (same m and t) that work with symbols ∈ GF (2m ) instead of bits (often m = 8 for bytes).
g(x) = (x + α)(x + α2 )...(x + α2t ) over GF (2m ), so we can use α’s as coefficients now. To get k × n G, put g(x) in the first row (with 0s), and shift the other rows. #codewords = (2m )k .
To encode, map binary codeword to symbol(s) (so tuple representation to power representation), use G to get symbol codeword (remember to use tuple representation for summing the products),
map that back to binary codeword, and transmit that. The information/symbol representation is for using shift registers, which is less complex. G also becomes m times as large in binary form.
RS always codes correct ≤ t bits (or symbols) of random errors, but burst errors of ≤ m(t − 1) + 1 bits (every symbol is m bits). Burst errors of ≤ mt are corrected iff the burst starts at a symbol.
BLOCK CODES III – CO-OPERATING CODES
Product codes are co-operating codes over the same alphabet: C1 ⊗ C2 is (n1 n2 , k1 k2 , d1 d2 ). Assume C1 and C2 are systematic in the first positions. Codewords are rectangular arrays, where the
top-left k1 × k2 quadrant contains the info bits, the top-right n1 − k1 × k2 quadrant contains the checkbits for the info rows according to ”row-code” C1 , the bottom-left quadrant contains the
checkbits for the info columns according to ”column-code” C2 , and the bottomright quadrant contains the checks-on-checks: either the checkbits of C2 on the checkbits of C1 or vice versa (same
resulting codeword). For decoding, decode the rows with C1 and then the columns with C2 . Two benefits 1) low complexity due to decoding a constituent C1 or C2 codeword at a time 2) deals
with both random and burst errors, since the random errors will mostly be corrected by the row decoder, while the uncorrectable burst errors in the rows may be dealt with by the column decoder.
The maximum number of errors that can be corrected is t1 (n2 − t2 ) + t2 n1 , which is the case if t2 rows have all errors and n2 − t2 rows have t1 errors.
d d −1
The minimum number of errors that may be incorrectly decoded is (t1 + 1)(t2 + 1). This may be lower than the theoretical t = b 1 2 c, which is achieved when decoding with the entire array
2
at once, but that is much slower and doesn’t handle both random and burst errors as well.
k k
Concatenated codes are co-operating codes C1 : (n, k, d) over GF (q) and C2 : (N, K, D) over GF (q ), where the channel of outer code C2 is C1 . Brackets denote the inner channel per q -ary symbol:
kK q -ary bits −map→ K q k -ary symbols a −C2Enc → N q k -ary symbols c −(→ k q -ary bits u −C1Enc → n q -ary bits x −channel→ y −C1Dec → û −)→ d −C2Dec → â −map→ kK q -ary bits
Similar to product codes, we have code parameters (N n, Kk, Dd) and the benefits of low complexity and the fact that the inner code tackles random errors and the outer code tackles burst errors.
BLOCK CODES IV – SOFT-DECISION DECODING
2
The examples use Binary Phase Shift Keying (BPSK): bit 0 is represented as real BPSK value (χ) +1 and 1 as −1; over Additive White Gaussian Noise (AWGN) channels: N (µ = χ, σ = N0 /2) is added.
The result is a received real/soft vector ρ, of which the signs indicate the bit predictions and the magnitudes α = |ρ| are measures of the predictions’ reliabilities.
• Hard decision decoding (low complexity, moderate performance) demodulates bit 0 if ρ ≥ 0 (1 if ρ < 0) and decodes codeword at smallest Hamming distance between bit representations. Ignores α.
• Soft decisions decoding (higher complexity, better performance) decodes according to smallest Euclidean distance between ρ and the BPSK representations. This is maximum likelihood (ML)
estimation: picking codeword ĉ that maximizes P (c | ρ), which is better in case of Gaussian noise.
However, a hard decision decoder may be right where a soft decision decoder is wrong if there are many small correct values and few large incorrect values.
Chase algorithms are a hybrid; low complexity due to algebraic decoding approach and good performance due to probabilistic decoding approach. They do multiple trials, in which bits indicated by a
test pattern are flipped in the hard decision vector, which is then decoded, and the decoding/trial with lowest Euclidean distance of its BPSK representation to ρ is picked. In order of complexity:
n
• Chase 1 uses trials; all possible test patterns with weight b d c, which are a lot. Not used in practice.
bd/2c 2
bd/2c
• Chase 2 uses 2 trials; all possible test patterns with 0s in the n − b d c most reliable positions.
2
d+1
• Chase 3 uses trials; inverting {0, 2, 4, 6, ..., d − 1} least reliable bits. Linear complexity in d.
2
Eb
At a high signal-to-noise ratio (SNR) , the bit error probabilities Pe for all Chase decoders and an ML decoder converge (asymptotically Chase is as good as ML). Hard decision decoder loses 3 dB.
N0
TU Delft MSc theses found:
• there are test sets with #trials < d that perform better than Chase 3, and that asymptotically optimal performance can be achieved with d trials
2 4
• optimal test set design should take channel characteristics into account, e.g. best performing test sets differ between AWGN and Rayleigh fading channel
• asymptotically optimal performance can be achieved with d trials when using multiple test sets, and dynamically choosing among them depending on received reliability values