Lecture 1. Introduction
Lectures on Tuesday & Friday. Problem sessions on Wednesday.
The idea is that you want to send some information over a “noisy” channel. “Noisy” here
means that it will create errors: what comes out of the channel does not have to be exactly
what came in (e.g., telephone lines or writing to hard disk).
From an information source, data is put into an encoder (channel encoding), which adds
extra information to help correct errors later on.
This encoded information is passed over the channel, which might affect the data due to the
noise on the channel. When finally passed to the decoder, you want to correct the errors
introduced by the channel’s noise.
The decoding process consists of two processes:
1. Detect and correct errors (if possible) (also called decoding, as the actual channel
decoding is fairly trivial).
2. Channel decoding: undo channel encoding.
Some considerations for good codes are:
- Easy channel encoding for transmission
- Easy channel decoding
- Easy decoding error detection + correction.
- Good error correcting/detecting capabilities.
- Good information rate (e.g., there could be a code where the channel encoder
returns 000 for input 0. This is called “triple repetition” and has a bad rate of transfer
of information).
If p = chance of correct transmission of each symbol, then the chance of correct
transmission of pure information (0 or 1) is p.
Using triple repetition encoding, a message is decoded as 0 if it contains more 0’s than 1’s.
Hence, the chance of a correct submission is p3 (all of the 0’s remains correct) +3 p2 (1− p)
(any of the 3 cases occur where one 0 flips and 2 remain correct (001 , 010 ,100 )).
For a channel where p=0,9, triple encoding increases the chance of correct transmission
over the channel from 0,9 to 0,972.
Assumptions:
- We send information as a sequence of 0’s and 1’s, which are called digits.
- A word is a sequence of digits. The length of a word is the number of digits in it.
- We send words across a binary channel (it only knows 0’s and 1’s and is unable to
produce 2, 3, 4, …), one after the other.
- A block code is a set of C words (length denoted as ¿ C∨¿) where every word has
the same length (e.g., triple repetition has block length 3).
,Channel assumptions:
- The probability of an error is the same for any digit (a 0 flipping to a 1 is equally likely
as a 1 flipping to a 0). If this is the case, the channel is symmetric.
- The probability of an error in one digit is independent from errors in any of the other
digits (an error in the first digit does not make it more likely for there to be an error
in the second digit).
BSC = Binary Symmetric Channel.
Suppose a codeword v is sent across a BSC and a word w is received.
If w is in the code (e.g., it’s 000 or 111 when using triple repetition), we accept it. If w ≠ v ,
undetected errors can occur.
If w is not in the code, we have to try to detect and correct the error.
1
The (information) rate of code C of length n is log 2 ¿ C∨¿ ¿ (if |C|=2 k, then we can write C
n
in k digits but use n ). This number expresses what part of the redundant message is actually
k
meaningful. In the |C|=2 k example, the information rate is .
n
Say you have a code C 2 that contains all the length-2 words (00 , 01 ,10 , 11), with a parity bit
2
added: [000 , 011, 101 ,110] , the rate is . C 2 can detect errors in one digit only. Correction is
3
hopeless, because flipping any bit will give a word that’s in the code so you don’t know
which one was sent originally.
The chance of i errors occurring in a word of length n is given by the formula below:
n i
∑ (ni ) p n−i ( 1− p )
i=1
,Lecture 2.
What is the chance ϕ p ( v , w) that, if a codeword v is sent, the word w is received?
We know p = the reliability of the BSC, n = the length of the word, d = the number of digits
n−d d
that differ. Then ϕ p ( v , w )= p ( 1− p ) .
Let’s decode a received word w to a code word v such that ϕ p ( v , w) is maximal. That is,
ϕ p ( v , w )=max {ϕ p ( u , w ) : u ∈C }
1
Suppose we have a BSC with reliability < p<1. Suppose v1 , v 2 are codewords of length n , w
2
was received. If v1 and w differ in d 1 digits and v 2 and w differ in d 2 digits, then
ϕ p ( v1 , w ) ≤ ϕ p ( v 2 , w ) ⇔ d 1 ≥ d 2.
Algebra
K={0,1 } with 1+1=0 .
For n ≥ 1, K n denotes the set of all n -length words using the elements in K (a vector space
over K ). Page 10-11 of the book shown the operations that can be done in this space.
Weights/distance
For v ∈ K n, the (Hamming) weight wt (v ) is the number of non-zero positions in v.
For v , w ∈ K n the (Hamming) distance d ( v , w) is the number of positions where v and w
differ. Note that d ( v , w )=wt ( v−w )=¿ (due to our K -space) w (v + w).
d ≥ 0 , d ( v , w )=0 ⇔ v=w
d ( v , w )=d (w , v)
d ( v , w ) ≤ d ( v ,u )+ d (u , w)
d ( v, w ) wt ( v +w )
ϕ p ( v , w )= pn−d ( v ,w ) ( 1− p ) = p n−wt (v+ w ) (1− p )
Maximum likelihood decoding (MLD)
A received word w ∈ K n is decoded to v ∈C as follows:
1. Complete maximum likelihood decoding (CMLD): decode w to v ∈C with d ( v , w) is
minimal. If several v are tied, just make a choice.
2. Incomplete maximum likelihood decoding (IMLD): decode w to v ∈C if it is the
unique element of C where d (v , w) is minimal, and maybe even want to limit the
number of changes we allow between v and w . If we can’t decode, we don’t decode
(and might ask for retransmission or interpolate if it concerns a large volume
sequence). Incompleteness here means that not every w maps to a v ∈C .
MLD is really nearest-neighbor decoding, because you compare w to all v ∈C , and pick the
nearest neighbor.
, For a given code C ∈ K n and v ∈C , what is the chance of decoding the received word to v ?
Say we decode using IMLD, where we only decode to the unique nearest neighbor v' ∈C (if
it exists)
We get v back only if
w ∈ L ( v )={ u ∈ K | d ( u , v ) < d ( u , v ) for all v ≠ v ∈C }
n ' '
In words, that is if w is in the set L(v) of v ∈ K n where v is the unique nearest neighbor
that’s in the code C .
θ p denotes the probability that if v is sent over a BSC of probability p then IMLD correctly
concludes that v was sent
θ p ( C , v )= ∑ ϕ p (v , w)
w ∈L(v)
Suppose n=3 ,C=[000,111], then
Error detection and correction
Let d=d ( C )=min d ( v , v' ) with v , v ' ∈C ∧ v ≠ v ' . For a code C containing at least two words
the distance of the code C is the smallest of the numbers d ( v , w) as v and w range over all
pairs of different codewords in C .
C detects an error pattern u ∈ K n ≠ 0 (if there are no errors, you can’t detect them),
if v+u ∉ C , ∀ v ∈ C (that is, you know there’s an error if what comes out of the channel is not
in the code).
C is t -error detecting if it detects all error patterns of weight at most t , and it fails to detect
some of weight t+ 1(otherwise it would be t + 1-error detecting).
If d=d (C), then the code is d−1-error detecting.
Let’s now correct some errors. C corrects the error pattern u if v+u for any v ∈C is decoded
to v and not to some other v' ∈C . Hence,
' ' '
∀ v ∈ C ∧ v ≠ v , d ( v , v +u ) <d (v , v +u)
That is, a code C corrects the error pattern u if, for all v in C , v+u is closer to v than to any
other word in C .
C is t -error correcting if it corrects every error pattern u of weight at most t and it fails to
correct some error pattern of weight t+ 1 (otherwise it would be t+ 1-error correcting).