Famke Nouwens
Theoretical Computer Science
Lecture 1
If we have a program, we can do 5 things with it:
1. Lexical analysis – break the program up into variables, names, numbers etc.
2. Parsing – create a tree of the sequence of operations that should be executed.
3. Optimization – check what can be left out and what can be pre-computed.
4. Termination – does the program stop (= halt) at some point?
5. Interpretation – figure out what it does (and if it’s useful).
In this course, we use a framework (an abstraction) to analyze a diverse set of problems.
The framework used is language recognition.
String – finite sequence of symbols drawn from some alphabet .
The empty string is .
The set of all possible strings over an alphabet is * (this contains ∞ strings).
Functions on strings s:
- Length: |s| gives the number of symbols in s.
- Occurrence of c: #c(s) gives the number of times that c occurs in s.
- Concatenation: st (or s ‖ t) adds t to the end of s. It is associative and s = s = s.
- Replication: si gives s repeated i times, where i is a natural number.
- Reverse: sR is s backwards. Theorem: (wx)R = xRwR.
- Substring: t is a substring of s if it appears in s in the exact same sequence.
• Proper substring: t is in s but it is not equal to s.
- Prefix: t is a prefix of s if it is a substring of s starting at the beginning of s (in the
same sequence).
• Proper prefix: t is a prefix of s but it is not equal to s.
- Suffix: t is a suffix of s if it is a substring of s containing the end of s (in the same
sequence).
• Proper prefix: t is a suffix of s but it is not equal to s.
- The empty string is always a proper substring, prefix and suffix of every string.
Language: a (possibly infinite) set of finite length strings over a finite alphabet . If L = { … }
then all possible strings are {…}*.
We can define a language in different ways, for example:
Language L = {x {a, b}* : all a’s precede all b’s}, meaning all strings in L have only a’s, b’s or
the empty string and all a’s are before the b’s.
It is important to know that L = { } = is not the same as L = {𝜀}, as well as that every prefix,
suffix and substrings starts with the empty string!
Languages are sets, and can be computationally defined in two ways:
1. Generator (enumerator): it lists all elements of the language. This can be done in
two ways:
a. Arbitrary order (since a language is a set)
, Famke Nouwens
b. Lexicographic order: shortest strings first, strings of same length are ordered
in dictionary order ( = alphabetic order).
2. Recognizer: decides whether a candidate string is in the language. It uses predicates
that can be true or false.
The size of language cannot be easily determined by one formula, however, we know the
smallest language has cardinality 0, since it’s the empty set . The largest language has size
*, but depends on the alphabet .
Theorem: If then * is countably infinite. Idea: enumeration is infinite, since there is
no longest string.
Theorem: If then the set of languages over is uncountably infinite.
Functions on languages:
- Set operations:
• Union: L1 ∪ L2
• Intersection: L1 ∩ L2
• Difference: L1 \ L2 (or L1 – L2)
• Complement: ¬ L
- Language operators:
• Concatenation: every string in L1 is combined with every string in L2 to form
new strings. Careful: L1L2 ≠ L2L1.
▪ L{} = {}L = L
▪ L = L = (because we cannot take an element from )
• Kleen star: L* is the language obtained by concatenating zero or more
elements from L.
Example: L = {dog, cat, fish} so L* = {𝜀, dog, cat, fish, dogdog, dogcat, dogfish,
…, dogcatfish, …, fishdogdogfishcat, …}
• + operator: L+ = L L*, so L+ is the language obtained by concatenating one or
more elements from L. L+ is the closure of L under concatenation.
▪ L+ = L* - {} iff L
+
▪ L = L* iff L
• Reverse: L is the set of strings formed by taking some string from L and
R
reversing it (and doing this for all strings).
▪ Theorem: (L1 L2)R = L2R L1R.
A semantic interpretation function assigns meanings to the strings of a language.
Lecture 2
Decision problem: a problem with a yes or no answer.
It is/can be answered by a decision procedure.
The focus of this course is the language recognition problem: given a language L and a
string w, is w in L?
Different kind of problems can be:
- Pattern matching: does a search string match a documents?
- Halting problem: does a program halt on all inputs?
, Famke Nouwens
Notation of a string encoding of X: < X >. Likewise for pairs X, Y: < X, Y >.
Turning problems into decision problems is not very hard, but you must describe the
language to hold all possible options and turn the problem into a verification problem (is w
in the language, yes or no?)
Equivalent problems: problems that can be reduced to the other.
There exist four classes of languages:
- Regular languages: class of languages that can be accepted by some finite state
machine.
• FSM: graphical representation using accepting and rejecting states.
- Context-free languages: class of languages that can be accepted by some pushdown
automata.
• PDA: FSM + a single stack with unlimited capacity. The stack must be empty
to accept a string.
- (Semi) decidable languages: class of languages that can be accepted by a Turing
machine (explanation comes later).
• Decidable: program stops with a decision true or false
• Semi-decidable: program stops with only decision true and fails to reject all
strings not in the language.
Rule of least power: use the least powerful language suitable for expressing information,
constraints or programs. Meaning, use the simplest machine if possible.
Grammar defines a language, while a machine accepts a language.
Nondeterminism: there are multiple options to choose from (for the same input).
A language is closed under a function if that function is carried out over the language, and
the newly formed language is again finite/infinite (depends on what it first was).
Lecture 3
Deterministic FSM: a finite state machine defined by a 5-tuple: M = (K, , , s, A), where:
- K is a finite set of states
- is an alphabet
- is the transition function from (K ) to K
- s K is the initial state
- A K is the set of accepting states
M accepts a string if it ends up in an element of A after reading the string. The set of all
accepted strings is denoted by LM or L(M).
Configuration: an element of K *. It captures the current state and the current string,
both of which can make a difference to M’s future behaviour.
Initial configuration on input w: (sM, w).
The Yields relations:
- Yields-in-one-step relation |-M: from the current state, we can reach the next state
and string by deleting/adding one symbol.