Theoretical Computer Science II
COS2601
Languages and words
Basics
L 1 = { x xx .x.xx xxxx . . . }
L1 is the language while x is the words which can be written as:
L1 = { x'n for n =I 2 3 . . . }
Length and reverse
If a = x a b c
Then length(a) = 4
Then there is reverse(1 45 ) = 54 1
Kleene’s Theorem
Kleene’s star
∑*
This notation is sometimes known as the Kleene star after the logician who was one of the
founders of this subject.
If ∑ = { x } , then
∑* = L4 = { Λ x xx xxx . . . }
When doing so make sure the smallest words are first
More info
If ∑ = ᴓ (the empty set),
then ∑ * = { Λ }
and S* = S * *
Kleene’s Plus
If for some reason we wish to modify the concept of closure to refer to only the concatenation
of some (not zero) strings from a set S, we use the notation + instead of *. For example:
If ∑ = { x } , then ∑+ = { x xx xxx . . . }
Recursive Definitions
Definition
• A recursive definition is characteristically a three-step process
1. We specify some basic objects in the set.
2. We give rules for constructing more objects in the set from the ones we already know.
3. We declare that no objects except those constructed in this way are allowed in the set.
Example
Rule I) 2 is in EVEN.
Rule 2) If x is in EVEN, then so is x + 2.
Rule 3) The only elements in the set EVEN are those that can be produced from the two rules above.
,Strings
The number n is the length of the string. Suppose, for example, X = {a, b, c, d}. Then the function f: {1, 2, 3} ® X
defined by f(1) = c, f(2) = a and f(3) = d is the string we would normally represent by writing cad and the length of
the string is 3.
CONCAT
What does one do with strings? Basically, all we want to do with these things is concatenate them. One can think
of concatenation as a binary operation on the set of all strings. If we call this operation CONCAT
Principal benefit of recursive definitions
Recursive definition
• Z+ is the smallest subset of R such that
• 1 Î Z+ and
• if k Î Z+ then k + 1 Î Z+.
Another correct recursive definition is:
• Rule 1: 1 Î Z+
• Rule 2: If k Î Z+, then also k+1 Î Z+.
• Rule 3: Only elements generated by the above rules are in Z+.
Induction principle
If a subset A of Z+ is such that 1 Î A and if k Î A, then also k + 1 Î A, then A = Z+.
Proof:
Step 1:
Define A ⊆ Z+ as follows:
A = { n ∈ Z+ ? 13 + 23 + 33 + … + n3 = (1 + 2 + … + n)2}.
We want to prove that A is actually equal to Z+. By definition we know that A ⊆ Z+. We now have to prove that Z+
A. (We will use the fact, previously shown, that 1 + 2 + … + k = k(k + 1)/2.)
Step 2:
1 ∈ Z+. Is 1 ∈ A? Yes, since 13 = 1 and 12 = 1.
Step 3:
Assume k ∈ A, thus 13 + 23 + 33 + … + k3 = (1 + 2 + … + k)2
Is k + 1 ∈ A? Consider
(1 + 2 + ... + k + (k + 1))2 = (1 + 2 + ... + k)2 + 2·(1 + 2 + ... + k)·(k + 1) + (k + 1)2 ?
= (1 + 2 + ... + k)2 + 2·(k(k + 1)/2)·(k + 1) + (k + 1)2
= (1 + 2 + ... + k)2 + (k3 + 2k2 + k) + (k2 + 2k + 1)
= (1 + 2 + ... + k)2 + (k3 + 3k2 + 3k + 1)
= (13 + 23 + ... + k3) + (k + 1)3. (from the assumption)
We have proven that k + 1 ∈ A.
? Remember, (a + b)2 = (a2 + 2ab + b2) with a = (1 + 2 + ... + k) and b = (k + 1).
Step 4:
We can now conclude that Z+⊆ A, and therefore that Z+= A.
,Regular Expressions
Kleen’s star in letters
We now introduce the use of the kleen’s star applied not to a set, but directly to the letter x and written as a
superscript as if it were an exponent:
x*
The simple expression x* will be used to indicate some sequence of x's (maybe none at
all). This x is intentionally written in boldface type to distinguish it from an alphabet character.
x* = Λ or x or x^2 or x^3 or x^4 . .
= x^n for some n = 0 I 2 3 4 . . .
The star operator applied to a letter is analogous to the star operator applied to a set. It represents an arbitrary
concatenation of copies of that letter (maybe none). This notation can be used to help us define languages by
writing
L4 = language(x*)
We should not confuse x*, which is a language-defining symbol. with L4, which is the name we have given to a
certain language. Therefore, we use the word "language" in the equation. We shall soon give a name to the world
in which this symbol x* lives. but not quite yet. Suppose that we wished to describe the language L over the
alphabet ∑ = {a b}. where
L = {a ab abb abbb abbbb . . .}
We could summarize this language by the English phrase "all words of the form one a followed
by some number of b's (maybe no b's at all)."
Using our star notation and boldface letters, we may write
L = language(a b*) or L = language(ab*)
BUT ALSO
We can apply the Kleene star to the whole string ah if we want, as follows:
(ab)* = Λ or ab or abab or ababab…
Another example of this is:
Language(ab*a) = {aa aba abba abbba abbbba. . .}
Important small details
Language(a*b*) = {Λ a b aa ab bb aaa aab abb bbb aaaa. . .)
Here we should again be very careful to observe that
a*b* ≠ (ab)*
Plus symbol in this crap
We now introduce another use for the plus sign. By the expression x + y where x and y are strings of characters
from an alphabet, we mean "either x or y." This means that x + y offers a choice, much the same way that x* does.
Care should be taken so as not to confuse this with " as an exponent.
T = language((a + c)b*)
= language(either a or c then some b 's)
= T = {a c ab cb abb cbb abbb cbbb abbbb cbbbb . . .}
Another example of this is
L = {aaa aab aba abb baa bab bba bbb}
L = language((a + b)(a + b)(a + b))
L = language((a + b)^3)
Another way which could make it more is (a + b)* instead of (a + b)^3
, Regular expressions are as follow:
The set of regular expressions is defined by the following rules:
Rule I
Every letter of ∑ can be made into a regular expression by writing it in boldface; Λ itself is a regular expression.
Rule 2
If r 1 and r 2 are regular expressions, then so are:
(i) (r 1 )
(ii) r1 r2
(iii) r 1 + r2
(iv) r 1 *
Rule 3
Nothing else is a regular expression.
Example
r1 = aa + b
r1* = (r 1)* = (aa + b)*
Phi φ
To the purely logical Vulcan mind, that would be the only answer, but since we have already employed the
boldface lambda (Λ) to mean the regular expression defining the word lambda, we take the liberty of using the
boldface phi (φ) to be the regular expression for the null language.
r + φ= r
and
φr = φ
Different expressions
V = {Λ a b ab bb abb bbb abbb bbbb . . .}
We can define V by the expression
b* + ab*
OR
(Λ + a)b*
AS
Λb* + ab* = (Λ + a)b*
Facts to remember
ab = ba
ab ≠ ba
in algebra, they are the same numerical product but in formal languages, they are different words
Distributive law caution
Let us reconsider the language
T ={a c ab cb abb cbb . . .}
T can be defined as above by
(a + c)b*
but it can also be defined by
ab* + cb*
This is another example of the distributive law.
However, the distributive l aw must be used with extreme caution. Sometimes, it is difficult to determine whether
if the law is applicable. Expressions may be distributed but operators cannot. Certainly, the star alone cannot
always be distributed without changing the meaning of the expression. For example, as we h ave noted earlier,
(ab)* ≠ a*b*. The language associated with (ab)* is words with alternating a's and b's, whereas the language
associated with a*b* is only strings where all the a's (if any ) precede al l the b's (also if any ). To make the
identification between the regular expressions and their associated languages more explicit, we need to define the
operation of multiplication of sets of words, a concept we have used informally already.