Complier Construction 11/24/2021
Compiler Construction
Lecture 4
Topics Covered in Lecture 3
1
Complier Construction 11/24/2021
Role of Lexical Analyzer
Errors generated by Lexical Analyzer
Tokens
Lexemes
Patterns
Lexical Analyzer
Part 2
2
Complier Construction 11/24/2021
Specification of Tokens
Lexemes are simple sequence of characters
Tokens are sets of lexemes....
So: Tokens form a REGULAR LANGUAGE
Regular Expressions- important notations for
specifying patterns
Use REGULAR EXPRESSION to precisely
describe what strings each type of token can
recognize
(Reference: Page 94 onwards)
5
Learn by Example:
Token to be specified = Identifier of C
letter → A | B | C | … | Z | a | b | …
|z
digit → 0 | 1 | 2 | … | 9
identifier → letter ( letter | digit )*
3
Complier Construction 11/24/2021
Another Example:
Let the grammar fragment be:
if expr then stmt
What are the patterns (Regular Expressions) for the following
tokens?
Terminals Set of strings
If → if
then → then
else → else
relop → < | <= | > | >= | = | <>
(Reference: Exp 3.6, Page 98)
Learn by Doing
Pattern for All Strings that start with
“tab” or end with “bat” ?
Answer
tab {A,…,Z,a,...,z}* | {A,…,Z,a,....,z}*bat
4
Complier Construction 11/24/2021
Token Recognition
Tokens can be recognized using a Transition diagram
Depicts sequence of actions a lexical analyzer take, when
called by the parser to get next token
Used to keep track of info about characters during scanning
of input
Example: Token to be specified >= and >
Return (relop, GE)
* Return (relop, GT)
9
Learn by Example
Relational Operators in Java
< <=
> >=
= <>
Specification of token relop
relop → < | <= | > | >= | = |
<>
Recognition of token relop
10
5
Complier Construction 11/24/2021
11
Learn by Doing
Identifiers in Java
position
Sal123
ab
x
Specification of token identifier
identifier → letter ( letter | digit )*
Recognition of token identifier ?
12
6
Complier Construction 11/24/2021
Learn by Doing
13
Terminologies : Automata &
Language Theory
Finite State Automata (FSA)
A recognizer that takes an input string and determines
whether it’s a valid string of the language.
Non-Deterministic FSA (NFA)
Has several alternative actions for the same input symbol
Deterministic FSA (DFA)
Has 1 action for any given input symbol
14
7
Complier Construction 11/24/2021
Representing NFA
1) Transition Graph:
Number states (circles), arcs, final states, …
What language is defined?
(a|b)*abb
15
Representing NFA
2) Transition Tables:
More suitable for representation within a
computer
16
8
Complier Construction 11/24/2021
Learn by Example
Given the regular expression :
(a (b*c)) | (a (b | c+)?)
Find a transition diagram NFA that
recognizes it
17
Learn by Example – NFA
construction
Step 1: (a (b*c)) | (a (b | c+)?)
(a (b*c))
18
9
Complier Construction 11/24/2021
Learn by Example – NFA
construction
Step 2: (a (b*c)) | (a (b | c+)?)
(a (b | c+)?)
19
Learn by Example – NFA
construction
Step 3:(a (b*c)) | (a (b | c+)?)
20
10
Complier Construction 11/24/2021
Working of NFA
Learn by Example: OR
Input: ababb move(0, a) = 0
1.move(0, a) = 1 move(0, b) = 0
2.move(1, b) = 2 move(0, a) = 1
3.move(2, a) = ? (undefined) move(1, b) = 2
move(2, b) = 3
REJECT !
ACCEPT !
21
The NFA Problem
Two problems
– Valid input may not be accepted
– Non-deterministic behavior from run
to run…
Solution ?
22
11
Complier Construction 11/24/2021
The DFA Saves The Day
A DFA is an NFA with a few restrictions
No epsilon transitions
For every state s, there is only one
transition (s,x) from s for any symbol x
in Σ
23
How does this all fit together ?
1. Reg. Expr. → NFA construction
2. NFA → DFA conversion
3. DFA simulation for lexical analyzer
Point to Remember
Both NFA and DFA can be used to
recognize tokens, but DFA are faster and
more optimizable than NFA
24
12
Complier Construction 11/24/2021
Tokens can be specified by using
regular expressions
Tokens can be recognized through
transition diagrams generated by
regular expressions
A transition diagram may be NFA or
DFA but DFA is preferable because of
its speed and optimization
25
Home Work
Write down the specification of Regular
Definition for:
“Unsigned numbers in C are strings such as
6250, 36.25, 6.235E4 or 1.25E-3.
Use the shorthand notation
26
13
Complier Construction 11/24/2021
Home Work
Try to study and understand Input Buffering
Section 3.2 Page 88 of your book
Specify a pattern for white spaces using
regular expressions and make transition
diagram to recognize it.
Specify a pattern for all Strings in which
{1,2,3} exist in ascending order and make
transition diagram to recognize it.
27
14