0% found this document useful (0 votes)
109 views14 pages

Compiler Construction: Lexical Analysis Basics

The document discusses topics from a lecture on compiler construction including lexical analysis and tokenization. It provides examples of specifying tokens using regular expressions and constructing finite state automata to recognize tokens. Key points covered include defining tokens as regular languages, using regular expressions to precisely describe token patterns, building transition diagrams for token recognition, and representing non-deterministic finite state automata. Examples are provided of specifying and building automata for relational operators, identifiers, and regular expressions.

Uploaded by

M Rustăm
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
109 views14 pages

Compiler Construction: Lexical Analysis Basics

The document discusses topics from a lecture on compiler construction including lexical analysis and tokenization. It provides examples of specifying tokens using regular expressions and constructing finite state automata to recognize tokens. Key points covered include defining tokens as regular languages, using regular expressions to precisely describe token patterns, building transition diagrams for token recognition, and representing non-deterministic finite state automata. Examples are provided of specifying and building automata for relational operators, identifiers, and regular expressions.

Uploaded by

M Rustăm
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Complier Construction 11/24/2021

Compiler Construction

Lecture 4

Topics Covered in Lecture 3

1
Complier Construction 11/24/2021

 Role of Lexical Analyzer


 Errors generated by Lexical Analyzer
 Tokens
 Lexemes
 Patterns

Lexical Analyzer

Part 2

2
Complier Construction 11/24/2021

Specification of Tokens

 Lexemes are simple sequence of characters


 Tokens are sets of lexemes....
 So: Tokens form a REGULAR LANGUAGE
 Regular Expressions- important notations for
specifying patterns
 Use REGULAR EXPRESSION to precisely
describe what strings each type of token can
recognize
(Reference: Page 94 onwards)
5

Learn by Example:

 Token to be specified = Identifier of C

 letter → A | B | C | … | Z | a | b | …
|z

 digit → 0 | 1 | 2 | … | 9

 identifier → letter ( letter | digit )*

3
Complier Construction 11/24/2021

Another Example:
 Let the grammar fragment be:
if expr then stmt
 What are the patterns (Regular Expressions) for the following
tokens?
Terminals Set of strings
 If → if
 then → then
 else → else
 relop → < | <= | > | >= | = | <>
(Reference: Exp 3.6, Page 98)

Learn by Doing

 Pattern for All Strings that start with


“tab” or end with “bat” ?

 Answer
tab {A,…,Z,a,...,z}* | {A,…,Z,a,....,z}*bat

4
Complier Construction 11/24/2021

Token Recognition
 Tokens can be recognized using a Transition diagram
 Depicts sequence of actions a lexical analyzer take, when
called by the parser to get next token
 Used to keep track of info about characters during scanning
of input
 Example: Token to be specified >= and >

Return (relop, GE)

* Return (relop, GT)


9

Learn by Example

 Relational Operators in Java


 < <=
 > >=
 = <>
 Specification of token relop
relop → < | <= | > | >= | = |
<>
 Recognition of token relop

10

5
Complier Construction 11/24/2021

11

Learn by Doing

 Identifiers in Java
 position
 Sal123
 ab
 x
 Specification of token identifier
identifier → letter ( letter | digit )*
 Recognition of token identifier ?
12

6
Complier Construction 11/24/2021

Learn by Doing

13

Terminologies : Automata &


Language Theory

 Finite State Automata (FSA)


 A recognizer that takes an input string and determines
whether it’s a valid string of the language.

 Non-Deterministic FSA (NFA)


 Has several alternative actions for the same input symbol

 Deterministic FSA (DFA)


 Has 1 action for any given input symbol

14

7
Complier Construction 11/24/2021

Representing NFA

1) Transition Graph:


Number states (circles), arcs, final states, …

 What language is defined?


 (a|b)*abb
15

Representing NFA
 2) Transition Tables:
More suitable for representation within a
computer

16

8
Complier Construction 11/24/2021

Learn by Example

 Given the regular expression :


(a (b*c)) | (a (b | c+)?)

 Find a transition diagram NFA that


recognizes it

17

Learn by Example – NFA


construction

Step 1: (a (b*c)) | (a (b | c+)?)

(a (b*c))

18

9
Complier Construction 11/24/2021

Learn by Example – NFA


construction
Step 2: (a (b*c)) | (a (b | c+)?)

(a (b | c+)?)

19

Learn by Example – NFA


construction
Step 3:(a (b*c)) | (a (b | c+)?)

20

10
Complier Construction 11/24/2021

Working of NFA

Learn by Example: OR
Input: ababb move(0, a) = 0
1.move(0, a) = 1 move(0, b) = 0
2.move(1, b) = 2 move(0, a) = 1
3.move(2, a) = ? (undefined) move(1, b) = 2
move(2, b) = 3
REJECT !
ACCEPT !

21

The NFA Problem

 Two problems
 – Valid input may not be accepted
 – Non-deterministic behavior from run
to run…
 Solution ?

22

11
Complier Construction 11/24/2021

The DFA Saves The Day

 A DFA is an NFA with a few restrictions

 No epsilon transitions

 For every state s, there is only one


transition (s,x) from s for any symbol x
in Σ

23

How does this all fit together ?

1. Reg. Expr. → NFA construction


2. NFA → DFA conversion
3. DFA simulation for lexical analyzer

 Point to Remember
 Both NFA and DFA can be used to
recognize tokens, but DFA are faster and
more optimizable than NFA
24

12
Complier Construction 11/24/2021

 Tokens can be specified by using


regular expressions
 Tokens can be recognized through
transition diagrams generated by
regular expressions
 A transition diagram may be NFA or
DFA but DFA is preferable because of
its speed and optimization
25

Home Work

 Write down the specification of Regular


Definition for:
“Unsigned numbers in C are strings such as
6250, 36.25, 6.235E4 or 1.25E-3.
 Use the shorthand notation

26

13
Complier Construction 11/24/2021

Home Work

 Try to study and understand Input Buffering


Section 3.2 Page 88 of your book
 Specify a pattern for white spaces using
regular expressions and make transition
diagram to recognize it.
 Specify a pattern for all Strings in which
{1,2,3} exist in ascending order and make
transition diagram to recognize it.

27

14

You might also like