A compiler is a program that translates high-level source code into low-level machine language, ensuring correctness and speed. It consists of various types, including single pass, two pass, and multipass compilers, and follows stages such as lexical analysis, syntax analysis, semantic analysis, code generation, and optimization. Error detection and recovery are crucial in compiler construction, with mechanisms to handle lexical, syntax, and semantic errors.
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0 ratings0% found this document useful (0 votes)
2K views22 pages
Compiler Construction Complete Notes
A compiler is a program that translates high-level source code into low-level machine language, ensuring correctness and speed. It consists of various types, including single pass, two pass, and multipass compilers, and follows stages such as lexical analysis, syntax analysis, semantic analysis, code generation, and optimization. Error detection and recovery are crucial in compiler construction, with mechanisms to handle lexical, syntax, and semantic errors.
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
{HT}
Compiler Construction 1: Compiler Techniques and Methodology
What is a Compiler? A compiler is a computer program which helps you transform source code written in a high-level language into low- level machine language. It translates the code written in one programming language to some other language without changing the meaning of the code.
Features of Compilers: 1. Correctness 2. Speed of compilation 3. The speed of the target code 4. Code debugging help
Types of Compiler: Following are the different types of Compiler: 1) Single Pass Compilers 2) Two Pass Compilers 3) Multipass Compilers {HT} Single Pass Compiler:
In single pass Compiler source code directly transforms into machine code. For example, Pascal language.
Two Pass Compiler:
Two pass Compiler is divided into two sections, viz. 1. Front end: It maps legal code into Intermediate Representation (IR). 2. Back end: It maps IR onto the target machine
Multipass Compilers:
The multipass compiler processes the source code or syntax tree of a program several times. It divided a large program into multiple small programs and process them. It develops multiple intermediate codes. All of these multipass take the output of the previous phase as an input. So it requires less memory. It is also known as ‘Wide Compiler’.
Steps for Language processing systems:
{HT} Before knowing about the concept of compilers, you first need to understand a few other tools which work with compilers.
Compiler techniques and methodology are the principles and practices that guide the design and implementation of compilers.
Stages of Compiler Techniques and Methodology:
{HT} Scanning and parsing: These are the processes of analyzing the syntax and structure of the source code and building an intermediate representation, such as an abstract syntax tree, that captures its meaning. Semantic analysis: This is the process of checking the validity and consistency of the source code, such as type checking, scope checking, and name resolution. Code generation: This is the process of translating the intermediate representation into executable code for the target machine or platform, such as assembly language or bytecode. Optimization: This is the process of improving the quality and performance of the executable code by applying various techniques, such as data flow analysis, register allocation, instruction scheduling, and loop transformation.
2: Organization of Compilers The organization of compilers in compiler construction involves breaking down the compiler into several distinct phases or components, each responsible for specific tasks in the process of translating a high-level programming language into machine code or an intermediate representation. The traditional organization of compilers follows a structure known as the "compiler front end" and "compiler back end."
Structure of a compiler: Any large software is easier to understand and implement if it is divided into well-defined modules. {HT}
Front End:
Lexical Analysis (Scanner): This is the first phase, where the
source code is broken down into a sequence of tokens. Syntax Analysis (Parser): This phase checks whether the sequence of tokens adheres to the grammatical structure of the programming language. Semantic Analysis: This phase checks the meaning of the statements and expressions in the program. It ensures that the {HT} program follows the language's semantics and performs tasks like type checking.
Intermediate Code Generation:
After the front end, the compiler may generate an intermediate
representation (IR) of the program. The IR is an abstraction that simplifies the source code while preserving its essential meaning.
Optimization:
The compiler performs various optimizations on the intermediate
code to improve the efficiency of the generated machine code.
Back End:
Code Generation: In this phase, the compiler generates the
target machine code or assembly code from the optimized intermediate code. Code Optimization (Machine-Dependent): This phase optimizes the generated machine code for the specific target architecture. It may include instruction scheduling, register allocation, and other architecture-specific optimizations. Code Emission: The final step involves emitting the machine code or generating an executable file from the optimized code.
Additional Considerations:
Error Handling: Throughout the compilation process, compilers
must handle errors gracefully, providing meaningful error messages. Debugging Information: Compilers often include information in the executable to aid in debugging, such as source code line numbers or variable names. {HT} Cross-Compilation: Some compilers support generating code for a different target architecture than the one on which the compiler itself runs.
3: Lexical and Syntax Analysis
Lexical analysis and syntax analysis are two crucial phases in the process of compiler construction. They are responsible for analyzing the source code of a programming language and converting it into a form that can be further processed by the compiler. Lexical Analysis: 1. Purpose: Tokenization: The main goal of lexical analysis is to break down the source code into a sequence of tokens. Tokens are the smallest units of meaning in a programming language, such as keywords, identifiers, literals, and operators.
2. Components: Lexer/Tokenizer: This is the component responsible for scanning the source code and identifying the tokens. Regular Expressions: These rules define the patterns for different types of tokens.
3. Steps in Lexical Analysis:
Scanning: The lexer scans the source code character by character. Token Recognition: It recognizes and categorizes sequences of characters into tokens based on predefined rules. Error Handling: Lexical analysis also involves detecting and reporting lexical errors, such as invalid characters or tokens. {HT} 4. Output: The output of lexical analysis is a stream of tokens that serves as input for the subsequent phases of the compiler.
Syntax Analysis: 1. Purpose: Grammar Verification: Syntax analysis checks whether the sequence of tokens generated by the lexical analysis conforms to the grammatical structure of the programming language. AST Construction: It builds a hierarchical structure called the Abstract Syntax Tree (AST) that represents the syntactic structure of the program.
2. Components: Parser: The parser is responsible for analyzing the arrangement of tokens and ensuring that it follows the syntax rules of the language. Context-Free Grammar (CFG): Syntax rules are often specified using CFG, which describes the syntactic structure of the language. Error Handling: The syntax analysis phase detects and reports syntax errors.
3. Steps in Syntax Analysis:
Parsing: The parser processes the stream of tokens generated by the lexer and checks whether it conforms to the language's syntax rules. Error Reporting: Syntax analysis also involves reporting detailed error messages when syntax errors are encountered.
4. Output: {HT} The output of syntax analysis is the AST, which serves as the basis for subsequent phases like semantic analysis, optimization, and code generation.
Example: E→E+E E→E–E E → id For the string id + id – id, the above grammar generates two parse trees:
Special Symbols: Most of the high-level languages contain some special symbols, as shown below: Name Symbols Punctuation Comma(,), Semicolon(:) Assignment = Special Assignment +=, -=, *=, /= Comparison ==, ≠, <, >, ≤, ≥ Preprocessor # Location Specifier & Logical &&, |, ||, ! {HT} Name Symbols Shift Operator >>, <<, >>>, <<<
Now we will understand with proper code of C++.
#include <iostream> int maximum (int x, int y) { // This will compare two numbers if (y > x) return y; else { return x; } }
4: Parsing Techniques The process of transforming the data from one format to another is called Parsing. This process can be accomplished by the parser. The parser is a component of the translator that helps to organize linear text structure following the set of defined rules which is known as grammar.
The process of Parsing:
{HT}
Types of Parsing:
There are two
types of Parsing: 1) The Top-down Parsing 2) The Bottom-up Parsing
Top-down Parsing: When the parser generates a parse with top-down expansion to the first trace, the left-most derivation of input is called top-down parsing. The top-down parsing initiates with the start symbol and ends on the terminals. Such parsing is also known as predictive parsing. {HT}
Recursive Descent Parsing: Recursive descent parsing is a
type of top-down parsing technique. This technique follows the process for every terminal and non-terminal entity. It reads the input from left to right and constructs the parse tree from right to left. Back-tracking: The parsing technique that starts from the initial pointer, the root node. If the derivation fails, then it restarts the process with different rules.
Bottom-up Parsing: The bottom-up parsing works just the reverse of the top-down parsing. It first traces the rightmost derivation of the input until it reaches the start symbol. {HT}
Shift-Reduce Parsing: Shift-reduce parsing works on two steps: Shift step and Reduce step. a. Shift step: The shift step indicates the increment of the input pointer to the next input symbol that is shifted. b. Reduce Step: When the parser has a complete grammar rule on the right- hand side and replaces it with RHS.
LR Parsing: LR parser is one of the most efficient syntax analysis techniques as it works with context-free grammar. In LR parsing L stands for the left to right tracing, and R stands for the right to left tracing.
Why is parsing useful in compiler designing?
In the world of software, every different entity has its criteria for the data to be processed. So parsing is the process that transforms the data in such a way so that it can be understood by any specific software.
The Technologies Use Parsers:
The programming languages like Java. The database languages like SQL. The protocols like HTTP. The XML and HTML. {HT} 5: Object Code Generation and Optimization Object code generation and optimization are crucial phases in the process of compiler construction. These phases are responsible for translating high-level programming languages into machine code that can be executed by a computer's hardware efficiently.
Code generation and optimization involves several
stages: 1. Intermediate Code Generation: The front-end of the compiler generates an intermediate representation of the source code. 2. Intermediate Code Optimization: Some compilers perform initial optimization on the intermediate code before generating the object code. 3. Object Code Generation: The optimized intermediate code is translated into machine code or assembly language. 4. Final Code Optimization: Further optimizations are applied to the generated object code to improve performance. Example of object code generation and optimization for a C program:
// C program int x = 10; int y = 20; int z = x + y;
// Intermediate code (three-address code)
t1 = 10 t2 = 20 t3 = t1 + t2 x = t1 y = t2 z = t3 {HT} // Object code (x86 assembly) mov eax, 10; t1 = 10 mov ebx, 20; t2 = 20 add eax, ebx; t3 = t1 + t2 mov [x], eax; x = t1 mov [y], ebx; y = t2 mov [z], eax; z = t3
// Optimized object code (x86 assembly)
mov eax, 10; x = 10 mov ebx, 20; y = 20 add eax, ebx; z = x + y mov [x], eax; store x mov [y], ebx; store y mov [z], eax; store z
Code Optimization is done in the following different
ways:
1. Compile Time Evaluation:
(i) A = 2*(22.0/7.0)*r Perform 2*(22.0/7.0)*r at compile time. (ii) x = 12.4 y = x/2.3 Evaluate x/2.3 as 12.4/2.3 at compile time.
2. Variable Propagation:
//Before Optimization c=a*b x=a till {HT} d=x*b+4 //After Optimization c=a*b x=a till d=a*b+4
3. Constant Propagation: If the value of a variable is a constant, then replace the variable with the constant. The variable may not always be a constant.
Example: (i) A = 2*(22.0/7.0)*r Performs 2*(22.0/7.0)*r at compile time. (ii) x = 12.4 y = x/2.3 Evaluates x/2.3 as 12.4/2.3 at compile time.
4. Copy Propagation: It is extension of constant propagation. It helps in reducing the compile time as it reduces copying.
Example: //Before Optimization c=a*b x=a till d=x*b+4
//After Optimization c=a*b {HT} x=a till d=a*b+4
5. Common Sub Expression Elimination:
In the above example, a*b and x*b is a common sub expression.
6. Dead Code Elimination:
Copy propagation often leads to making assignment statements into dead code.
Example: //Before Optimization c=a*b x=a till d=a*b+4 //After elimination: c=a*b till d=a*b+4
7. Function Cloning: Here, specialized codes for a function are created for different calling parameters.
Example: Function Overloading
7: Detection and Recovery from Errors
In compiler construction, error detection and recovery mechanisms play a crucial {HT} role in ensuring that a compiler can handle erroneous input and produce meaningful output. Errors can occur at various stages of the compilation process, such as lexical analysis, syntax analysis, semantic analysis, and code generation.
Error Detection and Recovery in Compiler Construction:
1. Error Detection:
Lexical Errors: Definition: Lexical errors involve invalid characters or token sequences. Detection: Lexical analyzers (scanners) examine the source code and identify errors by recognizing characters that do not form valid tokens or violate lexical rules.
Syntax Errors: Definition: Syntax errors occur when the input source code violates the grammar rules of the programming language. {HT} Detection: Syntax analyzers (parsers) detect these errors during the parsing phase by analyzing the structure of the code.
Semantic Errors: Definition: Semantic errors involve violations of the language's semantics, such as using a variable before it is declared. Detection: Semantic analysis identifies these errors during the semantic analysis phase.
2. Panic Mode Recovery:
Definition: Panic mode recovery involves discarding tokens until a synchronizing token is found. Purpose: It helps the compiler recover from a syntax error and continue parsing the source code.
3. Code Generation and Optimization Errors:
Definition: Errors in later stages may involve incorrect translations or inefficient code generation. Handling: The compiler detects and reports these errors to ensure the generation of correct and optimized machine code.
4. User-Defined Errors: Definition: Compilers may allow programmers to define custom error-handling routines or specify error-handling behavior. Purpose: Provides flexibility in handling errors based on the specific requirements of a programming project.
8: Contrast between Compilers and Interpreters
{HT}
Compiler: The Compiler is a translator which takes input i.e., High- Level Language, and produces an output of low-level language i.e. machine or assembly language. The work of a Compiler is to transform the codes written in the programming language into machine code (format of 0s and 1s) so that computers can understand. A compiler is more intelligent than an assembler it checks all kinds of limits, ranges, errors, etc. But its program run time is more and occupies a larger part of memory.
Advantages of Compiler: Compiled code runs faster in comparison to Interpreted code. Compilers help in improving the security of Applications.
Disadvantages of Compiler: The compiler can catch only syntax errors and some semantic errors. Compilation can take more time in the case of bulky code.
Interpreter: An Interpreter is a program that translates a programming language into a comprehensible language. The interpreter converts high-level {HT} language to an intermediate language. It contains pre-compiled code, source code, etc. It translates only one statement of the program at a time. Interpreters, more often than not are smaller than compilers.
Advantages of Interpreter: Programs written in an Interpreted language are easier to debug. Interpreted Language is more flexible than a Compiled language.
Disadvantages of Interpreter: The interpreter can run only the corresponding Interpreted program. Interpreted code runs slower in comparison to Compiled code. {HT}