Lexical Analysis in simple words – token generation, blank space, symbol table, lexical error
Lexical Analysis in simple words – token generation, blank space, symbol table, lexical error
Contents [hide]
- 1 Lexical Analysis in Simple Words
- 2 How Lexical Analysis Works?
- 3 Example of Token Generation
- 4 Handling Blank Spaces & Comments
- 5 Symbol Table
- 6 Lexical Errors
- 7 Summary
- 8 Lexical Analysis in simple words – token generation, blank space, symbol table, lexical error
- 9 LEXICAL ANALYSIS Structure of compiler – Functions and …
- 10 Lexical Analysis – Compiler Construction
- 11 compiler design lecture notes
- 12 Chapter 3 Lexical Analysis
- 13 CS 335: Lexical Analysis
Lexical Analysis in Simple Words
Lexical Analysis is the first phase of a compiler that breaks down the source code into smaller meaningful units called tokens. This process is done by a program called the Lexical Analyzer (Lexer).
How Lexical Analysis Works?
The source code is read character by character.
Meaningful groups of characters are converted into tokens.
Unnecessary characters like spaces, tabs, and comments are removed.
A symbol table is created to store identifiers and keywords.
If an unknown symbol is found, a lexical error is reported.
Example of Token Generation
Input Code:
Lexical Analyzer Output (Tokens):
Each part of the code is classified into a specific token type.
Handling Blank Spaces & Comments
Whitespace (spaces, tabs, new lines) is ignored by the lexical analyzer.
Comments (//
, /* ... */
) are removed, as they don’t affect program execution.
Example:
The lexer ignores "// This is a comment"
and processes only:
Symbol Table
A symbol table stores identifiers (variable names, function names) and their attributes like data type, memory location, scope, etc.
It helps the compiler track variables efficiently.
Example Symbol Table:
Identifier | Type | Memory Location |
---|---|---|
x | int | 1001 |
y | float | 1002 |
Lexical Errors
Errors occur when the lexer finds an unknown or invalid character.
Example:
Lexical Error: “Unexpected character ‘@’ found.”
Summary
Lexical Analysis breaks code into tokens.
Blank spaces & comments are ignored.
A symbol table stores variable names and types.
Lexical errors occur due to unknown characters.
Would you like a step-by-step breakdown of a real Lexical Analysis process?