Welcome to ShenZhenJia Knowledge Sharing Community for programmer and developer-Open, Learning and Share
menu search
person
Welcome To Ask or Share your Answers For Others

Categories

I was reading about tokens and counting the number of tokens in a program.

Previously I read somewhere that preprocessor commands are not counted as tokens. But when I read about tokens on Geeksforgeeks it is given in section "special symbols":

pre processor(#): The preprocessor is a macro processor that is used automatically by the compiler to transform your program before actual compilation.

So I am confused that in a program, if we write #define will it be a token?

For example:

#include<stdio.h> 
#define max 100 
int main() 
{ 
    printf("max is %d", max); 
    return 0; 
} 

How many tokens are in this example.?

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
thumb_up_alt 0 like thumb_down_alt 0 dislike
212 views
Welcome To Ask or Share your Answers For Others

1 Answer

The linked article is full of basic errors, and should not be relied upon.

The process of parsing C or C++ is defined as a series of transformations:1

  1. Backslash-newline is replaced with nothing whatsoever -- not even a space.
  2. Comments are removed and replaced with a single space each.
  3. The surviving text is converted into a series of preprocessing tokens. These are less specific than the tokens used by the language proper: for instance, the keyword if is an IF token to the language proper, but just an IDENT token to the preprocessor.
  4. Preprocessing directives are executed and macros are expanded.
  5. Each preprocessing token is converted into a token.
  6. the stream of tokens is parsed into an abstract syntax tree, and the rest of the compiler takes it from there.

Your example program

#include<stdio.h> 
#define max 100 
int main() 
{ 
    printf("max is %d", max); 
    return 0; 
}

will, after transformation 3, be this series of 23 preprocessing tokens:

PUNCT:# IDENT:include INCLUDE-ARG:<stdio.h>
PUNCT:# IDENT:define IDENT:max PP-NUMBER:100
IDENT:int IDENT:main PUNCT:( PUNCT:)
PUNCT:{
IDENT:printf PUNCT:( STRING:"max is %d" PUNCT:, IDENT:max PUNCT:) PUNCT:;
IDENT:return PP-NUMBER:0 PUNCT:;
PUNCT:}

The directives are still present at this stage. Please notice that #include and #define are each two tokens: the # and the directive name are separate. Some people like to write complex #if nests with the hashmarks all in column 1 but the directive names indented.

After transformation 5, though, the directives are gone and we have this series of 16+n tokens:

[ ... some large volume of tokens produced from the contents of stdio.h ... ]
INT IDENT:main LPAREN RPAREN
LBRACE
IDENT:printf LPAREN STRING:"max is %d" COMMA DECIMAL-INTEGER:100 RPAREN SEMICOLON
RETURN DECIMAL-INTEGER:0 SEMICOLON
RBRACE

where 'n' is however many tokens came from stdio.h.

Preprocessing directives (#include, #define, #if, etc.) are always removed from the token stream and perhaps replaced with something else, so you will never have tokens after transformation 6 that directly result from the text of a directive line. But you will usually have tokens that result from the effects of each directive, such as the contents of stdio.h, and DECIMAL-INTEGER:100 replacing IDENT:max.

Finally, C and C++ do this series of operations almost, but not quite, the same, and the specifications are formally independent. You can usually rely on preprocessing operations to behave the same in both languages, as long as you're only doing simple things with the preprocessor, which is best practice nowadays anyway.


1 You will sometimes see people talking about translation phases, which are the way the C and C++ standards officially describe this series of operations. My list is not the list of translation phases; it includes separate bullet points for some things that are grouped as a single phase by the standards, and leaves out several steps that aren't relevant to this discussion.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
thumb_up_alt 0 like thumb_down_alt 0 dislike
Welcome to ShenZhenJia Knowledge Sharing Community for programmer and developer-Open, Learning and Share
...