1. Homepage of Dr. Zoltán Porkoláb
    1. Home
    2. Archive
  2. Teaching
    1. Timetable
    2. Multiparadigm programming (MSc)
    3. C programming (BSc for physicists)
    4. Project tools (BSc)
    5. Bolyai College
    6. C++ (for foreign studenst)
    7. Software technology lab
    8. BSc and MSc thesis
  3. Research
    1. Templight
    2. CodeChecker
    3. CodeCompass
    4. Projects
    5. Publications (up to 2011)
    6. PhD students
  4. Affiliations
    1. Dept. of Programming Languages and Compilers
    2. Ericsson Hungary Ltd

The C programming language – Lecture 6.

Life and scope rules

In imperative programming languages variables have two important properties:

  1. Life - the TIME under run-time when the memory area is valid and usable.
  2. Scope - the AREA in the program where a name is binded to a memory area.

Life and scope is defined by the declaration of the variables. More precisely, the place of the declaration and the specified storage class is important. Declaration also specifies the type of the language objects.

Life categories in C

In C objects are mapped into the memory based on their storage types. Different storage types means different life rules.

String literals

String literals are values known at compile-time. The type of a string literal is character array of N. Write attempt of a string literal is undefined. These character arrays are not allocated in the writeable area of the program. Those should be considered as read-only memory. (In many implementations they really are allocated in read-only memory.)

1
2
3
4
char *hello1 = "Hello world";
// ...
// BAD!
hello1[1] = 'a'; // likely run-time error!

To avoid this situation, declare pointers to string literals as const char *

1
2
3
const char *hello1 = "Hello world";
// ...
hello1[1] = 'a'; // syntax error!
$ gcc  -Wall -std=c99 s.c 
s.c: In function ‘main’:
s.c:11:13: error: assignment of read-only location ‘*(hello2 + 1u)’
   hello2[1] = 'a'; // likely run-time error!
             ^

Moreover, hello1 and hello2 could be stored only one time, therefore the pointer value of hello1 and hello2 could be equal:

hello1 == hello2

This is different from using arrays to store strings. Those are allocated in the program area, and they can be read and write.

1
2
3
4
5
char t1[] = {'H','e','l','l','o','\0'};
char t2[] = "Hello";
char t3[] = "Hello";

char t1[1] = 'a'; // ok

and the address of t1, t2 and t3 are different.

Automatic life

Objects local to a block (and not declared static) has automatic life. Such objects are created in the stack. The stack is safe in a multithreaded environment. Objects created when the declaration is encountered and destroyed when control leaves the declaration block.

1
2
3
4
5
void f()
{
    int i = 2;  // life starts here with initialization
    ....
}               // life finished here

There should be no reference to a variable after its life has ended.

1
2
3
4
5
6
7
8
9
//
// This is BAD!!!
// 
int *f()
{
    int i = 2;  // life starts here with initialization
    ....
    return &i;  // likely a run-time error
}               // life finished here

Using the return value is invalid, since the life of i is already finished, and the memory area of i might be reused for other purposes.

Dynamic Life

Objects with dynamic life is created in the free store or heap. The lifetime starts with the call of the malloc function. The only parameter of malloc is the required number of bytes to allocate. The life ends with the call of free function. This function may be called in a different place of the program.

1
2
3
4
5
char *buffer = (char*) malloc(1024); // life starts here, alloc 1024 chars
double *dbls = (double*) malloc(10*sizeof(double)); // allocate 10 doubles
//...
free(buffer);                 // life finished here
free(dbls);                   // life finished here

The malloc function allocates a memory area and returns a pointer to the beginning of the allocated area. Since malloc returns void *, we have to convert the pointer to the destination pointer type.

double *dbls = (double*) malloc(10*sizeof(double));

Also, its a good idea to check teh return value against null pointer, since malloc returns NULL when was not able to allocate enough memory.

1
2
3
4
5
6
7
8
9
double *dbls = (double*) malloc(10*sizeof(double));
if ( dbls )
{
  // ok, succesfull allocation
}
else
{
  // malloc failed, dbls is NULL pointer
}

We can change the size of the allocated memory.

1
2
3
4
5
char *buffer1 = (char*) malloc(1024); // life starts, alloc 1024 chars
// ...
char *buffer2 = (char*) realloc(buffer1,2048); // re-alloc, keep content
// ...
free(buffer2);                 // life finishes here

The content of buffer1 will be preserved, if a new memory area will be allocated, then the old content will be copied to the new place.

If the first parameter of realloc() is NULL pointer, then a new area will be allocated. The second parameter may be bigger or smaller than teh original area.

Static life

Global variables, and static global variables have static life. Static life starts at the beginning of the program, and ends at the end of the program.

1
2
3
4
5
6
7
char buffer[80]; // static life is initialized automatically to '\0's
static int j;    // static life is initialized automatically to 0

int main()
{
// ...
}   // life finished here

The order of creation is well-defined inside a compilation unit, but not defined order between source-files. This can lead to the static initialization problems.

Local Static Variables

Local statics are declared inside a function as local variables, but with the static keyword. The life starts (and the initialization happens) when the declaration first time encountered and ends when the program is finishing.

1
2
3
4
5
6
7
8
9
10
11
12
13
void f()
{
  static int cnt = 0;  // life starts here on the first occurance
  ++cnt;
}
// ...
int main()
{
  while (... )
  {
    f();
  }
}   // life finished here

Scope rules

Scope rules define the section of the source where a name is binded to some C objects (e.g. variable, typename, enum, etc.).

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
// file1.cpp
#include <stdio.h>

int i;          // global, with external linkage
static int j;   // global, without external linkage
extern int n;   // global, defined in somewhere else
extern double fahr2cels(double); // function declaration

void f()
{
  int i;          // local i
  static int k;   // local k  
  {
    int j = i;  // local j, but i is the one declared above
    int i = j;  // local i, a different i then above
  }
}
static void g()  // static function: no external linkage
{
  extern int i;  // declaration only
  ++n;           // global n declared in other translation unit
}

Too wide scope can be the root of errors.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
int i;  // dangerous
int main()
{
    if ( ... )
    {
        int j = 1;
        //...
        ++i;    // ++j instead of j
    }
    int j = 0;
    while ( j < 10 )
    {
        //...
        i++; // instead of j++
    }
    for (i = 0; i < 10; ++i)
    {
        //...
    }
    ++i;    // global i   
}

Minimizing the scope is always a good idea!

1
2
3
4
5
6
7
int main()
{
    for ( int i = 0; i < 10; ++i )
    {
        // i is local here
    }
}

How (not to) make scope-life errors?:

The task:

  1. write a question to stdout
  2. read a string as answer from stdin
  3. print the answer to stdout
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
//
//  This is a VERY BAD program
//
#include <stdio.h>
char *answer( const char *question);

int main()
{
  printf( "answer = %s\n", answer( "How are you? ");
  return 0;
}

// very bad code!!
char *answer( const char *question)
{
  printf( "%d\n", question);
  char buffer[80];   // local scope, automatic life
                     // char[] converts to char*
  gets(buffer);      // ERROR1: possible buffer overrun!!
  return buffer;     // ERROR2: return pointer to local: never do this!
}

There are two big errors in the code above:

  1. The gets(buffer) function reads karakters into buffer until the first newline character. The buffer could be (and sooner or later will be) overflow. This buffer overflow problem is perhaps the most critical security errors in C.
  2. the function returns a pointer to an automatic life local variable. When we try to use the pointer, the memory behind it already gone. As the life of the local buffer is over, it may be overwritten by other values.

Lets try to fix it with making buffer to global.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
//
//  This works, but hard to maintain
//
#include <stdio.h>
#define BUFSIZE 80
char buffer[BUFSIZE];   // global scope, static life
char *answer( const char *question);

int main()
{
  printf("answer = %s\n", answer("How are you?"));
  return 0;
}
char *answer( const char *question)
{
  printf("%d\n", question);
  fgets(buffer, BUFSIZE, stdin); // OK: reads at most BUFSIZE-1 chars
  return buffer;                 // OK: buffer has static life
}

This is working (in this example), but buffer is visible in too many places. This is a maintenance nightmare. In fact, buffer is not a global concept in this program, it is only an implementation detail of answer function.

We should try to narrow the scope of buffer.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
//
//  This works, but not thread-safe
//
#include <stdio.h>
#define BUFSIZE 80
char *answer( const char *question);

int main()
{
  printf("answer = %s\n", answer("How are you?"));
  return 0;
}
char *answer( const char *question)
{
  printf("%d\n", question);
  static char buffer[BUFSIZE];   // OK: local scope, static life       
  fgets(buffer, BUFSIZE, stdin); // OK: reads at most BUFSIZE-1 chars
  return buffer;                 // OK: buffer has static life
}

This works as we expected, and the scope of buffer is minimal. However, this solution is far from perfect:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
//
//  This works, but not thread-safe
//
#include <iostream>
#define BUFSIZE 80
char *answer( const char *question);

int main()
{
  // printf("%s\n%s\n", answer("How are you?"), answer("Sure?"));
  printf("answer = %s\n%s\n", answer("Sure?"), answer("How are you?"));
  return 0;
}
char *answer( const char *question)
{
  printf( "%d\n", question);
  static char buffer[BUFSIZE];   // OK: local scope, static life       
  fgets(buffer, BUFSIZE, stdin); // OK: reads at most BUFSIZE-1 chars
  return buffer;                 // OK: buffer has static life
}
$ gcc -ansi -std=c99 -Wall -W howareyou.c
$ ./a.out 
How are you?
fine
Sure?
yes
answer = yes

yes

The problem is that we have only one buffer for the two answers, and the second answer overwrites the first one. The second answer will be printed twice.

In real world the same situation happens with concurrent programs executing multiply threads.

We need a separate buffer for each simultanious calls of answer.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
//
// BAD!!! Memory leak!
// 
#include <stdio.h>
#include <stdlib.h>   // for malloc()
#define BUFSIZE 80
char *answer( const char *question);

int main()
{
  printf("answer = %s\n%s\n",answer("Sure?"),answer("How are you?"));
  return 0;
}
char *answer( const char *question)
{
  printf( "%s\n", question);
  char *buffer = (char *) malloc(BUFSIZE);// who will free???       
  fgets(buffer, BUFSIZE, stdin); 
  return buffer;                
}
$ ./a.out 
How are you?
fine
Sure?
yes
answer = yes

fine

We finally have got two separate answers (in wrong order). But the real problem is hidden: no one freed the allocated buffers. In long run-time, with many calls of answer() we will run out of the memory!

This fenomenon is called memory leak.

The right solution is to

  1. Having an exact owner of every memory area. This case the owner is the caller function (here the main()).
  2. Use sequence points to separate sequential events.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
// 
// OK
// 
#include <stdio.h>
#include <stdlib.h>
#define BUFSIZE 80
char *answer( const char *question, char *buffer, int len);

int main()
{
  char buffer1[BUFSIZE], buffer2[BUFSIZE];
  printf("answer1 = %s\n",answer("How are you?", buffer1, BUFSIZE),
  printf("answer2 = %s\n",answer("Sure?", buffer2, BUFSIZE));
  return 0;
}
char *answer( const char *question, char *buffer, int len)
{
  printf( "%s\n", question);
  fgets(buffer, len, stdin); 
  return buffer;                
}