14. User defined operators

Video:

A good and clear interface is in the primary importance when defining a class. As documentation is frequently behind the actual code the only real information about it is the source itself. Suppose one want to use the Date class we defined in the previous chapter. The natural way would be something like the following:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
#include <iostream>     // standard header files
#include "date.h"       // date.h for date class

int main()
{
  date d1( 2016, 4);  // this should be 2016.04.01
  date d2 = d1;       // copy constructor, 
  d2 += 40;           // add 40 days  
  d1 = d2;            // assignment, now d1 is 2016.05.11

  std::cout << "d1++ == " << d1++ << '\n'; // 2016.05.11
  std::cout << " d1  == " <<  d1  << '\n'; // 2016.05.12
  std::cout << "++d2 == " << ++d2 << '\n'; // 2016.05.12
  std::cout << " d2  == " <<  d2  << '\n'; // 2016.05.12

  d3.setDate(2016,3,1);
  std::cout << "++d3 == " << ++d3 << '\n'; // 2016.03.02
  return 0;
}

This is more concise, natural and understandable than the interface we provided earlier for the Date class.

Overloaded operators

User defined operators can be implemented as member functions or as namespace functions. However, not all operators can be non-member: the assignment operators (operator=, operator+= etc.), the function call operator (operator()), the index operator (operator[]) and the array-pointer operator (operator->) must be defined as member function.

The scope operator (::) and dot operator (.) can not be overloaded at all.

When an operator is overloaded as a member, its leftmost operand will be the object to call it. When it is a member function, the arguments are just passed to the function in an ordinary way:

// when operator+ and operator~ are a member functions:
a+b  ->  a.operator+(b)
~a   ->  a.operator~()

// when operator+ and operator~ are namespace functions:
a+b  ->  operator+(a,b)
~a   ->  operator~(a)

Overloading some operators can be misleading. Logical operators (|| and &&) are shortcut operators, i.e. they strictly evaluate first the left operand and then – when it is required – evaluates the right operand. However, when the left operand is true in logical or, or false in logical and, the right hand side operand is not evaluated at all. This is not possible when the user defines his own operators as all operads are evaluated before passing them to the function (here the user defined logical operator). the same problem stands when one overloads the comma (,) operator.

We can still use the other operators. We declare the appropriate operators in the class header:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
class Date
{
public:
  // public methods as earlier ...

  Date& operator++();       // pre-fix increment
  Date  operator++(int);    // post-fix increment
  Date& operator+=( int n); // add n days
  /* operator--(), ... */
private:
  void checkDate( int y, int m, int d);
  int year;
  int month;
  int day;
};

As for the built-in integral types (int, etc.) we want to define separate pre-fix increment and post-fix increment ++ operators with different semantics. The C++ convention is that the operator++ with no parameter is the pre-fix increment, and the one with an int parameter is the post-fix version. The latter usually does not use its parameter which is optimized out from the actual code by the compiler.

In the cpp file we implement the three new methods:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
Date& Date::operator++()
{ 
  next(); 
  return *this;  // return reference to the object
}
Date Date::operator++(int)
{ 
  Date old(*this); // save a copy of the old value
  next(); 
  return old;    // return a copy of the object
}
Date& Date::operator+=( int n)
{ 
  add(n);
  return *this;  // return reference to the object 
}

We have to explain the difference between the implementations of the two increment functions. The semantics of the pre-fix increment is to first increment the value and then returning the new value. This is straightforward with the existing next function. The post-fix increment have to increment the value and then returning the old value. But how can we return with the old value? We have to store it somewhere. We use the variable old to store the old value. Then we increment the current value and at the end we return with the old value.

Pre-fix increment and operator+= returns with a reference to themselves. This is cheap and straightforward. However, as the post-fix operator returns with a local variable, that operator must return a value.

This behavioral difference exists even for the operators defined for the built-in types.

Operators on input and output streams

Input and output operators are natural targets of operator overloading. In the standard <iostream> header the standard library provides a number of overloding versions for these operators for the built-in and libraries. Some of these are implemented as member functions of the std::ostream and std::istream standard classes.

However, when we want to overload the input and output operators for a custom type we have to consider that a member operator is always called on the left operand. Here that would be an object of the standard input/output classes. Naturally, we have no right to modify any standard class. Therefore, our only option is to implement the input and output operators as namespace functions.

1
2
3
4
5
6
7
8
9
#include <iostream>  // to declre istream and ostream
class Date
{
public:
  // public and private methods as earlier ...
};
// namespace operators for input/output
std::istream& operator>>( std::istream& is, Date& d);
std::ostream& operator<<( std::ostream& os, const Date &d);

Notice the signature and return value of the operators. The leftmost parameter is the stream object we target for i/o. It is always passed bey reference. Input/output streams are non-copyable classes (however they can be moved since C++11), therefore we pass them by reference. naturally, the second parameter for input operation will be changed on the call, so it is also passed by reference. The output operator, however, is not intending to change its parameter.

The return value in both cases is the reference to the leftmost parameter. The reason is that these operators can be – and frequently is – used in chain, therefore each calls should return the actual (an d perhaps modified) stream object for the next i/o operation:

int i1 = 10, i2 = 20;
Date d1(2016), d2(2015);
std:.cout << i1 << i2 << std::endl;
std::cout << d1 << d2 << std::endl;
// means:
std::cout.operator<<(i1).operator(i2).operator<<(std::endl);
operator<<(operator<<( operator<<( std::cout, d1), d2), std::endl);

The implementation of the operators are strateforward. Notice the return statement!

1
2
3
4
5
6
7
8
9
10
std::istream& operator>>( std::istream& is, Date& d)
{
  d.get( is);
  return is;  // important for chained reads
}
std::ostream& operator<<( std::ostream& os, const Date &d)
{
  d.put( os);
  return os;  // important for chained writes 
}

Namespace vs. member operator

We want to further improve our Date class. Some of the clients want to compare two dates, i.e. the earlier date is the smaller. As the comparision operators can be defined either as member operators or as namespace operators, we have two choices.

1
2
3
Date d1(2016), d2(2015);
d1 < d2   //  d1.operator<(d2)  if member operator
d1 < d2   //  operator<(d1,d2)  if namespace operator

Some of the object-oriented textbooks would suggest to use the member function choice, as that would better express the cohesion between the data structure and the operations executed on it.

In this case, the users can use the Date class in the following ways:

1
2
3
4
5
6
7
8
9
10
11
12
#include <iostream>
#include "date.h"

int main()
{
  Date d1(2016), d2(2015);
  //...
  if ( d1 < d2 )    // ...  means: d1.operator<(d2);
  if ( d1 < 2000 )  // .... means: d1.operator<(2000)
  if ( 2000 < d2 )  // SYNTAX ERROR! no 2000.operator<(d2) 
  // ...
};

In line 8. we compare two Date objects, the earlier date is the smaller. In line 9. we compare d1 to 2000.1.1. The reason of this behavior is that Date constructor accepts a single integer as a constructor parameter (the year), and defaults the month and day parameter to 1. A constructor of class X taking a single parameter of type Y can be considered as a conversion operator from X to Y. when passing the integer argument 2000, the compiler first checks whether we have an overloaded version with exactly these parameters. When there is no match, but it is possible to convert the argument(s), then the automatic conversion will be happened.

However, we have got syntax error on 2000 < d1. The reason is that automatic conversion happens only in function argument position, e.g. when we are passing an integer to the Date::operator<(Date).

Therefore, we argue for implementing commutative operators as namespace functions. When we implement the less operation as a namespace function: bool operator<(Date, Date) the two operands are in a simmetric position, the same conversions will be applied for them. However, when we choose the member function implementation, the left operand will be in a special environment: no conversion will be applied for it.

To complete the task, first we declare the appropriate operators in date.h:

1
2
3
4
5
6
7
8
9
10
class Date
{
  //...
};
bool operator<(const Date &d1, const Date &d2);
bool operator<=(const Date &d1, const Date &d2);
bool operator>(const Date &d1, const Date &d2);
bool operator>=(const Date &d1, const Date &d2);
bool operator==(const Date &d1, const Date &d2);
bool operator!=(const Date &d1, const Date &d2);

And then we implement the methods in date.cpp:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
bool operator<( Date d1, Date d2)
{
  return (d1.getYear() <  d2.getYear())
      || (d1.getYear() == d2.getYear() 
                  && d1.getMonth() <  d2.getMonth())
      || (d1.getYear() == d2.getYear() 
                  && d1.getMonth() == d2.getMonth()
                  && d1.getDay()   <  d2.getDay());
}

bool operator==( date d1, date d2) { return !(d1<d2 || d2<d1); }
bool operator!=( date d1, date d2) { return d1<d2 || d2<d1; }
bool operator<=( date d1, date d2) { return !(d2<d1); }
bool operator>=( date d1, date d2) { return !(d1<d2); }
bool operator>( date d1, date d2)  { return d2<d1; }

To implement the other 5 operators using the less operator is a safety startegy: we hear can ensure the consistency between the operators. In this case we do not need to bother with efficiency: such trivial function bodies are optimized very well by the C++ compiler, we will hardly see the performance difference compared to the direct implementations not using the less operator.

Inline functions

We can improve the optimization possibilities defining the operators (and any other methods) in the date.h header file as inline function.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
// in the header file
class Date
{
  //...
};
inline bool operator<(const Date &d1, const Date &d2)
{  
  return (d1.getYear() <  d2.getYear())
      || (d1.getYear() == d2.getYear() 
                  && d1.getMonth() <  d2.getMonth())
      || (d1.getYear() == d2.getYear() 
                  && d1.getMonth() == d2.getMonth()
                  && d1.getDay()   <  d2.getDay());
}
inline bool operator>( date d1, date d2)  
{ 
  return d2 < d1; 
}
inline bool operator<=( date d1, date d2) 
{ 
  return !(d2 < d1); 
}
inline bool operator>=( date d1, date d2) 
{ 
  return !(d1 < d2); 
}
inline bool operator!=( date d1, date d2) 
{ 
  return  d1 < d2 || d2 < d1; 
}
inline bool operator==( date d1, date d2) 
{ 
  return  ! ( d1 == d2 ); 
}

C++ compilers use a large variety of optimizations. One of the most powerful tool is inlining, e.g. the compiler replaces the function call with the body of the called function. Calling a function is not always cheap: arguments should be copied to the stack, registers may have to be saved, etc. For certain small functions even the generated code of the function call is longer than the called function body itself. In these situations inlining may lead to significant performance gain.

When we place a function definition into a header file, the function body will be visible for the compiler in every place we call the function. in this case we must not repeat the function definition in the source (data.cpp) file. The inline keyword has two effects:

Give a hint for the compiler to inline the function. Note that inlining is not mandatory for the compiler, e.g. function recursion, and other specific situations may restrict inlining (or make it partial). Also, when we set a function pointer addressing an inline function, the function must be generated as usual, and can be called via the function pointer.
The linkage name of the function will be specific to the compilation unit. When a function definition is given in a header included to multiple compilation units, the function will be compiled into each object files idependently. In a normal case, when we attempt to link these objects, we get ambigous reference error. To avoid this, every function instance in every different object files must have different linkage name.

Functions defined inside the class curly brackets are automatically inline. In this case we do not write the inline keyword.

1
2
3
4
5
class Date
{
  int getYear() const { return year; } 
  // ...
};

Financed from the financial support ELTE won from the Higher Education Restructuring Fund of the Hungarian Government.