-
Notifications
You must be signed in to change notification settings - Fork 0
C Primer
The main thing that C++ adds to C is the entire new concept of objects. Objects aren't just a new feature that's available to the language, it's the foundational structure in the language, and will be the primary tool used when creating any project.
This change requires something of shift in headspace, it's one of those things that if you don't get it immediately will become a massive "oh! now I get it!" moment later.
In essence, C++ encourages you to define your own types and write the code within those types to make them behave however you need to. This means you can write code that is entirely self-contained without polluting the rest of the application space with unexpected clutter.
The pattern is that you write a class
and put your code & any variables you need into it. Then the outside program makes instances
of your classes, called objects. Each object gets its own copy of all the variables it holds. When you're writing the code to implement an object, you only naturally get access to the variables you declare in your class
.
As an example:
class Counter {
private:
int count = 0;
public:
void increment() {
if (count == INT_MAX) {
count = 0;
} else {
count += 1;
}
}
int value() {
return count;
}
};
void main() {
Counter a;
Counter b;
std::cout << "Initial values - a: " << a.value() << ", b: " << b.value() << std::endl;
// Increment a twice
a.increment();
a.increment();
// Increment b once
b.increment();
std::cout << "Final values - a: " << a.value() << ", b: " << b.value() << std::endl;
}
When run, this should output:
Initial values - a: 0, b: 0
Final values - a: 2, b: 1
This small snippet demonstrates quite a bit. First of all, notice that the main
function declares 2 instances of the Counter
type. They are completely independent, just as if you'd declared 2 integer variables instead. You can have as many instances of your class in your code as memory allows.
Also notice how the code inside the Counter class accesses only the member variables on its class. For example, inside the increment
method, we only have access to the member variables of our class (in this case, count
), any local variables we declare in the method, and the global variables.
One thing that's not obvious in the above example is that there is no way for the code in the main
function to tamper with the counters. If you tried to write a.count = 4;
in the main function, it wouldn't compile. This is because the count
member of the Counter
class is private
. private
means that a member can only be used by the class
that defines it. This can be used to prevent misuse of objects by outside code that might not have the full picture of what needs to be done. The only thing that can be done with our counters is you can increment them by 1 and you can get their value. You can't reset them, you can't randomise their values. How their state is changed is defined solely by the methods on the class.
If you wanted to give the main()
function access to the count
member, change private:
to public:
in the above snippet. Then main can set the counter however it likes. Also notice that we now have no guarantees about what the counter value may be. The count value might be negative, something that couldn't happen before.
There is another way of declaring a class, using the struct
keyword. struct
s only difference over classes is that everything is public by default, where in classes everything is private until you say otherwise. The following definitions are exactly the same:
class Customer {
public:
std::string name;
std::string address;
};
struct Customer {
std::string name;
std::string address;
};
There is literally no difference between these declarations.
The definitions above don't actually create anything. Declaring a class does not allocate any memory nor does it run any code. It is only defining a template that can be used to create an object later.
Classes and struct definitions can be nested. For example:
struct Customer {
class Counter {
int count = 0;
public:
void increment() {
count = (count + 1) % INT_MAX;
}
int value() {
return value;
}
};
std::string name;
std::string address;
Counter orders;
Counter complaints;
void addOrder();
};
As an exercise, lets implement the addOrder
method and see what's available to us:
void Customer::addOrder() {
orders.count = 5; // This will fail, we don't have access to private members even if they're declared inside nested classes. How could we get access if we needed it and it made sense?
complaints.increment(); // Yes, we have access to the increment method of our members.
count = 4; // This will fail. We don't have our own count member variable. We can't access the member variables of other structs directly, we need to say which instance we're accessing - orders or complaints. Otherwise there's no way for the compiler to know.
}
Here, our Customer
has 2 counters, the number of orders and the number of complaints. It makes sense that these are separate things that would be tracked independently. The code inside addOrder
is not intended to make sense as an implementation, it's more to show what we have access to and what we don't.
Notice how this method is implemented outside of the class
definition. This is the common way to implement methods. Generally the class
definition is kept as short as possible to act as a quick reference to what's available in the class
. The implementation, which can often be longer, gets broken out. Usually we put class
definitions into a .hpp
file, and the implementation code goes into a .cpp
file. The implementation is still limited to accessing only the members on the current instance of the class.
Every time a class references another class, that's a dependency. It is entirely possible to get into a situation where two classes each reference each other. This is a problem, because the C++ compiler will only read your code from top to bottom once. As it goes, it allocates tables of data sizes and offsets into structures. If at any point it can't work out how large something is or where a method is, it will fail.
Lets consider a situation where a widget has a button, and when the button is activated, it sets a variable in the widget. How might we try to model this?
struct Widget {
bool flashing = false;
Button startFlashing;
};
struct Button {
Widget parent;
void onAction() {
parent.flashing = true;
}
};
This approach just won't work. When the compiler comes across struct Widget
it tries to calculate how many bytes it needs to allocate when someone creates this widget. The widget contains two things, first a bool
, that's fine, a single byte can be used to allocate that. But now it comes to Button
, but it hasn't seen a Button yet, remember it only reads from top to bottom, and only once. So it can't work out how big the Widget should be and fails.
What if we reversed the definitions?
struct Button {
Widget parent;
void onAction() {
parent.flashing = true;
}
};
struct Widget {
bool flashing = false;
Button startFlashing;
};
Well, then we'd get the same situation, the Button
needs to contain a Widget
, but we don't know how big a widget is because we haven't seen it. So we don't know how many bytes to allocate for our Button objects, so we fail.
There's a deeper and more insidious problem here too. The button actually contains a new instance of a widget. Imagine if we could compile this. When we create our top-level widget, we create a button, because a widget contains a button. Then the button has its own complete Widget inside it, so we create one of those - but a widget has a button inside ... and so on. We don't want our button to have its own widget, we want our button to know about its parent widget.
We can do this with pointers.
struct Button {
Widget *parent;
void onAction() {
parent->flashing = true;
}
};
struct Widget {
bool flashing = false;
Button startFlashing;
};
Notice our button now contains a pointer to a Widget, not an entire widget instance. The compiler will still fail, because as it reads down, it doesn't know what a Widget is. But we're closer. One useful trick C++ gives us is a "forward declaration". The only job of this statement is to tell the compiler "don't worry, it's coming". A forward declaration looks like this:
struct Widget;
- notice there's no clue as to what's in the Widget, what its baseclass is, or anything. It just says that there's something called a Widget and we'll find it later.
Now we have:
struct Widget;
struct Button {
Widget *parent;
void onAction() {
parent->flashing = true;
}
};
struct Widget {
bool flashing = false;
Button startFlashing;
};
The compiler is now actually very happy about the pointer to the parent widget. The compiler knows how many bytes to allocate for a pointer - it's always the same, 64bits on a current modern computer. The code will now fail to compile the onAction
method. This is because as we read from top to bottom, the compiler knows Widget exists, knows how to create a pointer to it, but now we've got to access something inside the object we're pointing to. The compiler doesn't know where the flashing variable is inside the Widget structure because all it's seen is the forward declaration. So it can't output the code needed to say, for example: "go 4 bytes after the address to parent
and store the byte value 1
there".
Basically, it hasn't seen the "flashing" variable defined, so can't use it. We could try adding flashing
to the forward declaration, but now we get back to the situation where Widget
was defined first, which doesn't work.
What we need to do is break out the implementation of onAction
. We can do it like this:
struct Widget;
struct Button {
Widget *parent;
void onAction(); // name the method, give its return type and list all the parameters, but no implementation
};
struct Widget {
bool flashing = false;
Button startFlashing;
};
void Button::onAction() { // Actually implement onAction
parent->flashing = true;
}
Now as we read from top to bottom, we see: Widget
exists and will be defined. Button exists, has a pointer to a Widget and a method called onAction which will be defined later. Then we see our Widget and that it has a variable called flashing
and Button
variable called startFlashing
. Then we see a function, which is for the Button
class, that's fine, we've seen that. Called onAction
, which we've already seen, so we can tie those up. It accesses the parent variable, which is declared on the Button
class, so that's good. And we're accessing flashing
, and we've seen the flashing
variable and where it is inside the Widget
, so that's all good.
The only problem now is that we've never set the parent
pointer. In this case if we ran the code, the pointer would have some random value in it from previous memory and we'd try to overwrite something that wasn't actually a Widget. Probably causing a crash and almost certainly corrupting memory.
We need to set the pointer on the Button, since we need to introduce some logic, we need constructors. A Constructor
is a method that gets run when an object is first created.
struct Widget;
struct Button {
Widget *parent;
void onAction(); // name the method, give its return type and list all the parameters, but no implementation
};
struct Widget {
bool flashing = false;
Button startFlashing;
Widget() {
startFlashing->parent = this;
}
};
void Button::onAction() { // Actually implement onAction
parent->flashing = true;
}
This will now work fine. We need to do some work whenever we create a Widget
object, so we add a constructor. The constructor sets the parent
value of the Button to itself, so now the Button can call back into us.
Above, was the cheap, nasty, minimal, but effective way to break this kind of circular dependency. There are ways to make this much better. For example, it's very easy to forget to set the parent
of the Button. We can have the compiler give us an error if we forget to do this. We do this by creating a constructor on the Button that needs a Widget pointer passed into it. If we have this, we can't create a Button without passing in a Widget.
struct Widget;
struct Button {
Widget *parent;
Button(Widget* p) {
parent = p;
}
void onAction(); // name the method, give its return type and list all the parameters, but no implementation
};
struct Widget {
bool flashing = false;
Button startFlashing;
Widget() : startFlashing(this) {
}
};
void Button::onAction() { // Actually implement onAction
parent->flashing = true;
}
This is a lot safer as now we can't forget to set the parent widget. Notice that the form of the constructor for Widget has changed. This is the syntax for passing values into the constructor for our member variables.
Another problem is that Button knows about Widget explicitly. This limits our button to only interacting with things that are derived from Widget. This might be fine, but if it's not, we'll need another more flexible approach. This would be to pull out the things that are common between Button and Widget into a separate struct and use that. This would solve all our circular dependency issues in one go.
struct FlashingThing {
bool flashing = false;
};
struct Button {
FlashingThing *parent;
Button(FlashingThing* p) {
parent = p;
}
void onAction() { // Actually implement inline, we know everything we need to about flashing things
parent->flashing = true;
}
};
struct Widget {
FlashingThing isFlashing;
Button startFlashing;
Widget() : startFlashing(&isFlashing) {
}
};
This is one way to break the cycle. The Widget has a flashing thing, which it shows to the Button and the Button can use it with impunity. Anything can contain a flashing thing, so we're not limited to dealing with Widgets. Notice there's also another option:
struct FlashingThing {
bool flashing = false;
};
struct Button {
FlashingThing *parent;
Button(FlashingThing* p) {
parent = p;
}
void onAction() { // Actually implement inline, we know everything we need to about flashing things
parent->flashing = true;
}
};
struct Widget : FlashingThing {
Button startFlashing;
Widget() : startFlashing(this) {
}
};
Notice in this case, Widget
derives from FlashingThing
. We're saying Widget
is a FlashingThing
. And we tell our button about us with the this
pointer in the constructor.