Lessons I should have learned, Episode 2: hiding your data
Episode 2: hiding your data
Let's consider a contrived example where you are representing people and their relationships. We want to represent a person's name, social security number, address, and have pointers to mother and father.
typedef struct Person { char *name; char *address; int ssn; Person *mother; Person *father; } Person;
Now, let's say that we have some library which will create a queriable tree for us.
Person *mandy = insertPerson("Mandy", \ 123331234, "Hut #13" 0, 0); Person *naughtius = insertPerson("Naughtius", \ 349830123, "Centurianville #2", 0, 0); Person *brian = insertPerson("Brian", \ 593013297, "Hut #13", mandy, naughtius);
This is all well and good. But there's a problem. The user of this library can do something like this:
Person *mary = insertPerson("Mary", \ 012309821, "Bethleham", 0, 0); brian->mother = mary;
There might possibly be a use-case where we want to allow this to happen, but in general this isn't desirable whatsoever. We really want to keep the fields mother, father, and ssn immutable. We can do this in the following way:
/* FILE: library.h */ typedef struct Person { char *name; char *address; } Person; int getSSN(Person *); Person *getMother(Person *); Person *getFather(Person *); Person *insertPerson(char *, int, char *, Person*, Person*);
Now we have a struct Person which exposes only information that we will allow to be mutable. How can we use this cleanly and effectively? We define the following:
/* FILE: library.c */ #include "library.h" typedef struct privatePerson { Person p; int ssn; privatePerson *mother; privatePerson *father; } privatePerson; ...
How does this help us? First, consider the following:
privatePerson *mary = ...; Person *mom = (Person *) mary;
What will happen here? First, recall how struct layout works. Since Person p is the first entry in struct privatePerson then the offset in the privatePerson struct is 0. This means that the memory location of Person p is the same as the struct privatePerson which contains it. So, when we cast to the Person pointer we are, in fact pointing at a Person object. So, insertPerson(...) will use a struct privatePerson internally but will return a struct Person. As will getMother(...).
/* FILE: library.c */ ... Person *getMother(Person *p) { privatePerson *who = (privatePerson *) p; return (Person *) who->mother; }
This example illustrates how we can also cast back to a privatePerson from a Person. Note that this REQUIRES that the pointer to the struct Person also be a pointer to a struct privatePerson, i.e. you've already cast it once. We can't ensure that our user will never malloc his own struct Person. The best we can do is provide factories for construction (insertPerson above) and strongly document that doing so is unsupported and will lead to undefined behavior.
Because struct privatePerson is defined within library.c, it won't be visible to anyone who includes library.h. Just like with the previous post on opaque pointers, this isn't magic. The user can still opt-in to violate our model by including his own definition of a privatePerson struct and casting back to it. Though hopefully by doing so the user will be perfectly aware that he's breaking our API and thus is writing code that could potentially break in the next minor point release of the library.
February 17th, 2010 - 08:56
Good breakdown of the lesson learnt – important as ever.
February 17th, 2010 - 12:48
Good idea simulating some object-oriented-ness in C, but unfortunately I don’t like such magical constructs since it becomes a maintainability problem. People rarely read code comments or documentation (atleast I don’t) and try to wing their way through things. And I hate it when I have to debug issues in such code where I have to constantly keep a mental picture of what you are doing behind the scenes — I’d rather have simple code that does things simplistically — it’s easier to read and debug. You are providing an abstraction, to ensure good coding practice by users of your structure, but unfortunately you are mostly relying on documentation to enforce things.
February 17th, 2010 - 13:21
@Grok2.
I definitely agree about people neither reading documentation nor comments. I often glance over them myself. Perhaps you’re right about this leading to a problem, especially with this contrived example. But, instead consider a situation where you are providing some datastructure that you want the user to freely be able to mutate some fields but not others. Usually in the case of such structures, it doesn’t make sense to allocate your own Node objects anyway. You usually get pointers to them via some sort of “insert” function. This is the case I’m more aiming toward. One in which the API and the use-case lends itself to being in charge of the creation and deletion of objects, instead of the user.