Metaclasses #1 – Metamorphosis

Have you ever heard about meta-classes in Python?
The common answer is “no”. Metaclasses are also known as black-magic-voodoo. 99.9% of the time, you won’t need them.

Nevertheless, not needing something won’t stop us from learning how it works, for the sake of curiosity!

Where to start?

First things first, let’s go to [wikipedia].
We can summarize the definition of metaclasses as follows:
An instance is an instance of a class.
A class is an instance of a metaclass.

MyClass = MyMetaclass(...)
my_instance_1 = MyClass(...)
my_instance_2 = MyClass(...)

Getting one step closer

But how can we create metaclasses? And what parameters to they need for creating a class?

Here’s something that we’ve all done at some point – The inverse operation of creating an instance.
If instance = Class(), then we know that type(instance) == Class.
Going one step further : if Class = Metaclass() and instance = Class().
We can also write it as instance = Metaclass()()
Then we can inverse it, and get Class = type(instance) and MetaClass = type(Class).

Let’s see that in action, for some familiar objects:

OK. Interesting.
So the metaclass of the builtin classes is type.
The obvious question is who’s the meta-class of type?


Hmm.

Apart from the confusion of type being its own metaclass, we got ourselves a lead – type is a metaclass!
To the documentation we go!

Type: a two-faced function/class/metaclass/something.

[The docs] show us two different function signatures:

The first is the signature we all know and love.
Except..

Into the Side Story – Metamorphosis

(Note: from here on out, we’ll dive into a side story. It is not needed for the understanding of metaclasses. Feel free to go to [part 2], in which we will really talk about metaclasses)

The docs say that type(object) usually returns the same return value as object.__class__. Let’s check it out.

Interesting.
Apparently, type falls for our __class__ assignment.
(It’s also interesting that __class__ is not read-only)

We can also state this conclusion in another way – using the Tree of Research!
From now on, Tree of Research conclusions will be colored (woohoo!)

Assumptions to delete:
– An instance is bound to its creator class
– The __class__ attribute is read only

Let’s look at that behavior again, this time, with classes that have some methods.
We’re interested in seeing how method resolution works after changing classes.

Metamorphosis – Checking the behavior of methods

First, let’s define 2 classes:

a is a normal object. Nothing interesting so far.

This was just as expected.
Now, let us change its class:

Aha.
It seems like the method resolution is being determined by __class__ – this is why a.a doesn’t exist, while a.b does.
We can also notice this in the AttributeError message, which changed from 'A' to 'B'.
However, the __init__ is only called once, when a was initialized. Thus, it is only A.__init__ that was called..
We can further understand it by looking at __dict__

If you’re less familiar with what we did here, you can look at it like so:
Imagine an instance as some dict, some namespace.
You can add keys to that namespace by typing my_instance.key = value (this is what’s usually being done in __init__).
This is how we add keys. Getting the value behind keys is a bit different.
First, Python searches for values in the “namespace” of the instance, i.e. in my_instance.__dict__.
If it fails to find the key, then it goes to the class of that instance, and searches there.
if it fails again, it shall go through the Method-Resolution-Order (mro), which is the inheritance chain, in order to find that key.
Here’s an example:

This is a simple inheritance structure.
We can further populate b‘s attributes like so:

Next, let’s access 4 different keys: ‘a’, ‘b’, ‘c’, and ‘z’:

It all went as expected.
Behind the scenes – accessing ‘c’ only meant accessing b.__dict__.
Accessing ‘b’ meant accessing b.__dict__, failing, and then accessing B.__dict__.
Accessing ‘a’ meant accessing b.__dict__, failing, accessing B.__dict__, failing, then accessing A.__dict__.
Accessing z meant failing in the following order: b.__dict__, B.__dict__, A.__dict__.

Here are B.__dict__ and A.__dict__, in which we can see the keys a and b:

Plus, we can look at the method resolution order like so:

Tree of Research interpretation:
If we weren’t aware of the way python handles attribute access, we could assume either of 2 possibilities:
– When an instance is created, its inner dict is being populated by the attributes from all of its predecessors
– When an instance is created, it only gets a pointer to its parent, thus keeping a linked list of dicts to find attributes from.

In the test we’ve just done, we emphasized the difference between the two options.

Another interpretation could be that the instance only has its own and its parent inner dict, with the other predecessors being pointed to in a linked list fashion.
Let’s check that one out:

Note that here, instead of accessing each attribute manually, I was lazy and used dir, and filtered out internal attributes (those that start with '_')

Tree of Research Conclusion:
– When accessing an attribute from an instance, there are multiple ordered attempts at accessing keys from dicts, first is instance.__dict__, then is class.__dict__, followed by parent_class.__dict__, and so on and so forth.

Short Summary #1

So far, we’ve seen that calling type(object) is related to object.__class__.
We’ve seen that the __class__ attribute is not read only, and that by changing it, we were able to fool type.

Then, we talked about attribute resolution, about method resolution order, and about __dict__.

Metamorphosis – Checking the behavior of __class__

So far, we’ve only accessed a “static” value of __class__. How about “dynamic” values?
Let’s see if type actually calls __class__:

And… the answer is no 😦
type ignored our __class__ definition. It didn’t call it, and it just ignored it when we asked it for type(b).

Tree of Research Update:
– Accessing __class__ is not straight forward, and there may be some magic in the underlying C code.

Hope is not lost (for now, at least), since we still have an idea!
We know that sometimes programmers use isinstance checks instead of type checks.
Let’s see if we can fool isinstance:

Woah. Not only did we fool isinstance, and got isinstance(b, A) == True, but also type fooled us, and made b an instance of B.

Tree of Research shocking update:
– An instance can have multiple parent classes at the same time? (ignoring inheritance)

But how can it be?

Diving into the C

Put you swimming suit on, grab your mask and fins, and fill a large air cylinder – We’re going to dive deep into the C.

Opening [CPython] is the first step. But now we face the question – what should we ask the code?
How can we get the answer we seek?

First, I started with the regex search \b__class__\b.
(For those who are not familiar with \b, it stands for “boundary” – meaning it finds the boundary where the “text ends”.
For example, x\b will match "x", "x(" "x.", but won’t match "xy")

After getting too many results, I filter out .py files, since I find it unlikely that the implementation for type, and its interaction with __class__ will be implemented in Python.

And voila! a reasonable amount of results.
Ruling out results that are straight up irrelevant (such as __class__ appearing in a comment, not in the code), we end up with just 4 files, where the first 2 have names that sound relevant

C dive – first result – abstract.c

The first result is very short, and points us in a new direction:

Tree of Research – new lead incoming
– What is _Py_IDENTIFIER?

We find the following definition:

Since this is written in C, it is best read bottom up.

The description of the object says:

This structure helps managing static strings. The basic usage goes like this:
Instead of doing

   r = PyObject_CallMethod(o, "foo", "args", ...);

do

   _Py_IDENTIFIER(foo);
   ...
   r = _PyObject_CallMethodId(o, &PyId_foo, "args", ...);

This makes sense. CPython tries to optimize constant string management. And "__class__" sure does sound like a constant string that will be in use.

Thus, we shall expand our search and look for PyId___class__

The same files appear.

Since we’re just wandering, there’s no reason not to follow each match, take a look at some functions at its vicinity, and learn something from it.

And indeed, the first two results, which are the only results from abstract.c, put us just where we wish to be – inside object_isinstance:

Going over the interesting lines, we see that:
In line 2412 we define the _Py_IDENTIFIER of __class__
In line 2413 we make sure that the 2nd argument to isinstance is a class.
Otherwise, we reach the else, and return an error message
(Ignoring the other case of the else)
Line 2416 looks like the important one. It looks like the line that implements icls = inst.__class__.
Below it is the isinstance check, that appears to be implemented using PyType_IsSubtype.

If you’re not used to reading C, that may have been intimidating. Let us brief the conclusions from this code:

It seems like isinstance starts by checking if the instance is an instance of the class, using a function called PyObject_TypeCheck.
If that returns False, it goes on to do another check – this time accessing instance.__class__ “dynamically”, using _PyObject_LookupAttrId (the Id suffix stands for “identifier”. It merely wraps the access to the string inside the identifier struct).
Now, remember our result from earlier?

This looks like exactly it!
The first check gets the real class of b, which is B.
And when that fails, LookupAttr is called, and uses __getattribute__, which is what we used to print that line, and then also passes the test.

C dive – second result – typeobject.c

Next up, We’ll look at the result from typeobject.c.
However, even before that, since we’re going into typeobject.c, I thought of a shortcut for us to get our answer as to how type(object) works – Let’s go right into the definition of type, and look at its constructor.

Below is a slice of the definition of the type object:

The function that caught my eye is (ternaryfunc)type_call, since we know that we interact with type by calling it, and that it requires 1 or 3 arguments.

And indeed, the first thing that type_call does is handle type(object):

We can see that:

A) I was wrong, and ternaryfunc probably means that it takes type, args, kwargs, and not that the args is a tuple of length 3.
B) only type(object) has the behavior of returning the type of the object. A metaclass that inherits from type won’t suffice. (as can be seen in the below screenshot)
C) There are many checks, to make sure that only one argument was sent, and that it was sent by position.
D) The important line is 989 – simply calling Py_TYPE

And Py_TYPE returns object->ob_type:

Where, a few lines above this define, lies the PyObject definition:

Thus, we can assume that somehow, the moment we initialize a = A(), we create a new object, and set its ob_type.
This, however, leaves 2 open questions:
A) We need a proof, to see when ob_type is being set
B) We need to see why setting a.__class__ changes type(a).

Tree of Reaserch – Our current state

Additionally, let me remind us our research questions:
1) How does type know the type of the instance? (answered)
2) How does isinstance differ, and why could we fool him? (answered)
3) How does the binding between the instance and the class happen? (not answered)

Back to our research, we stopped at ob_type, and we’re now looking for places that change or set this value.

Starting with (A – We need a proof, to see when ob_type is being set), I went for the following regex: "->ob_type\s*="
It led me to define Py_SET_TYPE(ob, type)
Looking for calls to Py_SET_TYPE resulted in:

The first shown result resides in the function _PyObject_Init.
The second resides in object_set_class.
Looks like we’re close to both our answers.
(
reminder – our open questions:
A) We need a proof, to see when ob_type is being set
B) We need to see why setting a.__class__ changes type(a).
)

Starting with the second, we can learn a lot just by reading the function, with a great focus on the initial checks:

A) we see that “set“, when value == NULL, actually behaves as “delete“.
B) Line 4144 tells us that we’re in the right place, object.__class__ = instance is forbidden.
C) if compatible_for_assignment : Py_SET_TYPE (lines 4215 & 4219)

Cool.
Let’s see where is this function used:

With object_getsets appearing in:

Well, that’s something new, but pretty straight forward.
It appears that in the object class, there’s a list of attributes that have special getters/setters, and in the case of object, only __class__ has a special setter, which updates ob->ob_type

Cool.
So far, we’ve built the following picture:

Tree of Research – How does Python know what’s the class of an instance? – Summary
What’s known:
– Calling type(object) returns ob->ob_type, where ob is PyObject
– Calling isinstance(obj, cls) first checks obj->ob_type (like type(obj)), but then, also accesses the __class__ attribute.
– Setting object.__class__ changes ob->ob_type
type is the metaclass of all classes.
– the __class__ string is stored using _Py_IDENTIFIER(__class__)

What’s assumed:
– We assume that at the initialization of a PyObject, its ob_type is set.

Cool.
Onto the other match.
We searched for Py_SET_TYPE, and already looked at the lower result. Now let’s look at the upper one:

Well, this result looks right on target.
– The name of the function is _PyObject_Init
– It takes a PyObject and a PyTypeObject
– It calls Py_SET_TYPE
– It calls some function named _Py_NewReference

Let’s not dive inside this function. Rather, we’ll go outside, x-ref by x-ref (x-ref is a term coming from the Ida software. In Ida, pressing “x” gave a list of references for some function. Thus, x-ref is a term for saying “who uses this function?”)

We see that this function is called internally.
Filtering those results out, we see

The first coming from the function PyObject_Init, the second from _PyObject_New.

We find out that both are exported functions:

However, a match that we skipped:

Turned out to be the function PyType_GenericAlloc,
which seems to be used quite often:

In other words, the allocation of many objects calls PyType_GenericAlloc, which calls _PyObject_Init, which calls Py_SET_TYPE.

Other occurences include:

The first is the definition of the function.
The second is deep inside a long function called type_new. Nice.
The third is inside PyType_FromModuleAndSpec. I did not find this result interesting.
The forth is inside PyTypeObject PyBaseObject_Type
The fifth is inside PyTypeObject PySuper_Type

Overall, this sounds like the result to our questions. Phew. Finally.
Let’s wrap up our research with a picture of a tree of our conclusions.

With this, we can conclude our journey:

We started in Python, messing around with setting __class__.
We fooled type, and were fooled by isinstance, who told us that one of our instances is an instance of 2 different classes.

Then we went into C. Overall, we had 3 different leads: __class__, PyId___class__, and Py_SET_TYPE.
The second of which lead us to abstract.c / object_isinstance, which told us that isinstance both checks the “inner type” (using PyObject_TypeCheck) and the “dynamic type” (using _PyObject_LookupAttrId).
It also lead us to typeobject.c, from which we figured out that ob->ob_type is the main actor in our story.

Using the third lead, we found out that
A) object_set_class is a special function that’s invoked when obj.__class__ = cls is called.
B) PyType_GenericAlloc is a core function for building instances.

Happy and satisfied, with all our questions answered, we still have 2 more questions.
(Of course we have more questions. Did you really think that our curiosity will go to sleep after this session?)

The first is the original question of this post – what are metaclasses, and how do they work?
This will come in a later post.

The second is – looking inside typeobject.c, we reached PyTypeObject PySuper_Type, which is the definition of the type of the super() object.
Nearby, we have

Interesting! How does super work?

Though this won’t be answered this time. This post is too long, with part 2 already in the making.
It may come in part 3 (no promises)


This is part 1 out of 6 in the metaclass series.
– Part 1: [Metamorphosis] (about changing the type of an object)
– Part 2: [Metaclasses] (about metaclasses)
– Part 3: [Self-Knowledge] (about passing self to methods)
– Part 4: [Super] (about the super object)
– Part 5: [collections.abc] (about __subclasshook__)
– Part 6: [abc.ABCMeta] (an example of meta-class usage: about abstract methods)