Python Internals: Optimized OR, AND

This one’s a short one.
Let’s talk about optimizing and, or calculations!

What’s there to optimize?

Both and and or are binary functions, meaning that they take 2 inputs.
In addition, each has a special case in which the output can be determined by the first input alone.

In the case of and, we can see that if the first input is false, then the output is surely false.
Likewise, for or, if the first output is true, then the output is surely true.

Let’s see how Python uses this fact:

More optimizations

Another trick that Python uses is something that we’ve all come accustomed to : everything can be converted to bool.
In other words, we can write if 3, if "string", if [1, 2, 3], and Python automatically converts the object to boolean.

Thus, when acting with and, or, Python doesn’t have to convert to bool at this stage.
Meaning that if we write:
if x and y: print(3)
Then Python first calculates x and y, and the result need not be bool
And only then does it proceed to the if statement, and convert that result to bool.

We can see that by acting with and outside an if statement:

Why is 3 and 5 == 5?
We can understand it using the truth table.
Let’s write the truth table in a slightly different manner:

For and:
If the first input is false, then the output is like the first input
If the first input is true, then the output is determined by the second input

For or:
If the first input is true, then the output is like the first input
If the first input is false, then the output is determined by the second input.

It’s really similar to the usual truth table, only that we don’t convert to bool in the process.

Let’s see that in action

We start by defining some variables of types different than bool:

Then, calculating the and truth table:

And similarly for or:

Nice!

Note that in the cases where the second argument is not used, then if it were a function, then it wouldn’t be called.

Some Use-cases

This trick can be seen mainly when initializing arguments.

For example:

This usage is equivalent to:

If not fill:
    fill = ' '

Opcode Optimization

Let’s see how these optimizations are implemented in the byte code 🙂
And let’s do this exercise in Python 3.13, currently in pre-release.

We start as usual with defining 2 functions:

And apply or between them:

Onto the byte-code!

It is indeed a bit different than previous byte-codes we’ve seen in this blog – the Python team has done an amazing job at optimizing the byte code in versions 3.12 and 3.13!

Let’s digest this one a bit, and then compare it to the generated byte code in 3.11.

Basically, we see elements of what we’d expect:
– Load the function true, and call it
– Some condition (TO_BOOL -> POP_JUMP_IF_TRUE)
– if the condition was false, load false and call it

Digging a bit deeper, we can get the following picture:

  1. LOAD_GLOBAL : loading the function true into the stack
    Current stack: [true]
  2. CALL : calling the function from TOS (Top Of Stack)
    Current stack: [the output of true], which is [3]
  3. COPY : copy that value
    Current stack: [3, 3]
  4. TO_BOOL
    Current stack: [3, True]
  5. POP_JUMP_IF_TRUE : pops the TOS. Then, if TOS is True, then jump
    Current stack: [3]

    And now we branch.
    If the first item indeed is true, then we’d get to RETURN_VALUE, which would return that first item.
    Otherwise, we’d continue:
  6. POP_TOP
    Current stack: []
  7. LOAD_GLOBAL
    Current stack: [false]
  8. CALL
    Current stack: [the output of false], which is [ [] ]
  9. RETURN_VALUE
    simply returning the value of the 2nd variable, without even looking at it.

We see a similar behavior in Python 3.11.8:

We can see that JUMP_IF_TRUE_OR_POP has been split into multiple opcodes:
COPY
TO_BOOL
POP_JUMP_IF_TRUE
POP_TOP

Nevertheless, the behavior and the concept are identical.

Additionally, we see that and has the exact same behavior, only replacing POP_JUMP_IF_TRUE by POP_JUMP_IF_FALSE:

Nice.

Summary

This one was a shorter one, but I believe that we’ve covered some nice things.

We understood what Python wishes to do when acting with and, or. That is, it returns the first or second inputs, rather than returning True of False.

Additionally, we saw how that’s implemented in the byte code,
while getting a glimpse at the changes being done to the byte code in the newer versions of Python.

Hope you learned something new 🙂

Python Type Hints – Part 2 : Special Types

After looking at the implementation of the type hint syntax in [the previous post], it is now time to look at some special types that were created especially for type hints!

Starting point

Many functions are able to get more than 1 single type as input (for example, sum can take int, float, str, bool, (and any class that has __add__) ).
How can one indicate the type hint of a many-type function?

Using the typing module, one can write Union[int, str].
Later, Python 3.9 introduced the syntax int | str.
Let’s dig into that syntax.

There are 2 questions that come to mind.
A) What’s the type of the int | str object?
B) How can we act with __or__ on a class?

Luckily, both questions will be answered in the same place.
Let us see how we got there:

First lead: the interpreter

The easiest lead we can get is to simply get that type-hint type in the interpreter:

Okay.
Seeing types does not ring good bells.
Why? you ask.
Because I suspect that it will lead us to a circular discovery.

And indeed, in types.py, we find:

So, in the Python level, UnionType is defined as the type of the result of class | class.

Second lead: __or__ is acting on classes

Small confession: In the not-so-long-ago past, I messed around with type hints, and tried to created a new type-hint.
In the process, I found out that there’s a __class_getitem__ method, in addition to __getitem__.
The latter acts on instances of the class, whereas the former acts on the class itself.
(e.g. list[1] will call __class_getitem__, and ( [1] )[2] will call __getitem__).
Thus, I was familiar with the idea of methods acting on classes, rather then on their instances.

So, Let us turn our attention to the operator at play: __or__.
The key thing to understand is that the operator acts on int itself.
We can think of it either as if it acts on the class int itself,
or, as if it acts on instances of the class type.

Now, since the union operator should be global in the sense that it can act on any builtin type, it’s safer to assume that it would be implemented in the type object.

Onto PyType_Type

So we jump to typeobject.c, in which we go to the definition of type
(tip: ctrl+f ^PyTypeObject will find definitions of types in that file, since it searches for lines starting with a decleration of a new pyTypeObject object.)

The __or__ function is defined in tp_as_number

Aha! Spot on!

Before we move forward, let’s have a quick review on the Union type.

Union

Basically, a Union type is a tuple of types:

Union‘s imlpementation

The _Py_union_type_or function that we just saw, is the implementation of type.__or__ as well as UnionType.__or__.
Thus, it’s implementation is as follows:

Meaning that it takes 2 type-hints (each may be a class, like int, or a UnionType)
It extracts the size of each type-hint (for a single class, the size is 1)
It then merges the 2 tuples,
and creates a union type out of the combined tuple.

Is instance

A nice thing added with the Union type is the ability to use it in isinstance.
As I understand it, it’s role is mainly being syntax-sugar.
After all, isinstance(x, int | str) does look nicer compared to isinstance(x, (int, str)).

The implementation is rather simple:

If cls is a union type, then take the tuple in union.__args__, and use that instead.
Neat.

That’s basically what I wanted to show about Union.
The other interesting type I wanted to show is GenericAlias.

Generic Alias

That’s the type of the type-hint list[int]

It has a similar behavior to Union: they both have __args__:

The difference being that GenericAlias also has __origin__:

Where __origin__ is the underlying class that GenericAlias represents.
It is used in the constructor – when calling the constructor of a GenericAlias object, it forwards the call to the constructor of __origin__:

For example, we can abuse that in order to show the point, like so:

In other words, we created a GenericAlias type, with its __origin__ pointing to the function sum.
Thus, when calling the constructor of the new sum[int] type-hint, we actually call the function sum.
Neat.

Now, unlike Union, which was linked to type.__or__, thus being available to all types, GenericAlias is manually linked only to the following classes:

Well, some of the files have more than 1 object with a __class_getitem__ method, but I think that this list gives the main idea of which type of classes have a “create generic alias” method.

A single example from the list:

Generic Alias Syntax

It should be noted, since it was assumed in the above search, that the constructor Py_GenericAlias is linked to the method __class_getitem__, which is a relatively new function that has been created specifically for type hints.

It is best explained in the following example:

As we can see, __getitem__ is being called on instances, and the self it gets is the instance.
Whereas __class_getitem__ is being called on the class itself, the the self it gets is the class.

Additionally, they both get a single argument, which is a tuple of the passed inputs.

Summary

This post was a continuation of the previous part.
In the first part, we went through the syntax of type hints, some use cases, and how the basic syntax is implemented.

In this post, we viewed 2 special types: UnionType and GenericAlias, and took a peak at their implementation

Hope you learned something new 🙂

Python Type Hints – Part 1 : Type Hints

A type-hint is a kind of comment in Python, which is used by developers to indicate which class they expect certain objects to be instances of.

A simple example can be:

a: int = 5
b: int = 3
c: float = a/b

However, there are a few more subtleties to it.

1 – Type-hints do nothing

From a practical sense, the type-hint is, well, a hint, and is not enforced in any way.
For example, the following code is perfectly legitimate, and works with no problems:

def f(a: int) -> list:
    return a*5

print( f("abc") )
# abcabcabcabcabc

This is an example of a function which expects an int, and tells us that it will return a list, but actually, it can get a string and return a string.

Why does it work?
Because, as the title hinted us, type-hints do nothing. They are ignored by Python.

Can we prove it?

We can prove it like so:

# When Python first sees a line of code, it parses it into an AST.
# In the AST, we do see that there is an `annotation`.
>>> print( ast.dump( ast.parse("x: int = 'a'"), indent=2) )
Module(
  body=[
    AnnAssign(
      target=Name(id='x', ctx=Store()),
      annotation=Name(id='int', ctx=Load()),
      value=Constant(value='a'),
      simple=1)],
  type_ignores=[])

# If we compile the line of code
#     (meaning that Python will convert it into an AST,
#         and will then convert it into byte-code),
#     then we still see the annotation
>>> dis.dis(compile("x: int = 'a'", "file_name", "exec"))
  0           0 RESUME                   0

  1           2 SETUP_ANNOTATIONS
              4 LOAD_CONST               0 ('a')
              6 STORE_NAME               0 (x)
              8 LOAD_NAME                1 (int)
             10 LOAD_NAME                2 (__annotations__)
             12 LOAD_CONST               1 ('x')
             14 STORE_SUBSCR
             18 LOAD_CONST               2 (None)
             20 RETURN_VALUE

# However, once we place the code in a function,
#     we see that the annotation is gone
>>> def f():
...     x: int = 'a'

>>> dis.dis(f)
  1           0 RESUME                   0

  2           2 LOAD_CONST               1 ('a')
              4 STORE_FAST               0 (x)
              6 LOAD_CONST               0 (None)
              8 RETURN_VALUE

What’s that opcode?

What’s that magical STORE_SUBSCR ( __annotations__ ) thing?
Well, it seems that it adds annotations to variables defined in the interpreter:

Why is it ignored?

It may seem a bit weird at first that the type hints are somewhat available internally (in the AST, and in code-run-by-the-interpreter-and-not-inside-some-function), while being invisible (not showing in the function), and ignored (there’s no enforcement that x will be int).

But there’s some logic behind that.
Python, as a language, has every object (in the C level) inherit from the same type – everything is PyObject*.
For example, here’s: PyObject_SetItem:

 /* o[key]=v. */
PyAPI_FUNC(int) PyObject_SetItem(PyObject *o, PyObject *key, PyObject *v);

This function signature cannot tell the type of the objects coming in. It simply receives PyObject*.
If the function really wants to know the type of the object, then it can do the C equivalent of
if isinstance(obj, typ): ....
And that’s by design.

However, it is sometimes not totally ignored

When assigning type hints to variables, like shown above, the type-hint is totally ignored by Python.
However, when assigning type-hints to functions, or classes, the type hints are stored somewhere.

>>> def f(a: int, b: float) -> str:
...     return a+b

>>> f.__annotations__
{'a': int, 'b': float, 'return': str}

likewise, for classes:

>>> class A:
...     a: int
...     b: str = 3

>>> A.__annotations__
{'a': int, 'b': str}

>>> a = A()
>>> a.__annotations__
{'a': int, 'b': str}

Other than storing the type-hints in the __annotations__ dict, they are ignored.

While mostly ignored, there are some things that use type-hints

Who’s using type-hints?

There are 2 obvious answers:
The first, and the most important, is: programmers.
Type hints are a way of adding comments to the code, and explaining the code in a short, concise, and readable way. They are great.

(One such example I personally like is when working with physical calculations that have units.
Writing stuff like: earth_radius: KiloMeter = ... or rotation: Radians = np.pi)

The second is: static code analysis.
The most common 2 are PyCharm, and mypy.

Yet there are more uses to type hints. famous examples are pydantic and dataclasses.

How’s the dataclasses module use __annotations__?

The first thing CTRL+F found is the following from the _process_class function:

Next, it sets up all the annotations into the fields property:

If you’re curious about the fields property, then here it is:
_FIELDS = '__dataclass_fields__'
setattr(cls, _FIELDS, fields)

And, lastly, the dataclass creates the __init__ function:

Now, what peaked my interest here is that there’s an option to choose the name of the self argument. And it’s passed as a string.
Answering how it works will also tell us how dataclasses dynamically create a function.

Onto _init_fn

Basically, the whole function enters a single screenshot, and is pretty self explanatory

The interesting thing is that it generates strings!
As a consequence, we can pretty much predict what _create_fn does:

Oh my! an exec in the wild!

Basically, it is merely an automation to writing self.%s = %s
which is, fairly, exactly what’s promised.

Yet there’s a nice trick they’re doing – they create a function inside a function.
Line 428 creates our __init__ function
and line 431 creates a different function, whose output is our __init__ function.

This is done so that the newly created class will have access to the local scope (hence passing locals)
Note that it also has access to the same global scope, as passed in exec.

How’s the pydantic module use __annotations__?

When pydantic creates a new model object, it calls a function called collect_model_fields.
The fields are exactly the annotations of that class:

Another place where pydantic uses annotations is when generating a scheme:

Other worthy mentions

There’s functools.wraps, that now has __annotations__ in its list-of-properties-to-copy-from-wrapped-to-wrapper

There’s inspect, which exposes get_annotations.
It also has _signature_is_functionlike, which checks that obj.__annotations__ is either dict or None.
(for functions, it’s always dict or None. But for classes that behave like functions, we can alter that)

There’s typing, (duh).
It exposes a function called get_type_hints, which not only returns the annotations dict, but also handles strings as annotations.
We’ll talk about strings as annotations later, but I’ll just point out here that
def f(a: int)
and
def f(a: "int")
have the same meaning to us, as developers, and also for static analysis tools.
But, to Python, the former is a dict (with one key: "a") and a value of the type int itself, whereas the latter has a string as the value.

Thus, the get_type_hints function handles that:

How does this function work?
A simple exception reveals it all:

typing also exposes a version of NamedTuple with annotations (that is, the programmer has to define a class, inherit from NamedTuple, and put the type hints him/herself.)
All typing does is define the class, and makes it behave like collections.namedtuple,
as well as adding the following magic (if you’re not familiar with metaclasses, I can suggest my own posts about it. Otherwise, the following code will be pure magic, and can be ignored)

How’s __annotations__ implemented?

In fact, there’s not much to it. The implementation is rather simple.

For functions, for example, there are but a few results, non of which reveal something interesting.

Modules appear to have annotations.
That’s just like the example we had above, where a variable we defined in the interpreter created the global variable __annotations__.
Likewise, defining variables in the module’s scope (i.e. not inside functions/classes) will create an __annotations__ variable for the module.

The place where annotations are created is in the byte code.

Creating annotations for functions

In order to generate byte-code that creates annotations, let us define a function, f, whose code creates a function (g) with annotations:

def f():
    def g(a: int):
        return 1
    return g

Looking at the byte code, we see:

Starting from MAKE_FUNCTION, we see the following implementation:

Basically, it takes the top item, that’s code object g (at 18) – this will be the codeobj.
Then, it takes the next top item – this will be the annotations.

What’s that next-top-item?
We see that load a ; load int ; build tuple does exactly that – it builds a tuple (which will be converted to dict) that has the annotations.
Neat.

Side note: Python versions

The above byte-code is generated in Python 3.11.8.
For Python 3.13, we see a slightly different byte code:

Simply put, this splits MAKE_FUNCTION to the actual making of the function, and to setting its attributes. There are minor changes in the implementation, but the main point is that the opcode has been split into two.

Creating annotations for classes

The byte code for creating classes is a bit more complicated (compared to load const codeobj ; make function).
However, we’re going to see that the part relevant to us – creating class annotations – is composed of 2 parts:
first, create the annotations property (this will be done by SETUP_ANNOTATIONS)
second, populate the dictionary with values.

Let’s see it in action:

The second step is in front of us – the STORE_SUBSCR does __annotations__['a'] = int
The first step is inside the SETUP_ANNOTATIONS opcode, which simply creates an empty dict

from __future__ import annotations

In [another post], I expanded more on how __future__ imports work.
I’ll just mention here that using from __future__ import annotations changes the compiler’s behavior so that annotations won’t be parsed to the objects they point to, but rather to the actual string that’s written.
In other words, def f(a: int) wont have the object int as the annotation, rather the string "int".

We can see it in the following code:
When the future feature is turned on, the annotation value (string) is used.
When turned off, the value it points to is used

Summary

It’s that time again – the end of a post.
Is this post too long? too short? too detailed? I’m not sure, but I hope that it was right for you 🙂

In this post, we started looking at Python’s type hints.
At the syntax, and at the no-enforcement of it.
At the use cases, such as dataclasses and pydantic.
And at the implementation of it.

In the next post, we’re going to look at some special types that were created specifically for type hints.

See you next time 🙂

Python Internals: Future features

Have you ever wondered how from __future__ import XXX works?
How can it alter the behavior of Python, but only for some specific file?

Well, look no further, because we’re about to see how it works!

Starting point: __future__.py

Looking at that file, we see some definitions, with no altering-behavior implementations.

Basically, the file can be reduced to:

This is just one example of a feature, but there’s nothing more than other features.

However, it gave us a lead : let’s find-in-all-files CO_FUTURE_ANNOTATIONS

Next stop: future.c

We find 7 files with CO_FUTURE_ANNOTATIONS, but a file named future.c caught my eye.

And, at the very start of the file, we see:

We can see here 2 things:
The thing relevant to us is that the feature we’re asking is marked as a flag in the future-features.
The thing less relevant to us is that we see some exceptions, thus guiding us on the intended usage.

For example, we see that there’s a specific set of features, and we see the exception that’s being raised when asking for a non-existing feature.
Additionally, we see where the from __future__ import braces joke is implemented.

We can also ask what’s the type of feature, the answer to which is in the top of the function (in the part I contracted):

Basically, we get a statement, and make sure that it’s a from X import Y kind of statement.
Then, we take all the names that are being imported (the Ys), and we iterate them, treating each one as a feature flag, rather than an object to be imported.

Neat.

What do the other features do?

Looking at the behavior in the switch-case of features, we see the following:

In other words, all but 2 features do nothing.

Well, this follows the documentation:

Meaning that all the features are already mandatory, i.e. already implemented as the default – the future is already here!
Only annotations (which, if I recall correctly, is not actually planed to come to Python in the future) and barry as bdfl are not yet implemented as default.

Who’s Barry?

A quick search lands us in a test, which pretty much explains the joke:

The joke is also explained in [this stackoverflow question]

That’s all, folks

The tl;dr is that importing from __future__ turns on a flag to the compiler, which allows it to alter it’s behavior.

I hope that the magic of from __futur__ import X has been revealed, and that your curiosity has been nourished 🙂