While working with microservices in Python, a common pattern that I see is—the usage of dynamically filled dictionaries as payloads of REST APIs or message queues. To understand what I mean by this, consider the following example:
# src.py
from __future__ import annotations
import json
from typing import Any
import redis # Do a pip install.
def get_payload() -> dict[str, Any]:
"""Get the 'zoo' payload containing animal names and attributes."""
payload = {"name": "awesome_zoo", "animals": []}
names = ("wolf", "snake", "ostrich")
attributes = (
{"family": "Canidae", "genus": "Canis", "is_mammal": True},
{"family": "Viperidae", "genus": "Boas", "is_mammal": False},
)
for name, attr in zip(names, attributes):
payload["animals"].append( # type: ignore
{"name": name, "attribute": attr},
)
return payload
def save_to_cache(payload: dict[str, Any]) -> None:
# You'll need to spin up a Redis db before instantiating
# a connection here.
r = redis.Redis()
print("Saving to cache...")
r.set(f"zoo:{payload['name']}", json.dumps(payload))
if __name__ == "__main__":
payload = get_payload()
save_to_cache(payload)
Here, the get_payload
function constructs a payload that gets stored in a Redis DB in
the save_to_cache
function. The get_payload
function returns a dict that denotes a
contrived payload containing the data of an imaginary zoo. To execute the above snippet,
you'll need to spin up a Redis database first. You can use
Docker to do so. Install and configure Docker on your system
and run:
docker run -d -p 6379:6379 redis:alpine
If you run the above snippet after instantiating the Redis server, it'll run without
raising any error. You can inspect the content saved in Redis with the following command
(assuming you've got redis-cli
and jq
installed in your system):
echo "get zoo:awesome_zoo" | redis-cli | jq
This will return the following payload to your console:
{
"name": "awesome_zoo",
"animals": [
{
"name": "wolf",
"attribute": {
"family": "Canidae",
"genus": "Canis",
"is_mammal": true
}
},
{
"name": "snake",
"attribute": {
"family": "Viperidae",
"genus": "Boas",
"is_mammal": false
}
}
]
}
Although this workflow is functional in runtime, there's a big gotcha here! It's really
difficult to picture the shape of the payload
from the output of the get_payload
function; as it dynamically builds the dictionary. First, it declares a dictionary with
two fields—name
and animals
. Here, name
is a string value that denotes the name of
the zoo. The other field animals
is a list containing the names and attributes of the
animals in the zoo. Later on, the for-loop fills up the dictionary with nested data
structures. This charade of operations makes it difficult to reify the final shape of
the resulting payload
in your mind.
In this case, you'll have to inspect the content of the Redis cache to fully understand
the shape of the data. Writing code in the above manner is effortless but it makes it
really hard for the next person working on the codebase to understand how the payload
looks without tapping into the data storage. There's a better way to declaratively
communicate the shape of the payload that doesn't involve writing unmaintainably large
docstrings. Here's how you can leverage TypedDict
and Annotated
to achieve the goals:
# src.py
from __future__ import annotations
import json
# In < Python 3.8, import 'TypedDict' from 'typing_extensions'.
# In < Python 3.9, import 'Annotated' from 'typing_extensions'.
from typing import Annotated, Any, TypedDict
import redis # Do a pip install.
class Attribute(TypedDict):
family: str
genus: str
is_mammal: bool
class Animal(TypedDict):
name: str
attribute: Attribute
class Zoo(TypedDict):
name: str
animals: list[Animal]
def get_payload() -> Zoo:
"""Get the 'zoo' payload containing animal names and attributes."""
payload: Zoo = {"name": "awesome_zoo", "animals": []}
names = ("wolf", "snake", "ostrich")
attributes: tuple[Attribute, ...] = (
{"family": "Canidae", "genus": "Canis", "is_mammal": True},
{"family": "Viperidae", "genus": "Boas", "is_mammal": False},
)
for name, attr in zip(names, attributes):
payload["animals"].append({"name": name, "attribute": attr})
return payload
def save_to_cache(payload: Annotated[Zoo, dict]) -> None:
# You'll need to spin up a Redis db before instantiating
# a connection here.
r = redis.Redis()
print("Saving to cache...")
r.set(f"zoo:{payload['name']}", json.dumps(payload))
if __name__ == "__main__":
payload: Zoo = get_payload()
save_to_cache(payload)
Notice, how I've used TypedDict
to declare the nested structure of the payload Zoo
.
In runtime, instances of typed-dict classes behave the same way as normal dicts. Here,
Zoo
contains two fields—name
and animals
. The animals
field is annotated as
list[Animal]
where Animal
is another typed-dict. The Animal
typed-dict houses
another typed-dict called Attribute
that defines various properties of the animal.
Taking a look at the typed-dict Zoo
and following along its nested structure, the
final shape of the payload becomes clearer without us having to look for example
payloads. Also, Mypy can check whether the payload conforms to the shape of the
annotated type. I used Annotated[Zoo, dict]
in the input parameter of save_to_cache
function to communicate with the reader that an instance of the class Zoo
is a dict
that conforms to the contract laid out in the type itself. The type Annotated
can be
used to add any arbitrary metadata to a particular type.
In runtime, this snippet will exhibit the same behavior as the previous one. Mypy also approves this.
Handling missing key-value pairs
By default, the type checker will structurally validate the shape of the dict annotated
with a TypedDict
class and all the key-value pairs expected by the annotation must be
present in the dict. It's possible to lax this behavior by specifying totality. This
can be helpful to deal with missing fields without letting go of type safety. Consider
this:
from __future__ import annotations
from typing import TypedDict
class Attribute(TypedDict):
family: str
genus: str
is_mammal: bool
animal_attribute: Attribute = {
"family": "Hominidae",
"genus": "Homo",
} # Mypy will complain about the missing 'is_mammal' key.
Mypy will complain about the missing key:
src.py:12: error: Missing key "is_mammal" for TypedDict "Attribute"
animal_attribute: Attribute = {
^
Found 1 error in 1 file (checked 1 source file)
You can relax this behavior like this:
...
class Attribute(TypedDict, total=False):
family: str
genus: str
is_mammal: bool
...
Now Mypy will no longer complain about the missing field in the annotated dict. However,
this will still disallow arbitrary keys that isn't defined in the TypedDict
. For
example:
...
# Mypy will complain as the key 'species' doesn't exist in the TypedDict.
animal_attribute["species"] = "Sapiens"
...
src.py:17: error: TypedDict "Attribute" has no key "species"
animal_attribute["species"] = "Sapiens"
^
Found 1 error in 1 file (checked 3 source files)
make: *** [Makefile:134: mypy] Error 1
Sweet type safety without being too strict about missing fields!