🌀🌲 Hierarchical Data Structures with OpenAI

As we continue our exploration into data handling with Pydantic and OpenAI's function call, I'd like to delve deeper into an exciting concept: Recursive data structures. Today, I will provide a basic introduction to recursive schemas, demonstrating how we can convert a file system to a graph.

If you want to just look at the code you can check it out here

Why Recursive Pydantic?

Recursive Pydantic unlocks the ability to specify and parse hierarchical graph-like data in a more intuitive and approachable way. Structured extraction traditionally involved flat structures or simple nesting. By taking advantage of Pydantic's recursive definitions and JSON schema capabilities, we can define and work with complex hierarchical data structures with ease.

Single Completion Call, Recursive Results

It's noteworthy that although our function call structure is recursive, we obtain the results with just a single completion call!

Considerations and Caveats

While this approach is powerful, there are a few considerations:

  • Direct usage of a recursive structure isn't currently supported. We need to wrap it in a non-recursive structure. In our case, "DirectoryTree" was used to wrap the recursive node structure.
  • Some prompt engineering is necessary to achieve desired results. Thankfully, with doc strings and the Field.description attribute, we can handle this efficiently.

Why Does This Matter?

The implications of recursive Pydantic schemas are profound. Previously, chaining of Language Models (LLMs) in a deep or wide manner was the norm. However, a tree structure offers much more.

Tree structures are versatile. They can represent file systems or specify computations. Agents often need to perform meaningful actions. A tree structure allows us to represent finite computations with both "map" and "reduce" instructions. This ability to generate instruction sets with varying depth and width within a finite number of steps avoids cyclic behaviors.

In essence, by using a recursive schema, we can transform it into a dynamic arbitrary compute graph that can be dispatched to any execution engine of our choice.

Looking Ahead

In this post, I've not defined a specific computing language. Instead, I've proposed a data structure that can express arbitrary computation graphs. However, I believe that this approach will become increasingly important and popular. We are likely to see the emergence of frameworks that allow efficient planning through compute graphs, which can be dispatched for execution in a single shot.

The combination of OpenAI's function call and Pydantic offers a minimalist, Pythonic way to handle complex, hierarchical data structures. Embrace this approach and transform your data handling experiences!

For more insights and updates, consider following me on Twitter at @jxnlco and starring the repo here.