Refactor manual pydantics for scanpy pl agents #255

mengerj · 2024-12-12T17:13:11Z

Refactored pydantic class definitions in pl modules to match new structure.
#245

…on 2.2.8 to superior or equal to 2.2.8. Indeed, it appears that the grpcio 1.53.0 external dependency of pymilvus version 2.2.8 is not compatible with Windows OS 11 and Python version 2.12.3, whatever it is the wheel or source version. Running pytest does not yield any errors, beyond raising deprecated warnings

method. Currently scanpy is imported when ScanpyTLQueryBuilder.parametrise_query is called.

Only includes functions which dont start with "_"

Just the generic function

Merge Dev/tl into main to avoid plenty of branches

* add scanpy_pl module with initial fields * add mocked test for module * add module to API agent __init__.py * add benchmark case * add conditional for module benchmark * downgrade httpx due to conflict 0.28 removed the proxy keyword, but openai is not aware * add back default `question_uuid` field into pydantic class * add scatter pydantic class * add sc.pl.pca * add pca benchmark case * distinguish web api and python api benchmark * change case to scatter * add tsne class * add tsne case * fix typing * add generic formatter (#233) * add formatter functions for REST and Python * make discoverable on module level * add required field * test the formatting functions * `scanpy` to `sc` to fit common usage * adjust benchmark to use the formatter --------- Co-authored-by: daniele-lucarelli <[email protected]>

* pushed starter anndata file * removed the tester * Aim of the anndata api module * Draft of the AnnDataIOParameters * added a prompt * updated the prompt * started to implement the AnndataIOQueryBuilder * added test for anndata api * pushed pydantic reader classes * Updated the anndata tool with integrated test: -> returns dict with method & args Co-authored-by: Anis Ismail <[email protected]> * added query builder * added querybuilder for anndata and its test * updated query builder * added exclude none * feat(BaseAPIModel): Add reusable base class for structured outputs • Introduced BaseAPIModel, a reusable base class to streamline the creation of Pydantic models for structured outputs. • The class includes: • uuid: An optional field (str | None) for unique identification of model instances. • method_name: A required field (str) to specify the associated function or method, ensuring consistency across models. • Configured with arbitrary_types_allowed to support flexible extensions. • Designed for use in structured output generation. This addition lays the groundwork for standardized, maintainable, and consistent API models. * update query builder to remove create_runnable * Updated the pydatic classes with the BaseAPIModel * Updated the system prompt in the runnable of the AnnDataIOQueryBuilder * fix in import of pydanticparser * added test for query builder parameterise_query * removed comments + redundant script --------- Co-authored-by: Anis Ismail <[email protected]> Co-authored-by: Anis Ismail <[email protected]>

…tem prompt is updated for the anndata query

… into biohackathon3

…en number of latent dimensions

replace with any length type (...)

… into biohackathon3

…c classes adjusts the ABC, the individual legacy classes (builder and fetcher), and the tests

… into biohackathon3

now has empty list in parameters

… into biohackathon3

does not work yet; seems we now supply only metadata, no parameters

…her/biochatter into refactor_manual_pydantics

mengerj · 2025-01-24T15:22:48Z

Still working on this. Haven't found much time this week but I hope to get it done before you want to merge the biohackathon branch into main.
Comments for me:

Manual pydantics work well and output from LLM is basically the parsed python call
Need to adjust automated approach to correctly handle defaults and add the list of required parameters

… to how they are defined manually. Pydantic classes are then created through the BaseTools derived classes.

…QueryBuilder(scanpy.tl) to build a agent for scanpy.tl

…the automated approach can be applied

…ink a format as python call should be needed, as structued LLM output takes care of this.

mengerj · 2025-01-27T09:50:41Z

I tried to resolve conflicts with the biohackathon main branch and it should be fine. But there are still difference in how tools are created, for example between the AnnData Builder and the others. I tried to unify the automatic generation of pydantic classes with the manual creation. The automated approach now doesn't directly generate pydnatic classes, but creates two dictionaries with parameters and function descriptions. In the manual approach (see scanpy_pl agent) these dictionaries are defined manually. The tool creation is then handled by BaseTools derived classes.
The output of the LLM is basically already a correctly parametized function call, and I don't think the format_as_pyhon_call should be needed.
Sadly the automated approach still throws errors for some functions and I couldn't resolve this. Single functions can also be given to the method. Here is a short example usage:
from biochatter.api_agent.auto_module_agent import AutoModuleQueryBuilder import scanpy as sc conv = conversation() auto_query_builder = AutoModuleQueryBuilder(module = sc.pl.scatter) auto_query_builder.parameterise_query(question= "Please use a scatter plot to create a basic representation of my adata object", conversation=conv)

slobentanzer · 2025-01-27T11:50:21Z

Hi @mengerj, thanks for the great work! We don't need to merge this before we merge the main biohackathon branch, it is fine as a standalone feature. We'd need to make it robust though.

The output of the LLM is basically already a correctly parametized function call

I would caution that this then is not compatible with the other solution. If we can, we should definitely find a common ground and consensus way of returning the parameterised call. Naively, I would say that returning the Pydantic class and then parsing independently is most flexible and does not cost anything. Your method is based on Pydantic, after all. Any reason why this is not possible?

bastienchassagnol and others added 30 commits December 10, 2024 13:52

add the tools tl modules to API agent __init__.py

db60467

add scanpy_tl module with general description

6abb3a9

api agent for scnapy tl using the generate_pydantic_class_from_module

a46df7a

method. Currently scanpy is imported when ScanpyTLQueryBuilder.parametrise_query is called.

generic method to generate pydantic classes for functions in a module.

d4f3184

Only includes functions which dont start with "_"

working progress on QueryBuilder and its unit tests

7b4df80

Merge pull request #1 from mengerj/just_the_generic_function

93be844

Just the generic function

Merge branch 'main' into dev/tl-bastien

0b5ea9d

Merge pull request #2 from bastienchassagnol/dev/tl-bastien

40a7751

Merge Dev/tl into main to avoid plenty of branches

switch scanpy pl to langchain bind_tools

4f885ae

Fixed the prompt issue in the AnnDataIOQueryBuilder, but now no sys…

835f096

…tem prompt is updated for the anndata query

Merge branch 'biohackathon3' of https://github.com/biocypher/biochatter…

ee0106b

… into biohackathon3

add in the benchmark a call to scanpy.pp to carry on a PCA with a giv…

f10839a

…en number of latent dimensions

fix schema issue with fixed length tuples

78b9a66

replace with any length type (...)

Merge branch 'biohackathon3' of https://github.com/biocypher/biochatter…

2010548

… into biohackathon3

remove nested list in benchmark

ee20735

remove unnecessary variable

0e003f9

remove dual httpx definition

f032635

update ABC to return list from parameterise_query

70d4a39

add umap pydantic class

e53bcbb

migrate legacy query builder and fetcher to work with list of pydanti…

799ba8a

…c classes adjusts the ABC, the individual legacy classes (builder and fetcher), and the tests

Merge branch 'biohackathon3' of https://github.com/biocypher/biochatter…

8f705d7

… into biohackathon3

add draw_graph pydantic class

993da5f

assume list of classes as return

0a3ab0c

now has empty list in parameters

Merge branch 'biohackathon3' of https://github.com/biocypher/biochatter…

c3f8e12

… into biohackathon3

add draw_graph to tool list

8c1b646

return variable instead of call

02c8d19

Merge branch 'biohackathon3' of https://github.com/biocypher/biochatter…

cc4d718

… into biohackathon3

slobentanzer and others added 13 commits January 24, 2025 15:26

make more readable, fix ruff warnings

47eb328

fix return value

2186521

comment cases for testing

c5cdf31

move tool definitions to global module-level dictionary

d1f4ced

ignore long lines

56cc87d

ruff demands

d88ac4f

fix test

392365d

fix double plural

5a026e4

ruff warning

1401166

replace BaseModel with BaseAPIModel

c04ffb7

does not work yet; seems we now supply only metadata, no parameters

work in progress to fix formatting of python calls for new classes

457b837

move example into description

682f930

Merge branch 'refactor_manual_pydantics' of https://github.com/biocyp…

ed96ed0

…her/biochatter into refactor_manual_pydantics

menger added 12 commits January 27, 2025 10:09

changed autogeneration function to only collecting tool info, similar…

628c9db

… to how they are defined manually. Pydantic classes are then created through the BaseTools derived classes.

removed old approach of using autogeneration function. Use AutoModule…

e03a9bd

…QueryBuilder(scanpy.tl) to build a agent for scanpy.tl

safe pydantic tools as at

19fbccb

save pydantic tools as attribute

7e8a2c2

save pydantic classes as attirbute

1d0dc17

remove scanpy tl query builderemove scanpy tl query builder, because …

f9260da

…the automated approach can be applied

revert formatter changes to be confirm with previous code. I don't th…

d050405

…ink a format as python call should be needed, as structued LLM output takes care of this.

go back to using method_name for testing purposes

12356b7

latest version of anndata agent

2b35768

a file that is not used, due to other, newer version of anndata_agent

8c70b97

remove the Tl query builder

a1986ce

remove ScanpyTlQueryBuilder

bbaa480

mengerj marked this pull request as ready for review January 27, 2025 09:44

slobentanzer changed the base branch from biohackathon3 to main January 30, 2025 17:42

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactor manual pydantics for scanpy pl agents #255

Refactor manual pydantics for scanpy pl agents #255

mengerj commented Dec 12, 2024

mengerj commented Jan 24, 2025

mengerj commented Jan 27, 2025

slobentanzer commented Jan 27, 2025

Refactor manual pydantics for scanpy pl agents #255

Are you sure you want to change the base?

Refactor manual pydantics for scanpy pl agents #255

Conversation

mengerj commented Dec 12, 2024

mengerj commented Jan 24, 2025

mengerj commented Jan 27, 2025

slobentanzer commented Jan 27, 2025