Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CI Integration #13

Open
tusharmath opened this issue Aug 17, 2024 · 6 comments
Open

CI Integration #13

tusharmath opened this issue Aug 17, 2024 · 6 comments

Comments

@tusharmath
Copy link
Contributor

This is a neat library that does what it says. It would benefit immensly by leveraging some CI integration.

@jeremychone
Copy link
Owner

First, thanks for the PR #14.

However, I have some questions about the net value of GitHub actions for this library.

Context:

  • This library is single-authored, meaning a single person will merge the PRs to the repo.
  • For my personal workflow, I am not sure GitHub CI helps me anywhere. Running cargo test followed by a git merge and push is very efficient, and automating it further might lead to side effects (feel free to let me know if I missed anything).
  • I am always cautious about putting API keys on GitHub. So, if there is no clear and significant value, I tend to bypass this.

Now, where I could see value is for the contributors, so they can check that nothing breaks in their fork/PR branch.

But I do not think that GitHub actions on the main genai repo and its API Secret will be run on forks (I might be wrong). If I am wrong, I am even more uncomfortable having any fork/PR branch using the API keys.

I might be missing something obvious. Feel free to let me know.

@tusharmath
Copy link
Contributor Author

I think it help the developers contributing quite significantly. Gives them feedback that things are working or not and can automate releases for you. For eg: we would love to these commits released — v0.1.7...main

@jeremychone
Copy link
Owner

Thanks for the response.

Yes, I can see the value for PRs, as devs could check if they didn't break any adapters. When I do a PR to sea-query, for example, it's nice to see all tests pass.

Now, for personal productivity, this won't help me, as I never give my crates.io or other service credentials to third-party servers/services. Moreover, it's automated enough. "cargo test" ... "cargo publish" ... commit/push -wip. A few minutes once per release, which is at most weekly. So, there's no value in automating this more, only cost and risk.

Regarding PR contributions, I see the value. The issue is that I'm not super comfortable (yet) putting my AI provider keys there. Cohere, for example, has a pretty low limit on free access, and it will occasionally break the CI, making it hard for anyone to understand what's going on.

So, I think you have some good points, and I will definitely keep thinking about how I can add this. Thanks a lot for the PR; this is perfect.

Note 1: The commit about webc::Error was wrongly associated with this issue. Completely independent.

Note 2: I should have released the update for webc::Error earlier. It wasn't a process issue, just that I didn't realize that typically when a PR is merged, it's nice to do a release soon after, as the PR author probably needs it. I will make sure to change my methodology there.

@InAnYan
Copy link

InAnYan commented Oct 29, 2024

Regarding API keys: you contact to real APIs while you test genai?

But wouldn't it be simpler, if you run a mock webserver that have a specific schema from a specific LLM adapter, and that's it? That way you won't use any keys and remote connection

@InAnYan
Copy link

InAnYan commented Oct 29, 2024

Regarding CI/CD: I understand that this is only your personal project. But just in case it'll grow in future: it would be nice just to have a small check for cargo test and cargo clippy. You can also check the formatting

@jeremychone
Copy link
Owner

@InAnYan

I understand your perspective, but I differ a little, especially regarding this particular library.

From my experience, for libraries/applications, especially as they scale, it's better to favor a "real path test" rather than mocks, at least for core functions. Otherwise, we spend time writing and testing the mock logic rather than testing what matters, meaning what can/will break.

It's not an all-or-nothing approach. For example, sometimes mocking parts of the subsystem dependencies can be very beneficial. Mocks can add value, so it's not an all-or-nothing approach. A good example is when we use cloud applications or libraries that integrate with S3. Then, using MinIO as a mock S3 is a low-cost, high-value solution for those parts. In my other devai, I might create a "mock-agent" that would probably be an "echo" one, to test the devai flows.

However, in this particular genai library, the core of the logic happens in the Adapter layer, and those are highly provider-specific to the Adapter with nuances per provider (OpenAI for OpenAI, Groq, Ollama have some differences, without even talking about Anthropic and Gemini). Mocking this will defeat the value of the tests.

For example, if we were going to the "mock provider" path, we would have these two options:

  1. Generic Mock AI Provider - This would be a generic provider, probably with a special generic adapter. This is still quite a bit of work, but now we would test only the generic adapter and not a real provider, making the test value relatively low. The net value is very low, if any.

  2. Mock per Provider (e.g., OpenAI, Anthropic, ...) - To make these mocks meaningful, this would require a huge amount of work, and then these mocks would not keep up with the real implementations anyway. The net value is negative by an order of magnitude on this one.

So, both of these options do not really add any sensible value, and just add quite a bit of cost, and actually give a false sense of quality.

Now, on the unit-test side (which I consider more as "in-crate tests"), there are obviously some tests that depend on or not on the AI Provider. Perhaps, allowing a "cargo test ..." that filters the tests that do the AI Provider calls might be added, but this won't add much value.

Interestingly, for the devai, I will probably add a mock model, e.g., mock-echo-model, since the core logic of devai is more about the flow than the intricacies of calling a particular AI provider, as it uses this genai library for this.

So while there might be different valid approaches, this is built with scale in mind.

I hope this clarifies things a little.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants