-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ENH] add pipeline configuration/structure #3
Conversation
…) which we know will operate differently absed on '_pipeline_type'
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mostly looks good! I will have a better review once I start to implement my own Pipeline but for now just have a question about how we hash things.
I also updated my PR to your branch addressing some of my suggestions here.
Can you also give this PR a better description and update the README.md? |
Looks good! Just need the tests to pass |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good job! I'm going to say approve but I have a few comments, mostly minor / clean up related. Go ahead and merge after clean up.
Also, can you delete the umls_disease
pipeline since its not implemented? I can do it in another PR which will help me check the code more thoroughly.
Finally, delete the pipelines
folder (i.e. not the ns_pipelines
folder).
|
||
return predictions, clean_preds | ||
|
||
|
||
def _load_client(model_name): | ||
if 'gpt' in model_name: | ||
client = OpenAI(api_key=os.getenv('MYOPENAI_API_KEY')) | ||
client = OpenAI(api_key=os.getenv('OPENAI_API_KEY')) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The reason I had it this way was because if the environment variable is set to OPENAI_API_KEY
it will automatically be ingested by OpenAI
and thus this is not necessary.
So I wanted to have the option to not pass that key to OpenAI.
Specifically, this is for when you want to use the OpenAI client for another API (such as OpenRouter).
So what I would do is add which API key to use as a configuration parameter, and in the production environment name it something else that is not OPENAI_API_KEY
@@ -100,7 +55,8 @@ def ParticipantDemographics(IndependentPipeline): | |||
def __init__( | |||
self, | |||
extraction_model, | |||
prompt_set, inputs=("text",), | |||
prompt_set, | |||
inputs=("text",), | |||
input_sources=("pubget", "ace"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What I would do is add the key as part of the __init__
.
Later on, we could define a base class OpenAIPipeline
that sets up the client for the subclass automatically, and know which parameters to expect.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i.e. the _load_client
function could be part of t his new parent class and is alwasy called. For now it's fine though, we can cross that bridge later.
For now just make the key a config parameter and rename the value of the key something else.
ns_pipelines/word_count/run.py
Outdated
_version = "1.0.0" | ||
|
||
def __init__(self, inputs=("text",), input_sources=("pubget", "ace"), square_root=False): | ||
"""add any pipeline configuration here (as opposed to runtime arguments like n_cpus or n_cores)""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"""add any pipeline configuration here (as opposed to runtime arguments like n_cpus or n_cores)""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably only need this commont in the base class
if 'gpt' in model_name: | ||
client = OpenAI(api_key=os.getenv('OPENAI_API_KEY')) | ||
|
||
else: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In principle we can run this using other API keys and hence other model names, so perhaps let's not worry about validation here
Co-authored-by: Alejandro de la Vega <[email protected]>
Co-authored-by: Alejandro de la Vega <[email protected]>
Co-authored-by: Alejandro de la Vega <[email protected]>
Co-authored-by: Alejandro de la Vega <[email protected]>
Co-authored-by: Alejandro de la Vega <[email protected]>
Co-authored-by: Alejandro de la Vega <[email protected]>
Co-authored-by: Alejandro de la Vega <[email protected]>
Creates backbone of pipeline infrastructure.