[ENH] add pipeline configuration/structure #3

jdkent · 2024-10-22T14:48:21Z

Creates backbone of pipeline infrastructure.

pipelines/participant_demographics/run.py

…) which we know will operate differently absed on '_pipeline_type'

adelavega

Mostly looks good! I will have a better review once I start to implement my own Pipeline but for now just have a question about how we hash things.

I also updated my PR to your branch addressing some of my suggestions here.

ns_pipelines/dataset.py

ns_pipelines/pipeline.py

adelavega · 2024-10-30T20:10:44Z

Can you also give this PR a better description and update the README.md?

adelavega · 2024-11-01T22:21:07Z

Looks good! Just need the tests to pass

This reverts commit ef0b25d.

This reverts commit 5cdfc6d.

This reverts commit 585fc21.

This reverts commit 0839a6e.

adelavega

Good job! I'm going to say approve but I have a few comments, mostly minor / clean up related. Go ahead and merge after clean up.

Also, can you delete the umls_disease pipeline since its not implemented? I can do it in another PR which will help me check the code more thoroughly.

Finally, delete the pipelines folder (i.e. not the ns_pipelines folder).

adelavega · 2024-11-15T17:26:27Z

ns_pipelines/participant_demographics/run.py


    return predictions, clean_preds


 def _load_client(model_name):
    if 'gpt' in model_name:
-        client = OpenAI(api_key=os.getenv('MYOPENAI_API_KEY'))
+        client = OpenAI(api_key=os.getenv('OPENAI_API_KEY'))


The reason I had it this way was because if the environment variable is set to OPENAI_API_KEY it will automatically be ingested by OpenAI and thus this is not necessary.

So I wanted to have the option to not pass that key to OpenAI.

Specifically, this is for when you want to use the OpenAI client for another API (such as OpenRouter).

So what I would do is add which API key to use as a configuration parameter, and in the production environment name it something else that is not OPENAI_API_KEY

adelavega · 2024-11-15T17:28:34Z

ns_pipelines/participant_demographics/run.py

@@ -100,7 +55,8 @@ def ParticipantDemographics(IndependentPipeline):
    def __init__(
        self,
        extraction_model,
-        prompt_set, inputs=("text",),
+        prompt_set,
+        inputs=("text",),
        input_sources=("pubget", "ace"),


What I would do is add the key as part of the __init__.

Later on, we could define a base class OpenAIPipeline that sets up the client for the subclass automatically, and know which parameters to expect.

i.e. the _load_client function could be part of t his new parent class and is alwasy called. For now it's fine though, we can cross that bridge later.

For now just make the key a config parameter and rename the value of the key something else.

adelavega · 2024-11-15T18:03:08Z

ns_pipelines/word_count/run.py

+    _version = "1.0.0"
+
+    def __init__(self, inputs=("text",), input_sources=("pubget", "ace"), square_root=False):
+        """add any pipeline configuration here (as opposed to runtime arguments like n_cpus or n_cores)"""


Suggested change

"""add any pipeline configuration here (as opposed to runtime arguments like n_cpus or n_cores)"""

Probably only need this commont in the base class

ns_pipelines/word_count/run.py

ns_pipelines/participant_demographics/run.py

adelavega · 2024-11-15T18:23:00Z

ns_pipelines/participant_demographics/run.py

+    if 'gpt' in model_name:
+        client = OpenAI(api_key=os.getenv('OPENAI_API_KEY'))
+
+    else:


In principle we can run this using other API keys and hence other model names, so perhaps let's not worry about validation here

Co-authored-by: Alejandro de la Vega <[email protected]>

jdkent added 5 commits October 22, 2024 09:46

adding testing

6fe2bfb

add dependent pipeline

bbbfc9d

mark pipeline as (in)dependent

60c0978

wip: start modifying the existing pipeline

89d45f3

merge in new changes

5e8bafc

adelavega reviewed Oct 29, 2024

View reviewed changes

pipelines/participant_demographics/run.py Outdated Show resolved Hide resolved

adelavega and others added 7 commits October 30, 2024 13:14

Restructure package

90bee39

add filter_inputs function

cb16fc9

Refactor init logic to dataclasses

58bc727

Both group and independent can use the same function name ('function'…

cfff8bb

…) which we know will operate differently absed on '_pipeline_type'

group_function to function

79bfdcc

_hash_attrs instead

0625124

Set default _hash_attrs

c26ef1b

adelavega requested changes Oct 30, 2024

View reviewed changes

ns_pipelines/dataset.py Outdated Show resolved Hide resolved

ns_pipelines/dataset.py Outdated Show resolved Hide resolved

ns_pipelines/pipeline.py Outdated Show resolved Hide resolved

jdkent added 4 commits October 31, 2024 14:11

refactor based on feedback

e907624

add pipeline name to output path

85b20dd

wip: modify readme

ddd5b67

fix merge

23bb537

jdkent changed the title ~~add testing~~ [ENH] add pipeline configuration/structure Oct 31, 2024

add tests dependencies

c99c83f

jdkent added 8 commits November 14, 2024 17:00

add test for participant demographics

a78f241

opensource data

cdbdec2

remove old functions

d1e2a31

commit the cassette

cd5bb83

add dependencies

99095fb

allow installable pyproject

ef0b25d

move test directory and remove top level __init__

0839a6e

try underscores

585fc21

jdkent added 10 commits November 15, 2024 10:23

Revert "allow installable pyproject"

5cdfc6d

This reverts commit ef0b25d.

Revert "Revert "allow installable pyproject""

306e9ec

This reverts commit 5cdfc6d.

Revert "try underscores"

693cb76

This reverts commit 585fc21.

Revert "move test directory and remove top level __init__"

c3b5767

This reverts commit 0839a6e.

remove init

8e7152f

remove old files

20af580

switch to version 5

5cff6be

use editable install

194e9b1

trigger variable

6f45fba

add fake key

b6e26b0

adelavega approved these changes Nov 15, 2024

View reviewed changes

jdkent and others added 9 commits November 15, 2024 19:45

Update ns_pipelines/word_count/run.py

08e534a

Co-authored-by: Alejandro de la Vega <[email protected]>

Update ns_pipelines/participant_demographics/run.py

e8108fd

Co-authored-by: Alejandro de la Vega <[email protected]>

Update ns_pipelines/word_count/run.py

c366e61

Co-authored-by: Alejandro de la Vega <[email protected]>

Update ns_pipelines/word_count/run.py

e1fcd2b

Co-authored-by: Alejandro de la Vega <[email protected]>

Update ns_pipelines/participant_demographics/run.py

01a70f0

Co-authored-by: Alejandro de la Vega <[email protected]>

Update ns_pipelines/word_count/run.py

ce537b8

Co-authored-by: Alejandro de la Vega <[email protected]>

Update ns_pipelines/word_count/run.py

44ad3c6

Co-authored-by: Alejandro de la Vega <[email protected]>

change the names

35c09aa

work with .keys file

8c5237f

jdkent merged commit 830dd60 into neurostuff:main Nov 25, 2024
1 check passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[ENH] add pipeline configuration/structure #3

[ENH] add pipeline configuration/structure #3

jdkent commented Oct 22, 2024 •

edited

Loading

adelavega left a comment

adelavega commented Oct 30, 2024

adelavega commented Nov 1, 2024

adelavega left a comment

adelavega Nov 15, 2024

adelavega Nov 15, 2024

adelavega Nov 15, 2024

adelavega Nov 15, 2024

adelavega Nov 15, 2024

adelavega Nov 15, 2024

[ENH] add pipeline configuration/structure #3

[ENH] add pipeline configuration/structure #3

Conversation

jdkent commented Oct 22, 2024 • edited Loading

adelavega left a comment

Choose a reason for hiding this comment

adelavega commented Oct 30, 2024

adelavega commented Nov 1, 2024

adelavega left a comment

Choose a reason for hiding this comment

adelavega Nov 15, 2024

Choose a reason for hiding this comment

adelavega Nov 15, 2024

Choose a reason for hiding this comment

adelavega Nov 15, 2024

Choose a reason for hiding this comment

adelavega Nov 15, 2024

Choose a reason for hiding this comment

adelavega Nov 15, 2024

Choose a reason for hiding this comment

adelavega Nov 15, 2024

Choose a reason for hiding this comment

jdkent commented Oct 22, 2024 •

edited

Loading