Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Phi3poc #2301

Open
wants to merge 37 commits into
base: master
Choose a base branch
from
Open

[WIP] Phi3poc #2301

wants to merge 37 commits into from

Conversation

JessicaXYWang
Copy link
Contributor

Related Issues/PRs

#xxx

What changes are proposed in this pull request?

Briefly describe the changes included in this Pull Request.

How is this patch tested?

  • I have written tests (not required for typo or doc fix) and confirmed the proposed feature/bug-fix/change works.

Does this PR change any dependencies?

  • No. You can skip this section.
  • Yes. Make sure the dependencies are resolved correctly, and list changes here.

Does this PR add a new feature? If so, have you added samples on website?

  • No. You can skip this section.
  • Yes. Make sure you have added samples following below steps.
  1. Find the corresponding markdown file for your new feature in website/docs/documentation folder.
    Make sure you choose the correct class estimators/transformers and namespace.
  2. Follow the pattern in markdown file and add another section for your new API, including pyspark, scala (and .NET potentially) samples.
  3. Make sure the DocTable points to correct API link.
  4. Navigate to website folder, and run yarn run start to make sure the website renders correctly.
  5. Don't forget to add <!--pytest-codeblocks:cont--> before each python code blocks to enable auto-tests for python samples.
  6. Make sure the WebsiteSamplesTests job pass in the pipeline.

@JessicaXYWang
Copy link
Contributor Author

/azp run

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@codecov-commenter
Copy link

codecov-commenter commented Oct 15, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 83.41%. Comparing base (b2f4080) to head (0d7aafd).

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #2301      +/-   ##
==========================================
- Coverage   83.67%   83.41%   -0.27%     
==========================================
  Files         331      331              
  Lines       17177    17177              
  Branches     1526     1526              
==========================================
- Hits        14373    14328      -45     
- Misses       2804     2849      +45     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@JessicaXYWang
Copy link
Contributor Author

/azp run

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@JessicaXYWang
Copy link
Contributor Author

/azp run

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@JessicaXYWang
Copy link
Contributor Author

/azp run

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@JessicaXYWang
Copy link
Contributor Author

/azp run

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

self.config.update(kwargs)


def camel_to_snake(text):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there might already be one in library to use

"output column",
typeConverter=TypeConverters.toString,
)
modelParam = Param(Params._dummy(), "modelParam", "Model Parameters")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe explain difference between model params and other params (you can just link to other docs if easier)

typeConverter=TypeConverters.toString,
)
modelParam = Param(Params._dummy(), "modelParam", "Model Parameters")
modelConfig = Param(Params._dummy(), "modelConfig", "Model configuration")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe explain difference between model config and other params (you can just link to other docs if easier)

useFabricLakehouse = Param(
Params._dummy(),
"useFabricLakehouse",
"Use FabricLakehouse",
Copy link
Collaborator

@mhamilton723 mhamilton723 Jan 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this is for a local cache then you might be able to make the verbage generic like useLocalCache

@JessicaXYWang
Copy link
Contributor Author

/azp run

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@JessicaXYWang
Copy link
Contributor Author

/azp run

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@JessicaXYWang
Copy link
Contributor Author

/azp run

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@JessicaXYWang
Copy link
Contributor Author

/azp run

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@JessicaXYWang
Copy link
Contributor Author

/azp run

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@JessicaXYWang
Copy link
Contributor Author

/azp run

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@JessicaXYWang
Copy link
Contributor Author

/azp run

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@JessicaXYWang
Copy link
Contributor Author

/azp run

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

return re.sub(r"(?<!^)(?=[A-Z])", "_", text).lower()


class ComputableObject:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: rename to _BroadcastableModel

Comment on lines +220 to +225
if value is not None:
bc_computable = ComputableObject(value, self.getModelConfig())
sc = SparkSession.builder.getOrCreate().sparkContext
self.bcObject = sc.broadcast(bc_computable)
else:
self.bcObject = None
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dont do this here

Comment on lines +174 to +179
if self.getCachePath():
bc_computable = ComputableObject(self.getCachePath(), self.getModelConfig())
sc = SparkSession.builder.getOrCreate().sparkContext
self.bcObject = sc.broadcast(bc_computable)
else:
self.bcObject = None
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do this in transform

modelName = Param(
Params._dummy(),
"modelName",
"model name",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

might want to link to the list of models on huggingface

cachePath = Param(
Params._dummy(),
"cachePath",
"cache path for the model. could be a lakehouse path",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should mention that this should be a shared location between the workers

deviceMap = Param(
Params._dummy(),
"deviceMap",
"Specifies a model parameter for the device Map. For GPU usage with models such as Phi 3, set it to 'cuda'.",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no need to mention phi3 specifically here

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants