Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

release 0.9.0 #2239

Open
wants to merge 442 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 1 commit
Commits
Show all changes
442 commits
Select commit Hold shift + click to select a range
134f63e
Remove info logging added for debugging
alaydshah Jun 6, 2024
fd446b0
Fix
alaydshah Jun 6, 2024
2478350
Merge pull request #2146 from FedML-AI/dev/v0.7.0
fedml-alex Jun 6, 2024
19abac1
[CoreEngine] update the version and dependent libs.
fedml-alex Jun 6, 2024
3e39975
Merge pull request #2148 from FedML-AI/alexleung/dev_v070_for_refactor
fedml-alex Jun 6, 2024
27ad2e7
[CoreEngine] remove the deprecated files in the scheduler.
fedml-alex Jun 6, 2024
2c7d434
Merge pull request #2149 from FedML-AI/alexleung/dev_v070_for_refactor
fedml-alex Jun 6, 2024
f487b12
Merge pull request #2140 from FedML-AI/alaydshah/inference_gateway_lo…
alaydshah Jun 6, 2024
493463e
[Deploy] Recursively find the model serving package folder
Raphael-Jin Jun 6, 2024
6d5f62b
Merge branch 'raphael/fix-multi-subfd' of https://github.com/FedML-AI…
fedml-dimitris Jun 6, 2024
b4cb7c5
Making sure the unzipped file is a directory during initial deployment.
fedml-dimitris Jun 6, 2024
5a05310
Merge pull request #2150 from FedML-AI/raphael/fix-multi-subfd
Raphael-Jin Jun 6, 2024
f76d88e
[Deploy] Hot fix grammar.
Raphael-Jin Jun 6, 2024
8247dd2
Merge pull request #2152 from FedML-AI/raphael/hot-fix-grammar
Raphael-Jin Jun 6, 2024
4b11270
Hot fix to support local debugging
alaydshah Jun 6, 2024
2de8c37
Bug fix
alaydshah Jun 7, 2024
343b940
Merge pull request #2153 from FedML-AI/alaydshah/inference_gateway/ho…
Raphael-Jin Jun 7, 2024
38bc898
Adding sequential uploads & download using presigned URL
bhargav191098 Jun 7, 2024
aa62a94
minor comments and some error handling
bhargav191098 Jun 7, 2024
14bae99
[CoreEngine] 1. fixed the issue that the fork method is not support i…
fedml-alex Jun 7, 2024
28ff0f3
[CoreEngine] add the missed import.
fedml-alex Jun 7, 2024
6b33065
Merge pull request #2155 from FedML-AI/alexleung/dev_v070_for_refactor
fedml-alex Jun 7, 2024
c151831
Adding hash set for counting the number of pending requests per endp…
fedml-dimitris Jun 6, 2024
c29cf1d
[Deploy] Unified timeout key.
Raphael-Jin Jun 10, 2024
e667ded
Merge pull request #2151 from FedML-AI/dimitris/fix_pending_requests_…
Raphael-Jin Jun 10, 2024
5214078
Merge pull request #2154 from FedML-AI/bhargav191098/storage_presigne…
alaydshah Jun 10, 2024
c4a8714
[Deploy] Report worker's connectivity when it finished.
Raphael-Jin Jun 11, 2024
ea03b60
[Deploy] Refactor the quick start example, use public ip as default.
Raphael-Jin Jun 11, 2024
af026fb
Merge pull request #2158 from FedML-AI/raphael/refactor-quick-start
alaydshah Jun 11, 2024
31d8e7c
[CoreEngine] Adjust the design of FedML Python Agent to a decentraliz…
fedml-alex Jun 11, 2024
cc90279
Merge pull request #2159 from FedML-AI/dev/v0.7.0
fedml-alex Jun 11, 2024
9c227bb
Merge pull request #2160 from FedML-AI/dev/v0.7.0
fedml-alex Jun 11, 2024
edd148e
[CoreEngine] Use the fork process on the MacOS and linux to avoid the…
fedml-alex Jun 11, 2024
fd5af7e
[CoreEngine] Use the fork process on the MacOS and linux to avoid the…
fedml-alex Jun 11, 2024
2248621
Merge pull request #2162 from FedML-AI/alexleung/dev_branch_latest
fedml-alex Jun 11, 2024
207b5fb
Merge branch 'raphael/unify-connectivity' of https://github.com/FedML…
fedml-dimitris Jun 11, 2024
4a9622c
Adding default http connectivity type constant. Fixing minor typos an…
fedml-dimitris Jun 11, 2024
653fe66
[CoreEngine] make the multiprocess work on windows, linux and mac.
fedml-alex Jun 11, 2024
a7567ee
Merge pull request #2164 from FedML-AI/alexleung/dev_branch_latest
fedml-alex Jun 11, 2024
34fdba0
Merge pull request #2157 from FedML-AI/raphael/unify-connectivity
Raphael-Jin Jun 11, 2024
23d88fc
[Deploy] Remove unnecessary logic.
Raphael-Jin Jun 11, 2024
e0ad9b5
[Deploy] Remove unnecessary logic; Rename readiness check function; F…
Raphael-Jin Jun 11, 2024
64e8c77
[Deploy] Nit
Raphael-Jin Jun 11, 2024
9194f84
[Deploy] Hide unnecessary log.
Raphael-Jin Jun 11, 2024
8530973
Merge pull request #2165 from FedML-AI/raphael/refactor-container-dep…
fedml-dimitris Jun 11, 2024
243be07
[Deploy] Read port info from env.
Raphael-Jin Jun 12, 2024
0b23499
[CoreEngine] make the status center work in the united agents.
fedml-alex Jun 12, 2024
c27edd0
Merge pull request #2166 from FedML-AI/alexleung/dev_branch_latest
fedml-alex Jun 12, 2024
3a03471
[Deploy] Nit.
Raphael-Jin Jun 12, 2024
f0dd29e
[Deploy] Nit.
Raphael-Jin Jun 12, 2024
21a8a4c
[Deploy] Change few more places relate to gateway port.
Raphael-Jin Jun 12, 2024
e7e974d
[Deploy] Write port info into env file.
Raphael-Jin Jun 12, 2024
9c8ce99
[Deploy] Nit.
Raphael-Jin Jun 12, 2024
bec28a6
Merge pull request #2167 from FedML-AI/raphael/hotfix-inference-port
Raphael-Jin Jun 13, 2024
505103f
removing zip from upload
bhargav191098 Jun 14, 2024
03c58a2
changes in the download to support files
bhargav191098 Jun 14, 2024
cb7da70
print statement removal
bhargav191098 Jun 14, 2024
394906e
name issue
bhargav191098 Jun 14, 2024
2170797
\Adding Enum for data type
bhargav191098 Jun 15, 2024
5fb5ed4
adding user_id to bucket path
bhargav191098 Jun 15, 2024
14a0182
Merge pull request #2168 from FedML-AI/bhargav191098/removing_archive
bhargav191098 Jun 15, 2024
a1af615
[CoreEngine] refactor to support to pass the communication manager, s…
fedml-alex Jun 17, 2024
07ae3a9
Merge pull request #2173 from FedML-AI/alexleung/dev_branch_latest
fedml-alex Jun 17, 2024
6e8788c
[CoreEngine] refactor to support to pass the communication manager, s…
fedml-alex Jun 17, 2024
ca16d2a
Merge pull request #2174 from FedML-AI/alexleung/dev_branch_latest
fedml-alex Jun 17, 2024
78e310c
[CoreEngine] stop the status center, message center and other process…
fedml-alex Jun 17, 2024
7233d62
Merge pull request #2176 from FedML-AI/alexleung/dev_branch_latest
fedml-alex Jun 17, 2024
aecafb8
Fix compatibility by limiting numpy latest version.
Raphael-Jin Jun 17, 2024
87e11f7
Merge pull request #2177 from FedML-AI/raphael/fix-compat
Raphael-Jin Jun 17, 2024
1af78e7
[CoreEngine] replace the queue with the managed queue to avoid the mu…
fedml-alex Jun 18, 2024
1cac911
Merge pull request #2178 from FedML-AI/alexleung/dev_branch_latest
fedml-alex Jun 18, 2024
89219fb
Workaround device mapping inconsistency
alaydshah Jun 18, 2024
4ceba31
Merge pull request #2179 from FedML-AI/alaydshah/qualcomm/workaround/…
alaydshah Jun 18, 2024
1d5a05d
[Deploy][Autoscale] Bug fix: continue the for loop if no scale op.
Raphael-Jin Jun 19, 2024
a388915
Merge pull request #2182 from FedML-AI/raphael/fix-deploy
alaydshah Jun 19, 2024
31c57e0
Polishing the autoscaler real test.
fedml-dimitris Jun 19, 2024
4cb53fe
Replacing e_id.
fedml-dimitris Jun 19, 2024
4cc39fb
Merge pull request #2185 from FedML-AI/feature/autoscaler-real-test
fedml-dimitris Jun 19, 2024
1422fa1
[CoreEngine] check the nil pointer and update the numpy version.
fedml-alex Jun 19, 2024
c088de4
Merge pull request #2186 from FedML-AI/alexleung/dev_branch_latest
fedml-alex Jun 19, 2024
158eb9c
[CoreEngine] remove the deprecated action runners.
fedml-alex Jun 19, 2024
86b3db0
Merge pull request #2187 from FedML-AI/alexleung/dev_branch_latest
fedml-alex Jun 19, 2024
c485282
[CoreEngine] remove the unused files.
fedml-alex Jun 19, 2024
6b9cb03
Merge pull request #2188 from FedML-AI/alexleung/dev_branch_latest
fedml-alex Jun 19, 2024
9f996ab
[CoreEngine] when the deploy master reports finished status, we shoul…
fedml-alex Jun 19, 2024
82ca218
Merge pull request #2189 from FedML-AI/alexleung/dev_branch_latest
fedml-alex Jun 19, 2024
f28adea
[CoreEngine] Fix the stuck issue in the deploy master agent.
fedml-alex Jun 19, 2024
6b5e56f
Merge pull request #2190 from FedML-AI/alexleung/dev_branch_latest
fedml-alex Jun 19, 2024
31b7ae0
[Deploy] Hotfix: job runner context lost when logout.
Raphael-Jin Jun 20, 2024
eb0f207
Merge pull request #2191 from FedML-AI/raphael/hotfix-jobrunner
Raphael-Jin Jun 20, 2024
942b223
[ TEST ]: Initialize a GitHub Actions framework for CI tests
Jun 20, 2024
afe4147
[CoreEngine] in order to debug easily for multiprocessing, add the pr…
fedml-alex Jun 20, 2024
c47f527
Merge pull request #2193 from FedML-AI/alexleung/dev_branch_latest
fedml-alex Jun 20, 2024
fd257b8
[CoreEngine] update the dependant libs.
fedml-alex Jun 20, 2024
29a397f
Merge pull request #2194 from FedML-AI/alexleung/dev_branch_latest
fedml-alex Jun 20, 2024
7ccf195
[Deploy] Support arbitrary container image onboarding.
Raphael-Jin Jun 15, 2024
9ca6ecc
[Deploy] Add LoraX and Triton examples; Add url match pattern.
Raphael-Jin Jun 18, 2024
786718b
[Deploy] Support serverless container.
Raphael-Jin Jun 20, 2024
c0f691c
[Deploy] Nit.
Raphael-Jin Jun 20, 2024
67e93e8
Merge pull request #2195 from FedML-AI/raphael/pr/support-arbitrary-i…
Raphael-Jin Jun 20, 2024
7a0963e
[TEST]: add windows runners tests
Jun 21, 2024
4355c35
[doc]: make sure the workflow documents are more readable.
Jun 21, 2024
be60443
[doc]: make sure the workflow documents are more readable.
Jun 21, 2024
b63d960
[Merge]
Jun 21, 2024
d7481be
[CoreEngine] set the name of all monitor processes, remove the redund…
fedml-alex Jun 21, 2024
aa813a0
[CoreEngine] remove the API key.
fedml-alex Jun 21, 2024
11ef2a5
Merge pull request #2197 from FedML-AI/alexleung/dev_branch_latest
fedml-alex Jun 21, 2024
0491bb7
Merge pull request #2192 from Qigemingziba/github_action
fedml-alex Jun 21, 2024
fd038b5
Merge pull request #2199 from FedML-AI/alexleung/dev_v070_for_refactor
fedml-alex Jun 21, 2024
33fb5b4
[Deploy] Pass down the api key to container.
Raphael-Jin Jun 21, 2024
f412a26
[Deploy] Nit.
Raphael-Jin Jun 21, 2024
6ec7379
Merge pull request #2200 from FedML-AI/raphael/pass-api-key
Raphael-Jin Jun 21, 2024
d6c9411
[Deploy] Remove example.
Raphael-Jin Jun 21, 2024
dcc0845
Merge pull request #2201 from FedML-AI/raphael/remove-example
fedml-dimitris Jun 21, 2024
5ae6904
[CoreEngine] make the job stopping feature work.
fedml-alex Jun 25, 2024
0db8666
Merge pull request #2203 from FedML-AI/alexleung/dev_branch_latest
fedml-alex Jun 25, 2024
fa44ccc
[Deploy] Return custom path other than /predict.
Raphael-Jin Jun 25, 2024
bd89be1
[Deploy] Add sqlite backup for get_all_deployment_result_list.
Raphael-Jin Jun 25, 2024
43f99cf
[Deploy] Nit.
Raphael-Jin Jun 25, 2024
766c52a
[Deploy] Nit.
Raphael-Jin Jun 25, 2024
0c29c49
[Deploy] Hot fix hash exist.
Raphael-Jin Jun 26, 2024
9930d2d
Merge pull request #2204 from FedML-AI/raphael/refactor-inf-service
Raphael-Jin Jun 26, 2024
36378f8
[Deploy] Indicate worker connection type through cli and api.
Raphael-Jin Jun 26, 2024
5097ff2
[Deploy] Nit.
Raphael-Jin Jun 26, 2024
72ed9a3
Merge pull request #2205 from FedML-AI/raphael/hot-fix-hash-exist
Raphael-Jin Jun 26, 2024
7193577
Merge pull request #2206 from FedML-AI/raphael/indicate-connection-type
bhargav191098 Jun 26, 2024
a932082
Merge branch 'dev/v0.7.0' into alexleung/dev_v070_for_refactor
fedml-alex Jun 26, 2024
a5bbcd2
Merge pull request #2163 from FedML-AI/alexleung/dev_v070_for_refactor
fedml-alex Jun 28, 2024
084781f
Add logs in occupy_gpu_ids, and funcs in hardware_utils for debugging
alaydshah Jul 2, 2024
37be694
Revert "Adjust the design of FedML Python Agent to a decentralized ar…
Jul 2, 2024
babf08c
Merge pull request #2208 from FedML-AI/revert-2163-alexleung/dev_v070…
Raphael-Jin Jul 2, 2024
62c4bb8
Merge pull request #2207 from FedML-AI/alay_and_raphael/debug/deploym…
alaydshah Jul 2, 2024
f5ad35b
[Deploy] Fix round-robin algorithm; Format code.
Raphael-Jin Jul 8, 2024
c8e5755
[Deploy] Use terminology expose_subdomains.
Raphael-Jin Jul 22, 2024
7d47d27
Merge pull request #2214 from FedML-AI/raphael/change-terminology
Raphael-Jin Jul 24, 2024
281f8c0
Add marketplace_type, price_per_hour as optional login parameters
alaydshah Aug 1, 2024
24e4ce4
Fixes
alaydshah Aug 5, 2024
b692734
Nits
alaydshah Aug 5, 2024
06f70a4
Merge pull request #2210 from FedML-AI/raphael/refac-round-robin
alaydshah Aug 5, 2024
043fa6e
Bugfix
alaydshah Aug 6, 2024
682a6c4
Merge pull request #2216 from FedML-AI/alaydshah/update/provider_login
alaydshah Aug 7, 2024
bbf2493
Adding validation and price range restriction
alaydshah Aug 8, 2024
d9b4b8a
Merge pull request #2217 from FedML-AI/alaydshah/fix/provider_login
alaydshah Aug 8, 2024
a990a60
[Deploy] Automatically mount the workspace to container in the defaul…
Raphael-Jin Aug 12, 2024
dfd8308
[Deploy] Support bootstrap and CMD be indicated together.
Raphael-Jin Aug 12, 2024
1a09b0e
[Deploy] Nit.
Raphael-Jin Aug 12, 2024
7e5f6a1
[Deploy] Add example.
Raphael-Jin Aug 13, 2024
0d75918
Merge pull request #2218 from FedML-AI/raphael/refactor-mount-logic
Raphael-Jin Aug 16, 2024
47efcde
feat: Add name parameter to the bindingEdge method
alaydshah Aug 22, 2024
135c55b
Pass name into login
alaydshah Aug 22, 2024
4cf8066
Fixing grpc and trpc ipconfig from 127.0.0.0 to 0.0.0.0
Sep 4, 2024
be6196e
Merge pull request #2221 from FedML-AI/alaydshah/render/login/name
alaydshah Sep 4, 2024
277f4ca
Remove if condition, add log
alaydshah Sep 5, 2024
c7bfa63
Merge pull request #2222 from FedML-AI/alaydshah/fix/name
alaydshah Sep 5, 2024
17dd2b7
Stringify name
alaydshah Sep 12, 2024
4a198eb
Set name arg required to True
alaydshah Sep 12, 2024
16417d5
Making name optional
alaydshah Sep 12, 2024
03f37b8
Merge pull request #2223 from FedML-AI/alaydshah/bugfix/name
alaydshah Sep 12, 2024
53aead3
add the new certs.
fedml-alex Sep 13, 2024
f46cd1e
update new certs.
fedml-alex Sep 13, 2024
fa11d0b
Merge pull request #2224 from FedML-AI/alexleung/dev_v0700_for_merge
fedml-alex Sep 13, 2024
e046f5b
Fixing grpc compatibility with the fedml.ai platform and simplifying …
Sep 13, 2024
87ae30a
Merge branch 'dev/v0.7.0' into dimitris/grpc_fix
Sep 14, 2024
303f29b
Removing empty line.
Sep 14, 2024
1420898
[CoreEngine] set the cuda visible id into the docker container when t…
fedml-alex Sep 16, 2024
d0826d9
Merge pull request #2226 from FedML-AI/alexleung/dev_v0700_for_merge
fedml-alex Sep 16, 2024
30cfe02
set the gpu ids when training.
fedml-alex Sep 18, 2024
99903b2
Merge pull request #2227 from FedML-AI/alexleung/dev_v0700_for_merge
ASCE1885 Sep 18, 2024
f6b8c44
Merge pull request #2225 from FedML-AI/dimitris/grpc_fix
fedml-alex Sep 20, 2024
16a79d9
Adding simple local env docker client checker.
Oct 16, 2024
f299a8e
Adding more docker client existence checkpoints.
Oct 16, 2024
3349667
Fixing grpc readme file.
Oct 17, 2024
d2484fa
Remove circular dependency.
Oct 17, 2024
a959802
Extending grpc support to also consider docker container ips.
Oct 17, 2024
aa69122
Fixing notation and attribute names in grpc config files.
Oct 17, 2024
c302749
testing with ingress ip.
Oct 18, 2024
292bfb3
Polishing grpc + docker examples.
Oct 18, 2024
c6d4daf
Merge pull request #2229 from FedML-AI/dimitris/grpc_with_docker
fedml-alex Oct 24, 2024
55ff447
Parameterizing deploy host, port.
fedml-dimitris Nov 4, 2024
9fc5b4d
Merge pull request #2230 from FedML-AI/inference_runner_custom_host_port
fedml-alex Nov 6, 2024
a108a8a
[Deploy] Edge Case Handling.
Raphael-Jin Nov 11, 2024
98e084a
Merge pull request #2232 from FedML-AI/raphael/quick-fix-error-catch
fedml-alex Nov 11, 2024
698e95e
[fixbug]
charlieyl Dec 2, 2024
56f6059
Merge pull request #2233 from FedML-AI/charlie/dev/v0.7.0
charlieyl Dec 2, 2024
757e5f0
add log in get_available_gpu_ids[hardware_utils.py]
charlieyl Dec 10, 2024
c5c22c8
add logs
charlieyl Dec 10, 2024
ffa54a3
add logs
charlieyl Dec 10, 2024
589dc47
add logs
charlieyl Dec 10, 2024
40735d9
add logs
charlieyl Dec 10, 2024
90c1191
[bugfix]Enhance GPU management(need compare the readtime availabe gpu…
charlieyl Dec 11, 2024
30e1f70
[debug]add logs
charlieyl Dec 11, 2024
21a374c
[bugfix] Enhance GPU cache management by setting initial available GP…
charlieyl Dec 11, 2024
02b87f4
[bugfix]calculate the difference between realtime_available_gpu_ids a…
charlieyl Dec 11, 2024
9ff4a56
[bugfix]set shm_size to 8G if not specified
charlieyl Dec 11, 2024
fee49a4
Merge pull request #2234 from FedML-AI/dev/v0.7.0
charlieyl Dec 17, 2024
d5831b9
Revert "Merge pull request #2233 from FedML-AI/charlie/dev/v0.7.0"
charlieyl Dec 17, 2024
9fa8499
Merge pull request #2235 from FedML-AI/revert-2234-dev/v0.7.0
charlieyl Dec 17, 2024
2055d68
Merge pull request #2237 from FedML-AI/charlie/dev/v0.7.0
charlieyl Dec 17, 2024
bb31c93
remove debug logs
charlieyl Dec 18, 2024
cb489f8
Merge pull request #2238 from FedML-AI/charlie/dev/v0.7.0
charlieyl Dec 18, 2024
e27b830
check the gpu avaiablity using the random api to adapte the rental gpus.
Dec 20, 2024
181621a
Merge pull request #2240 from FedML-AI/alexleung/dev_v0700_4_sync
charlieyl Dec 20, 2024
77c6906
[bugfix]Adapt log method(in transformers/trainer.py) parameters
charlieyl Dec 20, 2024
038faaf
Merge branch 'dev/v0.7.0' into charlie/dev/v0.7.0
charlieyl Dec 20, 2024
a5708ea
Merge branch 'dev/v0.7.0' into charlie/dev/v0.7.0
charlieyl Dec 20, 2024
74803fe
[bugfix]Add the. zip suffix to the s3 key of the model card
charlieyl Dec 20, 2024
63f2110
Merge pull request #2241 from FedML-AI/charlie/dev/v0.7.0
fedml-alex Dec 20, 2024
93f9760
[update]Upgrade official website address: https://tensoropera.ai , an…
charlieyl Dec 20, 2024
ca5e764
Merge pull request #2242 from FedML-AI/charlie/dev/v0.7.0
fedml-alex Dec 20, 2024
4acb0f0
undo "Welcome to FedML.ai!"
charlieyl Dec 20, 2024
24469d2
Merge pull request #2243 from FedML-AI/charlie/dev/v0.7.0
fedml-alex Dec 20, 2024
07ae5ec
[bugfix]start_job_perf on execute_job_task
charlieyl Dec 25, 2024
f08a1ab
Merge pull request #2244 from FedML-AI/charlie/dev/v0.7.0
charlieyl Dec 25, 2024
46d766a
[logs]add deploy param
charlieyl Dec 28, 2024
62d331d
Merge pull request #2248 from FedML-AI/dev/v0.7.0
fedml-alex Feb 10, 2025
0f1a37e
add the login cli for the service provider named chainopera.
Feb 10, 2025
f98f7c2
Merge pull request #2249 from FedML-AI/alexleung/dev_v0700_4_sync
charlieyl Feb 11, 2025
a912c57
[bugfix] Handle deployment failure by deleting deployed replicas and …
charlieyl Feb 11, 2025
c6bfe20
[refactor] Disable request timeout middleware in device model inference
charlieyl Feb 12, 2025
7134c15
add logs
charlieyl Feb 12, 2025
f7552c4
[logs] Always enable log file
charlieyl Feb 12, 2025
d3b447a
[refactor] Optimize HTTP inference client and log file handling
charlieyl Feb 12, 2025
5910111
[perf] Optimize Uvicorn server configuration for improved inference g…
charlieyl Feb 12, 2025
a728037
[perf] Disable uvloop and httptools in model inference gateway
charlieyl Feb 12, 2025
3e1ae10
[perf] Reduce model inference gateway workers from 10 to 2
charlieyl Feb 12, 2025
d6d67e8
[perf] Optimize HTTP inference client and Uvicorn server configuration
charlieyl Feb 12, 2025
3bc0666
[perf] Remove verbose logging in model inference request handling
charlieyl Feb 13, 2025
ad22de4
[bugfix-combination] Add model configuration details for deployment f…
charlieyl Feb 13, 2025
2528f4f
[bugfix] Update default model configuration parameters for safer depl…
charlieyl Feb 13, 2025
60be0f7
[feature] Add endpoint_name parameter to model deployment method
charlieyl Feb 14, 2025
7853c90
[bugfix] Restore full GPU card selection parameters in NvidiaGPUtil
charlieyl Feb 17, 2025
068ed51
[perf] Reduce job metrics reporting sleep interval to 15 seconds and …
charlieyl Feb 17, 2025
8ac783d
[chore] Bump version to 0.9.6-dev202502181030
charlieyl Feb 18, 2025
b11558b
[feature] Enhance container log retrieval for exited containers
charlieyl Feb 18, 2025
a6702a4
Merge pull request #2250 from FedML-AI/dev/v0.7.0
charlieyl Feb 18, 2025
f376824
[bugfix] Improve lock handling in MLOps logging utilities
charlieyl Feb 19, 2025
cd976a5
[refactor] Simplify lock handling in MLOps logging utilities
charlieyl Feb 19, 2025
add749a
[feature] Add robust database operation error handling decorator
charlieyl Feb 19, 2025
46b8ce7
[upd] Bump version to 0.9.6-dev202502191600
charlieyl Feb 19, 2025
38a930a
[debug] Add logging for endpoint replica information in job monitor
charlieyl Feb 19, 2025
69c0d84
[debug] Re-enable logging for model monitoring metrics
charlieyl Feb 20, 2025
dcfe268
[refactor] Improve lock handling and error management in MLOps loggin…
charlieyl Feb 20, 2025
a26e1b3
[upd] Bump version to 0.9.6-dev202502202000
charlieyl Feb 20, 2025
cdb32e9
[upd] Bump version to 0.9.6-dev202502202000-2
charlieyl Feb 20, 2025
4dd13df
[debug] Remove verbose logging in job monitor and model metrics
charlieyl Feb 20, 2025
9516a2e
[debug] Reorder logging initialization in deployment protocol managers
charlieyl Feb 21, 2025
ef32cfb
Merge pull request #2246 from FedML-AI/charlie/dev/v0.7.0
charlieyl Feb 24, 2025
dac284e
[debug] Remove verbose logging in job monitor
charlieyl Feb 24, 2025
4f05779
Merge pull request #2252 from FedML-AI/charlie/dev/v0.7.0
charlieyl Feb 24, 2025
d5b8150
Release version 0.9.6
charlieyl Feb 25, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
[Deploy] Recursively find the model serving package folder
  • Loading branch information
Raphael-Jin committed Jun 6, 2024
commit 493463e3b4002edb929df966cd1dd409b3a60522
13 changes: 13 additions & 0 deletions python/fedml/computing/scheduler/comm_utils/file_utils.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
import os


def find_file_inside_folder(folder_path, file_name):
"""
Recursively search for a file inside a folder and its sub-folders.
return the full path of the file if found, otherwise return None.
"""
for root, dirs, files in os.walk(folder_path):
if file_name in files:
return os.path.join(root, file_name)

return None
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,7 @@
import yaml
from fedml.computing.scheduler.comm_utils.job_utils import JobRunnerUtils
from fedml.core.mlops import MLOpsRuntimeLog
from fedml.computing.scheduler.comm_utils import file_utils
from .device_client_constants import ClientConstants
from .device_model_cache import FedMLModelCache
from ..scheduler_core.general_constants import GeneralConstants
Expand Down Expand Up @@ -205,7 +206,7 @@ def run_impl(self, run_extend_queue_list, sender_message_center,
# Check if the package is already downloaded
unzip_package_path = ""
if os.path.exists(os.path.join(models_root_dir, parent_fd)):
unzip_package_path = self.find_previous_downloaded_pkg(os.path.join(models_root_dir, parent_fd), model_name)
unzip_package_path = self.find_previous_downloaded_pkg(os.path.join(models_root_dir, parent_fd))

# Download the package if not found
if unzip_package_path == "":
Expand Down Expand Up @@ -510,30 +511,13 @@ def build_dynamic_constrain_variables(self, run_id, run_config):
pass

@staticmethod
def find_previous_downloaded_pkg(parent_dir: str, model_name: str) -> str:
unzip_fd = ""
res = ""

for folder in os.listdir(parent_dir):
if folder.startswith("unzip_fedml_run"):
unzip_fd = os.path.join(parent_dir, folder)
break

exact_matched = False

if unzip_fd == "":
return res

for folder in os.listdir(unzip_fd):
if folder == model_name:
res = os.path.join(unzip_fd, folder)
exact_matched = True
break

if not exact_matched:
# Use the first folder found
for folder in os.listdir(unzip_fd):
res = os.path.join(unzip_fd, folder)
break

return res
def find_previous_downloaded_pkg(parent_dir: str) -> str:
"""
Find a folder inside parent_dir that contains the fedml_model_config.yaml file.
"""
res = file_utils.find_file_inside_folder(parent_dir, ClientConstants.MODEL_REQUIRED_MODEL_CONFIG_FILE)
if res is not None:
# return the parent folder of res
return os.path.dirname(res)
else:
return ""
Loading