24 Apr 23:13

mhamilton723

866261c

SynapseML v0.11.1

Bug Fixes 🐞

set default values for aadToken & url for internal Synapse (#1918)
ONNX model shape inference cannot handle batch with shape [-1] (#1906)
forgot to add getPValue to python side (#1909)
generate random dir for each test (#1908)
add back diagnosticsInfo for MVAD (#1892)
DML run get timeout if big dataset has more feature columns (Workaround Synapse Spark optimizer issue) (#1903)
fix date parsing in FaceSuite test (#1896)
fix Build pipeline (#1904)
Retry OnnxHub call to improve test reliability (#1889)
Normalize line-endings (#1883)
Remove case matching for erased generic type (#1880)
fix bug #1869, DML .setFitIntercept should be set to true (#1876)
Remove extraneous "Foo" type from Py codegen (#1867)
Allow variable size in ONNX inputs (#1851)
Abstain from CodeQL for markdown-only changes (#1865)
fix style
update OpenAIEmbedding internalServiceType

Build 🏭

bump peter-evans/create-or-update-comment from 2 to 3 (#1907)
bump ossf/scorecard-action from 2.1.2 to 2.1.3 (#1898)
bump amannn/action-semantic-pull-request from 5.1.0 to 5.2.0 (#1878)
bump @sideway/formula from 3.0.0 to 3.0.1 in /website (#1874)
bump webpack from 5.75.0 to 5.76.1 in /website (#1870)

Documentation 📘

Fix installation instruction in the webpage for the build.sbt file (#1921)
note discrete treatment data type (#1905)
add custom chatbot creation to form demo (#1888)
add overview page for simple DNN and fix some typos (#1879)
Fix a typo in installation docs
fix link issue in CONTRIBUTING.md (#1864)
fix a few issues in cognitive service demo (#1861)

Features 🌈

add streaming API for MVAD (#1893)
[DistributionBalanceMeasure] Add implementation + unit tests for custom reference distribution (#1885)
Add ChatGPT through the OpenAIChatCompletion transformer (#1887)
support new api version of form recognizer (#1882)
Add a new function to DMLModel, getPValue (#1863)
update default internal endpoint for cog services (#1859)

Maintenance 🔧

bump to v0.11.1 (#1933)
Adding telemetry for the dataset metadata. This one is specially for … (#1917)
fix r tests (#1927)
fix build issues (#1916)
disable test until Synapse is fixed (#1915)
add .bloop to .gitignore (#1897)
clean up old/missed search indexes in SearchWriterSuite (#1901)
Add utility to clean azure search indexes
update website docs to point to correct developer API docs (#1877)
Update pipeline.yaml for Azure Pipelines (#1866)
make sure nightly build has new commit

Changes:

866261c chore: bump to v0.11.1 (#1933)
3c09702 chore: Adding telemetry for the dataset metadata. This one is specially for … (#1917)
0d0d10c feat: add streaming API for MVAD (#1893)
1b71c1d chore: fix r tests (#1927)
0df97ad chore: fix build issues (#1916)
78695fb Update Regression - Vowpal Wabbit vs. LightGBM vs. Linear Regressor.ipynb (#1922)
87d5bc5 docs: Fix installation instruction in the webpage for the build.sbt file (#1921)
8320b2b fix: set default values for aadToken & url for internal Synapse (#1918)
4912ae4 chore: disable test until Synapse is fixed (#1915)
469445b fix: ONNX model shape inference cannot handle batch with shape [-1] (#1906)

See More

3fa001e build: bump peter-evans/create-or-update-comment from 2 to 3 (#1907)
f51327e Update LightGBM version to 3.3.5 (#1910)
b1e584e fix: forgot to add getPValue to python side (#1909)
a09a6f7 docs: note discrete treatment data type (#1905)
0fa3f2a fix: generate random dir for each test (#1908)
736c317 fix: add back diagnosticsInfo for MVAD (#1892)
13afff6 fix: DML run get timeout if big dataset has more feature columns (Workaround Synapse Spark optimizer issue) (#1903)
7546e7f build: bump ossf/scorecard-action from 2.1.2 to 2.1.3 (#1898)
f227f02 fix: fix date parsing in FaceSuite test (#1896)
0f02626 fix: fix Build pipeline (#1904)
ce9fe41 chore: add .bloop to .gitignore (#1897)
7ffa970 chore: clean up old/missed search indexes in SearchWriterSuite (#1901)
9a6cf03 chore: Add utility to clean azure search indexes
52919ce fix: Retry OnnxHub call to improve test reliability (#1889)
979c629 feat: [DistributionBalanceMeasure] Add implementation + unit tests for custom reference distribution (#1885)
412620a docs: add custom chatbot creation to form demo (#1888)
9f634a6 feat: Add ChatGPT through the OpenAIChatCompletion transformer (#1887)
7657089 fix: Normalize line-endings (#1883)
c156792 feat: support new api version of form recognizer (#1882)
ed842a5 docs: add overview page for simple DNN and fix some typos (#1879)
87e1c78 fix: Remove case matching for erased generic type (#1880)
cd72bc9 build: bump amannn/action-semantic-pull-request from 5.1.0 to 5.2.0 (#1878)
564d047 fix: fix bug #1869, DML .setFitIntercept should be set to true (#1876)
392dbbf chore: update website docs to point to correct developer API docs (#1877)
129abde build: bump @sideway/formula from 3.0.0 to 3.0.1 in /website (#1874)
4d1c560 build: bump webpack from 5.75.0 to 5.76.1 in /website (#1870)
62c79d8 docs: Fix a typo in installation docs
1f63dab feat: Add a new function to DMLModel, getPValue (#1863)
83f8260 fix: Remove extraneous "Foo" type from Py codegen (#1867)
a5bec45 fix: Allow variable size in ONNX inputs (#1851)
23c9b0a chore: Update pipeline.yaml for Azure Pipelines (#1866)
dedcbda docs: fix link issue in CONTRIBUTING.md (#1864)
a7f31d5 fix: Abstain from CodeQL for markdown-only changes (#1865)
a5f38b1 Update DoubleMLEstimator test CI verification (#1862)
a44f917 fix: fix style
cc931af fix: update OpenAIEmbedding internalServiceType
424d586 feat: update default internal endpoint for cog services (#1859)
e4a0e2c docs: fix ...

Assets 2

05 Mar 13:37

mhamilton723

v0.11.0

7b23764

SynapseML v0.11.0

Building production ready distributed machine learning pipelines can be a challenge for even the most seasoned researcher or engineer. We are excited to announce the release of SynapseML v0.11.0 (Previously MMLSpark), an open-source library that aims to simplify the creation of massively scalable machine learning pipelines. SynapseML unifies several existing ML Frameworks and new MSFT algorithms in a single, scalable API that’s usable across Python, R, Scala, Java, .NET, C#, and F#.

Highlights


ChatGPT and GPT-4 at Scale	Simple Deep Learning	LightGBM v2
Intelligent chat and embeddings. Simplified Prompting APIs.	Train custom image and text classifiers with ease	Higher performance, >10x lower memory footprint, same API
View Notebook	Learn More	Try an example


ONNX Model Hub	Causal Learning	Vowpal Wabbit v2
Embed >150 state of the art deep networks into your pipelines	Discover and measure causal treatment effects	New second generation integration
Learn More	View Docs	Explore Samples

New Features

General ✨

R Support is no longer Beta! (#1586)
Support for Spark 3.2.3

Open AI 🤖

Add OpenAI Prompt Template support (#1843)
Add Azure OpenAI embedding support (#1832)
Add Azure Active Directory authentication for OpenAI (#1829)
Add Null-value handling for OpenAI models (#1854)

Deep Learning 🕸

Remove CNTK functionality and replace with ONNX (#1593)
Add the DeepTextClassifier a simple API for fine tuning a wide array of Hugging Face 🤗 text transformers using PyTorch Lightning (#1591)
Add the DeepVisionClassifier a simple API for deep transfer learning and fine-tuning of a variety of vision backbones (#1518)

Azure Cognitive Services for Big Data 🧠

Add SpeakerEmotionInference transformer to generate emotion annotation tags for emotive reading in SpeechToText (#1691)
Add new AnalyzeText API (#1760)
Support Azure Active Directory (AAD) authentication for the cognitive services (#1778, #1797)
Move different cognitive services into sub packages (#1746)
Add audiobook generation example (#1852)
Add a notebook for advanced cognitive service usage (#1825)
Upgrade MVAD to v1.1 (#1788)
Remove MVAD's dependence on hardwired credentials and azure SDKs (#1629)
Add word-level timing to SpeechToTextSDK and ConversationTranscription (#1801)
Add the descriptionExcludes parameter to AnalyzeImage (#1590)

Causal Learning 📈

Add the causal DoubleMLEstimator for learning causal treatment effects from data (#1715)
Add a DoubleMLEstimator document and sample notebook (#1730)
Fix DML regression bug, should remove both treatment and outcome columns as feature columns (#1820)
Add TreatmentCol type checking (#1816)
Update test to validate ATE value should be positive (#1821)
Fix issue with missing causal test coverage (#1799)

LightGBM 🌳

Add LightGBM streaming execution mode for more reliable performance with orders of magnitude less memory. (#1580)
Add maxNumClasses param to LightGBMClassifier for multi-class (#1841)
Added the passThroughArgs feature which allows users to set low level LGBM parameters before they are wrapped in SparkML (#1749)

Vowpal Wabbit 🐇

Vowpal Wabbit v2 (#1579):
- Support Vowpal Wabbit input format using VowpalWabbitGeneric model
- Support additional algorithms & label types (multi-class, cost sensitive one against all): sample notebook
- Progressive validation (aka 1-step ahead) using VowaplWabbitGenericProgressive
- New Contextual Bandit Offline Policy Evaluation Notebook
- Data parallel training independent of cluster size

Additional Updates

Bug Fixes 🐞

Support grayscale images in toNDArray (#1592)
Adjust learning rate in VW example notebook (#1853)
Correct copy/paste error in acr cleanup (#1838)
Fix synapse test config, and isolation forest notebook (#1833)
Add spark config to fix ArrayStoreException (#1757)
Fix breeze NoSuchMethodError (#1807)
Fix modelVersion param in TextAnalytics (#1756)
Make logging infrastructure consistent and add logging checks (#1755)
Fix website sidebars and vulnerabilities in packages (#1753)
Remove Vowpal Wabbit exclusion, add Interpretability exclusion (#1708)
Update isolation forest notebook (#1696)
Remove error on invalid columns in DropColumns (#1695)
Fix PyArrow failure in deeplearning test (#1689)
Fix linked service setters on cog service base class (#1685)
KernelSHAP throws error when the key type in the ZipMap output is LongType (#1656)
Fix flaky translate tests (#1643)
Fix speechToTextSuite serialization Fuzzing failure (#1626)
Fix translator endpoint and update all endpoints for gov regions (#1623)
Finder runtime issues (#1598)
Clean up cluster if Databricks tests pass ([#1599](https://github....

Contributors

nightscape, svotaw, and 20 other contributors

Assets 2

22 Nov 14:30

mhamilton723

v0.10.2

cd1d2ea

SynapseML v0.10.2

v0.10.2

Bug Fixes 🐞

remove Vowpal Wabbit exclusion, add Interpretability exclusion (#1708)
remove synapse E2E testing exclusion - cyber ml (#1699)
update isolation forest notebook (#1696)
don't throw on invalid columns in DropColumns (#1695)
fix pyarrow failure in deeplearning test (#1689)
fix linked service on cog service base (#1685)
fix Uplift Modelling style
KernelSHAP throws error when the key type in the ZipMap output is LongType (#1656)
fix flaky translate tests (#1643)
update ubuntu to 20.04 in pipeline (#1624)

Build 🏭

bump actions/checkout from 2 to 3 (#1737)
bump loader-utils from 2.0.2 to 2.0.3 in /website (#1709)
bump amannn/action-semantic-pull-request from 5.0.1 to 5.0.2 (#1688)
bump amannn/action-semantic-pull-request from 4 to 5.0.1 (#1680)

Documentation 📘

update developer readme instruction on python env creation (#1693)
fix multiple typos and update error hintings in ai-samples-timeseries notebook (#1663)
improve error msg to make it clearer for users and fix typos (#1662)
simplify data downloading and add mlflow to uplift modelling (#1659)
move magic command forward since it restarts interpreter
remove unused docs and fix links
improve example notebooks
add aisample uplift modelling (#1640)
fix command to launch jupyter notebook (#1649)
add mlflow in ai samples time series forecasting (#1645)
add mlflow logging and loading (#1641)
update spark version in Readme
improve readme overview
add aisample on text classification (#1617)

Features 🌈

add simple deep learning text classifier (#1591)
Add SpeakerEmotionInference transformer for generating SSML t… (#1691)
Deprecate CNTK objects (#1712)
Remove CNTK functionality and replace with ONNX (#1593)
R test generation (#1586)

Maintenance 🔧

bump version to 0.10.2 (#1738)
fix style (#1736)
automate clean-acr with github action workflow (#1735)
autodelete old models (#1729)
Making secrets optional and cached (#1726)
add secret scanning infrastructure (#1724)
Move new ImageFeaturizer to onnx namespace (#1711)
ScalaStyle fixes (#1716)
update scalatest and scalactic (#1706)
remove synapse test exclusions (#1698)
pin az and python versions (#1705)
fix ado integration (#1704)
remove notebooks (#1703)
fix reopen comment action
fix reopen on comment workflow
fix typo in issue reopen yaml
re open github issues after a comment (#1676)
clean up github workflows and add issue label remover (#1674)
turn off failing synapse tests temporarily (#1658)
added synapse-internal to platform detector function (#1651)
publish test jars
improve test coverage (#1631)
Remove MVAD's dependence on hardwired credentials and azure SDKs (#1629)
clean up TextAnalytics cog service APIs (#1622)

Testing 💚

Additional E2E testing infrastructure (#1727)
Improve ONNXtests reliability (#1713)

Acknowledgements

We would like to acknowledge the developers and contributors, both internal and external who helped create this version of SynapseML.\n

Changes:

cd1d2ea chore: bump version to 0.10.2 (#1738)
fd78889 build: bump actions/checkout from 2 to 3 (#1737)
c806ba7 chore: fix style (#1736)
e6b5a90 feat: add simple deep learning text classifier (#1591)
1de2d55 chore: automate clean-acr with github action workflow (#1735)
952d1bd clarify date comparisons when deleting old models/groups (#1733)
6ea02bd chore: autodelete old models (#1729)
8b02e1d chore: Making secrets optional and cached (#1726)
c62c6ad test: Additional E2E testing infrastructure (#1727)
aeb2ff7 feat: Add SpeakerEmotionInference transformer for generating SSML t… (#1691)

See More

0b96cc5 chore: add secret scanning infrastructure (#1724)
2a7a67b feat: Deprecate CNTK objects (#1712)
e38e3ad chore: Move new ImageFeaturizer to onnx namespace (#1711)
0ff6802 test: Improve ONNXtests reliability (#1713)
fe4c5d2 chore: ScalaStyle fixes (#1716)
050b541 build: bump loader-utils from 2.0.2 to 2.0.3 in /website (#1709)
f2e88fd feat: Remove CNTK functionality and replace with ONNX (#1593)
abdfe19 fix: remove Vowpal Wabbit exclusion, add Interpretability exclusion (#1708)
6a1f994 chore: update scalatest and scalactic (#1706)
144674f chore: remove synapse test exclusions (#1698)
32c654b chore: pin az and python versions (#1705)
c8fba28 chore: fix ado integration (#1704)
92d4095 chore: remove notebooks (#1703)
a953780 fix: remove synapse E2E testing exclusion - cyber ml (#1699)
b257c70 fix: update isolation forest notebook (#1696)
9120b05 using predictionCol for isolation forest (#1686) [ #1060 ]
448f6b7 Remove trident.mlflow APIs. (#1687)
f4af33f fix: don't throw on invalid columns in DropColumns (#1695)
c531bbb docs: update developer readme instruction on python env creation (#1693)
467e651 build: bump amannn/action-semantic-pull-request from 5.0.1 to 5.0.2 (#1688)
302831f fix: fix pyarrow failure in deeplearning test (#1689)
e857511 fix: fix linked service on cog service base (#1685)
f29318a build: bump amannn/action-semantic-pull-request from 4 to 5.0.1 (#1680)
50ac0c8 Update reopen-issue-on-comment.yml
c9278b5 chore: fix reopen comment action
b3a9ba9 chore: fix reopen on comment workflow
9fe273b chore: fix typo in issue reopen yaml
a7c50de chore: re open github issues after a comment (#1676)
8914750 chore: clean up github workflows and add issue label remover (#1674)
965231a docs: fix multiple typos and update error hintings in ai-samples-timeseries notebook (#1663)
4fa7249 docs: improve error msg to make it clearer for users and fix typos (#1...

Assets 2

23 Aug 03:41

mhamilton723

v0.10.1

0f54bc6

v0.10.1

SynapseML v0.10.1

Bug Fixes 🐞

fix speechToTextSuite serializationFuzzing failure (#1626)
fix translator endpoint and update all endpoints for gov regions (#1623)
binder runtime issues (#1598)
clean up cluster if databricks tests pass (#1599)
fix deep-learning test flakiness (#1600)
update dotnetTestBase assembly version (#1601)
fix flaky forms test (#1584)

Build 🏭

bump EnricoMi/publish-unit-test-result-action from 1 to 2 (#1609)
bump actions/setup-node from 2 to 3 (#1610)
bump actions/setup-python from 2.3.2 to 4.2.0 (#1611)
bump actions/setup-java from 2 to 3 (#1612)
simplify e2e test pipeline with test matrix

Documentation 📘

add aisample notebooks into community folder (#1606)
add aisample time series forecasting (#1614)
fix .NET logo on website (#1604)
improve OpenAI notebook (#1596)
pin mybinder to v0.10.0 to avoid thrashing
add demo into videos on website (#1581)
update installation guidance of v0.10.0 (#1578)
add more .net samples (#1570)
add dotnet installation & example doc (#1567)
Update issue template

Features 🌈

add stale bot for issues (#1602)
Support grayscale images in toNDArray (#1592)
Add the descriptionExcludes parameter to AnalyzeImage (#1590)
Added the DeepVisionClassifier a simple API for deep transfer learning and fine-tuning of a variety of vision backbones (#1518)

Maintenance 🔧

bump to v0.10.1 (#1628)
deprecate old Text analytics APIs to prepare for refactoring (#1627)
remove deprecated lime APIs (#1620)
update openai service to the official deployment, and disable test due to outage (#1619)
Auto update GitHub actions with dependabot (#1608)
hotfix binder badge
pin binder version for users (#1607)
Bump spark to 3.2.2
bump spark version
Format welcome message with emojis (#1583)
Add welcome message to new PRs/Issues (#1573)
Add GH workflow to label new/reopened issues (#1571)
update website (#1566)

Testing 💚

stabilize unit tests (#1576)

Acknowledgements

We would like to acknowledge the developers and contributors, both internal and external who helped create this version of SynapseML.\n

Changes:

0f54bc6 chore: bump to v0.10.1 (#1628)
3d0f3f4 chore: deprecate old Text analytics APIs to prepare for refactor (#1627)
2052e13 chore: remove deprecated lime APIs (#1620)
09213b0 fix: fix speechToTextSuite serializationFuzzing failure (#1626)
9f78bf0 fix: fix translator endpoint and update all endpoints for gov regions (#1623)
7e90d19 docs: add aisample notebooks into community folder (#1606)
ac40e5a chore: update openai service to official, and disable test due to outage (#1619)
f54f7f6 docs: add aisample time series forecasting (#1614)
7b4b0e1 build: bump EnricoMi/publish-unit-test-result-action from 1 to 2 (#1609)
43b0d17 build: bump actions/setup-node from 2 to 3 (#1610)

See More

c48a07a build: bump actions/setup-python from 2.3.2 to 4.2.0 (#1611)
b1a331c build: bump actions/setup-java from 2 to 3 (#1612)
78e40cb chore: Auto update github actions with dependabot (#1608)
69d2d20 chore: hotfix binder badge
93d7ccf chore: pin binder version for users (#1607)
c7a61ec fix: binder runtime issues (#1598)
c960c06 docs: fix .NET logo on website (#1604)
28a35b4 fix: clean up cluster if databricks tests pass (#1599)
5a28740 fix: fix deep-learning test flakiness (#1600)
adf1a61 fix: update dotnetTestBase assembly version (#1601)
c659b33 feat: add stale bot for issues (#1602)
05a4202 docs: improve OpenAI notebook (#1596)
e019756 feat: Support gray scale images in toNDArray (#1592)
51beaa0 feat: Add the descriptionExcludes parameter to AnalyzeImage (#1590)
b9ac22a docs: pin mybinder to v0.10.0 to avoid thrashing
1808a0f chore: Bump spark to 3.2.2
8e7d453 build: simplify e2e test pipeline with test matrix
8e34c7b chore: bump spark version
44c8ed5 feat: Added the DeepVisionClassifier a simple API for deep transfer learning and fine-tuning of a variety of vision backbones (#1518)
e4f0883 fix: fix flaky forms test (#1584)
7da5f49 chore: Format welcome message with emojis (#1583)
0e6bb35 Serena/update issue template (#1582)
a6a2718 docs: add demo into videos on website (#1581)
7c34fc4 test: stabilize unit tests (#1576)
49f3a58 chore: Add welcome message to new PRs/Issues (#1573)
4868e8b Add back LightGBM library initialization in booster (#1575)
d427b88 docs: update installation guidance of v0.10.0 (#1578)
55a60c9 docs: add more .net samples (#1570)
39fe2d8 chore: Add GH workflow to label new/reopened issues (#1571)
0febe3c docs: add dotnet installation & example doc (#1567)
db95a10 chore: update website (#1566)

This list of changes was auto generated.

Assets 2

18 Jul 02:50

mhamilton723

v0.10.0

e9986fe

v0.10.0

Building production ready distributed machine learning pipelines can be a challenge for even the most seasoned researcher or engineer. We are excited to announce the release of SynapseML v0.10.0 (Previously MMLSpark), an open-source library that aims to simplify the creation of massively scalable machine learning pipelines. SynapseML unifies several existing ML Frameworks and new MSFT algorithms in a single, scalable API that’s usable across Python, R, Scala, Java, .NET, C#, and F#.

Highlights


OpenAI Language Models	.NET, C#, and F# Support	Full MLFlow Support	Live Demos in Browser
Embed 175-billion parameter models into your databases with ease	Use or train any SynapseML model from .NET	Quick and easy MLOps, model management, and autologging	Explore the SynapseML library with zero setup
Learn More	Getting Started Guide	Explore the Docs	Run in Browser

New Features

General ✨

SynapseML now supports .NET, C#, F#, and other .NET ecosystem languages in addition to Scala, Python, and R. Please see our Setup Guide and LightGBM from .NET example for more details. (#1539, #1156, #1443)
SynapseML is now usable from your browser with zero setup using Binder. Quickly explore our demos in Binder. (#1487, #1493)

Azure Cognitive Services for Big Data 🧠

Added OpenAI GPT-3 Sentence Completion Transformer. Use this feature to embed 175-billion parameter language models into distributed pipelines and databases to solve a variety of general purpose NLP tasks across natural language and code. (#1495, #1541)
Added an example of Sentence Completion with GPT-3 (#1564)
Added support for Form Recognizer V3.0 (#1269)
Improved MVAD usability with async training and better data validation (#1477)
Upgraded the univariate anomaly detection version to v1.1-preview (#1440)
Added a multivariate anomaly detection sample notebook (#1365)
Added a Text to Speech example to cognitive service overview (#1350)
Added opinion mining to TextSentiment Models (#1449)
Fixed Azure Maps schemas (#1553)
Removed modelID param validators in FormRecognizerV3 (#1551)
Fixed form recognizer and form ontology learner issues (#1506)
Fixed setServiceName python method in OpenAI (#1498)
Fixed error in Text Analytics Analyze schema
Improved error handling for MVAD (#1448, #1391)
Removed unused concurrency parameter for MVAD (#1383)
Improved robustness of flood risk notebook by adding polling (#1427)

Responsible AI at Scale 😇

Added partial dependence plots (PDP) to allow for understanding how independent variables affect a model's prediction (#1426)
Updated ICE/PDP documentation with PDP-based feature importance and additional examples (#1441, #1352)
Added a notebook for ICE and PDP feature explainers (#1318)
Updated data balance documentation to better describe how it can be used to ensure model fairness (#1540)

MLFlow 🔃

Added documentation for MLFlow autologging (#1508)
Added documentation on the SynapseML-MLFlow integration (#1428)

LightGBM on Spark 🌳

Added the ability to pass in generic argument strings to LightGBM enabling many complex parameterizations (#1444)
Added seed parameters to LightGBM (#1387)
Added a method to get LightGBM native model string directly (#1515)
Fixed issue with validation data creation during useSingleDataset mode (#1527)
Fixed multiclass training with initial scores (#1526)
Fixed saving LightGBM model iterations with early stopping (#1497)
Fixed issue where chunk size parameter was incorrectly specified during data copy (#1490)
Fixed issue where when empty partition is chosen as the main worker in singleDatasetMode (#1458)
Fixed bug with data repartitioning in LightGBMRanker (#1368)
Fixed outdated docs for useSingleDatasetMode (#1562)
Refactored LightGBM class structure to improve logging and debugging (#1557)

Vowpal Wabbit 🐇

Fixed issues with the saveNativeModel for the VWRegressionModel #1364 (#1366)
Fixed issues with building quadratic interaction terms (#1460)

Isolation Forests 🌲

Added an Isolation Forest Multivariate Anomaly Detection sample notebook (#1483)

Additional Updates

Maintenance 🔧

Removed unused debugging code (#1546)
Remove Synapse test exclusion for Explanation Dashboard notebook (#1531)
Made python style checks verbose (#1532)
Fixed library checking while installing library on Databricks cluster (#1488)
Upgraded and fix Dockerfiles (#1472)
Added Developer Docker Image build to pipeline (#1480)
Fixed ADO area path in Issue Linker (#1464)
Fix master version badge display
Improved Databricks error reporting
Updated azure cli to stop build errors
Fixed SSL handshake flakiness
Added itsdangerous as a dependency to ADB tests (#1412)
Turned on debug for pr to work item workflow
Pointed pr linker to official implementation
Changed GitHub action trigger from pull_request_target to pull_request (#1413)
Fixed issue where Unit Tests were not executing ([#1409](https://github.com/Microsoft/SynapseML/issu...

Contributors

riserrad, svotaw, and 24 other contributors

Assets 2

12 Jan 22:42

mhamilton723

v0.9.5

79d92d3

SynapseML v0.9.5

Building production ready distributed machine learning pipelines can be a challenge for even the most seasoned researcher or engineer. We are excited to announce the release of SynapseML v0.9.5 (Previously MMLSpark), an open-source library that aims to simplify the creation of massively scalable machine learning pipelines. SynapseML unifies several existing ML Frameworks and new MSFT algorithms in a single, scalable API that’s usable across Python, R, Scala, and Java.

Highlights


Geospatial Intelligence	Multivariate Anomaly Detection	Responsible AI at Scale	Text To Speech	Healthcare Analytics
Large-scale map and geocoding operations	Build custom time series anomaly detection systems	Distributed Conditional Expectation and Partial Dependence Analysis	East-to-use Neural Text to Speech for large datasets	Quickly understand entities and relationships in corpora of medical text.

New Features

Geospatial Intelligence 🗺️

Added support for distributed geospatial queries backed by the Azure Maps API
Added the geospatial usage overview (#1339)
Explore how to use the geospatial intelligence services to analyze flood risks. (#1339)
Added the AddressGeocoder transformer to map informal addresses to standardized adresses with latitude and longitude (#1294)
Added the ReverseGeocoder transformer to map latitude and longitude measurements to standardized addresses. (#1339)
Added the CheckPointInPolygon, to detect if latitude and longitude queries lie inside regions of interest (#1339)

Azure Cognitive Services for Big Data 🧠

Added the Healthcare Analytics Transformer for extracting medical information, entities, and relationships for text. [Example Usage] (#1329)
Added the FitMultivariateAnomaly estimator for training custom anomaly detection models on DataFrames of multivariate time series data (#1272)
Added example notebook for Multivariate Anomaly Detector
See how to train a custom Multivariate Anomaly detector in the Estimators reference docs (#1323)
Added simplified Text Analytics transformers that support auto-batching (#1329)
Added the TextToSpeech Transformer for transforming Dataframes of text to audio files with neural voice synthesis (#1320)
Added the TextAnalyze transformer to support executing multiple text analytics workloads within a single API call (#1267, #1312)

Responsible AI at Scale 😇

Added Individual Conditional Expectation explanations and Partial Dependence Plots with the ICETransformer. This tool gives detailed explanations of how features in opaque-box models affect the model prediction. (#1284)
Learn about how to use the ICETransformer through an example with the Adult Census dataset

MLFlow 🔃

Add MLFlow support for saving and loading SynapseML models (#1277)

LightGBM on Spark 🌳

Improved LightGBM training performance 4x-10x by setting num_threads to be cores-1 (#1282)
Added the predict_disable_shape_check in LightGBM (#1273)
Reduced temporary file bloat by creating the LightGBM native temp directory lazily (#1326)
Added logging for number of columns and rows when creating datasets, set useSingleDatasetMode=True by default (#1222)

Infrastructure 🏭

SynapseML now installable from Maven Central!
SynapseML now supports spark v3.2.x

Additional Updates

Bug Fixes 🐞

Allowed FlattenBatch to propagate non-array values (#1286)
Fixed flaky tests (#1342)
Fixed website bugs and migrated docSearch (#1331)
Fixed issue where IsolationForestModel does not properly exchange params with the inner model (#1330)
Corrected the objective param when using fobj (#1292)
Fixed issue where broadcasted sum in breeze 1.0 breaks in Spark 3.2.0 (#1299)
Hotfixes for R test runners (#1283)
fix installation instruction (#1268)
Removing broadcast hint (#1255)
fix install instructions (#1259)

Build 🏭

bump algoliasearch-helper from 3.6.1 to 3.6.2 in /website (#1270)
remove some deps that cause sec issues (#1264)

Documentation 📘

Fixed broken link to CyberML notebook (#1322)
Added website announcement bar (#1263)
Updated and improve readme (#1262)
Removed references to runme in contributing.md
Supported Math expressions in website markdown (#1278)
Corrected Synapse typo in website (#1335)

Maintenance 🔧

Stopped lightGBM tests from timing out (#1315)
Fixed r test flakiness (#1314)
Updated VerifyLightGBMClassifier.scala (#1313)
Update speech SDK test results
Add in missing tests in build (#1300)
Fix flaky build steps (#1298)
Fix website telemetry (#1261)
Add website telemetry (#1260)
Added missing test classes to pipeline

Contributor Spotlight

We are excited to highlight the contributions of the following SynapseML contributors:


Serena Ruan	Ilya Matiach	Sudhindra Kovalam
Serena is an engineer on the Azure Synapse team in Beijing. In this release, Serena has continued her unbelievable speed of contributions with support for Multivariate Anomaly Detection, MLFlow, and installation from Maven Central. These contributions are just a few of the many projects Serena has contributed since she joined just a few months ago!	Ilya is a prolific engineer on the Azure Machine Learning Boston team working on responsible AI. Ilya contributed LightGBM on Spark and worked tirelessly to improve and support this feature. Ilya has been an active contributor to the SynapseML project for 5 years and has built many of the tools in the library.	Sudhindra is an engineer on the Microsoft Maps team and has contributed intelligent geospatial APIs to SynapseML v0.9.5. Sudhindra developed new ways to automate generation of Spa...

Contributors

nhymxu, martin0258, and 19 other contributors

Assets 2

16 Nov 05:19

mhamilton723

v0.9.4

e6da4d5

SynapseML v0.9.4

Building production ready distributed machine learning pipelines can be a challenge for even the most seasoned researcher or engineer. We are excited to announce the release of SynapseML (Previously MMLSpark), an open-source library that aims to simplify the creation of massively scalable machine learning pipelines. SynapseML unifies several existing ML Frameworks and new MSFT algorithms in a single, scalable API that’s usable across Python, R, Scala, and Java.

Highlights


General Availability on Synapse	ONNX on Spark	Responsible AI	Form Recognition and Translation	Reinforcement Learning
We are ready to help you productionalize on Azure Synapse Analytics	Distributed and hardware accelerated model inference on Spark	Understand opaque-box models, measure dataset biases, Explainable Boosting Machines	Parse PDFs and translate dataframes between over 100 languages	Contextual Bandit Reinforcement Learning with Vowpal Wabbit

New Features

General ✨

Renamed and rebranded! Microsoft ML for Apache Spark is now SynapseML
New modular library sub-packages for standalone install of each major set of features
Support Spark 3.1.2 and Scala 2.12
Support pip install synapseml for python bindings

ONNX on Spark 🕸

ONNX model inference on Spark (#1152)
Add documention and notebooks for ONNXModel evaluation (#1164)

Cognitive Services for Big Data🧠

Added Multilingual Translation APIs (#1108) (Tutorial)
Added FormRecognition APIs (Invoice, IDs, BusinessCards, Layouts, Custom Models) (#1099) (Tutorial)
Added the FormOntologyLearner to extract meaningful "ontologies" of objects from collections of forms
Add notebook to Create a Multilingual Search Engine from Forms
Updated Text Analytics API to V3.1 (#1193)
Add redactedText to PIIV3 (#1247)
Added Personally Identifying Information (PII) identification
Added Read API
Added Conversation Transcription API
Cognitive service now support data exfiltration protected (DEP) VNET allowing for individualized security solutions on Synapse Analytics (Learn More)
Added support for the m4a codec in Speech to Text models
Added predictive maintenance notebook
Added Cognitive Service overview notebook
Added support for linked service authentication in Synapse Analytics
Simple no-code support in in Synapse Analytics

Responsible AI at Scale 😇

Added Additive Shapley Explanations (SHAP) for understanding the predictions of opaque-box models (#1077)
New API for Locally Interpretable Model-Agnostic Explanations (LIME), now supports background distributions text models, and has the same API as SHAP (#1077)
Added Measure transformers for Data Balance Analysis (#1218)
Add more notebook samples for documentation (#1043)
Documentation and notebooks for Interpretability on Spark
Introduce Responsible AI section on website (Interpretability + DataBalanceAnalysis) (#1241)
Adding document and notebook for Data Balance Analysis (#1226)
Explainable Boosting Machines for performant and interpretable ML (Private preview on Synapse Analytics only)

Vowpal Wabbit 🐇

Added ContextualBandit reinforcement learning (#896)
Added Vowpal Wabbit Overview Notebook

LightGBM 🌳

Added matrix type parameter and improve logic to automatically infer dataset sparsity (#1052)
Added several parameters related to dart boosting type (#1045)
Added chunk size parameter for copying java data to native (#1041)
Added number of threads parameter (#1055)
Added custom objective function to LightGBM learners (#1054)
Added singleton dataset mode for faster performance and reduced memory usage (#1066)
Add num iteration and start iteration parameters to LightGBM model (#1024)
Added the average precision metric (#1034)
Added overview notebook for LightGBM
Moved to new streaming API for dense data to reduce memory usage
Tuned chinking code for faster performance

Build and Infrastructure Improvements 🏭

New Docusaurus website generation system
E2E Tests on Synapse Analytics (#1014)
Split library into separately installable subprojects (#1073)
Added a unified logging and telemetry system (#1019)
Modernized R wrapper generation
New Automated Python test generation (#998)
New extensible code generation system
New two-tiered security for build secrets
Update ubuntu version to 18.04
Automated back-up ACR images

Additional Updates

Bug Fixes 🐞

Enable backwards compatibility for mmlspark python namespace imports (#1244)
Fix publishing to maven and pypi (#1242)
Fix broken link to notebook in Data Balance Analysis doc (#1240)
min_data_in_leaf missing from dataset parameters in lightgbm (#1239)
Fix performance issue in interpretability notebooks (#1238)
Fixed cognitive service errors (#1176)
Fixed flaky tests
Rename NERPii to PII
Fixed cog service test flakes
Fixed setLinkedService issues in Synapse (#1177)
Improved LGBM error message for invalid slot names (#1160)
Fixed generated python code (#1121)
Updated notebookUtils class path (#1118)
Fixed LIME NaN weight output (#1117, #1112)
Fixed Guava version issue in Azure Synapse and Databricks (#1103)
Fixed flakiness in spark session stopping
Fixed result parsing for forms
Fixed explainers returning wrong results when targetClassesCol is specified
Fixed CNTKModel issue due to catalyst bug on databricks (#1076)
Fixed null handling in bing image response (#1067)
Avoided strange issue with databricks json parser
Fixed dependency exclusions and build secret querying
Fixed issue in tabular lime sampler (#1058)
Updated Bing search URLs (#1048)
Refactored python wrappers to use common class (#758)
Updated java params patch (#1027)
Added missing returns in new python lightGBM model methods
Stop R binding generation from failing silently
Fixed conversation transcription participant column functionality
Reduce verbosity to...

Assets 2

03 Nov 03:11

mhamilton723

v0.9.2

81f5f80

SynapseML v0.9.2

v0.9.2

Bug Fixes 🐞

fix publish to central maven (#1233)
fix website (#1234)
fix typo in sbt install
lightgbm default params should not be specified if optional (#1232)
fix website broken links (#1230)
improve azure search writer error message in Array[Array[]] case
update baseUrl and fix static images (#1217)
Fixing flaky unit tests (#1215)
Docker image should install openjdk-8-jre as opposed to default-… (#1211)
Fixing flaky test

Documentation 📘

add explanation dashboard integration example notebook (#1236)
fix links to developer readme and R setup (#1229)

Feat

Build our new website (#1190)

Features 🌈

support direct pip install (#1223)
Measure transformers for Data Balance Analysis (#1218)
Add the FormOntologyLearner

Maintenance 🔧

release synapseml 0.9.2 (#1237)

Performance Improvements 🚀

website enhancement (#1221)

Acknowledgements

We would like to acknowledge the developers and contributors, both internal and external who helped create this version of SynapseML.\n

Changes:

81f5f80 chore: release synapseml 0.9.2 (#1237)
127c70a docs: add explanation dashboard integration example notebook (#1236)
9b9c2fb fix: fix publish to central maven (#1233)
7059573 fix: fix website (#1234)
d47f014 fix: fix typo in sbt install
336eff5 fix: lightgbm default params should not be specified if optional (#1232)
3d92dd7 feat: support direct pip install (#1223)
2771853 docs: fix links to developer readme and R setup (#1229)
ea91189 fix: fix website broken links (#1230)
bbd8744 perf: website enhancement (#1221)

See More

c5e1742 feat: Measure transformers for Data Balance Analysis (#1218)
73c6a65 fix: improve azure search writer error message in Array[Array[]] case
d8344c5 feat: Add the FormOntologyLearner
2d81b50 fix: update baseUrl and fix static images (#1217)
e23041f fix: Fixing flaky unit tests (#1215)
5d31e3e fix: Docker image should install openjdk-8-jre as opposed to default-… (#1211)
9623b3e Feat: Build our new website (#1190)
3f74133 fix: Fixing flaky test

This list of changes was auto generated.

Assets 2

15 Oct 20:14

mhamilton723

v0.9.1

6b81426

SynapseML v0.9.1

v0.9.1

Bug Fixes 🐞

fix readme badge

Maintenance 🔧

Bump version to 0.9.1

Acknowledgements

We would like to acknowledge the developers and contributors, both internal and external who helped create this version of SynapseML.\n

Changes:

6b81426 chore: Bump version to 0.9.1
274b110 fix:fix doc publishing
600bc6e fix: fix readme badge

This list of changes was auto generated.

Assets 2

15 Oct 05:01

mhamilton723

v0.9.0

a6c7fea

SynapseML v0.9.0

v0.9.0

Bug Fixes 🐞

don't crash on fallback storage location (#1183)

Chore

rename mmlspark to synapseml (#1204)

Features 🌈

updata versions in README.md (#1205)

Maintenance 🔧

release synapseml 0.9.0 (#1206)

Acknowledgements

We would like to acknowledge the developers and contributors, both internal and external who helped create this version of SynapseML.\n

Changes:

a6c7fea chore: release synapseml 0.9.0 (#1206)
383cb95 Chore: rename mmlspark to synapseml (#1204)
ecc6868 fix: don't crash on fallback storage location (#1183)
661e3e5 feat: updata versions in README.md (#1205)

This list of changes was auto generated.

Assets 2

Releases: microsoft/SynapseML

SynapseML v0.11.1

SynapseML v0.11.1

Bug Fixes 🐞

Build 🏭

Documentation 📘

Features 🌈

Maintenance 🔧

Changes:

SynapseML v0.11.0

Highlights

New Features

General ✨

Open AI 🤖

Deep Learning 🕸

Azure Cognitive Services for Big Data 🧠

Causal Learning 📈

LightGBM 🌳

Vowpal Wabbit 🐇

Additional Updates

Bug Fixes 🐞

Contributors

SynapseML v0.10.2

v0.10.2

Bug Fixes 🐞

Build 🏭

Documentation 📘

Features 🌈

Maintenance 🔧

Testing 💚

Acknowledgements

Changes:

v0.10.1

SynapseML v0.10.1

Bug Fixes 🐞

Build 🏭

Documentation 📘

Features 🌈

Maintenance 🔧

Testing 💚

Acknowledgements

Changes:

v0.10.0

Highlights

New Features

General ✨

Azure Cognitive Services for Big Data 🧠

Responsible AI at Scale 😇

MLFlow 🔃

LightGBM on Spark 🌳

Vowpal Wabbit 🐇

Isolation Forests 🌲

Additional Updates

Maintenance 🔧

Contributors

SynapseML v0.9.5

Highlights

New Features

Geospatial Intelligence 🗺️

Azure Cognitive Services for Big Data 🧠

Responsible AI at Scale 😇

MLFlow 🔃

LightGBM on Spark 🌳

Infrastructure 🏭

Additional Updates

Bug Fixes 🐞

Build 🏭

Documentation 📘

Maintenance 🔧

Contributor Spotlight

Contributors

SynapseML v0.9.4

Highlights

New Features

General ✨

ONNX on Spark 🕸

Cognitive Services for Big Data🧠

Responsible AI at Scale 😇

Vowpal Wabbit 🐇

LightGBM 🌳