The Oracle Accelerated Data Science (ADS) SDK is maintained by the Oracle Cloud Infrastructure (OCI) Data Science service team. It speeds up common data science activities by providing tools that automate and simplify common data science tasks. Additionally, provides data scientists a friendly pythonic interface to OCI services. Some of the more notable services are OCI Data Science, Model Catalog, Model Deployment, Jobs, Data Flow, Object Storage, Vault, Big Data Service, Data Catalog, and the Autonomous Database. ADS gives you an interface to manage the life cycle of machine learning models, from data acquisition to model evaluation, interpretation, and model deployment.
With ADS you can:
- Read datasets from Oracle Object Storage, Oracle RDBMS (ATP/ADW/On-prem), AWS S3 and other sources into
Pandas dataframes
. - Use feature types to characterize your data, create meaning summary statistics and plot. Use the warning and validation system to test the quality of your data.
- Tune models using hyperparameter optimization with the
ADSTuner
tool. - Generate detailed evaluation reports of your model candidates with the
ADSEvaluator
module. - Save machine learning models to the OCI Data Science Model Catalog.
- Deploy models as HTTP endpoints with Model Deployment.
- Launch distributed ETL, data processing, and model training jobs in Spark with OCI Data Flow.
- Train machine learning models in OCI Data Science Jobs.
- Manage the life cycle of conda environments through the
ads conda
command line interface (CLI).
You have various options when installing ADS.
$ python3 -m pip install oracle-ads
The all-optional
module will install all optional dependencies.
$ python3 -m pip install oracle-ads[all-optional]
To work with gradient boosting models, install the boosted
module. This module includes XGBoost and LightGBM model classes.
$ python3 -m pip install oracle-ads[boosted]
For big data use cases using Oracle Big Data Service (BDS), install the bds
module. It includes the following libraries, ibis-framework[impala]
, hdfs[kerberos]
and sqlalchemy
.
$ python3 -m pip install oracle-ads[bds]
To work with a broad set of data formats (for example, Excel, Avro, etc.) install the data
module. It includes the fastavro
, openpyxl
, pandavro
, asteval
, datefinder
, htmllistparse
, and sqlalchemy
libraries.
$ python3 -m pip install oracle-ads[data]
To work with geospatial data install the geo
module. It includes the geopandas
and libraries from the viz
module.
$ python3 -m pip install oracle-ads[geo]
Install the notebook
module to use ADS within a OCI Data Science service notebook session. This module installs ipywidgets
and ipython
libraries.
$ python3 -m pip install oracle-ads[notebook]
To work with ONNX-compatible run times and libraries designed to maximize performance and model portability, install the onnx
module. It includes the following libraries, onnx
, onnxruntime
, onnxmltools
, skl2onnx
, xgboost
, lightgbm
and libraries from the viz
module.
$ python3 -m pip install oracle-ads[onnx]
For infrastructure tasks, install the opctl
module. It includes the following libraries, oci-cli
, docker
, conda-pack
, nbconvert
, nbformat
, and inflection
.
$ python3 -m pip install oracle-ads[opctl]
For hyperparameter optimization tasks install the optuna
module. It includes the optuna
and libraries from the viz
module.
$ python3 -m pip install oracle-ads[optuna]
Install the tensorflow
module to include tensorflow
and libraries from the viz
module.
$ python3 -m pip install oracle-ads[tensorflow]
For text related tasks, install the text
module. This will include the wordcloud
, spacy
libraries.
$ python3 -m pip install oracle-ads[text]
Install the torch
module to include pytorch
and libraries from the viz
module.
$ python3 -m pip install oracle-ads[torch]
Install the viz
module to include libraries for visualization tasks. Some of the key packages are bokeh
, folium
, seaborn
and related packages.
$ python3 -m pip install oracle-ads[viz]
Note
Multiple extra dependencies can be installed together. For example:
$ python3 -m pip install oracle-ads[notebook,viz,text]
- Oracle Accelerated Data Science SDK (ADS) Documentation
- OCI Data Science and AI services Examples
- Oracle AI & Data Science Blog
- OCI Documentation
import ads
from ads.common.auth import default_signer
import oci
import pandas as pd
ads.set_auth(auth="api_key", oci_config_location=oci.config.DEFAULT_LOCATION, profile="DEFAULT")
bucket_name = <bucket_name>
key = <key>
namespace = <namespace>
df = pd.read_csv(f"oci://{bucket_name}@{namespace}/{key}", storage_options=default_signer())
This example uses SQL injection safe binding variables.
import ads
import pandas as pd
connection_parameters = {
"user_name": "<user_name>",
"password": "<password>",
"service_name": "<tns_name>",
"wallet_location": "<file_path>",
}
df = pd.DataFrame.ads.read_sql(
"""
SELECT *
FROM SH.SALES
WHERE ROWNUM <= :max_rows
""",
bind_variables={ max_rows : 100 },
connection_parameters=connection_parameters,
)
This project welcomes contributions from the community. Before submitting a pull request, please review our contribution guide CONTRIBUTING.md.
Find Getting Started instructions for developers in README-development.md
Consult the security guide SECURITY.md for our responsible security vulnerability disclosure process.
Copyright (c) 2020, 2022 Oracle and/or its affiliates. Licensed under the Universal Permissive License v1.0