Skip to content
This repository has been archived by the owner on Nov 17, 2023. It is now read-only.

Releases: apache/incubator-datalab

Datalab 2.6.0 Release

30 Nov 10:51
Compare
Choose a tag to compare

DataLab is Self-service, Fail-safe Exploratory Environment for Collaborative Data Science Workflow

New features in v2.6.0

All cloud platforms:

  • Implemented Images page, where DataLab users may view and manage custom images.
    • Now DataLab user can:
      • View the list of own and shared custom images;
      • View additional info about custom image;
      • View activities on sharing/stopping sharing custom images;
      • Share custom images with selected users or user groups;
      • Stop sharing custom images with selected users or user groups;
      • Terminate custom images.
    • DataLab administrators can grant permission to user for:
      • Sharing own custom images;
      • Terminating own custom images;
      • Notebook creation based on own custom images;
      • Notebook creation based on shared custom images.
  • Updated versions of installed software:
    • Jupyter notebook v.6.4.12;
    • JupyterLab notebook v.3.4.3;
    • Superset notebook v.1.5.1;
    • TensorFlow notebook v.2.9.1;
    • RStudio notebook v.2022.02.2-485;
    • Angular v.11.2.14;
    • Keycloak v.18.0.1.
  • Added the possibility to connect a new data platform to DataLab account.
    • Now DataLab users can connect MLflow platform.

AWS:

  • Added a new template JupyterLab with TensorFlow.

Azure:

  • Added support of Data Engine Service (HDInsight) for RStudio, Jupyter & Apache Zeppelin notebooks.

Improvements in v2.6.0

All cloud platforms:

  • Added ability to start user's notebook from administrative panel;
  • Improved filter function by creating a separate “Filter” action button (in this release available only for Image page, to be updated on other pages in the next releases);
  • Added minor improvements for About & Help sections to make user experience easier and more intuitive.

Bug fixes in v2.6.0

All cloud platforms:

  • Fixed a bug when project creation was allowed after total quota exceeded;
  • Fixed a bug when Ungit link leaded to 502 error for DeepLearning notebook;
  • Fixed wrong date & time data of uploaded objects in the bucket browser;
  • Fixed a bug when R package was absent for installation from DataLab WEB UI for RStudio & Apache Zeppelin notebooks;

AWS:

  • Fixed a bug when EMR creation failed on RStudio & Apache Zeppelin notebooks.

GCP:

  • Fixed a bug when Superset creation failed during configuration;
  • Fixed a bug when Dataproc creation failed on RStudio & Apache Zeppelin notebooks;
  • Fixed a bug when billing data were absent for Compute;
  • Fixed a bug when DataLab deployment was unsuccessful in existing VPC or subnet.

Known issues in v2.6.0

GCP:

  • SSO is not available for Superset.

Refer to the following link in order to view other major/minor issues in v2.6.0

Apache DataLab: Known issues

Known issues caused by cloud provider limitations in v2.6.0

Microsoft Azure:

  • Resource name length should not exceed 80 chars.

GCP:

  • Resource name length should not exceed 64 chars.

NOTE: the DataLab has not been tested for RedHat Enterprise Linux.

Datalab 2.5.1 Release

27 Jan 09:21
Compare
Choose a tag to compare

DataLab is Self-service, Fail-safe Exploratory Environment for Collaborative Data Science Workflow

Improvements in v2.5.1

All cloud platforms:

  • Added GPU filter type and count for easier environment management by administrator;
  • Added explanation for open terminal action on Audit page.

Bug fixes in v2.5.1

Azure:

  • Fixed a bug when instance creation failed on stage of devtools installation;
  • Fixed a bug when Apache Zeppelin notebook creation failed during shell interpreter configuration;
  • Fixed a bug when edge node status on WEB DataLab UI was not synced up with Cloud instance status;
  • Fixed a bug when DeepLearning creation failed due to wrong path to connector;
  • Fixed a bug when not all billing drop down values were visible;
  • Fixed minor UI issues which were reproduced only for smaller desktop size;
  • Fixed a bug when connection for Jupyter R/Scala kernels were unsuccessful;
  • Fixed a bug when Data Engine creation failed on Jupyter/RStudio/Apache Zeppelin notebooks;
  • Fixed a bug when very often notebook/Data Engine creation and stopping failed due to low level socket;
  • Fixed a bug when Jupyter/RStudio/DeepLearning notebooks creation failed from image;
  • Fixed a bug when SSN/any type of notebook creation was not always successful from the first attempt.

Azure and GCP:

  • Fixed a bug when time to time DeepLearning notebook creation failed on stage of nvidia installation;
  • Fixed a bug when sometimes any type of notebook creation failed during disk mount.

Known issues in v2.5.1

GCP:

  • Superset creation fails during configuration;
  • SSO is not available for Superset.

Microsoft Azure:

  • Notebook WEB terminal does not work for remote endpoint.

Refer to the following link in order to view other major/minor issues in v2.5.1

Apache DataLab: Known issues

Known issues caused by cloud provider limitations in v2.5.1

Microsoft Azure:

  • Resource name length should not exceed 80 chars.

GCP:

  • Resource name length should not exceed 64 chars.

NOTE: the DataLab has not been tested for RedHat Enterprise Linux.

Datalab 2.5.0 Release

16 Jul 07:37
Compare
Choose a tag to compare

DataLab is Self-service, Fail-safe Exploratory Environment for Collaborative Data Science Workflow

New features in v2.5.0

All Cloud platforms:

  • Implemented Configuration page. Now DataLab administrators can view, edit configuration files and restart the services;
  • Implemented localization. Now the DataLab UI is automatically updated to a specific location (e.g., date and time format, currency view etc.);
  • Updated versions of installed software:
    • Ubuntu v.20.04;
    • Python v.3.x;
    • Pip v.21.0.1;
    • R v.4.1.0;
    • Angular v.10.2.2;
    • Jupyter notebook v.6.1.6;
    • RStudio notebook v.1.4.1103;
    • Apache Zeppelin notebook v.0.9.0;
    • TensorFlow notebook v.2.5.0;
    • Apache Spark v.3.0.1;
    • Ungit v.1.5.15.

AWS:

  • Added support of new version of Data Engine Service (EMR) v.6.2.0.

GCP:

  • Added support of new version of Data Engine Service (Dataproc) v.2.0.0-RC22-ubuntu18.

Improvements in v2.5.0

All Cloud platforms:

  • Added DeepLearning notebook creation based on Cloud native image;
  • Added specific python versions via virtual environments for all notebooks and compute resources (except Data Engine Service and DeepLearning).

GCP:

  • Added optional notebook/compute creation with GPU for Jupyter notebook, Data Engine Service (Dataproc) and Data Engine (standalone cluster).

Bug fixes in v2.5.0

All Cloud platforms:

  • Fixed a bug when instance status on the DataLab WEB UI was not synched up with Cloud instance status after provisioning restart;
  • Fixed a bug when Spark executor memory was not allocated in depends on notebook instance shape;
  • Fixed a bug when reminder about notebook stopping continued to show up after scheduler had been triggered;
  • Fixed a bug when library status did not change on the DataLab WEB UI in case of unknown library name installation.

Known issues in v2.5.0

GCP:

  • Superset creation fails during configuration;
  • SSO is not available for Superset.

Microsoft Azure:

  • Notebook WEB terminal does not work for remote endpoint.

Refer to the following link in order to view other major/minor issues in v2.5.0:

Apache DataLab: Known issues

Known issues caused by cloud provider limitations in v2.5.0

Microsoft Azure:

  • Resource name length should not exceed 80 chars;

GCP:

  • Resource name length should not exceed 64 chars;

NOTE: the DataLab has not been tested on GCP for RedHat Enterprise Linux.

DLab 2.4.0 Release

17 Sep 10:01
78e5247
Compare
Choose a tag to compare

DLab is Self-service, Fail-safe Exploratory Environment for Collaborative Data Science Workflow

New features in v2.4.0

All Cloud platforms:

  • Implemented bucket browser. Now user is able to manage Cloud data source by means of accessing Cloud Blob Storage from DLab Web UI;
  • Added support of audit. Now DLab administrators can view history of all actions;
  • Updated versions of installed software:
    • Ubuntu v.18.04;
    • TensorFlow notebook v.2.1.0;
    • MongoDB v.4.2.

AWS:

  • Added support of new version of Data Engine Service (EMR) v.5.30.0 and v.6.0.0.

Improvements in v2.4.0

All Cloud platforms:

  • Added support of connection via Livy and SparkMagic for Jupyter and RStudio notebooks;
  • Added ability to select multiple resources on 'Environment management' to make user experience easier and more intuitive;
  • Added support to install libraries of particular version from DLab Web UI. Also, now user is able to update/downgrade library via Web UI;
  • Extended billing functionality introducing new entity - monthly project quota(s);
  • Added notifications for cases when project quota is exceeded;
  • Conveyed analytical environment URL's to DLab administration page.

GCP:

  • Added possibility to create custom image for notebook.

Bug fixes in v2.4.0

All Cloud platforms:

  • Fixed a bug when administrative permissions disappeared after endpoint connectivity issues;
  • Fixed a bug when all resources disappeared in 'List of resources' page after endpoint connectivity issues;
  • Fixed a bug when administrative role could not be edited for already existing group;
  • Fixed a bug when billing report was not populated in Safari;
  • Fixed a bug with discrepancies in detailed billing and in-grid billing report.

GCP:

  • Fixed a bug when billing was not correctly updated for period overlapping two calendar years;

Microsoft Azure:

  • Fixed a rare bug when notebooks or SSN were not always created successfully from the first attempt.

Known issues in v2.4.0

GCP:

  • SSO is not available for Superset.

Microsoft Azure:

  • Notebook creation fails on RedHat;
  • Web terminal is not working for Notebooks only for remote endpoint.

Refer to the following link in order to view the other major/minor issues in v2.4.0:

Apache DLab: Known issues

Known issues caused by cloud provider limitations in v2.4.0

Microsoft Azure:

  • Resource name length should not exceed 80 chars;
  • TensorFlow templates are not supported for RedHat Enterprise Linux;
  • Low priority Virtual Machines are not supported yet.

GCP:

  • Resource name length should not exceed 64 chars;
  • NOTE: DLab has not been tested on GCP for RedHat Enterprise Linux.

DLab 2.3.0 Release

27 Apr 08:09
Compare
Choose a tag to compare

DLab is Self-service, Fail-safe Exploratory Environment for Collaborative Data Science Workflow

New features in v2.3.0

All Cloud platforms:

  • Added support for multi-Cloud orchestration for AWS, Azure and GCP. Now, a single DLab instance can connect to the above Clouds, by means of respective set of API's, deployed on cloud endpoints;
  • Added JupyterLab v.0.35.6 template
  • Updated versions of installed software:
    • Jupyter notebook v.6.0.2;
    • Apache Zeppelin v.0.8.2;
    • RStudio v.1.2.5033;
    • Apache Spark v.2.4.4 for standalone cluster;

AWS:

  • Added support of new version of Data Engine Service (EMR) v.5.28.0;

GCP:

  • Added support of new version of Data Engine Service (Dataproc) v.1.4;
  • Added new template Superset v.0.35.1;

Improvements in v2.3.0

All Cloud platforms:

  • Grouped project management actions in single Edit project menu for ease of use;
  • Introduced new "project admin" role;
  • SSO now also works for Notebooks;
  • Implemented ability to filter installed libraries;
  • Added possibility to sort by project/user/charges in 'Billing report' page;
  • Added test option for remote endpoint;

Bug fixes in v2.3.0

All Cloud platforms:

  • Fixed a bug when Notebook name should be unique per project for different users, since it was impossible to operate Notebook with the same name after the first instance creation;
  • Fixed a bug when administrator could not stop/terminate Notebook/computational resources created by another user;
  • Fixed a bug when shell interpreter was not showing up for Apache Zeppelin;
  • Fixed a bug when scheduler by start time was not triggered for Data Engine;
  • Fixed a bug when it was possible to start Notebook if project quota was exceeded;
  • Fixed a bug when scheduler for stopping was not triggered after total quota depletion;

AWS:

  • Fixed a bug when Notebook image/snapshot were still available after SSN termination;

Microsoft Azure:

  • Fixed a bug when custom image creation from Notebook failed and deleted the existing Notebook of another user;
  • Fixed a bug when detailed billing was not available;
  • Fixed a bug when spark reconfiguration failed on Data Engine;
  • Fixed a bug when billing data was not available after calendar filter usage;

Known issues in v2.3.0

GCP:

  • SSO is not available for Superset;

Microsoft Azure:

  • Notebook creation fails on RedHat;
  • Web terminal is not working for Notebooks only for remote endpoint;

Refer to the following link in order to view the other major/minor issues in v2.3.0:

Apache DLab: known issues

Known issues caused by cloud provider limitations in v2.3.0

Microsoft Azure:

  • Resource name length should not exceed 80 chars;
  • TensorFlow templates are not supported for RedHat Enterprise Linux;
  • Low priority Virtual Machines are not supported yet;

GCP:

  • Resource name length should not exceed 64 chars;
  • NOTE: DLab has not been tested on GCP for RedHat Enterprise Linux;

DLab 2.2 Release

25 Nov 14:59
Compare
Choose a tag to compare

DLab is Self-service, Fail-safe Exploratory Environment for Collaborative Data Science Workflow

New features in v2.2

All Cloud platforms:

  • added concept of Projects into DLab. Now users can unite under Projects and collaborate
  • for ease of use we've added web terminal for all DLab Notebooks
  • updated versions of installed software:
    • angular 8.2.7

GCP:

  • added billing report to monitor Cloud resources usage into DLab, including ability to manage billing quotas
  • updated versions of installed software:
    • Dataproc 1.3

Improvements in v2.2

All Cloud platforms:

  • implemented login via KeyCloak to support integration with multiple SAML and OAUTH2 identity providers
  • added DLab version into WebUI
  • augmented ‘Environment management’ page
  • added possibility to tag Notebook from UI
  • added possibility to terminate computational resources via scheduler

GCP:

  • added possibility to create Notebook/Data Engine from an AMI image

AWS and GCP:

  • UnGit tool now allows working with remote repositories over ssh
  • implemented possibility to view Data Engine Service version on UI after creation

Bug fixes in v2.2

All Cloud platforms:

  • fixed sparklyr library (r package) installation on RStudio, RStudio with TensorFlow notebooks

GCP:

  • fixed a bug when Data Engine creation fails for DeepLearning template
  • fixed a bug when Jupyter does not start successfully after Data Engine Service creation (create Jupyter -> create Data Engine -> stop Jupyter -> Jupyter fails)
  • fixed a bug when DeepLearning creation was failing

Known issues in v2.2

All Cloud platforms:

  • Notebook name should be unique per project for different users in another case it is impossible to operate Notebook with the same name after the first instance creation

Microsoft Azure:

  • DLab deployment is unavailable if Data Lake is enabled
  • custom image creation from Notebook fails and deletes existed Notebook

Refer to the following link in order to view the other major/minor issues in v2.2:

Apache DLab: known issues

Known issues caused by cloud provider limitations in v2.2

Microsoft Azure:

  • resource name length should not exceed 80 chars
  • TensorFlow templates are not supported for Red Hat Enterprise Linux
  • low priority Virtual Machines are not supported yet

GCP:

  • resource name length should not exceed 64 chars
  • billing data is not available
  • NOTE: DLab has not been tested on GCP for Red Hat Enterprise Linux

DLab 2.1 Release

15 Apr 14:58
Compare
Choose a tag to compare

DLab is Self-service, Fail-safe Exploratory Environment for Collaborative Data Science Workflow

New features in v2.1

All Cloud platforms:

  • implemented tuning Apache Spark standalone cluster and local spark configurations from WEB UI (except for Apache Zeppelin)
  • added a reminder after user logged in notifying that corresponding resources are about to be stopped/terminated
  • implemented SSN load monitor: CPU, Memory, HDD
  • updated versions of installed software:
    • Jupyter 5.7.4
    • RStudio 1.1.463
    • Apache Zeppelin 0.8.0
    • Apache Spark 2.3.2 for standalone cluster
    • Scala 2.12.8
    • CNTK 2.3.1
    • Keras 2.1.6 (except for DeepLearning - 2.0.8)
    • MXNET 1.3.1
    • Theano 1.0.3
    • ungit 1.4.36

AWS:

  • implemented tuning Data Engine Service from WEB UI (except for Apache Zeppelin)
  • added support of new version of Data Engine Service (AWS EMR) 5.19.0

MS azure and AWS:

  • implemented ability to manage total billing quota for DLab as well as billing quota per user

Improvements in v2.1

All Cloud platforms:

  • added ability to configure instance size/shape (CPU, RAM) from DLab UI for different user groups
  • added possibility to install Java dependencies from DLab UI
  • added alternative way to access analytical notebooks just by clicking on notebook's direct URL.
    • added LDAP authorization in Squid (user should provide his LDAP credentials when accessing notebooks/Data Engine/Data Engine Service via browser)
  • improved error handling for various scenarios on UI side
  • added support of installing DLab into two VPCs

MS Azure:

  • it is now possible to install DLab only with private IP’s

Bug fixes in v2.1

AWS:

  • fixed pricing retrieval logic to optimize RAM usage on SSN for small instances
    GCP:
  • fixed a bug when DeepLearning creation was failing
  • fixed a bug which caused shared bucket to be deleted in case Edge node creation failed for new users

Known issues in v2.1

All Cloud platforms:

  • remote kernel list for Data Engine is not updated after stop/start Data Engine
  • following links can be opened via tunnel for Data Engine/Data Engine: service: worker/application ID, application detail UI, event timeline, logs for Data Engine
  • if Apache Zeppelin is created from AMI with different instance shape, spark memory size is the same as in created AMI.
  • sparklyr library (r package) can not be installed on RStudio, RStudio with TensorFlow notebooks
  • Spark default configuration for Apache Zeppelin can not be changed from DLab UI. Currently it can be done directly through Apache Zeppelin interpreter menu.
    For more details please refer for Apache Zeppelin official documentation: https://zeppelin.apache.org/docs/0.8.0/usage/interpreter/overview.html
  • shell interpreter for Apache Zeppelin is missed for some instance shapes
  • executor memory is not allocated depending on notebook instance shape for local spark

AWS

  • can not open master application URL on resource manager page, issue known for Data Engine Service v.5.12.0
  • java library installation fails on DLab UI on Data Engine Service in case when it is installed together with libraries from other groups.

GCP:

  • storage permissions aren't differentiated by users via Dataproc permissions (all users have R/W access to other users buckets)
  • Data Engine Service creation is failing after environment has been recreated
  • It is temporarily not possible to run playbooks using remote kernel of Data Engine (dependencies issue)
  • Data Engine creation fails for DeepLearning template
  • Jupyter does not start successfully after Data Engine Service creation (create Jupyter -> create Data Engine -> stop Jupyter -> Jupyter fails)

Microsoft Azure:

  • creation of Zeppelin or RStudio from custom image fails on the step when cluster kernels are removing
  • start Notebook by scheduler does not work when Data Lake is enabled
  • playbook running on Apache Zeppelin fails due to impossible connection to blob via wasbs protocol

Known issues caused by cloud provider limitations in v2.1

Microsoft Azure:

  • resource name length should not exceed 80 chars
  • TensorFlow templates are not supported for Red Hat Enterprise Linux
  • low priority Virtual Machines are not supported yet
  • occasionally billing data is not available for Notebook secondary disk

GCP:

  • resource name length should not exceed 64 chars
  • billing data is not available
  • NOTE: DLab has not been tested on GCP for Red Hat Enterprise Linux