Give us a star if you appreciate what we do
What is Hopsworks?
Quick Start
Development and Operational ML on Hopsworks
Docs
Who’s behind Hopsworks?
Open-Source
Join the community
Contribute
Hopsworks and its Feature Store are an open source data-intensive AI platform used for the development and operation of machine learning models at scale.
bash <(curl -s https://repo.hops.works/installer/latest/hopsworks-installer.sh)
Recommended minimum specification.
The Hopsworks Installer takes roughly 1-2 hrs to complete, depending on your bandwidth.
bash <(curl -s https://repo.hops.works/installer/latest/hopsworks-cloud-installer.sh)
If you have the Azure or GCP CLI utilities installed (on a Linux machine), then the Hopsworks Cloud Installer (Hopsworks-cloud-installer.sh) will both install Hopsworks and provision the VMs in one command.
To work with the Hopsworks IDE plugin for IntelliJ/PyCharm, you can install it directly from the plugins menu of the IDE or clone it and follow the README.
mvn install
Maven uses yeoman-maven-plugin to build both the front-end and the backend. Maven first executes the Gruntfile in the yo directory, then builds the back-end in Java. The yeoman-maven-plugin copies the dist folder produced by grunt from the yo directory to the target folder of the backend.
You can also build Hopsworks without the frontend (for Java EE development and testing):
mvn install -P-web
You can develop and run Python, Spark, and Flink applications on Hopsworks - in Jupyter notebooks, as jobs, or even notebooks as jobs. You can build production pipelines with the bundled Airflow, and even run ML training pipelines with GPUs in notebooks on Airflow. You can train models on as many GPUs as are installed in a Hopsworks cluster and easily share them among users.
Hopsworks documentation includes a user-guide, Feature Store documentation, and an Administrator Guide. There is also dedicated documentation for the Hopsworks Feature Store.
Hopsworks REST API is documented with Swagger and hosted by SwaggerHub.
- hopsworks-api - https://app.swaggerhub.com/apis-docs/logicalclocks/hopsworks-api
- hopsworks-ca - https://app.swaggerhub.com/apis-docs/logicalclocks/hopsworks-ca
To build and deploy swagger on your own Hopsworks instance you can follow the instructions found in this guide.
Hopsworks started as an open-source collaborative project at KTH University, RISE, and has more recently been taken on by Logical Clocks. Several funding bodies have helped contribute to its development including: European Commission (FP7, H2020), EIT, SSF, Vinnova and Celtic-Next.
Hopsworks is available under the AGPL-V3 license. In plain English this means that you are free to use Hopsworks and even build paid services on it, but if you modify the source code, you should also release your changes and any systems built around it as AGPL-V3.
- Ask questions and give us feedback in the Hopsworks Community
- Follow us on Twitter
- Check out all our latest product releases
We are building the most complete and modular ML platform available in the market and we count on your support to continuously improve Hopsworks. Feel free to give us suggestions, report bugs and add features to our library anytime.
We’re the best in what we do and want our community to succeed as well.
Our many thanks to the top contributors of Hopsworks!
cd hopsworks/hopsworks-IT/src/test/ruby/
bundle install
rspec --format html --out ../target/test-report.html
To run a single test
cd hopsworks/hopsworks-IT/src/test/ruby/
rspec ./spec/session_spec.rb:60
To skip tests that need to run inside a vm
cd hopsworks/hopsworks-IT/src/test/ruby/
rspec --format html --out ../target/test-report.html --tag ~vm:true
When the test is done if LAUNCH_BROWSER
is set to true in .env
, it will open the test report in a browser.
- Storage setup - install
- Create partitions on disks -
parted
is one posibility (https://www.thegeekdiary.com/how-to-create-a-partition-using-parted-command/)
parted /dev/vdb
(parted) mklabel msdos
(parted) mkpart
Partition type? primary/extended? primary
File system type? [ext2]? ext4
Start? 0
End? 1000G
Warning: The resulting partition is not properly aligned for best performance.
Ignore/Cancel? I
- Format
mkfs.ext4 /dev/vdb1
- Mount
mkdir /mnt/disk1
mount /dev/vdb1 /mnt/disk1
- Append to /etc/fstab
/dev/vdb1 /srv ext4 defaults 0 0
- Generic install setup
sudo yum install -y java-1.8.0-openjdk
sudo yum install -y wget
ssh-keygen -t rsa -N "" -f ~/.ssh/id_rsa
cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
- Add each of the nodes to the other's
/etc/hosts
file - Karamel
wget https://repo.hops.works/master/karamel-0.6.tgz
tar xzvf karamel-0.6.tgz
cd karamel-0.6
nohup ./bin/karamel --headless &
- Install/Restart fix
/etc/hosts
- make hostname resolv to private ip instead of localhost - when restarting a vm remember to
- update main node
/srv/hops/hadoop/etc/hadoop/yarn_exclude_nodes.xml
su rmyarn
and/srv/hops/hadoop/bin/yarn rmadmin -refreshNodes
- update worker ip in main node
/etc/hosts
andsystemctl restart dnsmasq
- shutdown all services
- update ip in
/etc/hosts
so it doesn't use localhost, but the private ip for resolving localhost - update ip in
/etc/resolv.conf
- (might have to remove immutable flag -chattr -i /etc/resolv.conf
) - update ip in
/etc/systemd/system/multi-user.target.wants/consul.service
- update ip in
/etc/dnsmasq.d/default
- update ip in
/srv/hops/kagent/etc/config.ini
systemctl restart dnsmasq
systemctl daemon-reload
- restart all services
- decomission yarn nodes
- add node to
/srv/hops/hadoop/etc/hadoop/yarn_exclude_nodes.xml
<host><name>heap-worker.novalocal</name></host>
- as user
rmyarn
run:/srv/hops/hadoop/bin/yarn rmadmin -refreshNodes
- shutdown decomissioned node
- changing hostname
- add new host in the hopsworks.hosts table
- update hostname in
/srv/hops/consul/.bashrc
- update host-id and hostname in
/srv/hops/kagent/etc/config.ini
- update hosts in
/srv/hops/kagent/etc/state_store/crypto_material_state.json
- save
/srv/hops/super_crypto
and then empty to regenerate certs - export cert pass
export HOPSIFY_PASSWORD=
- run hopsify for all users:
/srv/hops/kagent/host-certs/hopsify --config /srv/hops/kagent/etc/config.ini x509 --alt-url=https://192.168.1.21:8181/ --username consul
- set acls for all users
setfacl -m u:consul:rx /srv/hops/super_crypto/consul
- remove
data_dir
from consul before restarting it to avoid id clashes