Skip to content

Latest commit

 

History

History
74 lines (57 loc) · 2.37 KB

README.md

File metadata and controls

74 lines (57 loc) · 2.37 KB

Python Big Data Scientific Computing Kit

This ansible script deploys a server with a collection of Python Big Data and Scientific Computing tools and libraries, preconfigured for running on a local Spark cluster.

Included packages:

Installation

  1. Setup a server or VM with CentOS7

  2. Ensure FQDN is configured correctly. Spark requires the host system hostname to be resolvable, quickest fix is to ensure the hostname resolves as 127.0.0.1 by adding an entry in /etc/hosts:

    127.0.0.1 localhost.localdomain localhost pydatalab.server.local pydatalab
    
  3. Create an ansible hosts inventory (assuming server hostname is your hostname is :

    [master]
    pydatalab.server.local
    
  4. Execute ansible

    ansible-playbook -i hosts playbook.yml
    
  5. Jupyter should be running at pydatalab.server.local:8888

  6. Default login for pydatalab user is pydatalab:pydatalab

Integration with Hortonworks Hadoop

This ansible script detects whether it is being installed on Hortonworks Data Platform and will create a Jupyter kernel with the right environment variables set and configured to use Spark on HDP.

Supported platforms

This script have been tested on:

  • CentOS 7.1
  • Red Hat Enterprise Linux 7.1