Please see our main README file for more details.
- Repository: https://github.com/ViDA-NYU/data-polygamy
- Programming Languages: Java (for main source code) and Python (for experiments and plots)
- Additional Programming Language info: Java version 1.7.0_45, and Python version 2.7.3
- Compiler Info: Apache Maven 3.3.9
- Packages/Libraries Needed: Dependencies
- Repository: Datasets
- Data generators: There are no data generators, but we provide scripts to download and configure all our data and metadata; please see the
prepareData.sh
script.
Most of the experiments were executed on a cluster with 20 compute nodes, each node running Red Hat Enterprise Linux Server release 6.7, and having the following configuration:
- Processor: AMD Opteron(TM) Processor 6272 (4x16 cores) 2.1GHz
- Caches: 3 levels; level 1: 8 x 64KB; level 2: 8 x 2MB; level 3: 2 x 8MB
- Memory: 256GB
- Secondary Storage: IBM DCS3700; capacity of 505.086TB of 360 SAS disks
- Network: IBM Networking Operating System RackSwitch G8264 10GBe
The scalability experiment was the only one performed on Amazon Web Services (AWS), and more information about it is available here. We provide scripts to automatically configure the cluster.
- Scripts and how-tos to generate all necessary data or locate datasets: please see the
prepareData.sh
script and sections 6.2 and 6.3. - Scripts and how-tos to prepare the software for system: please see the
prepareSoftware.sh
script and section 4. - Scripts and how-tos for all experiments executed for the paper: please see the
runExperiments.sh
script and section 6.