This project is a movie data analysis platform that collects JSON formatted movie data in CSV, structures it using Spark, and loads it into JanusGraph. By leveraging the powerful graph database features of JanusGraph, relationships between movies, actors, directors, and other entities can be visualized. The project is built as a web application using Spring Boot and React.js, allowing users to access and query this data.
- Java 17
- Apache Maven : 4.0.0
- Apache Spark
- Scala Version : 2.13
- Spark Version : v4.0.0-preview
- Docker : 24.0.7
- Docker Compose : 1.29.2
- JanusGraph : 1.1.0-20240712-100602.aaf21b1
- Apache Cassandra : 5.0.0
- Elasticsearch : 8.14.3
- Gremlin Visualizer : 1.0
- Apache Tinkerpop
- Gremlin Driver : 3.7.2
- Spring : 3.3.1
- JSON : 20240303
- React.js : 18.3.1
- NPM : 10.8.2
- Node.js : 22.5.1
- Java JDK 17
- Docker
- Docker Compose
- Node.js
git clone https://github.com/alihsan-tsdln/movie-database-data-engineering.git
cd movie-database-data-engineering
mvn clean install
javac *.java
sudo docker-compose up -d
If the images don't exist, Docker will automatically pull them.
java Main
// WARNING : Bulk loading doesn't control consistency of your data. Be sure consistency of your data.
We may monitor our job executions from http:/localhost:4040
java Application
SpringBoot will start server in http:/localhost:8080
npm start
NPM will start React.js in http:/localhost:3002 Normally, React tries to start in 3000 port but our Gremlin Visualizer uses 3000 port.
You can query the questions which in drop-down menu.
We started Gremlin Visualizer in Docker. Firstly, from JanusGraph instance which in Docker we will start Gremlin server.
sudo docker ps
We've learned the ID of JanusGraph server from output of command.
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
595a60cce026 cassandra:3 "docker-entrypoint.s…" 14 minutes ago Up 14 minutes 7000-7001/tcp, 0.0.0.0:9042->9042/tcp, :::9042->9042/tcp, 7199/tcp, 0.0.0.0:9160->9160/tcp, :::9160->9160/tcp jce-cassandra
474275e60438 docker.elastic.co/elasticsearch/elasticsearch:6.6.0 "/usr/local/bin/dock…" 14 minutes ago Up 14 minutes 0.0.0.0:9200->9200/tcp, :::9200->9200/tcp, 9300/tcp jce-elastic
8fe5588bcd23 janusgraph/janusgraph:latest "docker-entrypoint.s…" 14 minutes ago Up 14 minutes 0.0.0.0:8182->8182/tcp, :::8182->8182/tcp jce-janusgraph
d9540645d532 prabushitha/gremlin-visualizer:latest "docker-entrypoint.s…" 14 minutes ago Up 14 minutes 0.0.0.0:3000-3001->3000-3001/tcp, :::3000-3001->3000-3001/tcp gremlin_visualize
sudo docker exec -it ${Id of your JanusGraph server} bash
We've reached the server terminal.
bin/gremlin.sh
We've started Gremlin Server by gremlin.sh.
janusgraph@8fe5588bcd23:/opt/janusgraph$ bin/gremlin.sh
\,,,/
(o o)
-----oOOo-(3)-oOOo-----
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/opt/janusgraph/lib/log4j-slf4j-impl-2.20.0.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/opt/janusgraph/lib/logback-classic-1.2.11.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
plugin activated: tinkerpop.server
plugin activated: tinkerpop.tinkergraph
07:32:30 INFO org.apache.tinkerpop.gremlin.hadoop.jsr223.HadoopGremlinPlugin.getCustomizers - HADOOP_GREMLIN_LIBS is set to: /opt/janusgraph/lib
07:32:30 WARN org.apache.hadoop.util.NativeCodeLoader.<clinit> - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
plugin activated: tinkerpop.hadoop
plugin activated: tinkerpop.spark
plugin activated: tinkerpop.utilities
plugin activated: janusgraph.imports
gremlin>
Finally, we are able to use website. The link is http:/localhost:3002. Host part would be your JanusGraph's host. Port is 8182 default port of Gremlin.