Vessel Metrics Service

Overview

The Vessel Metrics Service is an application designed for parsing, validating, and analyzing vessel data from CSV files, as well as generating compliance and performance metrics. The service supports calculating speed differences, fuel efficiency, outlier detection, and more. The system is built with Spring Boot 3.x and Java 21, leveraging object-oriented principles, error handling, and performance optimization.

Features

Speed Difference Calculation: Calculate the difference between actual speed over ground and proposed speed over ground for each vessel waypoint.
Validation Issues: Identify and report validation issues such as missing values, negative speeds, and outliers.
Compliance Comparison: Compare two vessels' compliance based on actual and proposed speeds.
Problematic Waypoints: Identify groups of consecutive waypoints with validation issues and classify problems (e.g., outliers, missing values).
Data Merging: Retrieve raw and calculated metrics for a specific vessel within a given time frame.

Requirements

Java 21
Spring Boot 3.x
Docker (for PostgreSQL database setup)
Maven (for dependency management)

Setup Instructions

Clone the repository:

git clone https://github.com/gmitaros/vessel-metrics-service.git
cd vessel-metrics-service

Build the project:

mvn clean install

or

mvn clean install -DskipTests

to skip tests for faster building time

Running with Docker

To run the service along with PostgreSQL using Docker and Docker Compose:

Build the Docker image:
```
docker-compose build
```
Start the application:
```
docker-compose up
```

This will start both the PostgreSQL container and the vessel-metrics-service container.

Environment Variables

These are configured via the docker-compose.yml file and passed as environment variables to the application:

SPRING_DATASOURCE_URL: The JDBC URL for the PostgreSQL database.
SPRING_DATASOURCE_USERNAME: The username for the database.
SPRING_DATASOURCE_PASSWORD: The password for the database.

Also, the service will use the application.properties from the resources file

Application Properties

The application is configured using the application.properties file. Below are key configurations that are customizable:

spring.application.name: Defines the name of the application (Vessel Metrics Service).
vessel.metrics.outlier.threshold: Sets the threshold for detecting outliers in vessel data. Default is 3.0.

Database Configuration (PostgreSQL)

spring.datasource.url: The JDBC URL for the PostgreSQL database.
spring.datasource.username: Database username.
spring.datasource.password: Database password.

JPA & Hibernate Configuration

spring.jpa.hibernate.ddl-auto: Disables automatic DDL generation by Hibernate (none).
spring.jpa.show-sql: Controls whether to show SQL queries in the log (false).
spring.jpa.properties.hibernate.jdbc.batch_size: Sets the batch size for Hibernate inserts (1000).
spring.datasource.hikari.maximum-pool-size: Configures the HikariCP connection pool size (set to 20).

CSV Data Loading

vessel.metrics.csv.path: Determines the path of the CSV data to load. Default: (/data/vessel_data.csv).
vessel.metrics.csv.load.if.already.have.data: If this is enabled (true) then the app will load again the CSV data to the db ignoring if there are data already in the DB. Default: (false)

You can modify these properties in application.properties located in src/main/resources/.

Assumptions

CSV Data Loading in Vessel Metrics Service

The application automatically loads vessel data from a CSV file into the database when the service starts, triggered by an ApplicationReadyEvent. Data is loaded under the following conditions:

The database contains no existing vessel data, or
The vessel.metrics.csv.load.if.already.have.data property is set to true.

The CSV file is located at the path specified by the vessel.metrics.csv.path property. The system parses, validates, and stores the data in batches, followed by an outlier detection process.

The CSV file is placed in the /data/ directory with the required fields (e.g., vessel_code, datetime, latitude, etc.).
Thresholds for outlier detection and other validations can be adjusted via properties in the application.properties file.

Postman Collection

The project includes a Postman collection with all the available API requests and their stored responses. This allows for easy testing and exploration of the application's endpoints. You can import the collection into Postman and execute requests against the running application to view the expected behavior. The collection is organized by key functionalities, ensuring a smooth and efficient way to validate the application’s performance and responses during development or testing.

Error Handling

Comprehensive error handling is implemented across services. Errors like missing vessel data, invalid inputs, and calculation failures are logged and managed using custom exceptions.
API responses are designed to return meaningful error messages and HTTP status codes (e.g., 404 for missing vessels, 400 for invalid input).

Logging

Every major service and controller action is logged, including metrics calculations, data validation, and compliance comparisons.

Unit Tests

The project contains unit tests for critical components, ensuring that speed differences, fuel efficiency, and validation services function correctly.
To run the tests:
```
mvn test
```

Dependencies

The project relies on the following external libraries:

Spring Boot 3.3.x - Core framework.
PostgreSQL - Relational database.
Hibernate - ORM for interacting with PostgreSQL.
Lombok - To reduce boilerplate code.
Apache Commons CSV - For CSV parsing.
Flyway - Database migrations.

API Endpoints

1. Speed Differences

GET /vessels/{vesselCode}/speed-differences
Returns paginated speed differences for a vessel.

2. Validation Issues

GET /vessels/{vesselCode}/validation-issues
Returns validation issues for a vessel, sorted by frequency.

3. Compliance Comparison

GET /vessels/compare-compliance?vesselCode1=code1&vesselCode2=code2
Compares compliance between two vessels.

4. Data Merge

GET /vessels/{vesselCode}/data-merge?startDate=YYYY-MM-DD&endDate=YYYY-MM-DD
Retrieves merged raw and calculated metrics within a specific period.

5. Problematic Waypoints

GET /vessels/{vesselCode}/problematic-waypoints?problemType=outlier
Returns problematic waypoints grouped by validation issues (optional problem type filter).

Open Issues and TODOs

Database Locking for Concurrent CSV Processing:

Problem: In a multi-pod deployment, concurrent CSV processing may lead to race conditions or data duplication if multiple instances process the CSV simultaneously.
Potential Solutions:
1. Pessimistic Locking:
  - Use SQL SELECT ... FOR UPDATE or LOCK to enforce a database-level lock during CSV processing.
  - This ensures that only one process at a time is writing or modifying the table, but it could reduce performance under high loads because it blocks access to database rows or tables until the lock is released, forcing other transactions to wait.
    - This can lead to higher contention and delays, especially when multiple processes or threads are waiting to access the same data. The longer the lock is held (e.g., during CSV processing), the longer other operations are stalled. In distributed systems with high concurrency, this can cause bottlenecks, slowing down the overall system performance.
2. Distributed Locking:
  - Use external tools like Redis to create distributed locks. This approach is more suited for distributed environments, where multiple instances or pods may attempt to process the same data.
  - Example: Redis SETNX (Set if Not Exists) can be used to create a lock, ensuring only one instance can process the CSV file.

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
.github/workflows		.github/workflows
.mvn/wrapper		.mvn/wrapper
Postman		Postman
src		src
.gitignore		.gitignore
Dockerfile		Dockerfile
Java Exercise-2024-04-11.pdf		Java Exercise-2024-04-11.pdf
LICENSE		LICENSE
README.md		README.md
docker-compose.yml		docker-compose.yml
mvnw		mvnw
mvnw.cmd		mvnw.cmd
pom.xml		pom.xml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Vessel Metrics Service

Overview

Features

Requirements

Setup Instructions

Running with Docker

Environment Variables

Application Properties

Database Configuration (PostgreSQL)

JPA & Hibernate Configuration

CSV Data Loading

Assumptions

CSV Data Loading in Vessel Metrics Service

Postman Collection

Error Handling

Logging

Unit Tests

Dependencies

API Endpoints

1. Speed Differences

2. Validation Issues

3. Compliance Comparison

4. Data Merge

5. Problematic Waypoints

Open Issues and TODOs

About

Releases

Packages

Languages

License

gmitaros/vessel-metrics-service

Folders and files

Latest commit

History

Repository files navigation

Vessel Metrics Service

Overview

Features

Requirements

Setup Instructions

Running with Docker

Environment Variables

Application Properties

Database Configuration (PostgreSQL)

JPA & Hibernate Configuration

CSV Data Loading

Assumptions

CSV Data Loading in Vessel Metrics Service

Postman Collection

Error Handling

Logging

Unit Tests

Dependencies

API Endpoints

1. Speed Differences

2. Validation Issues

3. Compliance Comparison

4. Data Merge

5. Problematic Waypoints

Open Issues and TODOs

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages