The Vessel Metrics Service is an application designed for parsing, validating, and analyzing vessel data from CSV files, as well as generating compliance and performance metrics. The service supports calculating speed differences, fuel efficiency, outlier detection, and more. The system is built with Spring Boot 3.x and Java 21, leveraging object-oriented principles, error handling, and performance optimization.
- Speed Difference Calculation: Calculate the difference between actual speed over ground and proposed speed over ground for each vessel waypoint.
- Validation Issues: Identify and report validation issues such as missing values, negative speeds, and outliers.
- Compliance Comparison: Compare two vessels' compliance based on actual and proposed speeds.
- Problematic Waypoints: Identify groups of consecutive waypoints with validation issues and classify problems (e.g., outliers, missing values).
- Data Merging: Retrieve raw and calculated metrics for a specific vessel within a given time frame.
- Java 21
- Spring Boot 3.x
- Docker (for PostgreSQL database setup)
- Maven (for dependency management)
-
Clone the repository:
git clone https://github.com/gmitaros/vessel-metrics-service.git cd vessel-metrics-service
-
Build the project:
mvn clean install
or
mvn clean install -DskipTests
to skip tests for faster building time
To run the service along with PostgreSQL using Docker and Docker Compose:
-
Build the Docker image:
docker-compose build
-
Start the application:
docker-compose up
This will start both the PostgreSQL container and the vessel-metrics-service container.
These are configured via the docker-compose.yml
file and passed as environment variables to the application:
SPRING_DATASOURCE_URL
: The JDBC URL for the PostgreSQL database.SPRING_DATASOURCE_USERNAME
: The username for the database.SPRING_DATASOURCE_PASSWORD
: The password for the database.
Also, the service will use the application.properties from the resources file
The application is configured using the application.properties
file. Below are key configurations that are customizable:
spring.application.name
: Defines the name of the application (Vessel Metrics Service
).vessel.metrics.outlier.threshold
: Sets the threshold for detecting outliers in vessel data. Default is 3.0.
spring.datasource.url
: The JDBC URL for the PostgreSQL database.spring.datasource.username
: Database username.spring.datasource.password
: Database password.
spring.jpa.hibernate.ddl-auto
: Disables automatic DDL generation by Hibernate (none
).spring.jpa.show-sql
: Controls whether to show SQL queries in the log (false
).spring.jpa.properties.hibernate.jdbc.batch_size
: Sets the batch size for Hibernate inserts (1000
).spring.datasource.hikari.maximum-pool-size
: Configures the HikariCP connection pool size (set to20
).
vessel.metrics.csv.path
: Determines the path of the CSV data to load. Default: (/data/vessel_data.csv
).vessel.metrics.csv.load.if.already.have.data
: If this is enabled (true
) then the app will load again the CSV data to the db ignoring if there are data already in the DB. Default: (false
)
You can modify these properties in application.properties
located in src/main/resources/
.
The application automatically loads vessel data from a CSV file into the database when the service starts, triggered by an ApplicationReadyEvent
. Data is loaded under the following conditions:
- The database contains no existing vessel data, or
- The
vessel.metrics.csv.load.if.already.have.data
property is set totrue
.
The CSV file is located at the path specified by the vessel.metrics.csv.path
property. The system parses, validates, and stores the data in batches, followed by an outlier detection process.
- The CSV file is placed in the
/data/
directory with the required fields (e.g.,vessel_code
,datetime
,latitude
, etc.). - Thresholds for outlier detection and other validations can be adjusted via properties in the
application.properties
file.
The project includes a Postman collection with all the available API requests and their stored responses. This allows for easy testing and exploration of the application's endpoints. You can import the collection into Postman and execute requests against the running application to view the expected behavior. The collection is organized by key functionalities, ensuring a smooth and efficient way to validate the application’s performance and responses during development or testing.
- Comprehensive error handling is implemented across services. Errors like missing vessel data, invalid inputs, and calculation failures are logged and managed using custom exceptions.
- API responses are designed to return meaningful error messages and HTTP status codes (e.g., 404 for missing vessels, 400 for invalid input).
- Every major service and controller action is logged, including metrics calculations, data validation, and compliance comparisons.
- The project contains unit tests for critical components, ensuring that speed differences, fuel efficiency, and validation services function correctly.
- To run the tests:
mvn test
The project relies on the following external libraries:
- Spring Boot 3.3.x - Core framework.
- PostgreSQL - Relational database.
- Hibernate - ORM for interacting with PostgreSQL.
- Lombok - To reduce boilerplate code.
- Apache Commons CSV - For CSV parsing.
- Flyway - Database migrations.
- GET
/vessels/{vesselCode}/speed-differences
- Returns paginated speed differences for a vessel.
- GET
/vessels/{vesselCode}/validation-issues
- Returns validation issues for a vessel, sorted by frequency.
- GET
/vessels/compare-compliance?vesselCode1=code1&vesselCode2=code2
- Compares compliance between two vessels.
- GET
/vessels/{vesselCode}/data-merge?startDate=YYYY-MM-DD&endDate=YYYY-MM-DD
- Retrieves merged raw and calculated metrics within a specific period.
- GET
/vessels/{vesselCode}/problematic-waypoints?problemType=outlier
- Returns problematic waypoints grouped by validation issues (optional problem type filter).
Database Locking for Concurrent CSV Processing:
-
Problem: In a multi-pod deployment, concurrent CSV processing may lead to race conditions or data duplication if multiple instances process the CSV simultaneously.
-
Potential Solutions:
-
Pessimistic Locking:
- Use SQL
SELECT ... FOR UPDATE
orLOCK
to enforce a database-level lock during CSV processing. - This ensures that only one process at a time is writing or modifying the table, but it could reduce performance under high loads because it blocks access to database rows or tables until the lock is released, forcing other transactions to wait.
- This can lead to higher contention and delays, especially when multiple processes or threads are waiting to access the same data. The longer the lock is held (e.g., during CSV processing), the longer other operations are stalled. In distributed systems with high concurrency, this can cause bottlenecks, slowing down the overall system performance.
- Use SQL
-
Distributed Locking:
- Use external tools like Redis to create distributed locks. This approach is more suited for distributed environments, where multiple instances or pods may attempt to process the same data.
- Example: Redis
SETNX
(Set if Not Exists) can be used to create a lock, ensuring only one instance can process the CSV file.
-