Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CU-86c083kqv - Add config fields for names for Kafka and Notifications #325

Merged
merged 5 commits into from
Oct 15, 2024

Conversation

sushantpatil1214
Copy link
Collaborator

@sushantpatil1214 sushantpatil1214 commented Sep 25, 2024

Summary by CodeRabbit

  • New Features

    • Introduced configurable fields for name validation, allowing external configuration of given and family name retrieval.
    • Added nameValidationFields, fieldsForKafkaKeyGen, and nameFieldsForNotificationDisplay to various configuration files and classes to enhance flexibility in name validation and Kafka operations.
    • Enhanced the logic for generating display names and processing candidate records in the linking process.
  • Bug Fixes

    • Improved filtering logic for patient names by replacing hardcoded strings with dynamic references to configuration fields.
  • Chores

    • Removed unnecessary lines from deployment script related to the Keycloak test server.

@rcrichton
Copy link
Member

Copy link
Contributor

coderabbitai bot commented Sep 25, 2024

Walkthrough

The changes involve modifications across several Java and Scala files to enhance the configuration and retrieval of patient name fields. The hardcoded strings for given names and family names have been replaced with configurable fields, allowing for more flexible name validation. New fields for name validation have been added to various configuration classes and JSON files, enabling external configuration of these fields. Additionally, a new method for generating Kafka partition keys has been introduced, improving code organization.

Changes

File Path Change Summary
JeMPI_Apps/JeMPI_AsyncReceiver/src/main/java/.../Main.java Updated apacheReadCSV method to use configurable fields givenNameField and familyNameField for retrieving names instead of hardcoded strings. Added method generateKafkaPartitionKey.
JeMPI_Apps/JeMPI_EM_Scala/src/main/scala/.../Config.scala Added new optional fields fieldsForKafkaKeyGen and nameFieldsForNotificationDisplay to Config case class.
JeMPI_Apps/JeMPI_LibShared/src/main/java/.../FieldsConfig.java Introduced new public fields nameValidationFields, fieldsForKafkaKeyGen, and nameFieldsForNotificationDisplay to FieldsConfig class, initialized in the constructor.
JeMPI_Apps/JeMPI_LibShared/src/main/java/.../JsonConfig.java Added new properties nameValidationFields, fieldsForKafkaKeyGen, and nameFieldsForNotificationDisplay to the JsonConfig record.
JeMPI_Apps/JeMPI_LibShared/src/main/java/.../NameFieldsForKafkaKeyGen.java Introduced a new record NameFieldsForKafkaKeyGen with a field List<String> nameFieldsForKafkaKeyGen.
JeMPI_Apps/JeMPI_LibShared/src/main/java/.../ConfigurationModel.java Added new fields nameValidationFields, fieldsForKafkaKeyGen, and nameFieldsForNotificationDisplay to Configuration record.
JeMPI_Apps/JeMPI_Linker/src/main/java/.../LinkerDWH.java Updated patientName method to use FIELDS_CONFIG.nameValidationFields for retrieving names. Refactored linkInteraction method for improved candidate processing and notification logic.
devops/linux/docker/data-config/config-reference-auto-generate-fields.json Added new fields fieldsForKafkaKeyGen and nameFieldsForNotificationDisplay with values ["given_name", "family_name"].
devops/linux/docker/data-config/config-reference-link-d-validate-dp-match-dp.json Added new fields fieldsForKafkaKeyGen and nameFieldsForNotificationDisplay with values ["given_name", "family_name"].
devops/linux/docker/data-config/config-reference-link-d.json Added new fields fieldsForKafkaKeyGen and nameFieldsForNotificationDisplay with values ["given_name", "family_name"].
devops/linux/docker/data-config/config-reference-link-dp-minimal.json Added new fields fieldsForKafkaKeyGen and nameFieldsForNotificationDisplay with values ["given_name", "family_name"].
devops/linux/docker/data-config/config-reference-link-dp.json Added new fields fieldsForKafkaKeyGen and nameFieldsForNotificationDisplay with values ["given_name", "family_name"].

Possibly related PRs

Suggested reviewers

  • NyashaMuusha

Poem

🐰 In fields of code, we hop and play,
With names now set in a flexible way.
No more hard strings, just fields to choose,
A dance of data, we cannot lose!
So let’s celebrate this joyful feat,
For patient names, our work is sweet! 🌟


Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🧹 Outside diff range and nitpick comments (4)
JeMPI_Apps/JeMPI_LibShared/src/main/java/org/jembi/jempi/shared/config/input/NameValidationFields.java (1)

6-7: LGTM: Good use of record and JSON annotation.

The use of a record for this simple data structure is appropriate, and the @JsonInclude annotation is a good practice for JSON serialization.

Consider renaming the class to NameValidationConfig or NameValidationRules to better reflect its purpose as a configuration or set of rules, rather than just fields.

JeMPI_Apps/JeMPI_EM_Scala/src/main/scala/org/jembi/jempi/em/config/Config.scala (1)

68-69: Summary: Good addition, documentation needed

The changes to the Config case class are minimal and focused, improving the configurability of the system as intended. The addition of nameValidationFields is a good enhancement.

To improve the overall quality and maintainability of the code:

  1. Consider adding a comment above the Config case class explaining the purpose of each field, especially the newly added nameValidationFields.
  2. Update the PR description or add a comment in the code to clarify how nameValidationFields relates to Kafka and Notifications, as mentioned in the PR objectives.

These additions will help future developers understand the purpose and usage of these configuration options.

devops/linux/docker/data-config/config-reference-link-d-validate-dp-match-dp.json (1)

225-228: LGTM! Consider minor formatting adjustment.

The addition of the nameValidationFields array is a good enhancement to the configuration. It clearly specifies which fields should be used for name validation, which aligns well with the existing demographicFields section.

For consistency with the rest of the file, consider adjusting the indentation of the array elements. Here's a suggested format:

  "nameValidationFields": [
-            "given_name",
-            "family_name"
+    "given_name",
+    "family_name"
  ],
JeMPI_Apps/JeMPI_Linker/src/main/java/org/jembi/jempi/linker/backend/LinkerDWH.java (1)

98-106: LGTM! Consider adding a safety check for nameValidationFields size.

The changes improve the flexibility of the patientName method by using configurable field names from FIELDS_CONFIG.nameValidationFields. This aligns well with the PR objective of adding config fields for names.

For added robustness, consider adding a check to ensure FIELDS_CONFIG.nameValidationFields has at least two elements before accessing them. This can prevent potential IndexOutOfBoundsExceptions:

 private static String patientName(final Interaction interaction) {
    var patientRecord = interaction.demographicData();
+   if (FIELDS_CONFIG.nameValidationFields.size() < 2) {
+      LOGGER.warn("Insufficient name validation fields configured. Using default field names.");
+      return patientRecord.fields.stream()
+         .filter(field -> "given_name".equals(field.ccTag()) || "family_name".equals(field.ccTag()))
+         .map(DemographicData.DemographicField::value)
+         .collect(Collectors.joining(" "))
+         .trim();
+   }
    String givenNameField = FIELDS_CONFIG.nameValidationFields.get(0);
    String familyNameField = FIELDS_CONFIG.nameValidationFields.get(1);
    // ... rest of the method remains unchanged
 }

This change ensures the method gracefully handles cases where the configuration might be incomplete.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Files that changed from the base of the PR and between 8927d25 and f5499bd.

📒 Files selected for processing (13)
  • JeMPI_Apps/JeMPI_AsyncReceiver/src/main/java/org/jembi/jempi/async_receiver/Main.java (1 hunks)
  • JeMPI_Apps/JeMPI_EM_Scala/src/main/scala/org/jembi/jempi/em/config/Config.scala (1 hunks)
  • JeMPI_Apps/JeMPI_LibShared/src/main/java/org/jembi/jempi/shared/config/FieldsConfig.java (2 hunks)
  • JeMPI_Apps/JeMPI_LibShared/src/main/java/org/jembi/jempi/shared/config/input/JsonConfig.java (1 hunks)
  • JeMPI_Apps/JeMPI_LibShared/src/main/java/org/jembi/jempi/shared/config/input/NameValidationFields.java (1 hunks)
  • JeMPI_Apps/JeMPI_LibShared/src/main/java/org/jembi/jempi/shared/models/ConfigurationModel.java (1 hunks)
  • JeMPI_Apps/JeMPI_Linker/src/main/java/org/jembi/jempi/linker/backend/LinkerDWH.java (1 hunks)
  • devops/linux/docker/data-config/config-reference-auto-generate-fields.json (1 hunks)
  • devops/linux/docker/data-config/config-reference-link-d-validate-dp-match-dp.json (1 hunks)
  • devops/linux/docker/data-config/config-reference-link-d.json (1 hunks)
  • devops/linux/docker/data-config/config-reference-link-dp-minimal.json (1 hunks)
  • devops/linux/docker/data-config/config-reference-link-dp.json (1 hunks)
  • devops/linux/docker/deployment/install_from_scratch/d-stack-1-create-db-build-all-reboot.sh (0 hunks)
💤 Files not reviewed due to no reviewable changes (1)
  • devops/linux/docker/deployment/install_from_scratch/d-stack-1-create-db-build-all-reboot.sh
🔇 Additional comments not posted (15)
JeMPI_Apps/JeMPI_LibShared/src/main/java/org/jembi/jempi/shared/config/input/NameValidationFields.java (1)

1-5: LGTM: Package and imports are appropriate.

The package name and imports are well-structured and relevant to the class's purpose. The use of Jackson annotations suggests proper JSON handling capabilities.

JeMPI_Apps/JeMPI_LibShared/src/main/java/org/jembi/jempi/shared/config/input/JsonConfig.java (2)

22-22: Summary of changes and recommendations

The addition of List<String> nameValidationFields to the JsonConfig record is a positive change that aligns with the PR objectives. To further improve this change:

  1. Add a @JsonProperty annotation and documentation comment to the new field.
  2. Verify and update all JsonConfig usages throughout the codebase.
  3. Consider adding validation or a default value for the new field.

These recommendations will enhance code consistency, maintainability, and robustness. Overall, the change is approved pending these minor adjustments.


22-22: Verify usage and consider adding validation for the new field.

The addition of nameValidationFields to the JsonConfig record is a non-breaking change. However, consider the following points:

  1. Ensure that all code creating JsonConfig instances is updated to include this new field.
  2. Consider adding validation to ensure the list is not null or empty when required.
  3. You might want to provide a default value (e.g., an empty list) to maintain backward compatibility.

To ensure the change is properly integrated, please run the following script:

This script will help identify areas of the codebase that might need updates due to the new field addition.

✅ Verification successful

Verification Successful: No additional usages of JsonConfig require updates.

The addition of nameValidationFields does not impact other parts of the codebase based on the current analysis.

  • No direct instantiations of JsonConfig were found outside of deserialization within JsonConfig.java.
  • No unit tests related to JsonConfig need updates.
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Check for JsonConfig usage and potential updates needed

# Search for JsonConfig instantiations
echo "Searching for JsonConfig instantiations:"
rg --type java "new JsonConfig\(" -A 5

# Search for JsonConfig deserialization
echo "\nSearching for JsonConfig deserialization:"
rg --type java "readValue\(.*JsonConfig\.class" -A 5

# Search for potential unit tests that might need updating
echo "\nSearching for potential unit tests:"
rg --type java "class.*Test.*JsonConfig" -A 10

Length of output: 1182


Script:

#!/bin/bash
# Description: Search for factory methods or builder patterns creating JsonConfig instances

# Search for factory methods returning JsonConfig
echo "Searching for factory methods returning JsonConfig:"
rg --type java ".*JsonConfig.*factory.*\(.*\)" -A 5

# Search for builder patterns involving JsonConfig
echo "\nSearching for builder patterns with JsonConfig:"
rg --type java "JsonConfig\.builder\(" -A 5

# Search for JSON resource files that deserialize into JsonConfig
echo "\nSearching for JSON resource files deserializing into JsonConfig:"
rg --type json "JsonConfig" --context 2

Length of output: 504

JeMPI_Apps/JeMPI_EM_Scala/src/main/scala/org/jembi/jempi/em/config/Config.scala (2)

68-68: LGTM: Trailing comma addition

The addition of a trailing comma after the rules field is a good practice in Scala. It makes future additions to the case class easier and doesn't affect the functionality.


69-69: Approve addition with clarification request

The addition of nameValidationFields: Option[List[String]] to the Config case class is a good enhancement that allows for optional configuration of name validation fields. This aligns with the PR objective of adding configuration fields for names.

However, could you please clarify how this field relates specifically to Kafka and Notifications as mentioned in the PR objectives? It might be helpful to add a comment explaining the purpose and usage of this field in the context of Kafka and Notifications.

To ensure this addition doesn't conflict with existing code, let's check for any other occurrences of nameValidationFields:

✅ Verification successful

Approval Confirmed

The addition of nameValidationFields: Option[List[String]] to the Config case class aligns with the PR objectives and does not conflict with existing code. No other occurrences of nameValidationFields were found in the codebase.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Search for other occurrences of nameValidationFields
rg "nameValidationFields" --type scala

Length of output: 165

JeMPI_Apps/JeMPI_LibShared/src/main/java/org/jembi/jempi/shared/models/ConfigurationModel.java (1)

19-19: Consider adding documentation and clarifying the field's purpose.

The addition of nameValidationFields aligns with the PR objectives. However, to improve clarity and maintainability:

  1. Add a Javadoc comment explaining the purpose and usage of this field.
  2. Consider using a more specific name if possible, e.g., nameValidationRules or nameValidationCriteria.
  3. If nullability is intentional, document it; otherwise, consider initializing with an empty list.

Example improvement:

/**
 * Specifies the fields to be used for name validation.
 * Each string in the list represents a field name that should be validated.
 * If null, no specific name validation will be performed.
 */
List<String> nameValidationFields,

To ensure consistent usage, let's check for other occurrences of similar fields:

✅ Verification successful

Add documentation and clarify the purpose of nameValidationFields.

The nameValidationFields is consistently used across multiple classes, indicating a standardized approach. To enhance clarity and maintainability:

  1. Documentation: Add Javadoc comments explaining the purpose and usage of nameValidationFields.
  2. Nullability: If the field is intentionally nullable, document this behavior. Otherwise, consider initializing it with an empty list to avoid potential NullPointerExceptions.

Example improvement:

/**
 * Specifies the fields to be used for name validation.
 * Each string in the list represents a field name that should be validated.
 * If null, no specific name validation will be performed.
 */
List<String> nameValidationFields,
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Check for similar validation fields in the codebase
rg --type java "List<String>\s+\w+ValidationFields" .

Length of output: 637

devops/linux/docker/data-config/config-reference-link-d.json (1)

116-119: Approve with suggestions: Clarify purpose and fix indentation

The addition of the nameValidationFields is noted. However, I have a few points to address:

  1. Could you please clarify the purpose of this new field and how it relates to the PR objectives of adding config fields for Kafka and Notifications? This connection is not immediately apparent from the change.

  2. The indentation of the new field is inconsistent with the rest of the file. Consider adjusting it for better readability.

To ensure consistency, let's check if these field names are used elsewhere in the config:

Consider applying this diff to fix the indentation:

   "nameValidationFields": [
-            "given_name",
-            "family_name"
+    "given_name",
+    "family_name"
   ],
✅ Verification successful

Approval Granted: Clarification Needed

The addition of the nameValidationFields has been verified and aligns with the existing usage of given_name and family_name in the configuration file. Additionally, the indentation has been corrected as suggested.

However, please clarify how this new field relates to the PR objectives of adding config fields for Kafka and Notifications, as this connection is not immediately apparent from the changes.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Verify the usage of "given_name" and "family_name" in the config file

echo "Occurrences of 'given_name':"
grep -n "given_name" devops/linux/docker/data-config/config-reference-link-d.json

echo -e "\nOccurrences of 'family_name':"
grep -n "family_name" devops/linux/docker/data-config/config-reference-link-d.json

Length of output: 672

devops/linux/docker/data-config/config-reference-auto-generate-fields.json (1)

138-141: Approve with suggestions: Align indentation and add explanatory comment

The addition of nameValidationFields aligns with the PR objective of adding config fields for names. However, a few improvements can be made:

  1. The indentation of the array elements is inconsistent with the rest of the file.
  2. Consider adding a comment explaining the purpose of this new field and how it relates to Kafka and Notifications (as mentioned in the PR title).

Here's a suggested improvement:

+  // Fields used for name validation in Kafka and Notifications processes
   "nameValidationFields": [
-            "given_name",
-            "family_name"
+    "given_name",
+    "family_name"
   ],

To ensure consistency, let's check the indentation style used throughout the file:

devops/linux/docker/data-config/config-reference-link-dp.json (1)

177-180: Approved addition of nameValidationFields, but clarification needed

The addition of the nameValidationFields array is a good enhancement for specifying which name fields should be validated. This aligns well with the existing demographicFields structure.

However, I have two points for consideration:

  1. The PR objectives mention adding config fields for Kafka and Notifications, but this change doesn't seem directly related to those components. Could you please clarify how this addition fits into the broader context of the PR objectives?

  2. To improve maintainability, consider adding a JSON comment explaining the purpose of this new field. For example:

"nameValidationFields": [
  // Specifies the fields to be used for name validation
  "given_name",
  "family_name"
],

To ensure consistency, let's verify if these fields exist in the demographicFields array:

This script should return the configurations for both "given_name" and "family_name" if they exist in the demographicFields array.

✅ Verification successful

Verified nameValidationFields addition

The nameValidationFields array correctly references existing fields in the demographicFields array, ensuring proper name validation functionality.

However, the PR objectives mention adding config fields for Kafka and Notifications. Could you please clarify how this addition integrates with Kafka and Notifications as outlined in the PR?

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Verify that the fields in nameValidationFields exist in demographicFields

jq '.demographicFields[] | select(.fieldName == "given_name" or .fieldName == "family_name")' devops/linux/docker/data-config/config-reference-link-dp.json

Length of output: 838

devops/linux/docker/data-config/config-reference-link-dp-minimal.json (1)

192-195: Approved: New nameValidationFields added successfully.

The addition of the nameValidationFields array is consistent with the PR objectives and maintains the existing JSON structure. The field names used ("given_name" and "family_name") align with the existing demographic fields.

To ensure this change doesn't conflict with existing configurations, please run the following verification:

Additionally, please consider the following:

  1. Could you provide documentation or clarification on the purpose and usage of this new nameValidationFields array? This will help other developers understand how it's intended to be used in the system.

  2. Are there any other name-related fields that might be relevant to include in this array? For example, if your system supports middle names or name prefixes/suffixes, it might be worth considering their inclusion.

If additional name-related fields are relevant, consider expanding the array:

 "nameValidationFields": [
   "given_name",
-  "family_name"
+  "family_name",
+  "middle_name",
+  "name_prefix",
+  "name_suffix"
 ],

Note: Only add these if they are actually used in your system and present in the demographicFields section.

✅ Verification successful

Verified: nameValidationFields addition confirmed.

Both tests passed successfully:

  • nameValidationFields is unique within the configuration.
  • All fields in nameValidationFields exist in demographicFields.

No issues found. This change is consistent with the project's configuration standards.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Verify the uniqueness of the new field and its consistency with demographic fields

# Test 1: Check if nameValidationFields is unique in the config
echo "Test 1: Checking uniqueness of nameValidationFields"
count=$(jq 'keys' devops/linux/docker/data-config/config-reference-link-dp-minimal.json | grep -c "nameValidationFields")
if [ $count -eq 1 ]; then
    echo "PASS: nameValidationFields is unique"
else
    echo "FAIL: nameValidationFields is not unique or missing"
fi

# Test 2: Verify if the fields in nameValidationFields exist in demographicFields
echo "Test 2: Verifying consistency with demographicFields"
jq -r '.nameValidationFields[]' devops/linux/docker/data-config/config-reference-link-dp-minimal.json | while read field; do
    if jq -e --arg field "$field" '.demographicFields[] | select(.fieldName == $field)' devops/linux/docker/data-config/config-reference-link-dp-minimal.json > /dev/null; then
        echo "PASS: $field exists in demographicFields"
    else
        echo "FAIL: $field not found in demographicFields"
    fi
done

Length of output: 1105

JeMPI_Apps/JeMPI_LibShared/src/main/java/org/jembi/jempi/shared/config/FieldsConfig.java (2)

34-34: LGTM: New field nameValidationFields added correctly.

The new field nameValidationFields is properly declared as a public final List<String>. This is consistent with the coding style of other fields in the class and appropriate for storing name validation fields.


Line range hint 1-146: Overall changes look good, but clarification needed on PR objectives.

The changes to add nameValidationFields are well-implemented and consistent with the existing code structure. However, there's a discrepancy between the PR objectives and the actual changes:

  1. The PR title mentions "Add config fields for names for Kafka and Notifications".
  2. The PR is linked to a task about creating a config flag for Kafka key.

The current changes don't explicitly mention Kafka or notifications. Could you please clarify how nameValidationFields relates to Kafka and notifications configuration? This will help ensure that the implementation fully meets the PR objectives.

To help clarify the relationship between the new field and Kafka/notifications, you could run the following command to search for related usages:

If there are no results, it might indicate that additional changes are needed to fully implement the PR objectives.

JeMPI_Apps/JeMPI_AsyncReceiver/src/main/java/org/jembi/jempi/async_receiver/Main.java (2)

156-167: Summary of changes

The modifications to the apacheReadCSV method successfully implement configurable name fields, aligning with the PR objectives. This enhancement improves the flexibility of the system by allowing customization of given name and family name fields.

However, to further improve the robustness of this implementation, consider:

  1. Adding null checks or default values for the config fields.
  2. Updating the documentation to reflect these new configurable options.
  3. Ensuring that all parts of the codebase that interact with these fields are updated accordingly.

Overall, this change is a positive step towards a more configurable system.


156-157: Improved configurability for name fields.

The introduction of givenNameField and familyNameField variables from FIELDS_CONFIG.nameValidationFields enhances the flexibility of the system. This change aligns well with the PR objectives of adding config fields for names.

Consider adding null checks or default values for these config fields to prevent potential runtime errors:

String givenNameField = FIELDS_CONFIG.nameValidationFields.size() > 0 ? FIELDS_CONFIG.nameValidationFields.get(0) : "given_name";
String familyNameField = FIELDS_CONFIG.nameValidationFields.size() > 1 ? FIELDS_CONFIG.nameValidationFields.get(1) : "family_name";

Please ensure that the documentation is updated to reflect these new configurable fields. Run the following command to check if the README or configuration guide mentions these new fields:

Also applies to: 159-159, 164-164

JeMPI_Apps/JeMPI_Linker/src/main/java/org/jembi/jempi/linker/backend/LinkerDWH.java (1)

98-106: Changes align well with the class purpose and have minimal impact.

The modifications to the patientName method introduce more flexibility without affecting its usage elsewhere in the class. This change enhances the configurability of the system, which is in line with the PR objectives.

The rest of the LinkerDWH class doesn't require any modifications due to these changes. The sendNotification method, which uses patientName, will benefit from the improved flexibility without needing any alterations.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Outside diff range and nitpick comments (9)
JeMPI_Apps/JeMPI_LibShared/src/main/java/org/jembi/jempi/shared/config/input/NameFieldsForKafkaKeyGen.java (2)

6-10: LGTM: Well-structured record with appropriate annotation.

The use of @JsonInclude(JsonInclude.Include.NON_NULL) is a good practice for JSON serialization. The record structure is suitable for this configuration class, providing immutability and useful method implementations.

Consider shortening the field name for improved readability:

 public record NameFieldsForKafkaKeyGen(
-        List<String> nameFieldsForKafkaKeyGen
+        List<String> nameFields
         ) {
 }

This change maintains clarity while making the code more concise.


1-10: Overall assessment: Well-implemented configuration class for Kafka key generation.

This new NameFieldsForKafkaKeyGen record aligns well with the PR objectives to add configuration fields for names in Kafka. It provides a flexible way to specify which name fields should be used for Kafka key generation. The implementation is concise, follows good practices, and uses appropriate Java features like records and Jackson annotations.

To further enhance this implementation, consider the following suggestions:

  1. Add documentation comments (Javadoc) to explain the purpose and usage of this class.
  2. If there are specific constraints on the name fields (e.g., non-empty list, specific field names), consider adding validation logic or using bean validation annotations.
  3. Think about providing a default set of name fields if none are specified, which could be useful for maintaining backwards compatibility or providing a sensible default configuration.
JeMPI_Apps/JeMPI_LibShared/src/main/java/org/jembi/jempi/shared/config/input/JsonConfig.java (1)

22-23: LGTM! Consider adding documentation for the new fields.

The addition of fieldsForKafkaKeyGen and nameFieldsForNotificationDisplay aligns well with the PR objectives. The @JsonProperty annotations are correctly used, ensuring proper JSON serialization/deserialization.

To improve code maintainability, consider adding brief documentation for these new fields. Here's a suggested improvement:

+     /**
+      * List of field names to be used for generating Kafka keys.
+      */
      @JsonProperty("fieldsForKafkaKeyGen") List<String> fieldsForKafkaKeyGen,
+     /**
+      * List of name fields to be displayed in notifications.
+      */
      @JsonProperty("nameFieldsForNotificationDisplay") List<String> nameFieldsForNotificationDisplay,

This addition will provide clarity for future developers working with this configuration.

JeMPI_Apps/JeMPI_EM_Scala/src/main/scala/org/jembi/jempi/em/config/Config.scala (1)

64-70: LGTM! Consider adding documentation for the new fields.

The addition of fieldsForKafkaKeyGen and nameFieldsForNotificationDisplay fields to the Config case class aligns well with the PR objectives. The use of Option[List[String]] provides flexibility in configuration.

Consider adding Scaladoc comments for these new fields to explain their purpose and expected values. This would improve the maintainability of the code. For example:

case class Config(
  // ... existing fields ...
  
  /** List of field names to be used for generating Kafka partition keys */
  fieldsForKafkaKeyGen: Option[List[String]],
  
  /** List of name fields to be displayed in notifications */
  nameFieldsForNotificationDisplay: Option[List[String]]
)
JeMPI_Apps/JeMPI_LibShared/src/main/java/org/jembi/jempi/shared/models/ConfigurationModel.java (1)

19-20: LGTM! Consider adding JavaDoc comments for the new fields.

The addition of fieldsForKafkaKeyGen and nameFieldsForNotificationDisplay aligns well with the PR objectives. The implementation is correct and integrates seamlessly with the existing structure.

To improve code documentation, consider adding JavaDoc comments for these new fields. For example:

/**
 * List of field names used for generating Kafka keys.
 */
List<String> fieldsForKafkaKeyGen,

/**
 * List of name fields to be displayed in notifications.
 */
List<String> nameFieldsForNotificationDisplay,

This will provide clear documentation for future developers working with this configuration.

devops/linux/docker/data-config/config-reference-link-d.json (1)

116-123: LGTM! Consider adding a comment for clarity.

The addition of fieldsForKafkaKeyGen and nameFieldsForNotificationDisplay fields is well-implemented and aligns with the PR objectives. These new configuration options enhance the system's flexibility for Kafka key generation and notification display.

Consider adding a brief comment above each new field to explain their purpose and usage. This would improve the configuration file's readability and maintainability. For example:

  // Fields used for generating Kafka partition keys
  "fieldsForKafkaKeyGen": [
    "given_name",
    "family_name"
  ],
  // Fields used for displaying patient names in notifications
  "nameFieldsForNotificationDisplay": [
    "given_name",
    "family_name"
  ],
devops/linux/docker/data-config/config-reference-link-dp.json (1)

177-184: LGTM! Consider adding comments for clarity.

The addition of fieldsForKafkaKeyGen and nameFieldsForNotificationDisplay aligns well with the PR objectives. These new fields provide the necessary configuration for Kafka key generation and notification display, enhancing the system's flexibility.

For improved clarity and maintainability, consider adding comments to explain the purpose of these new fields. For example:

  // Fields used for generating Kafka partition keys
  "fieldsForKafkaKeyGen": [
    "given_name",
    "family_name"
  ],
  // Fields used for displaying names in notifications
  "nameFieldsForNotificationDisplay": [
    "given_name",
    "family_name"
  ],

This will help future developers understand the purpose and usage of these configuration options.

devops/linux/docker/data-config/config-reference-link-d-validate-dp-match-dp.json (1)

Update the rules section to utilize the new fields

The rules section does not currently utilize the newly added fields fieldsForKafkaKeyGen and nameFieldsForNotificationDisplay. Please consider updating the rules to incorporate these fields effectively. If no changes are necessary, adding a comment to explain their intended usage in relation to the existing rules would be beneficial.

🔗 Analysis chain

Line range hint 233-285: Consider updating the rules section to utilize the new fields.

While the new fields have been added successfully, the rules section remains unchanged. Consider if any updates to the rules are necessary to utilize these new fields effectively. If no changes are required, it might be helpful to add a comment explaining how these new fields are intended to be used in relation to the existing rules.

To help with the verification, you can run the following script to check for any references to the new fields in the codebase:

This script will help identify if and where the new fields are being used, and list any files that might be related to Kafka key generation or notification display.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Search for references to new fields in the codebase

echo "Searching for fieldsForKafkaKeyGen usage:"
rg --type-add 'config:*.{json,js,ts,scala,java}' -t config 'fieldsForKafkaKeyGen'

echo -e "\nSearching for nameFieldsForNotificationDisplay usage:"
rg --type-add 'config:*.{json,js,ts,scala,java}' -t config 'nameFieldsForNotificationDisplay'

echo -e "\nListing files that might be related to Kafka key generation or notification display:"
fd -e scala -e java 'Kafka|Notification'

Length of output: 6908

JeMPI_Apps/JeMPI_AsyncReceiver/src/main/java/org/jembi/jempi/async_receiver/Main.java (1)

186-208: Well-implemented partition key generation method with room for optimization.

The generateKafkaPartitionKey method is well-structured and flexible, using configurable fields and the Soundex algorithm for key generation. The @NotNull annotation and default "Unknown" value ensure consistent behavior.

Consider this optimization to reduce object creation:

-      StringBuilder partitionKey = new StringBuilder();
+      StringBuilder partitionKey = new StringBuilder(FIELDS_CONFIG.fieldsForKafkaKeyGen.size() * 4); // Assuming Soundex typically returns 4 characters
 
       for (String field : FIELDS_CONFIG.fieldsForKafkaKeyGen) {
          String value = patientRecord.fields.stream()
                                             .filter(fieldData -> field.equals(fieldData.ccTag()))
                                             .map(DemographicData.DemographicField::value)
                                             .findFirst()
                                             .orElse("");
 
          if (!value.isEmpty()) {
-            partitionKey.append(new Soundex().soundex(value));
+            new Soundex().soundex(value, partitionKey);
          }
       }
 
       if (partitionKey.length() == 0) {
          partitionKey.append("Unknown");
       }
 
-      String partitionKeyStr = partitionKey.toString();
-      LOGGER.info("Using Kafka topic/partition: {}", partitionKeyStr);
-      return partitionKeyStr;
+      LOGGER.info("Using Kafka topic/partition: {}", partitionKey);
+      return partitionKey.toString();

This optimization pre-sizes the StringBuilder and uses the Soundex method that appends directly to the StringBuilder, reducing intermediate String object creation.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Files that changed from the base of the PR and between fc2ac46 and 77380ae.

📒 Files selected for processing (12)
  • JeMPI_Apps/JeMPI_AsyncReceiver/src/main/java/org/jembi/jempi/async_receiver/Main.java (3 hunks)
  • JeMPI_Apps/JeMPI_EM_Scala/src/main/scala/org/jembi/jempi/em/config/Config.scala (1 hunks)
  • JeMPI_Apps/JeMPI_LibShared/src/main/java/org/jembi/jempi/shared/config/FieldsConfig.java (2 hunks)
  • JeMPI_Apps/JeMPI_LibShared/src/main/java/org/jembi/jempi/shared/config/input/JsonConfig.java (1 hunks)
  • JeMPI_Apps/JeMPI_LibShared/src/main/java/org/jembi/jempi/shared/config/input/NameFieldsForKafkaKeyGen.java (1 hunks)
  • JeMPI_Apps/JeMPI_LibShared/src/main/java/org/jembi/jempi/shared/models/ConfigurationModel.java (1 hunks)
  • JeMPI_Apps/JeMPI_Linker/src/main/java/org/jembi/jempi/linker/backend/LinkerDWH.java (1 hunks)
  • devops/linux/docker/data-config/config-reference-auto-generate-fields.json (1 hunks)
  • devops/linux/docker/data-config/config-reference-link-d-validate-dp-match-dp.json (1 hunks)
  • devops/linux/docker/data-config/config-reference-link-d.json (1 hunks)
  • devops/linux/docker/data-config/config-reference-link-dp-minimal.json (1 hunks)
  • devops/linux/docker/data-config/config-reference-link-dp.json (1 hunks)
🧰 Additional context used
🔇 Additional comments (9)
JeMPI_Apps/JeMPI_LibShared/src/main/java/org/jembi/jempi/shared/config/input/NameFieldsForKafkaKeyGen.java (1)

1-5: LGTM: Package declaration and imports are appropriate.

The package name follows Java naming conventions and is suitable for a shared configuration input. The imports are correctly used in the code, with no unused imports present.

devops/linux/docker/data-config/config-reference-auto-generate-fields.json (1)

138-145: LGTM! Consider adding comments for clarity.

The addition of fieldsForKafkaKeyGen and nameFieldsForNotificationDisplay arrays is well-structured and aligns with the PR objectives. These new configurations will enhance the flexibility of the system for Kafka key generation and name display in notifications.

Consider adding brief comments above each new array to explain their purpose and usage. This would improve the self-documentation of the configuration file. For example:

  // Fields used for generating Kafka partition keys
  "fieldsForKafkaKeyGen": [
    "given_name",
    "family_name"
  ],
  // Fields used for displaying patient names in notifications
  "nameFieldsForNotificationDisplay": [
    "given_name",
    "family_name"
  ],

To ensure these new fields are properly utilized in the codebase, we should verify their usage:

This will help confirm that the new configuration fields are being properly utilized in the codebase.

✅ Verification successful

Verification Successful! All added configuration fields are properly utilized within the codebase.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Verify the usage of new configuration fields in the codebase

# Test: Search for usage of fieldsForKafkaKeyGen
echo "Searching for fieldsForKafkaKeyGen usage:"
rg --type-add 'code:*.{java,scala}' --type code 'fieldsForKafkaKeyGen'

# Test: Search for usage of nameFieldsForNotificationDisplay
echo "Searching for nameFieldsForNotificationDisplay usage:"
rg --type-add 'code:*.{java,scala}' --type code 'nameFieldsForNotificationDisplay'

Length of output: 2620

devops/linux/docker/data-config/config-reference-link-dp-minimal.json (1)

192-199: New configuration fields added for Kafka key generation and notification display

The addition of fieldsForKafkaKeyGen and nameFieldsForNotificationDisplay enhances the configurability of the system. These fields appear to serve the following purposes:

  1. fieldsForKafkaKeyGen: Specifies which fields (given_name and family_name) should be used to generate Kafka message keys.
  2. nameFieldsForNotificationDisplay: Defines which name fields should be used when displaying patient names in notifications.

The changes look good and are consistent with the existing configuration structure. However, consider the following:

To ensure these changes are properly integrated and documented:

  1. Verify that the code using these configuration fields handles them correctly:

  2. Check if there's any documentation that needs to be updated:

Consider the following architectural implications:

  1. Ensure that changing these fields doesn't break existing Kafka consumers that may depend on the current key generation method.
  2. Update any relevant documentation or README files to explain these new configuration options and their impact on the system.
  3. If these fields are meant to be user-configurable, ensure that the deployment process and user documentation are updated accordingly.
devops/linux/docker/data-config/config-reference-link-d-validate-dp-match-dp.json (1)

225-232: LGTM: New configuration fields added successfully.

The addition of fieldsForKafkaKeyGen and nameFieldsForNotificationDisplay arrays is in line with the PR objectives. These new fields provide the necessary configuration for Kafka key generation and notification display, enhancing the system's flexibility.

JeMPI_Apps/JeMPI_LibShared/src/main/java/org/jembi/jempi/shared/config/FieldsConfig.java (4)

34-35: LGTM: New fields added for Kafka key generation and notification display.

The addition of fieldsForKafkaKeyGen and nameFieldsForNotificationDisplay as public final fields is consistent with the class design and naming conventions. These fields will enhance the configurability of the system for Kafka key generation and name display in notifications.


93-96: LGTM: Robust initialization of fieldsForKafkaKeyGen.

The initialization logic for fieldsForKafkaKeyGen is well-implemented:

  • It handles potential null values from the config.
  • Uses List.copyOf() to ensure immutability.
  • Provides an empty immutable list as a fallback.

This implementation addresses the suggestions from the previous review and follows best practices for null safety and immutability.


97-99: LGTM: Consistent and robust initialization of nameFieldsForNotificationDisplay.

The initialization logic for nameFieldsForNotificationDisplay is well-implemented and consistent with fieldsForKafkaKeyGen:

  • It handles potential null values from the config.
  • Uses List.copyOf() to ensure immutability.
  • Provides an empty immutable list as a fallback.

This consistency in implementation across similar fields is commendable and promotes code maintainability.


Line range hint 34-99: Overall assessment: Well-implemented changes that meet the PR objectives.

The additions to the FieldsConfig class effectively implement new configuration fields for Kafka key generation and name fields for notification display. The changes are consistent, follow best practices for null safety and immutability, and align well with the existing code structure. These enhancements will improve the system's configurability as intended.

JeMPI_Apps/JeMPI_AsyncReceiver/src/main/java/org/jembi/jempi/async_receiver/Main.java (1)

157-157: Improved code organization with extracted method.

The change to use generateKafkaPartitionKey(patientRecord) instead of inline logic enhances code readability and maintainability. This extraction allows for easier updates to the partition key generation logic in the future.

Copy link
Collaborator

@MatthewErispe MatthewErispe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. I made a few enhancements to the change for more flexibility and performance

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Outside diff range and nitpick comments (1)
JeMPI_Apps/JeMPI_Linker/src/main/java/org/jembi/jempi/linker/backend/LinkerDWH.java (1)

Line range hint 368-371: Potential omission of scores equal to minThreshold_ in notification conditions

The condition v.score() > minThreshold_ excludes scores exactly equal to minThreshold_. To include these scores, consider changing the condition to v.score() >= minThreshold_.

Apply this diff to include scores equal to minThreshold_:

-                if (v.score() > minThreshold_ && v.score() < matchThreshold_) {
+                if (v.score() >= minThreshold_ && v.score() < matchThreshold_) {
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Files that changed from the base of the PR and between 77380ae and 5de2658.

📒 Files selected for processing (1)
  • JeMPI_Apps/JeMPI_Linker/src/main/java/org/jembi/jempi/linker/backend/LinkerDWH.java (1 hunks)
🧰 Additional context used
🔇 Additional comments (5)
JeMPI_Apps/JeMPI_Linker/src/main/java/org/jembi/jempi/linker/backend/LinkerDWH.java (5)

98-108: Improved patient name construction using configurable fields

The patientName method has been refactored to use configurable fields from FIELDS_CONFIG.nameFieldsForNotificationDisplay. This increases flexibility and maintainability by allowing the display name to adapt to different configurations.


Line range hint 251-259: Updated linkInteraction method signature to include threshold parameters

The linkInteraction method now includes minThreshold_, maxThreshold_, and marginWindowSize_ parameters, enhancing the configurability of the linking process.


Line range hint 346-354: Verify the correctness of threshold ordering in confusion matrix assignment

Ensure that the thresholds maxThreshold_, matchThreshold_, and minThreshold_ are correctly ordered and that the logic for assigning confusionMatrix reflects the intended behavior.


Line range hint 386-395: Ensure consistency in threshold comparisons

In the condition linkToGoldenId.score() <= maxThreshold_, verify that the use of <= is consistent with how thresholds are applied elsewhere in the code. Consistent comparison operators help prevent unintended behavior at threshold boundaries.


Line range hint 404-412: Confirm logic for margin candidate selection

The condition firstCandidate.score - candidate.score <= marginWindowSize_ determines which candidates are within the margin window. Please verify that subtracting candidate.score from firstCandidate.score correctly identifies candidates that are within the desired margin.

@MatthewErispe MatthewErispe merged commit ad7083b into dev Oct 15, 2024
6 checks passed
@MatthewErispe MatthewErispe deleted the CU-86c083kqv_Create-Config-Flag-for-Kafka-Key branch October 15, 2024 11:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants