Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

chore(server): item import performance enhancement #1319

Merged
merged 2 commits into from
Nov 22, 2024

Conversation

yk-eukarya
Copy link
Contributor

@yk-eukarya yk-eukarya commented Nov 21, 2024

Summary by CodeRabbit

Release Notes

  • New Features

    • Introduced SaveAll methods across multiple components, allowing for batch saving of items and threads.
    • Added NewThreads function for creating multiple thread documents from a list.
    • Enhanced item import functionality with better error handling and performance improvements.
  • Bug Fixes

    • Improved error handling in various methods to ensure appropriate responses for invalid operations.
  • Documentation

    • Updated type aliases and method signatures for clarity and consistency across the codebase.

These updates enhance the overall efficiency and usability of the application, particularly in managing multiple items and threads simultaneously.

@yk-eukarya yk-eukarya self-assigned this Nov 21, 2024
Copy link

coderabbitai bot commented Nov 21, 2024

Walkthrough

The pull request introduces several modifications across multiple files, primarily focusing on adding a SaveAll method to various structs and interfaces, enabling batch operations for saving items and threads. The changes also include updates to existing methods to mark unused parameters and streamline context handling. Additionally, there are enhancements in item import logic and identifier usage throughout the codebase, improving consistency and performance. New type aliases and methods are introduced to facilitate better management of item and thread identifiers.

Changes

File Path Change Summary
server/internal/infrastructure/memory/item.go Added SaveAll method; marked ctx as unused in multiple methods; updated Item struct methods.
server/internal/infrastructure/memory/memorygit/memory.go Added SaveAll method to VersionedSyncMap for batch saving.
server/internal/infrastructure/memory/thread.go Added SaveAll method to Thread struct for batch saving threads.
server/internal/infrastructure/mongo/item.go Added SaveAll method to Item struct for saving multiple items.
server/internal/infrastructure/mongo/mongodoc/thread.go Added NewThreads function for creating multiple thread documents.
server/internal/infrastructure/mongo/mongogit/collection.go Added SaveAll method to Collection struct for saving multiple documents.
server/internal/infrastructure/mongo/thread.go Added SaveAll method to ThreadRepo for batch saving threads.
server/internal/usecase/interactor/item.go Refactored Import method for better item handling; improved error handling.
server/internal/usecase/repo/Item.go Added SaveAll method to Item interface.
server/internal/usecase/repo/thread.go Added SaveAll method to Thread interface.
server/pkg/item/id.go Added type alias IDList for id.ItemIDList.
server/pkg/item/list.go Updated identifier types in methods; added IDs method to extract IDs.
server/pkg/thread/list.go Added IDs and ToMap methods to List type for threads.

Possibly related PRs

  • refactor(server): geojson and csv export #1214: The changes in this PR involve modifications to the item processing functions, which may relate to the handling of items in the main PR, particularly in the context of exporting items to different formats.
  • feat(server): implement public api for assets #1277: This PR introduces new functionality for managing assets, which could be relevant to the Item struct and its methods in the main PR, especially regarding how items and assets are processed and retrieved.

🐇 "In the code where items dwell,
A new SaveAll casts its spell.
Threads and items, all in line,
Batch saving now, oh how divine!
With identifiers clear and bright,
Our code hops forward, pure delight!" 🐇✨


Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link

netlify bot commented Nov 21, 2024

Deploy Preview for reearth-cms canceled.

Name Link
🔨 Latest commit 9edd673
🔍 Latest deploy log https://app.netlify.com/sites/reearth-cms/deploys/673f79483e0fe50008b6aa33

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

🧹 Outside diff range and nitpick comments (22)
server/internal/usecase/repo/thread.go (1)

18-18: Add documentation for the SaveAll method.

The addition of SaveAll method is a good performance enhancement for batch operations. Consider adding a documentation comment to specify:

  • Expected behavior for partial failures
  • Any atomicity guarantees
  • Performance characteristics

Example documentation:

+// SaveAll saves multiple threads in a batch operation.
+// It may provide better performance than individual Save operations.
+// Implementation-specific atomicity guarantees may apply.
 SaveAll(context.Context, thread.List) error
server/pkg/thread/list.go (1)

23-29: LGTM! Excellent performance optimization additions.

The new methods IDs() and ToMap() are well-implemented and provide valuable performance optimizations:

  • IDs() efficiently extracts thread IDs using util.Map
  • ToMap() creates a hash map for O(1) lookups, which can significantly improve performance when searching for threads by ID

These additions align well with the PR's performance enhancement objectives.

Consider using these methods in scenarios where:

  • You need to check thread existence
  • You need to perform bulk operations on threads
  • You need to cross-reference threads with other entities
server/pkg/item/list.go (2)

28-32: Consider adding a documentation comment

While the implementation is correct, consider adding a documentation comment to explain the purpose of this method, especially since it's a new addition.

+// IDs returns a list of item IDs from the item list
 func (l List) IDs() IDList {

Line range hint 49-59: Consider optimizing item lookup performance

The current implementation uses a linear search (O(n)) to find items. For better performance, especially with large lists, consider:

  1. Using a map-based lookup (O(1)) similar to ToMap
  2. Or maintaining an internal index map
+// Item returns a versioned item by ID using linear search
 func (l VersionedList) Item(iid ID) Versioned {
     if l == nil {
         return nil
     }
+    // TODO: Consider using map-based lookup for better performance
     for _, versioned := range l {
server/internal/infrastructure/memory/thread.go (4)

38-41: Add context cancellation check

The method accepts a context parameter but doesn't utilize it. Consider adding a context cancellation check before proceeding with the operation.

 func (r *Thread) SaveAll(_ context.Context, th thread.List) error {
+	if err := ctx.Err(); err != nil {
+		return fmt.Errorf("thread save all: %w", err)
+	}
 	if r.err != nil {
 		return r.err
 	}

43-47: Consider batching the permission checks

The current implementation validates all threads' permissions before saving any. For large lists, this could be memory-intensive and might delay the start of actual saving operations.

Consider processing threads in batches to optimize memory usage and enable partial progress. Example approach:

const batchSize = 1000
for i := 0; i < len(th); i += batchSize {
    end := i + batchSize
    if end > len(th) {
        end = len(th)
    }
    batch := th[i:end]
    
    // Process batch
    for _, t := range batch {
        if !r.f.CanWrite(t.Workspace()) {
            return repo.ErrOperationDenied
        }
    }
    r.data.StoreAll(batch.ToMap())
}

48-50: Improve error handling for StoreAll operation

The StoreAll operation's error handling could be more specific to help with debugging.

-	r.data.StoreAll(th.ToMap())
-	return nil
+	if err := r.data.StoreAll(th.ToMap()); err != nil {
+		return fmt.Errorf("failed to store threads: %w", err)
+	}
+	return nil

38-50: Implementation aligns with performance enhancement goals

The new SaveAll method supports batch operations, which is beneficial for item import performance. However, consider the following performance optimizations:

  1. Batch processing for large datasets
  2. Parallel processing for permission checks
  3. Monitoring and metrics for batch operations

These enhancements would further support the PR's performance objectives.

Would you like assistance in implementing any of these optimizations?

server/internal/usecase/repo/Item.go (1)

35-35: LGTM! Well-designed batch operation interface.

The addition of SaveAll method is a good architectural decision for improving item import performance by enabling batch operations.

Consider these implementation guidelines for concrete types:

  • Ensure atomic transaction handling for the batch operation
  • Implement proper error collection/aggregation
  • Consider adding batch size limits to prevent memory issues
server/internal/infrastructure/mongo/mongodoc/thread.go (1)

43-43: Add documentation for the new function

Consider adding a documentation comment explaining the function's purpose, parameters, and return values. This will help other developers understand its role in batch processing.

+// NewThreads converts a list of threads into MongoDB documents and their corresponding IDs.
+// It safely handles nil entries by skipping them.
+// Returns a slice of ThreadDocument and a slice of their string IDs.
 func NewThreads(a thread.List) ([]ThreadDocument, []string) {
server/internal/infrastructure/mongo/thread.go (2)

43-54: LGTM: Well-structured batch save implementation

The implementation follows good practices:

  • Early validation of input and permissions
  • Efficient batch processing
  • Consistent with single-item save behavior

Consider adding performance monitoring for large batches to help tune the operation.


43-54: Consider adding transaction support

For data consistency, consider wrapping the batch operation in a transaction. This ensures atomicity - either all threads are saved or none are.

Example enhancement:

 func (r *ThreadRepo) SaveAll(ctx context.Context, threads thread.List) error {
     if len(threads) == 0 {
         return nil
     }
     for _, t := range threads {
         if !r.f.CanWrite(t.Workspace()) {
             return repo.ErrOperationDenied
         }
     }
+    return r.client.WithTransaction(ctx, func(sessCtx context.Context) error {
         docs, ids := mongodoc.NewThreads(threads)
-        return r.client.SaveAll(ctx, ids, lo.ToAnySlice(docs))
+        return r.client.SaveAll(sessCtx, ids, lo.ToAnySlice(docs))
+    })
 }
server/internal/infrastructure/memory/memorygit/memory.go (2)

78-88: Implementation looks correct but could be optimized for batch operations.

The method correctly handles input validation and maintains consistency with the existing SaveOne functionality. However, given this is part of a performance enhancement PR, consider these improvements:

  1. The current implementation makes individual SaveOne calls, which might not be optimal for large batches.
  2. Consider adding concurrent processing for better performance with large datasets.

Here's a suggested optimization using goroutines with a worker pool:

 func (m *VersionedSyncMap[K, V]) SaveAll(key []K, value []V, parent []*version.VersionOrRef) {
 	if len(key) != len(value) || (parent != nil && len(key) != len(parent)) {
 		return
 	}
 	if len(key) == 0 {
 		return
 	}
-	for i := 0; i < len(key); i++ {
-		m.SaveOne(key[i], value[i], parent[i])
-	}
+	// Use a worker pool for large datasets
+	if len(key) > 1000 {
+		const workers = 4
+		var wg sync.WaitGroup
+		ch := make(chan int, len(key))
+		
+		// Start workers
+		for w := 0; w < workers; w++ {
+			wg.Add(1)
+			go func() {
+				defer wg.Done()
+				for i := range ch {
+					m.SaveOne(key[i], value[i], parent[i])
+				}
+			}()
+		}
+		
+		// Send work
+		for i := 0; i < len(key); i++ {
+			ch <- i
+		}
+		close(ch)
+		wg.Wait()
+	} else {
+		// Use simple loop for small datasets
+		for i := 0; i < len(key); i++ {
+			m.SaveOne(key[i], value[i], parent[i])
+		}
+	}
 }

79-81: Consider providing more descriptive error handling.

The current implementation silently returns when input slices have inconsistent lengths. Consider adding error returns to help callers understand why the operation failed.

-func (m *VersionedSyncMap[K, V]) SaveAll(key []K, value []V, parent []*version.VersionOrRef) {
+func (m *VersionedSyncMap[K, V]) SaveAll(key []K, value []V, parent []*version.VersionOrRef) error {
 	if len(key) != len(value) || (parent != nil && len(key) != len(parent)) {
-		return
+		return fmt.Errorf("inconsistent input lengths: keys=%d, values=%d, parents=%d", 
+			len(key), len(value), len(parent))
 	}
server/internal/infrastructure/mongo/mongogit/collection.go (1)

110-112: Enhance input validation

The current validation checks for length mismatches but could be more comprehensive:

  1. Check for nil/empty IDs and docs
  2. Validate individual IDs and documents for correctness

Consider adding these validations:

+    if ids == nil || docs == nil {
+        return rerror.ErrInvalidParams
+    }
+    
+    // Validate individual items
+    for i, id := range ids {
+        if id == "" {
+            return rerror.ErrInvalidParams
+        }
+        if docs[i] == nil {
+            return rerror.ErrInvalidParams
+        }
+    }
server/internal/infrastructure/memory/item.go (1)

178-190: Optimize permission checks and consider atomic operations

The implementation has two potential areas for improvement:

  1. Permission checks could be optimized by grouping items by project and checking permissions once per project, rather than per item.
  2. Consider implementing transaction-like behavior to ensure all-or-nothing semantics for batch operations.

Here's a suggested optimization for permission checks:

 func (r *Item) SaveAll(_ context.Context, il item.List) error {
 	if r.err != nil {
 		return r.err
 	}

-	for _, t := range il {
-		if !r.f.CanWrite(t.Project()) {
-			return repo.ErrOperationDenied
-		}
-	}
+	// Group items by project and check permissions once per project
+	projectMap := make(map[id.ProjectID]struct{})
+	for _, t := range il {
+		projectMap[t.Project()] = struct{}{}
+	}
+	for projectID := range projectMap {
+		if !r.f.CanWrite(projectID) {
+			return repo.ErrOperationDenied
+		}
+	}
+
 	r.data.SaveAll(il.IDs(), il, nil)
 	return nil
 }
server/internal/infrastructure/mongo/item.go (3)

237-241: Consider optimizing permission checks for large batches

The current implementation checks permissions sequentially for each item. For large batches, this could be optimized.

Consider this optimization:

-	for _, itm := range items {
-		if !r.f.CanWrite(itm.Project()) {
-			return repo.ErrOperationDenied
-		}
-	}
+	// Group items by project for efficient permission checking
+	projects := make(map[string]struct{})
+	for _, itm := range items {
+		projects[itm.Project().String()] = struct{}{}
+	}
+	for projectID := range projects {
+		if !r.f.CanWrite(id.ProjectIDFrom(projectID)) {
+			return repo.ErrOperationDenied
+		}
+	}

232-232: Add method documentation

Consider adding documentation to describe the method's purpose, parameters, and behavior.

Add a comment block like:

+// SaveAll saves multiple items in a single batch operation.
+// It returns ErrOperationDenied if any item's project is not writable.
+// Returns nil if the input list is empty.
 func (r *Item) SaveAll(ctx context.Context, items item.List) error {

232-244: Consider adding batch size limits

Large batches could impact memory usage. Consider implementing batch size limits for better resource management.

Suggestions:

  1. Add a constant for maximum batch size (e.g., maxBatchSize = 1000)
  2. Split large batches into smaller chunks
  3. Process chunks sequentially or with controlled concurrency

Would you like me to provide a detailed implementation for batch size limiting?

server/internal/usecase/interactor/item.go (3)

379-389: Consider checking for empty itemsIds before database call

If itemsIds is empty, calling FindByIDs might result in an unnecessary database query. Consider adding a condition to check if itemsIds is not empty before making the call to optimize performance.

Apply this diff to add the conditional check:

+ if len(itemsIds) > 0 {
    oldItems, err := i.repos.Item.FindByIDs(ctx, itemsIds, nil)
    if err != nil {
        return interfaces.ImportItemsResponse{}, err
    }
+ }

578-583: Consider reusing saved items to avoid refetching

After saving items, refetching them from the database may be unnecessary. Consider reusing the items from itemsToSave to avoid extra database queries and enhance performance.


1051-1053: Address the TODO: Enhance performance using schema package

The TODO comment suggests improving performance by utilizing the schema package. Consider implementing this enhancement to optimize handleGroupFields.

Would you like assistance in implementing this performance improvement or creating a GitHub issue to track it?

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between 44e60cb and ab13e42.

📒 Files selected for processing (13)
  • server/internal/infrastructure/memory/item.go (5 hunks)
  • server/internal/infrastructure/memory/memorygit/memory.go (1 hunks)
  • server/internal/infrastructure/memory/thread.go (1 hunks)
  • server/internal/infrastructure/mongo/item.go (2 hunks)
  • server/internal/infrastructure/mongo/mongodoc/thread.go (1 hunks)
  • server/internal/infrastructure/mongo/mongogit/collection.go (1 hunks)
  • server/internal/infrastructure/mongo/thread.go (2 hunks)
  • server/internal/usecase/interactor/item.go (8 hunks)
  • server/internal/usecase/repo/Item.go (1 hunks)
  • server/internal/usecase/repo/thread.go (1 hunks)
  • server/pkg/item/id.go (1 hunks)
  • server/pkg/item/list.go (3 hunks)
  • server/pkg/thread/list.go (2 hunks)
🔇 Additional comments (26)
server/pkg/thread/list.go (1)

23-29: Verify usage patterns for optimal performance gains.

Let's verify how these new methods are being utilized in the codebase, particularly in import-related code.

✅ Verification successful

Based on the output, I can now generate the final response:

New methods are being utilized effectively across the codebase

The IDs() method is widely used across multiple components (items, assets, schemas, etc.) for efficient ID collection, particularly in:

  • Request processing and item lookups
  • Asset reference handling
  • Authorization checks
  • Integration operations

The ToMap() method is effectively used in:

  • Request processing for version comparison (latestItemsMap)
  • Thread management in memory storage
  • Item version mapping
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Check usage patterns of the new methods and identify potential optimization opportunities

# Check for direct usage of these new methods
echo "Checking direct usage of new methods..."
rg -A 3 "IDs\(\)" --type go
rg -A 3 "ToMap\(\)" --type go

# Look for potential optimization opportunities
echo "Checking for potential optimization opportunities..."
# Find places where we might be doing manual ID extraction or map creation
rg -A 5 "for\s+.*,\s+thread\s+:=\s+range" --type go
rg -A 5 "map\[thread\.ID\]" --type go

Length of output: 10513

server/pkg/item/id.go (2)

9-9: LGTM! The new IDList type alias is well-placed and follows the established pattern.

The addition of IDList type alias is consistent with the file's existing type definitions and will support the batch operations being introduced.


9-9: Verify the usage of IDList type alias

Let's ensure this new type alias is being properly utilized in the batch operations mentioned in the summary.

✅ Verification successful

Type alias IDList is correctly used for batch operations

The type alias IDList = id.ItemIDList is being properly utilized throughout the codebase, particularly in repository interfaces and implementations. The search results show consistent usage in:

  • Repository interfaces: FindByIDs(context.Context, id.ItemIDList, *version.Ref) (item.VersionedList, error)
  • Batch operations: FindAllVersionsByIDs, FindByAssets, etc.
  • Test cases validating the functionality
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Check the usage of the new IDList type alias in the codebase
# Expected: Find usages in SaveAll methods and other batch operations

# Search for direct usage of IDList
echo "=== Searching for IDList usage ==="
rg "IDList" --type go

# Search for SaveAll methods that might use this type
echo -e "\n=== Searching for SaveAll methods ==="
ast-grep --pattern 'func $$$SaveAll($$$) $$$'

Length of output: 44461

server/pkg/item/list.go (4)

10-14: LGTM! Type change is consistent

The implementation correctly uses lo.Filter and maintains the existing logic while updating to the new type system.


16-20: LGTM! Type change is consistent

The implementation correctly uses lo.Map and maintains the existing logic while updating to the new type system.


22-26: LGTM! Type change is consistent

The implementation correctly uses lo.Find and maintains the existing logic while updating to the new type system.


Line range hint 61-67: LGTM! Efficient implementation with pre-allocated capacity

The implementation correctly creates a map with pre-allocated capacity and maintains the existing logic while updating to the new type system.

server/internal/usecase/repo/Item.go (1)

35-35: Verify implementations of the SaveAll method.

Let's ensure all implementations of the Item interface have been updated with the new method.

✅ Verification successful

Let me correct the regex syntax and try a different approach to find the implementations.


SaveAll method is properly implemented across the codebase

The verification shows that the SaveAll method is correctly implemented in both memory and MongoDB implementations of the Item interface:

  • server/internal/infrastructure/mongo/item.go: Has SaveAll(ctx context.Context, items item.List) error
  • server/internal/infrastructure/memory/item.go: Has SaveAll(_ context.Context, il item.List) error

Both implementations follow the interface signature and handle empty lists appropriately.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Find all implementations of the Item interface and verify SaveAll method

# Find potential implementation files
echo "=== Searching for Item interface implementations ==="
rg --type go -l "struct.*Item.*{" server/internal/infrastructure/

# Find SaveAll method implementations
echo -e "\n=== Searching for SaveAll implementations ==="
ast-grep --pattern 'func ($RECV *$TYPE) SaveAll(context.Context, item.List) error {
  $$$
}'

Length of output: 478


Script:

#!/bin/bash
# Search for Item interface implementations with fixed regex
echo "=== Searching for Item interface implementations ==="
rg --type go "type \w+Item struct" server/internal/infrastructure/

# Search for SaveAll method implementations
echo -e "\n=== Searching for SaveAll method implementations ==="
rg --type go -A 2 "func \(.*\) SaveAll\(" server/internal/infrastructure/

Length of output: 4254

server/internal/infrastructure/mongo/mongodoc/thread.go (2)

43-55: Implementation looks good with performance considerations in place!

The implementation includes several good performance practices:

  • Pre-allocation of result slices with proper capacity
  • Reuse of existing NewThread function for consistency
  • Proper nil checks to prevent panics
  • Enables batch processing which aligns with the PR's performance enhancement goals

43-55: Verify the integration with item import flow

Let's verify how this batch conversion function is being utilized in the item import flow to ensure it contributes to the performance enhancement.

✅ Verification successful

From analyzing the code, I can see that the NewThreads batch conversion function is being used in the thread repository's SaveAll method, which is called from the item import interactor (server/internal/usecase/interactor/item.go). This shows that the batch conversion function is indeed integrated with the item import flow for performance optimization. The code shows:

  1. The thread repository implements batch saving using SaveAll
  2. The item import interactor uses this batch functionality: if err := i.repos.Thread.SaveAll(ctx, threadsToSave)
  3. The batch conversion function NewThreads properly handles nil checks and efficiently pre-allocates slices

Integration with item import flow verified

The batch conversion function is correctly integrated into the item import flow and contributes to performance optimization by enabling bulk operations during item imports.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Search for usage of NewThreads in the item import related code
rg -l "NewThreads.*thread\.List" | xargs rg "import.*item"

# Look for any performance-related changes in files that use this function
rg -l "NewThreads.*thread\.List" | xargs rg "batch|bulk|performance"

Length of output: 131


Script:

#!/bin/bash
# First, let's find files that use NewThreads function
rg -l "NewThreads"

# Then, let's look at the actual usage context with surrounding lines
rg -B 5 -A 5 "NewThreads"

# Let's also check for any thread-related batch operations
rg -B 3 -A 3 "thread.*batch|batch.*thread"

# Check for any performance-related comments or documentation
fd README.md | xargs rg -i "performance|batch|bulk"

Length of output: 1899


Script:

#!/bin/bash
# Let's check the mongo thread repository implementation to understand the context
cat server/internal/infrastructure/mongo/thread.go

# Let's also check if there are any recent changes or commits related to batch operations
rg -B 3 -A 3 "SaveAll|BulkWrite|InsertMany"

# Let's look for any item import related code that might be using threads
rg -l "import.*thread" | xargs rg "import.*item"

Length of output: 34158

server/internal/infrastructure/mongo/thread.go (1)

12-12: LGTM: Appropriate use of lo package

The lo package is a good choice for slice operations and type conversions.

server/internal/infrastructure/memory/memorygit/memory.go (1)

78-88: Verify usage patterns and memory implications.

Since this is part of a performance enhancement PR, let's verify how this method will be used.

✅ Verification successful

Batch operations are well-structured and safely implemented

The verification shows that SaveAll is consistently used across multiple repository implementations (mongo, memory) with proper safety checks:

  • All implementations verify list lengths before processing
  • Used for domain entities: items, threads, models, groups, views, and requests
  • Batch sizes are controlled by the business logic layer, not arbitrary client input
  • Memory implementation uses VersionedSyncMap which has proper validation

The changes to VersionedSyncMap.SaveAll align with the established pattern across the codebase where:

  1. Input validation ensures key-value-parent arrays have matching lengths
  2. Empty lists are handled gracefully
  3. Individual saves are performed in a controlled loop
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Check usage patterns of SaveAll method and potential batch sizes

# Find all calls to SaveAll to understand typical batch sizes
rg -A 5 "SaveAll\(" --type go

# Look for any existing batch processing patterns
ast-grep --pattern 'for $_ := range $_ {
  $$$
  Save($$$)
  $$$
}'

Length of output: 19080

server/internal/infrastructure/mongo/mongogit/collection.go (1)

108-127: 🛠️ Refactor suggestion

Optimize batch operations for better performance

The current implementation iterates through items sequentially, which may not be optimal for large batch operations. Consider the following improvements:

  1. The TODO comment correctly identifies the need for bulk write operations
  2. The method could benefit from concurrent processing for better performance
  3. Error handling could be enhanced to collect all errors instead of stopping at the first one

Here's a suggested implementation using MongoDB's bulk write operations:

 func (c *Collection) SaveAll(ctx context.Context, ids []string, docs []any, parents []*version.VersionOrRef) error {
-    // TODO: optimize to use bulk write
     if len(ids) != len(docs) || (parents != nil && len(ids) != len(parents)) {
         return rerror.ErrInvalidParams
     }
     if len(ids) == 0 {
         return nil
     }
+
+    // Check for archived documents first
+    for _, id := range ids {
+        if archived, err := c.IsArchived(ctx, bson.M{"id": id}); err != nil {
+            return err
+        } else if archived {
+            return version.ErrArchived
+        }
+    }
+
+    // Prepare bulk write models
+    models := make([]mongo.WriteModel, 0, len(ids))
+    for i := 0; i < len(ids); i++ {
+        actualVr := version.Latest.OrVersion()
+        if parents != nil {
+            actualVr = lo.FromPtrOr(parents[i], version.Latest.OrVersion())
+        }
+
+        meta, err := c.meta(ctx, ids[i], actualVr.Ref())
+        if err != nil {
+            return err
+        }
+
+        var refs []version.Ref
+        actualVr.Match(nil, func(r version.Ref) { refs = []version.Ref{r} })
+        newmeta := Meta{
+            ObjectID: primitive.NewObjectIDFromTimestamp(util.Now()),
+            Version:  version.New(),
+            Refs:     refs,
+        }
+
+        if meta == nil {
+            if !actualVr.IsRef(version.Latest) {
+                return rerror.ErrNotFound
+            }
+        } else {
+            newmeta.Parents = []version.Version{meta.Version}
+        }
+
+        model := mongo.NewInsertOneModel().SetDocument(&Document[any]{
+            Data: docs[i],
+            Meta: newmeta,
+        })
+        models = append(models, model)
+    }
+
+    // Execute bulk write
+    if _, err := c.client.Client().BulkWrite(ctx, models); err != nil {
+        return rerror.ErrInternalBy(err)
+    }
+
+    // Update refs
+    for i := 0; i < len(ids); i++ {
+        actualVr := version.Latest.OrVersion()
+        if parents != nil {
+            actualVr = lo.FromPtrOr(parents[i], version.Latest.OrVersion())
+        }
+        if err := version.MatchVersionOrRef(actualVr, nil, func(r version.Ref) error {
+            return c.UpdateRef(ctx, ids[i], r, nil)
+        }); err != nil {
+            return err
+        }
+    }
+
     return nil
 }

Let's verify the potential performance impact:

server/internal/infrastructure/memory/item.go (1)

27-31: LGTM: Clean constructor implementation

The constructor follows Go idioms and properly initializes the required fields.

server/internal/infrastructure/mongo/item.go (2)

16-16: LGTM: Good choice of utility library

The addition of the github.com/samber/lo package is appropriate for handling slice type conversions in batch operations.


232-244: LGTM: Well-structured batch save implementation

The implementation follows good practices with early returns, proper error handling, and batch operations.

server/internal/usecase/interactor/item.go (10)

369-373: LGTM: Project retrieval and error handling

The code correctly retrieves the project using s.Project() and handles errors appropriately.


395-397: LGTM: Initialization for batch operations

Initializing threadsToSave and itemsToSave prepares for efficient batch saving of threads and items.


398-403: LGTM: Struct for tracking item changes

Defining itemChanges and initializing itemsEvent enhances tracking of item updates during the import process.


479-479: LGTM: Append thread to batch save list

Appending the new thread to threadsToSave correctly prepares it for batch saving.


531-531: LGTM: Append metadata item to batch save list

Adding the metadata item to itemsToSave ensures it will be saved along with other items.


548-550: LGTM: Handling group fields with error checking

Group fields are processed using handleGroupFields, and errors are appropriately handled.


559-559: LGTM: Append item to batch save list

Appending the item to itemsToSave prepares it for efficient batch saving.


564-568: LGTM: Recording item changes for event processing

Storing oldFields and action in itemsEvent aids in generating accurate events after import.


570-572: LGTM: Batch saving threads with error handling

Threads are saved in batch using SaveAll, and errors are properly handled.


574-577: LGTM: Batch saving items with error handling

Items are saved in batch using SaveAll, improving performance and ensuring error handling.

server/internal/usecase/repo/thread.go Show resolved Hide resolved
server/internal/infrastructure/memory/item.go Show resolved Hide resolved
server/internal/usecase/interactor/item.go Show resolved Hide resolved
server/internal/usecase/interactor/item.go Show resolved Hide resolved
server/internal/usecase/interactor/item.go Show resolved Hide resolved
@yk-eukarya yk-eukarya merged commit c817481 into main Nov 22, 2024
21 checks passed
@yk-eukarya yk-eukarya deleted the chore/item-import-performance branch November 22, 2024 09:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant