Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(websocket): Add workspace and room context to document paths #957

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

kasugamirai
Copy link
Member

@kasugamirai kasugamirai commented Mar 3, 2025

Overview

What I've done

What I haven't done

How I tested

Screenshot

Which point I want you to review particularly

Memo

Summary by CodeRabbit

  • New Features
    • Enhanced WebSocket endpoints now follow a structured URL path that includes both workspace and project identifiers, supporting improved connection specificity.
    • Collaborative editing now leverages workspace context for a more integrated real-time experience.
  • Bug Fixes
    • Strengthened request validation and improved error messages to ensure only correctly formatted connection paths are processed.

Copy link
Contributor

coderabbitai bot commented Mar 3, 2025

Walkthrough

This pull request introduces updates across several components handling WebSocket connections and document storage. A new UUID dependency is added to generate random identifiers. Parameter names in functions have been updated from doc_id to doc_path, and the WebSocket routing is restructured to include a room_id in addition to the document identifier. New functions for extracting and generating Google Cloud Storage paths have been added, and UI components now incorporate an optional workspaceId parameter. These changes adjust type signatures, improve path validation, and streamline identifier handling without altering underlying business logic.

Changes

File(s) Change Summary
server/websocket/Cargo.toml Added dependency: uuid = { version = "1.7.0", features = ["v4"] }.
server/websocket/src/broadcast/pool.rs Renamed parameter from doc_id to doc_path in get_or_create_group and remove_connection methods.
server/websocket/src/server.rs Updated WebSocket route from /{doc_id} to /workspaces/{room_id}/projects/{doc_id}; modified handler signatures to accept a tuple (room_id, doc_id).
server/websocket/src/storage/gcs/mod.rs Added new functions extract_ids_from_path and generate_gcs_path; modified GCS operations to use these functions and improved error handling.
server/websocket/src/ws.rs Changed parameter from doc_id to doc_path in ws_handler; introduced path validation, splitting, and normalization logic.
ui/src/lib/yjs/useYjsSetup.ts
ui/src/routes/workspaces.$workspaceId_.projects_.$projectId.lazy.tsx
Added optional workspaceId parameter; updated parameter destructuring and connection path generation in UI components.

Sequence Diagram(s)

sequenceDiagram
    participant Client
    participant Router
    participant WSHandler
    participant BroadcastPool
    participant GcsStore

    Client->>Router: Sends WebSocket connection request to /workspaces/{room_id}/projects/{doc_id}
    Router->>WSHandler: Extracts (room_id, doc_id) from path
    WSHandler->>BroadcastPool: Call get_or_create_group(normalized_path)
    BroadcastPool-->>WSHandler: Returns BroadcastGroup
    WSHandler->>GcsStore: Invoke document operations using generated GCS path
Loading

Possibly related PRs

Suggested labels

websocket, api

Suggested reviewers

  • pyshx

Poem

I'm a hopping little rabbit,
Coding in the moonlit night.
Paths and IDs all aligned,
New routes shining ever so bright.
Cheers to clean code and a joyful byte!
🐰✨

✨ Finishing Touches
  • 📝 Generate Docstrings

Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

@kasugamirai
Copy link
Member Author

wip

Copy link

netlify bot commented Mar 3, 2025

Deploy Preview for reearth-flow ready!

Name Link
🔨 Latest commit 2ee3ddf
🔍 Latest deploy log https://app.netlify.com/sites/reearth-flow/deploys/67c60c1bed09b800089a73af
😎 Deploy Preview https://deploy-preview-957--reearth-flow.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🔭 Outside diff range comments (3)
server/websocket/src/ws.rs (1)

208-214: 🛠️ Refactor suggestion

Rename doc_id parameter to match usage context.

The function accepts a parameter named doc_id but it's actually receiving the normalized_path. This inconsistency could cause confusion for developers.

async fn handle_socket(
    socket: axum::extract::ws::WebSocket,
    bcast: Arc<crate::BroadcastGroup>,
-   doc_id: String,
+   normalized_path: String,
    pool: Arc<BroadcastPool>,
    user_token: Option<String>,
) {
server/websocket/src/storage/gcs/mod.rs (2)

555-555: ⚠️ Potential issue

Update remaining hardcoded PROJECTS_PREFIX reference.

The implementation of GcsRange::next() still uses the hardcoded PROJECTS_PREFIX instead of the dynamic path generation, which will cause key extraction to fail.

impl Iterator for GcsRange {
    type Item = GcsEntry;

    fn next(&mut self) -> Option<Self::Item> {
        if self.current >= self.objects.len() {
            return None;
        }
        let obj = &self.objects[self.current];
        let value = self.values[self.current].clone()?;
        self.current += 1;

+       // Extract the room_id and doc_id from the object name
+       let object_path = &obj.name;
+       let key_part = if let Some(pos) = object_path.rfind('/') {
+           &object_path[(pos + 1)..]
+       } else {
+           &object_path[PROJECTS_PREFIX.len()..]
+       };

        Some(GcsEntry {
-           key: hex::decode(&obj.name[PROJECTS_PREFIX.len()..]).unwrap_or_default(),
+           key: hex::decode(key_part).unwrap_or_default(),
            value,
        })
    }
}

187-265: ⚠️ Potential issue

Update rollback_to method to use the new path scheme.

The rollback_to method still uses the old path construction approach which will be inconsistent with the rest of the implementation.

The rollback_to method should be updated to use the new path extraction and generation functions like the other methods. This is a significant inconsistency that could lead to data integrity issues.

🧹 Nitpick comments (5)
server/websocket/src/ws.rs (2)

133-145: Improve path validation with more robust checks.

The current path validation only checks for minimum length and specific segments, but doesn't validate the content of the IDs themselves. This could potentially allow invalid or malicious input to pass through.

let parts: Vec<&str> = doc_path.split('/').collect();
-if parts.len() < 4 || parts[0] != "workspaces" || parts[2] != "projects" {
+// Validate path structure and ensure IDs are not empty
+if parts.len() < 4 || parts[0] != "workspaces" || parts[2] != "projects" || 
+   parts[1].trim().is_empty() || parts[3].trim().is_empty() {
    return Response::builder()
        .status(400)
        .body(axum::body::Body::from("Invalid path format"))
        .unwrap();
}

181-182: Use a helper function for path normalization.

This path format appears in multiple places in the codebase. Consider creating a dedicated helper function to ensure consistency.

-let normalized_path = format!("workspaces/{}/projects/{}", room_id, doc_id);
+let normalized_path = format_workspace_path(room_id, doc_id);

// Add this function to a utility module:
// fn format_workspace_path(room_id: &str, doc_id: &str) -> String {
//     format!("workspaces/{}/projects/{}", room_id, doc_id)
// }
server/websocket/src/storage/gcs/mod.rs (2)

27-34: Consider adding logging for path extraction failures.

The extract_ids_from_path function silently returns None when path validation fails. Adding debug logging would help diagnose issues during development.

fn extract_ids_from_path(path: &str) -> Option<(String, String)> {
    let parts: Vec<&str> = path.split('/').collect();
    if parts.len() >= 4 && parts[0] == "workspaces" && parts[2] == "projects" {
        Some((parts[1].to_string(), parts[3].to_string()))
    } else {
+       debug!("Invalid path format: {}", path);
        None
    }
}

104-107: Improve error handling with more specific error messages.

The current implementation returns an empty result when path extraction fails. More specific error reporting would help diagnose issues.

let (room_id, doc_id) = match extract_ids_from_path(doc_id) {
    Some(ids) => ids,
-   None => return Ok(Vec::new()),
+   None => {
+       debug!("Failed to extract room_id and doc_id from path: {}", doc_id);
+       return Ok(Vec::new())
+   },
};
server/websocket/src/server.rs (1)

88-93: Consider adding input validation for room_id and doc_id.

While the function correctly extracts and formats the path, it doesn't validate the content of room_id and doc_id before use.

Path((room_id, doc_id)): Path<(String, String)>,

+ // Validate that IDs don't contain invalid characters
+ if room_id.contains(|c: char| !c.is_alphanumeric() && c != '-' && c != '_') || 
+    doc_id.contains(|c: char| !c.is_alphanumeric() && c != '-' && c != '_') {
+     return Response::builder()
+         .status(StatusCode::BAD_REQUEST)
+         .body(Body::from("Invalid ID format"))
+         .unwrap();
+ }

let doc_path = format!("workspaces/{}/projects/{}", room_id, doc_id);
📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 1593105 and 2ee3ddf.

⛔ Files ignored due to path filters (1)
  • server/Cargo.lock is excluded by !**/*.lock, !**/*.lock
📒 Files selected for processing (7)
  • server/websocket/Cargo.toml (1 hunks)
  • server/websocket/src/broadcast/pool.rs (4 hunks)
  • server/websocket/src/server.rs (2 hunks)
  • server/websocket/src/storage/gcs/mod.rs (12 hunks)
  • server/websocket/src/ws.rs (3 hunks)
  • ui/src/lib/yjs/useYjsSetup.ts (3 hunks)
  • ui/src/routes/workspaces.$workspaceId_.projects_.$projectId.lazy.tsx (1 hunks)
⏰ Context from checks skipped due to timeout of 90000ms (1)
  • GitHub Check: ci-websocket / ci
🔇 Additional comments (20)
server/websocket/Cargo.toml (1)

60-60: Added UUID dependency for generating unique identifiers

The inclusion of the uuid crate with the v4 feature is a good addition for generating random UUIDs, which will likely be used for handling document or workspace identifiers in the WebSocket context.

ui/src/lib/yjs/useYjsSetup.ts (6)

14-14: Properly added workspaceId parameter to function signature

The addition of the workspaceId parameter to the function signature is a good enhancement that will support the workspace context in document paths.


19-19: Type definition correctly updated for workspaceId

The type definition is appropriately updated to include the optional workspaceId parameter, maintaining consistency with the function signature.


38-38: Updated conditional check to include workspaceId requirement

The conditional check now properly requires workspaceId as a condition for establishing the WebSocket connection, aligning with the new path structure.


45-45: Well-structured document path format

The path format clearly separates the workspace and project contexts, which makes the path more organized and hierarchical. This is a good practice for resource identification.


48-48: Updated WebSocket provider with new docPath parameter

The WebSocket provider now correctly uses the new document path format that includes workspace context.


82-82: Dependency array properly updated to include workspaceId

The dependency array for the useEffect hook has been updated to include workspaceId, ensuring that the effect will re-run if this parameter changes.

ui/src/routes/workspaces.$workspaceId_.projects_.$projectId.lazy.tsx (2)

67-67: Properly destructured workspaceId from useParams

The code now correctly destructures both projectId and workspaceId from the route parameters, with appropriate type annotations.


74-74: Passing workspaceId to useYjsSetup

The workspaceId is now correctly passed to the useYjsSetup function, completing the integration of workspace context in the WebSocket connection setup.

server/websocket/src/broadcast/pool.rs (6)

46-47: Parameter renamed from doc_id to doc_path for clarity

Renaming the parameter from doc_id to doc_path better reflects that this is now a path with workspace and room context, not just a simple identifier. This improves code readability and maintainability.


57-57: Updated document loading to use doc_path parameter

The document loading logic now correctly uses the doc_path parameter, maintaining consistency with the parameter rename.


61-62: Logging statements updated to reference doc_path

Logging statements have been updated to reference the new parameter name, ensuring consistent terminology throughout the codebase.


66-68: Error logging updated to use doc_path parameter

The error and info logging statements now correctly use the doc_path parameter, maintaining consistent terminology.


86-86: BroadcastConfig now uses doc_path for document naming

The BroadcastConfig structure now correctly uses doc_path for the document name, which is consistent with the parameter rename.


113-114: remove_connection method updated to use doc_path parameter

The remove_connection method has been updated to use doc_path instead of doc_id, maintaining consistency with the parameter rename throughout the codebase.

server/websocket/src/ws.rs (1)

230-231:

✅ Verification successful

Ensure this function is used consistently with the new path structure.

At line 230, the pool removes a connection using doc_id which now represents the normalized path. Verify that the remove_connection method is compatible with this change.


🏁 Script executed:

#!/bin/bash
# Check the implementation of remove_connection
rg -A 10 "fn\s+remove_connection" --type rust server/websocket/src/

Length of output: 1095


CONFIRMED: The verification confirms that the remove_connection method (located in server/websocket/src/broadcast/pool.rs) accepts a normalized path (doc_path: &str), which aligns with the use of doc_id on line 230. No changes are required.

server/websocket/src/storage/gcs/mod.rs (1)

22-22:

❌ Incorrect review comment

Add version constraint for the uuid dependency.

The uuid crate is imported but lacks a version constraint, which could lead to unexpected compatibility issues.


🏁 Script executed:

#!/bin/bash
# Check if uuid is specified in Cargo.toml
grep -A 5 "uuid" server/websocket/Cargo.toml

Length of output: 205


UUID Version Constraint Already Specified

After verifying the Cargo.toml file in server/websocket, we found that the uuid dependency is already constrained with:

uuid = { version = "1.7.0", features = ["v4"] }

No changes are required in server/websocket/src/storage/gcs/mod.rs.

Likely an incorrect or invalid review comment.

server/websocket/src/server.rs (3)

67-67: Good use of structured routing patterns.

The updated route clearly reflects the resource hierarchy, which enhances API readability and consistency.


75-78: Updated log message aligns well with the route change.

The updated log message accurately reflects the new route structure, which helps with debugging and documentation.


99-103: Ensure consistent validation between auth and non-auth handlers.

If input validation is added to the auth handler, it should be mirrored in the non-auth handler to maintain consistency.

The implementation for both handlers should be kept in sync. Consider extracting the validation logic to a shared function if needed.

Comment on lines +147 to +148
let room_id = parts[1];
let doc_id = normalize_doc_id(parts[3]);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Verification agent

🧩 Analysis chain

Consider validating extracted IDs.

The extracted IDs are used throughout the system but aren't validated for correctness. Consider implementing validation to ensure they don't contain special characters that might cause issues in paths.


🏁 Script executed:

#!/bin/bash
# Check for other ID validation in the codebase
rg -A 2 -B 2 "validate.*id|id.*validate" --type rust

Length of output: 14372


Action Required: Enforce ID Validation in WebSocket Handler

The extracted IDs in server/websocket/src/ws.rs (lines 147–148) are used throughout the system without explicit validation. While other parts of the codebase (e.g., XML and geometry modules) include validation routines, there’s no evidence of similar checks for the WebSocket IDs. Consider the following:

  • Room ID (parts[1]): Validate that it doesn’t contain unexpected special characters or malformed data that could lead to path issues.
  • Document ID (normalize_doc_id(parts[3])): Confirm that normalize_doc_id not only normalizes but also ensures the ID conforms to an expected safe format. If it doesn’t, introduce additional validation similar to other modules.

Implementing these checks will help guarantee the integrity of IDs across the system.

Comment on lines +36 to +38
fn generate_gcs_path(room_id: &str, doc_id: &str) -> String {
format!("projects/{}/{}/{}", room_id, doc_id, uuid::Uuid::new_v4())
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Using UUID for path generation may cause data consistency issues.

Generating a new UUID for each path means that the same logical document will have different paths on each operation. This could lead to data duplication or inconsistent access patterns. Clarify if this is the intended behavior.

Consider modifying the function to use a deterministic path construction:

fn generate_gcs_path(room_id: &str, doc_id: &str) -> String {
-   format!("projects/{}/{}/{}", room_id, doc_id, uuid::Uuid::new_v4())
+   // Use a deterministic path without UUID to ensure consistency
+   format!("projects/{}/{}", room_id, doc_id)
}

If unique files are needed, consider using the key content as part of the path or a hash function.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
fn generate_gcs_path(room_id: &str, doc_id: &str) -> String {
format!("projects/{}/{}/{}", room_id, doc_id, uuid::Uuid::new_v4())
}
fn generate_gcs_path(room_id: &str, doc_id: &str) -> String {
// Use a deterministic path without UUID to ensure consistency
format!("projects/{}/{}", room_id, doc_id)
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant