Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use varint length field for last_path encoding to support longer GCP object names #72

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

r-uehara0219
Copy link

fix #71

Overview

This PR updates the last_path encoding mechanism in embulk-input-gcs. Instead of using a fixed 1-byte length field (limiting paths to 127 bytes), the new implementation uses a varint length field, which can span 1–2 bytes, to represent the length of the UTF-8 encoded string.

Why Is It Necessary?

GCP object names can be up to 1024 bytes in length. The previous implementation’s 1-byte length field restricted last_path to 128 characters, causing errors when handling longer object names. Switching to a varint length field removes this limitation and ensures that the plugin can support all valid GCP object names.

@r-uehara0219 r-uehara0219 requested a review from a team as a code owner March 7, 2025 08:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

Support longer GCS object names for last_path encoding
1 participant