Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace FileSourceParams path with URI #5104

Closed
wants to merge 3 commits into from

Conversation

rdettai
Copy link
Contributor

@rdettai rdettai commented Jun 11, 2024

Description

The FileSourceParams curently contains a path that is converted forth and back to a URI. This PR straitens up things to avoid these conversions.

How was this PR tested?

Describe how you tested this PR.

@rdettai rdettai force-pushed the refactor-file-source branch from 373435a to afd7868 Compare June 11, 2024 07:01
@rdettai rdettai self-assigned this Jun 11, 2024
@rdettai rdettai force-pushed the uri-file-src-params branch from e34d436 to 383850e Compare June 11, 2024 07:52
Copy link

github-actions bot commented Jun 11, 2024

On SSD:

Average search latency is 0.993x that of the reference (lower is better).
Ref run id: 2337, ref commit: 4ade7b5
Link

On GCS:

Average search latency is 0.99x that of the reference (lower is better).
Ref run id: 2339, ref commit: 4ade7b5
Link

assert!(cache_directory_path.read_dir().unwrap().next().is_none());

let does_not_exit_uri = uri_from_path(&test_env.data_dir_path)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
let does_not_exit_uri = uri_from_path(&test_env.data_dir_path)
let does_not_exist_uri = uri_from_path(&test_env.data_dir_path)

}

/// Deserializing as an URI
fn uri_from_str<'de, D>(deserializer: D) -> Result<Option<Uri>, D::Error>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Uri implements Deserialize, so I don't think we need a custom deserializer for Option<Uri>.

@@ -249,37 +254,33 @@ pub struct FileSourceParams {
#[schema(value_type = String)]
#[serde(skip_serializing_if = "Option::is_none")]
#[serde(default)]
#[serde(deserialize_with = "absolute_filepath_from_str")]
pub filepath: Option<PathBuf>, //< If None read from stdin.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could rename this field file_uri and use an alias to maintain backward compatibility.

})
}

pub fn file_from_str<P: AsRef<str>>(filepath: P) -> anyhow::Result<Self> {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is necessary, it's already super easy to go from &str to Uri and then to a file params object.

/// TODO: we might want to replace `PathBuf` with `Uri` directly in
/// `FileSourceParams`
fn absolute_filepath_from_str<'de, D>(deserializer: D) -> Result<Option<PathBuf>, D::Error>
impl FromStr for FileSourceParams {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here.

parsed_cust_event.uri(),
PathBuf::from("s3://quickwit-test/test.json"),
parsed_cust_event.uri().unwrap(),
Uri::from_str("s3://quickwit-test/test.json").unwrap(),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe Uri implements PartialEq for &str.

@rdettai rdettai force-pushed the refactor-file-source branch 5 times, most recently from 46208bf to fdc98a5 Compare June 12, 2024 20:55
@rdettai rdettai force-pushed the uri-file-src-params branch from ceaaeb3 to 24887c6 Compare June 12, 2024 20:57
@rdettai rdettai force-pushed the refactor-file-source branch from fdc98a5 to 3555d3a Compare June 12, 2024 21:06
@rdettai rdettai force-pushed the uri-file-src-params branch 3 times, most recently from 4b1775f to 6c4f299 Compare June 14, 2024 07:42
@rdettai rdettai force-pushed the refactor-file-source branch from 94f94ee to bea5d82 Compare June 26, 2024 07:56
@rdettai rdettai force-pushed the uri-file-src-params branch from 6c4f299 to cbcbfda Compare June 26, 2024 07:56
@rdettai rdettai force-pushed the refactor-file-source branch from bea5d82 to 6d6fdb1 Compare June 26, 2024 08:38
@rdettai rdettai force-pushed the uri-file-src-params branch from cbcbfda to 0c93920 Compare June 26, 2024 08:38
@rdettai
Copy link
Contributor Author

rdettai commented Jul 4, 2024

Closing this in favor of #5148 that brings significant changes on top of the file source,

@rdettai rdettai closed this Jul 4, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants