Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added use orc column names session #24158

Draft
wants to merge 6 commits into
base: master
Choose a base branch
from

Conversation

malinjawi
Copy link

Description

This pull request adds a new session property orc_use_column_names to toggle the usage of ORC column names when accessing ORC files. This session property is analogous to the existing session property for Parquet (parquet_use_column_names). It allows users to enable or disable column name access for ORC files, offering more flexibility in how ORC files are processed within Presto.

Motivation and Context

Currently, there is no session property to control the use of column names when reading ORC files. This change aligns the behavior of ORC with that of Parquet, which already has a session property (parquet_use_column_names) to manage similar functionality. By adding this session property, users can now control whether column names are accessed from ORC files, improving consistency across file formats and enabling finer control over ORC file processing.

issue: #24134

Impact

  • New Session Property: A new session property (orc_use_column_names) is added, allowing users to enable or disable the use of ORC column names.
  • Behavior Change: This change introduces no breaking changes but adds flexibility to control how ORC files are processed.
  • No Performance Impact: There is no expected performance impact from this change, as it simply introduces a configuration toggle.

Test Plan

Contributor checklist

  • Please make sure your submission complies with our contributing guide, in particular code style and commit standards.
  • PR description addresses the issue accurately and concisely. If the change is non-trivial, a GitHub Issue is referenced.
  • Documented new properties (with its default value), SQL syntax, functions, or other functionality.
  • If release notes are required, they follow the release notes guidelines.
  • Adequate tests were added if applicable.
  • CI passed.

Release Notes

Please follow release notes guidelines and fill in the release notes below.

== RELEASE NOTES ==

General Changes
* ... :pr:`12345`
* ... :pr:`12345`

Hive Connector Changes
* ... :pr:`12345`
* ... :pr:`12345`

If release note is NOT required, use:

== NO RELEASE NOTE ==

@malinjawi malinjawi requested a review from a team as a code owner November 27, 2024 09:30
@malinjawi malinjawi requested a review from presto-oss November 27, 2024 09:30
Copy link

linux-foundation-easycla bot commented Nov 27, 2024

CLA Missing ID CLA Not Signed

@malinjawi
Copy link
Author

Note I am not sure if I need to add testing to this feature extension.

Also, I have signed the CLA late not sure if I need to update it further.

@malinjawi malinjawi marked this pull request as draft November 27, 2024 12:10
@nmahadevuni
Copy link
Member

hi @malinjawi , let's move the orc config property to HiveCommonClientConfig.java and session property to HiveCommonSessionProperties.java where the corresponding parquet properties are present.

Mohammad Linjawi added 4 commits November 28, 2024 19:01
@malinjawi
Copy link
Author

Hi @nmahadevuni , I've moved the ORC config property to HiveCommonClientConfig.java and the session property to HiveCommonSessionProperties.java, where the corresponding Parquet properties are defined. Please let me know if any further changes are needed. Also, It seems I should move the test cases too ?

@steveburnett
Copy link
Contributor

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants