Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: remove single quotes in index names when printing #60251

Open
wants to merge 23 commits into
base: main
Choose a base branch
from

Conversation

thedataninja1786
Copy link

@thedataninja1786 thedataninja1786 commented Nov 8, 2024

Copy link
Member

@rhshadrach rhshadrach left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR! Whenever making changes to behavior, please add tests.

@rhshadrach rhshadrach added Output-Formatting __repr__ of pandas objects, to_string Bug labels Nov 8, 2024
@jorisvandenbossche jorisvandenbossche changed the title remove single quotes in index names when printing BUG: remove single quotes in index names when printing Nov 8, 2024
@thedataninja1786
Copy link
Author

Thanks for the PR! Whenever making changes to behavior, please add tests.

Tests have been added.

def as_escaped_string(
thing: Any, escape_chars: EscapeChars | None = escape_chars
) -> str:
translate = {"\t": r"\t", "\n": r"\n", "\r": r"\r"}
translate = {"\t": r"\t", "\n": r"\n", "\r": r"\r", "'": ""}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe this will make the output here:

df = pd.DataFrame({"'a": [1], "b": [2]}).set_index(["'a", "b"])
print(df.index.names)

be ['a', 'b']. That is not correct.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What would be the desired behavior?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The index names must be accurate, the first being 'a. I would think ['\'a', 'b'].

(["'test\n'", "b"], ['test\n', 'b']) # Remove quotes, preserve newline
]

for input_names, expected_names in test_cases:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you use pytest.mark.parametrize instead.

Copy link
Member

@rhshadrach rhshadrach left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good! Will need to pass pre-commit.

https://pandas.pydata.org/docs/dev/development/contributing_codebase.html#pre-commit

Also needs a line in the whatsnew for v3.0.0.

(["'test\n'", "b"], ["\'test\n\'", "b"]) # Escape and preserve newline
])
def test_formatted_index_names(input_names, expected_names):
# Create DataFrame with specified index names
Copy link
Member

@rhshadrach rhshadrach Nov 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you remove this comment - it repeats the code verbatim and so is not needed.

In addition, can you start the test with a comment referencing the issue:

# GH#60190

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we are there now?

@@ -663,9 +663,9 @@ Interval

Indexing
^^^^^^^^
- Bug in :func:`pprint_thing` with ``quote_strings=True`` failing for strings with embedded single quotes (:issue:`60190`)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pprint_thing is an internal function, the whatsnew is for users of pandas. We need to report on the impact users will notice. Something like Bug in printing :class:`Index` names would not escape single quotes

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes have been made

@@ -663,9 +663,9 @@ Interval

Indexing
^^^^^^^^
- Bug in :class:`Index` printed names do not escape single quotes (:issue:`60190`)
Copy link
Member

@rhshadrach rhshadrach Nov 21, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not think :class:`Index` printed names is clear, is there an objection to what I suggested in my previous comment? In any case, perhaps this is better.

Bug in printing :attr:`Index.names` and :attr:`MultiIndex.levels` would not escape single quotes (:issue:`60190`)

I did not realize that this can impact levels previously.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No there was not, in any case I will be directly including your current suggestion since it also include levels :) I'll make another PR.

df = pd.DataFrame({name: [1, 2, 3] for name in input_names}).set_index(input_names)
formatted_names = df.index.names

assert formatted_names == expected_names
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test passes on the main branch, so is not testing your changes. You can do str(df.index.names) and compare this to the expected value, e.g. ['\'a', 'b']

Copy link
Author

@thedataninja1786 thedataninja1786 Nov 23, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is an issue that I didn't realize before, that's why the tests pass. When the tests look like this
image
running the pre-commit checks: pre-commit run --from-ref=upstream/main --to-ref=HEAD --all-files fail, and convert it to the format of the latest PR (i.e. not escaping the single quotes).

Copy link
Member

@rhshadrach rhshadrach Nov 23, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When comparing to str(df.index.names), you want the expected to be a single string, not a list of strings. But indeed, you need to escape the backslash: ['\\'a', 'b'].

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to clarify my previous comment, you cannot compare single entries because with a backslash (this PR) and without a backslash (main) evaluate as equal. However if you convert the entire index.names to a string, they do not.

input_names = ["'a", "b"]
df = pd.DataFrame({name: [1, 2, 3] for name in input_names}).set_index(input_names)

print(str(df.index.names) == "['\\'a', 'b']")  # main
# False

print(str(df.index.names) == "['\\'a', 'b']")  # PR
# True

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay I think I'm with you :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Output-Formatting __repr__ of pandas objects, to_string
Projects
None yet
Development

Successfully merging this pull request may close these issues.

BUG: pprint_thing(..., quote_strings=True) fails for strings with embedded single-quotes
2 participants