Skip to content

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix order of columns after TSDataset.to_pandas() and after TSDataset.__init__() #25

Closed
Mr-Geekman opened this issue Aug 14, 2023 · 0 comments
Labels
enhancement New feature or request

Comments

@Mr-Geekman
Copy link

Mr-Geekman commented Aug 14, 2023

Issue by GooseIt
Wednesday Feb 08, 2023 at 15:55 GMT
Originally opened as tinkoff-ai#1106


🚀 Feature Request

It may be useful to impose the same order on both the return dataframe of TSDataset.to_dataset() and the dataframe df constructed during TSDataset.__init__() as the order imposed on the return dataframe of TSDataset.to_flatten() for the sake of consistency.
Current order of columns in both the return dataframe of TSDataset._to_dataset() and TSDataset.df places "target" along other features in alphabetical order, while order of columns in the return dataframe of TSDataset.to_flatten() places "target" after "timestamp" and "segment" and prior to other features in alphabetical order.
The order after TSDataset.to_flatten() makes observing "target" value more convenient (as it is not hidden among many other features) and emphasises its special role.

Proposal

I propose the following order of columns:

  • timestamp,
  • segment,
  • target,
  • other columns in alphabetical order.

How it can be done for TSDataset.to_dataset():

  1. Find line df_copy = df_copy.pivot(index="timestamp", columns="segment") in etna.datasets.tsdataset.py
  2. Prior to it reorder columns of df_copy in a way that puts "target" prior to other features, if said "target" is provided. It should look like feature_columns.remove("target") and in the next line df_copy = df_copy[["timestamp, "segment", "target"] + feature_columns]

How it can be done for TSDataset.__init__():

  1. Find line df = pd.concat((df, self.df_exog), axis=1).loc[df.index].sort_index(axis=1, level=(0, 1)) in etna.datasets.tsdataset.py
  2. Correct it in a way that puts "target" before other columns, still sorted in alphabetical order.

Test cases

  1. Fix doctest of TSDataset.to_dataset().
  2. Make sure current tests pass.
  3. Add tests on order of columns for both modified methods to etna.tests.test_datasets.test_dataset.py:
  • test_to_dataset_correct_column_order for TSDataset.to_dataset()
  • test_init_with_exog_correct_column_order for TSDataset.__init__() with df_exog != None

Additional context

See issue tinkoff-ai#873 for similar issue for TSDataset.to_flatten()

@Mr-Geekman Mr-Geekman added the enhancement New feature or request label Aug 14, 2023
@Mr-Geekman Mr-Geekman moved this to Specification in etna board Aug 15, 2023
@etna-team etna-team locked and limited conversation to collaborators May 30, 2024
@d-a-bunin d-a-bunin converted this issue into discussion #370 May 30, 2024
@github-project-automation github-project-automation bot moved this from Specification to Done in etna board May 30, 2024

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

Labels
enhancement New feature or request
Projects
Status: Done
Development

No branches or pull requests

1 participant