Fix order of columns after TSDataset.to_pandas() and after TSDataset.init() #25

Mr-Geekman · 2023-08-14T15:39:30Z

Issue by GooseIt
Wednesday Feb 08, 2023 at 15:55 GMT
Originally opened as tinkoff-ai#1106

🚀 Feature Request

It may be useful to impose the same order on both the return dataframe of TSDataset.to_dataset() and the dataframe df constructed during TSDataset.__init__() as the order imposed on the return dataframe of TSDataset.to_flatten() for the sake of consistency.
Current order of columns in both the return dataframe of TSDataset._to_dataset() and TSDataset.df places "target" along other features in alphabetical order, while order of columns in the return dataframe of TSDataset.to_flatten() places "target" after "timestamp" and "segment" and prior to other features in alphabetical order.
The order after TSDataset.to_flatten() makes observing "target" value more convenient (as it is not hidden among many other features) and emphasises its special role.

Proposal

I propose the following order of columns:

timestamp,
segment,
target,
other columns in alphabetical order.

How it can be done for TSDataset.to_dataset():

Find line df_copy = df_copy.pivot(index="timestamp", columns="segment") in etna.datasets.tsdataset.py
Prior to it reorder columns of df_copy in a way that puts "target" prior to other features, if said "target" is provided. It should look like feature_columns.remove("target") and in the next line df_copy = df_copy[["timestamp, "segment", "target"] + feature_columns]

How it can be done for TSDataset.__init__():

Find line df = pd.concat((df, self.df_exog), axis=1).loc[df.index].sort_index(axis=1, level=(0, 1)) in etna.datasets.tsdataset.py
Correct it in a way that puts "target" before other columns, still sorted in alphabetical order.

Test cases

Fix doctest of TSDataset.to_dataset().
Make sure current tests pass.
Add tests on order of columns for both modified methods to etna.tests.test_datasets.test_dataset.py:

test_to_dataset_correct_column_order for TSDataset.to_dataset()
test_init_with_exog_correct_column_order for TSDataset.__init__() with df_exog != None

Additional context

See issue tinkoff-ai#873 for similar issue for TSDataset.to_flatten()

The text was updated successfully, but these errors were encountered:

Mr-Geekman added the enhancement New feature or request label Aug 14, 2023

Mr-Geekman added this to etna board Aug 15, 2023

Mr-Geekman moved this to Specification in etna board Aug 15, 2023

etna-team locked and limited conversation to collaborators May 30, 2024

d-a-bunin converted this issue into discussion #370 May 30, 2024

github-project-automation bot moved this from Specification to Done in etna board May 30, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

This issue was moved to a discussion.

Fix order of columns after TSDataset.to_pandas() and after TSDataset.init() #25

Fix order of columns after TSDataset.to_pandas() and after TSDataset.init() #25

Mr-Geekman commented Aug 14, 2023 •

edited

Loading

This issue was moved to a discussion.

This issue was moved to a discussion.

Fix order of columns after TSDataset.to_pandas() and after TSDataset.__init__() #25

Fix order of columns after TSDataset.to_pandas() and after TSDataset.__init__() #25

Comments

Mr-Geekman commented Aug 14, 2023 • edited Loading

🚀 Feature Request

Proposal

Test cases

Additional context

This issue was moved to a discussion.

Fix order of columns after TSDataset.to_pandas() and after TSDataset.init() #25

Fix order of columns after TSDataset.to_pandas() and after TSDataset.init() #25

Mr-Geekman commented Aug 14, 2023 •

edited

Loading