Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Atomic File Creation Probelm #1372

Open
KianaLia opened this issue Aug 1, 2022 · 5 comments
Open

Atomic File Creation Probelm #1372

KianaLia opened this issue Aug 1, 2022 · 5 comments
Assignees
Labels
question Further information is requested

Comments

@KianaLia
Copy link

KianaLia commented Aug 1, 2022

Hi guys!
I'm trying to make a valid Atomic dataset file but I've got some probelms!

My original dataset is a .csv file containing two clomuns: Bid and NumberOfPages (It's a sample file for testing)
I load this file a Pandas Dataframe in my code and save it as a .txt file with the code below:

np.savetxt(r'/content/drive/MyDrive/goldoon_data/bd.txt', df, header=''.join(f'{col},' for col in df.columns).rstrip())

The result looks like this:
Screenshot from 2022-08-01 13-06-17

And I rename the files format from rectest.txt to rectest.item in the specified data path.
Then I try to make a dataset using the following code:
config_dict = { 'field_separator' : ',', 'seq_separator' : ' ', 'neg_sampling' : {'uniform': 1}, 'data_path': '/content/drive/MyDrive/', 'load_col': {'item': ['bid','NumberOfPages']}, 'ITEM_ID_FIELD': 'bid', 'save_dataset': True, 'save_dataloaders': True }
config = Config(model='BPR', dataset = 'rectest', config_dict= config_dict)
dataset = create_dataset(config)

But I get this error:
Screenshot from 2022-08-01 13-23-26

Can you help me with it? or do you know a better way to make custom Atomic files?

@Ethan-TZ Ethan-TZ self-assigned this Aug 1, 2022
@Ethan-TZ Ethan-TZ added the question Further information is requested label Aug 1, 2022
@Ethan-TZ
Copy link
Member

Ethan-TZ commented Aug 1, 2022

@KianaLia Hello, thanks for your attention to RecBole!
This is because the wrong format of the file. Please ensure that the documents are strictly structured.
First, you should remove the , at the end of first line. Second, the remaining lines should be separated by commas (',').

@KianaLia KianaLia closed this as completed Aug 1, 2022
@KianaLia KianaLia closed this as completed Aug 1, 2022
@KianaLia KianaLia reopened this Aug 4, 2022
@KianaLia
Copy link
Author

KianaLia commented Aug 4, 2022

Hi again!
@chenyuwuxin Can you help me with creating an Atomic File with the format you mentioned above from a Pandas DataFrame?
Here's an example of my dataset:

df = pd.DataFrame({'NumberOfPages:float': {0: 96.0, 1: 96.0, 2: 144.0}, 'bid:token': {0: 3, 1: 3, 2: 5}})

I've shared my tries in the link below:
https://stackoverflow.com/questions/73193618/prevent-newline-rule-to-apply-on-header-np-savetxt

@Ethan-TZ
Copy link
Member

Ethan-TZ commented Aug 5, 2022

@KianaLia For a DataFrame object of your example, you can try the following command to create an Atmmic File:
df.to_csv('./test.txt', sep='\t', index=False)

@KianaLia
Copy link
Author

KianaLia commented Aug 6, 2022

Thanks for your easy solution @chenyuwuxin
But when I feed the .txt file into the create_dataset() command I get the following error:
image

Here's my config dict:

config_dict = { 'seq_separator' : '\t', 'neg_sampling' : {'uniform': 1}, 'data_path': '/content/drive/MyDrive/', 'load_col': {'item': ['bid','NumberOfPages']}, 'ITEM_ID_FIELD': 'bid', 'save_dataset': True, 'save_dataloaders': True }

@Sherry-XLL
Copy link
Member

Hello @KianaLia,

I don't quite understand the specific meaning of the two columns in your original dataset. In our framework, the .inter file containing user and item columns must be loaded, and the USER_ID_FIELD and ITEM_ID_FIELD must be specified. In your configuration, only the item attribute is loaded, so an error will be reported.

Please clarify whether your question is applicable to the recommendation scenario, and refer to the section on atomic files in our documentation. Thanks for your attention to RecBole!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

3 participants