Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

嘗試建立yelp知識圖譜 #1597

Open
emilyjeng opened this issue Dec 21, 2022 · 6 comments
Open

嘗試建立yelp知識圖譜 #1597

emilyjeng opened this issue Dec 21, 2022 · 6 comments
Assignees
Labels
bug Something isn't working

Comments

@emilyjeng
Copy link

emilyjeng commented Dec 21, 2022

您好!我嘗試簡單建立了yelp的知識圖譜,在.kg檔案中,我將head_id:token設為iten_id:token,relation_id:token設為location.shop.location,tail_id:token設為categories:token_seq
如下所示:
image
也增加另一個relation_id:token
image

在.link檔案中,item_id:token保持不變,entity_id:token設為categories:token_seq,如下所示:
image

但在我執行時會遇到錯誤,如下所示:

Traceback (most recent call last):
File "run_recbole.py", line 48, in
run_recbole(
File "/Emily/RecBole-master/recbole/quick_start/quick_start.py", line 69, in run_recbole
dataset = create_dataset(config)
File "/Emily/RecBole-master/recbole/data/utils.py", line 70, in create_dataset
dataset = dataset_class(config)
File "/Emily/RecBole-master/recbole/data/dataset/kg_dataset.py", line 68, in init
super().init(config)
File "/Emily/RecBole-master/recbole/data/dataset/dataset.py", line 108, in init
self._from_scratch()
File "/Emily/RecBole-master/recbole/data/dataset/dataset.py", line 120, in _from_scratch
self._data_processing()
File "/Emily/RecBole-master/recbole/data/dataset/dataset.py", line 168, in _data_processing
self._normalize()
File "/Emily/RecBole-master/recbole/data/dataset/dataset.py", line 710, in _normalize
feat[field] = norm(feat[field].values)
File "/Emily/RecBole-master/recbole/data/dataset/dataset.py", line 698, in norm
mx, mn = max(arr), min(arr)
ValueError: max() arg is an empty sequence

我不理解該如何處理這問題?或是我在建立知識圖譜的想法有錯?
如需復現我可以提供資料

@emilyjeng emilyjeng added the bug Something isn't working label Dec 21, 2022
@Ethan-TZ
Copy link
Member

Ethan-TZ commented Dec 24, 2022

@emilyjeng 您好,請問是否可以提供一下您運行的配置文件?

@emilyjeng
Copy link
Author

emilyjeng commented Dec 26, 2022

檔案連結如下:
https://drive.google.com/drive/folders/10-2Q8zW3FC_hylvKXorfohL8w0sw1Dm-?usp=sharing

yaml檔設定如下:
#dataset config
field_separator: "\t"
seq_separator: " "
USER_ID_FIELD: user_id
ITEM_ID_FIELD: item_id
RATING_FIELD: rating
TIME_FIELD: timestamp
NEG_PREFIX: neg_
LABEL_FIELD: label
normalize_all: True #正規化
threshold:
rating: 4
load_col:
inter: [user_id, item_id, rating]
kg: [head_id, relation_id, tail_id]
link: [item_id, entity_id]

#data filtering for interactions
val_interval:
rating: "[4,inf)"
unused_col:
inter: [rating]

user_inter_num_interval: "[10,inf)"
item_inter_num_interval: "[10,inf)"

embedding_size: 64
kg_embedding_size: 64 # (int) The embedding size of relations in knowledge graph.
reg_weights: [1e-2,1e-2] # (list of float) The L2 regularization weights.
#data preprocessing for knowledge graph triples
kg_reverse_r: True
entity_kg_num_interval: "[5,inf)"
relation_kg_num_interval: "[5,inf)"

#training and evaluation
epochs: 500
train_batch_size: 4096
eval_batch_size: 40960000
metrics: ['Recall', 'MRR', 'NDCG', 'Hit', 'Precision']
valid_metric: Hit@10
train_neg_sample_args:
distribution: uniform
sample_num: 1
dynamic: False

執行:
python run_recbole.py --model=CKE --dataset=yelp22_us10shop --config_files=test.yaml

@Ethan-TZ
Copy link
Member

@emilyjeng 您好,請嘗試將normalize_all設置爲False

@emilyjeng
Copy link
Author

emilyjeng commented Jan 2, 2023

@chenyuwuxin 感謝解答!我將normalize_all設置爲False後,出現報錯如下:

02 Jan 08:41 INFO yelp22_us10shop
The number of users: 1
Average actions of users: nan
The number of items: 1
Average actions of items: nan
The number of inters: 0
The sparsity of the dataset: 100.0%
Remain Fields: ['entity_id', 'user_id', 'item_id', 'head_id', 'relation_id', 'tail_id', 'label']
The number of entities: 1
The number of relations: 2
The number of triples: 0
The number of items that have been linked to KG: 0
02 Jan 08:41 WARNING Field [rating] is not in [inter_feat], which can not be set in unused_col.
Traceback (most recent call last):
File "run_recbole.py", line 48, in
run_recbole(
File "/Emily/RecBole-master/recbole/quick_start/quick_start.py", line 73, in run_recbole
train_data, valid_data, test_data = data_preparation(config, dataset)
File "/Emily/RecBole-master/recbole/data/utils.py", line 166, in data_preparation
train_sampler, valid_sampler, test_sampler = create_samplers(
File "/Emily/RecBole-master/recbole/data/utils.py", line 297, in create_samplers
sampler = Sampler(
File "/Emily/RecBole-master/recbole/sampler/sampler.py", line 227, in init
super().init(distribution=distribution, alpha=alpha)
File "/Emily/RecBole-master/recbole/sampler/sampler.py", line 40, in init
self.used_ids = self.get_used_ids()
File "/Emily/RecBole-master/recbole/sampler/sampler.py", line 257, in get_used_ids
raise ValueError(
ValueError: Some users have interacted with all items, which we can not sample negative items for them. Please set user_inter_num_interval to filter those users.,

但如上面的設置,我有設置user_inter_num_interval,以及我發現user及item數量過少,請問是否我的數據集kg及link建立關聯的想法是否錯誤的?或是有其他的問題?
如下圖:
image

@Ethan-TZ
Copy link
Member

Ethan-TZ commented Jan 6, 2023

@emilyjeng 您好,這個問題是由於數據集中存在某個用戶或者物品的交互過少,導致它交互的對象全被過濾掉了。您可以嘗試降低user_inter_num_intervalitem_inter_num_interval來解決這個問題。

@emilyjeng
Copy link
Author

emilyjeng commented Jan 10, 2023

@chenyuwuxin 您好!我後來發現是entity_kg_num_interval和relation_kg_num_interval的數量問題,當我降低後,產生了另一個錯誤
Traceback (most recent call last):
File "run_recbole.py", line 48, in
run_recbole(
File "/Emily/RecBole-master/recbole/quick_start/quick_start.py", line 69, in run_recbole
dataset = create_dataset(config)
File "/Emily/RecBole-master/recbole/data/utils.py", line 70, in create_dataset
dataset = dataset_class(config)
File "/workspace-nfs/JMD220-dev/NLP/Emily/RecBole-master/recbole/data/dataset/kg_dataset.py", line 68, in init
super().init(config)
File "/Emily/RecBole-master/recbole/data/dataset/dataset.py", line 108, in init
self._from_scratch()
File "/Emily/RecBole-master/recbole/data/dataset/dataset.py", line 120, in _from_scratch
self._data_processing()
File "/Emily/RecBole-master/recbole/data/dataset/dataset.py", line 164, in _data_processing
self._remap_ID_all()
File "/Emily/RecBole-master/recbole/data/dataset/kg_dataset.py", line 407, in _remap_ID_all
self._merge_item_and_entity()
File "/Emily/RecBole-master/recbole/data/dataset/kg_dataset.py", line 349, in _merge_item_and_entity
entity_id_map[i] = new_item_token2id[self.entity2item[entity_token[i]]]
KeyError: '_7bSxlQbj51wn5_0DouyKg'
此錯誤看起來是找不到ID

是否 entity_id不能是字串呢?
以下是我的link檔,item_id:token entity_id:token
image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants