You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When attempting to configure my model with TabularSequenceFeatures.from_schema(), I encounter an error, which I suspect is related to the setup of the item embedding table. Could anyone point out what I might be doing wrong?
Details
I'm working in the PyTorch 23.12 Docker image and most of my code came from trying to follow the End-to-end session-based recommendation notebook or the Model Architectures page.
Here is my nvt code:
# Load datasetdf=pq.read_table('/workspace/scriptie/data/processed/processedAndTruncated.parquet').to_pandas()
df['priceCategory'] =df['priceCategory'].astype(str)
df=df.rename(columns={'accommodationId': 'item_id'})
# Categorify categorical featurescateg_feats= ['engagementType', 'periodId', 'country', 'item_id', 'aquaFun', 'adultOnly', 'forKids',
'priceCategory']
categorify_op=categ_feats>>nvt.ops.Categorify()
userId= ['userId']
userId_op=userId>>nvt.ops.Categorify() >>nvt.ops.TagAsUserID()
# Define Groupby Workflowgroupby_feats=userId_op+categ_feats+ ['engagementCountLog', 'itemRecencyLog', 'dateHoursLog', 'dayOfYearSin', 'dayOfYearCos']
# Step 2: Define groupby operation to create list columnsgroupby_features=groupby_feats>>nvt.ops.Groupby(
groupby_cols=['userId'],
sort_cols=['dateHoursLog'],
aggs={
'item_id': ['list', 'count'],
'engagementType': ['list'],
'periodId': ['list'],
'country': ['list'],
'aquaFun': ['list'],
'adultOnly': ['list'],
'forKids': ['list'],
'priceCategory': ['list'],
'dateHoursLog': ['list'],
'itemRecencyLog': ['list'],
'engagementCountLog': ['list'],
'dayOfYearSin': ['list'],
'dayOfYearCos': ['list']
},
name_sep='-'
)
# Ading metadata opsmetadata_features=groupby_features>>nvt.ops.AddMetadata(tags=['LIST'])
tagged_item_id=groupby_features['item_id-list'] >>nvt.ops.TagAsItemID() >>nvt.ops.AddMetadata(tags=['ITEM_ID', 'ITEM' ,'CATEGORICAL'])
cont_op=groupby_features['dateHoursLog-list', 'itemRecencyLog-list', 'engagementCountLog-list', 'dayOfYearSin-list', 'dayOfYearCos-list'] >>nvt.ops.AddMetadata(tags=[Tags.CONTINUOUS])
categ_op=groupby_features['engagementType-list', 'periodId-list', 'country-list', 'item_id-list', 'aquaFun-list', 'adultOnly-list', 'forKids-list', 'priceCategory-list', 'item_id-count'] >>nvt.ops.AddMetadata(tags=['CATEGORICAL'])
# add any other workflowsrenamendUserId=groupby_features['userId'] >>nvt.ops.Rename(name='user_id')
selected_features=metadata_features+cont_op+categ_op+tagged_item_id# Filter out sessions with length 1MINIMUM_SESSION_LENGTH=2final_workflow_ops=selected_features>>nvt.ops.Filter(f=lambdadf: df["item_id-count"] >=MINIMUM_SESSION_LENGTH)
# Create and apply the workflowworkflow=nvt.Workflow(final_workflow_ops)
# Apply the combined workflow in a single fit_transform calldataset=nvt.Dataset(df)
workflow.fit(dataset)
transformed_dataset=workflow.transform(dataset)
# Save the transformed dataset with metadata to parquettransformed_dataset.to_parquet("/workspace/scriptie/data/processed/processed_with_metadata_nvt")
❓ Questions & Help
When attempting to configure my model with TabularSequenceFeatures.from_schema(), I encounter an error, which I suspect is related to the setup of the item embedding table. Could anyone point out what I might be doing wrong?
Details
I'm working in the PyTorch 23.12 Docker image and most of my code came from trying to follow the End-to-end session-based recommendation notebook or the Model Architectures page.
Here is my nvt code:
And here is my current model:
Running the code of my model will result in a key error, namely: KeyError: 'item_id-list'. Here is the complete error message:
After executing the command inputs.item_embedding_table, I encounter a KeyError identical to one I've experienced above.
The text was updated successfully, but these errors were encountered: