Skip to content

Latest commit

 

History

History
42 lines (31 loc) · 2.11 KB

README.md

File metadata and controls

42 lines (31 loc) · 2.11 KB

Self-Training Graph Semantic Embeddings

Code and Dataset for the journal paper Enhancing Social Media User Semantic Embedding through Graph-Aware Contrastive Self-Supervised Learning (IEEE Access 2024) image

Datasets

Self-Training

Once the dataset is downloaded, save it inside the datasets directory, it should be named TwitterNeighbours. You can use a custom name if you prefer, in that case however you will need to update the src.config.data.DataConfig configuration files.

Once ready, you can launch a training with the following commands:

$ CUDA_VISIBLE_DEVICES=... python train.py [CONFIGS]

Produce graph semantic embeddings

After having loaded your graph data into from torch_geometric.data import Data

from src.model import samGAT
embs = samGAT(x=graph_data.x, edge_index=graph_data.edge_index.contiguous())

where graph_data.x are the initial representation for each node (in case of social media users it could be the average text embeddings of the user post)

Archetypes for hateful users few-shot

In the folder data/AppraieEval/archetypes archetypes for hatefull users can be found. here there are 4 files:

  • hate_archetypes_neighbours.json archetypes of users connections;
  • hate_archetypes_and_neighs_tweet_embeddings.json average post embeddings of each archetype and connected user;
  • hate_archetypes_initial_embeddings.json initial embeddings for just archetype users;
  • hate_archetypes_graph_embeddings graph embedding for archetypes (computed with samGATs)

    The graph semantic embeddings of the archetypes can be used, as mentioned in the paper, to estimate a score for hatefulness of a new user, as long as the graph model used to embed archetypes and user is the same.

Citing this work