Skip to content
/ graphnn Public

Graph neural network library for career trajectory definition

License

Notifications You must be signed in to change notification settings

bda82/graphnn

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

34 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Library for graph neural networks [graph-nn]

Short description

This library contains tools for working with graph neural networks, as well as auxiliary modules and algorithms that together allow you to create, train and use models, layers and datasets that work with data in a graph representation.

The library is under active development with the ultimate goal of solving predictive analytics tasks in the field of social network analysis and building career paths for university students and graduates, as well as for companies interested in developing their employees and recruiting staff.

To do this, already at the current stage of development, in addition to the basic models of graph neural networks, examples and tools for creating inherited solutions, the library includes a link parser of the VKontakte social network and HeadHunter labor exchange, as well as algorithms for finding the shortest path in a weighted graph with different types of connections and vertices.

All this together gives researchers and developers the basis for creating their own solutions in the field of graph neural networks for solving complex social and technical problems.

Repository composition

Datasets

The library contains definitions for working with datasets created according to the principles of inheritance from the base class. The base class of the dataset is set in the corresponding part of the dataset module. The library also defines private implementations of datasets (social networks) for the development of examples and tests (in particular, Cora dataset for examples with the analysis of citations of social media messages), and also an example of a dataset for industrial application in terms of job search SfeduDataset and a special dataset for loading data from the graph database ArangoDataset.

Loaders

To download datasets from the server, it was decided to implement a special Loader and define the single data upload mode for other implemented elements of a graph neural network and several examples. Additionally, appends the BatchLoader for butch data upload and DisjointLoader for disjoint loading.

Graph

The main work of a graph neural network is determined by the base class Graph, which is a container for data. The container works with the following parameters:

  • x: to represent the features of nodes,
  • a: to represent the adjacency matrix,
  • e: to represent the attributes of the edges of the graph,
  • y: to represent the nodes or labels of the graph.

Additionally, an algorithm for finding the shortest Bellman-Ford distance is implemented, represented by the corresponding original class and modified.

Neural network layers

The following neural network layers were created for the main work of the library:

Sending messages

To implement the algorithm for promoting information on a graph neural network, an algorithm was implemented via Base Class for transmitting messages in a graph neural network (for the GraphSage algorithm).

Models

The Main model was also created convolutional neural network, complementing the Tensorflow/Karas model and special industry model SfeduModel.

Dispersion models

For the basic message passing function Generic Message Passing, as well as a sub-library, scattering models were implemented:

Transformations

Defined by the transformation base class LayerPreprocess - Implements the preprocessing function in the convolutional layer for the adjacency matrix.

Utilities

The library has a sufficient number of utilities and auxiliary functions:

Configuration, parameters and settings

Library configuration sets a lot of files in the config directory.

The main composition (named parameters):

  • aggregation methods,
  • properties and attributes,
  • application constants,
  • data types,
  • datasets,
  • folders,
  • named functions,
  • initializers,
  • models,
  • names,
  • links.

How to use

Can be used like this.

Set up envs

cp .env.dist .env

Create virtual environment

virtualenv -p <path_to_python> venv
source venv/bin/activate

Install packages

pip install -r requirements.txt

or

make install

If you change some packages, you can freeze this with command

pip freeze > requirements.txt

or

make freeze

Additional tools

HH crawler

Defines Vacancies/Keywords DataSet generator from HH.ru.

Collection of simple scripts for crawling vacancies from HH.ru site via API for generating CSV file by fields data like: name, description and key skills.

It helps to generate CSV file with following format:

"$name1 & $description1","key skills1"
"$name2 & $description2","key skills2"
"$name3 & $description3","key skills3"
...

Scripts tested on python 3.10 but should work on previous versions too.

Get pages

Change text field in download.py to yours:

text = 'NAME:Data science'

Then run script

cd ./gns/crawlers/hh
python download.py

This script will download save results from API to ./docs/pagination folder in JSON format.

Get details about vacancies

On the next step we need to download extended details about vacancies:

python parse.py

Script will call API and save responses to ./docs/vacancies folder.

Generate CSV

python generate.py

Result will be saved to ./docs/csv folder.

VK API crawler

How to use

cd ./gns/crawlers/vk
python main.py <vk_nickname_or_id>

Makefile

A Makefile is provided to automate some tasks. Available commands:

  • install: Installing packages.
  • freeze: Fixing packages.
  • clear: clearing the cache.
  • serve: package maintenance:
    • landing,
    • automatic formatting,
    • sorting of imports,
  • typing check.
  • test: run tests.

Examples

Examples are provided in the directory examples:

  • Test example for the Cora dataset (analysis of the citation graph of social network messages).
  • Test case for the Cora dataset for the Chebyshev Convolutional layer (analysis of the citation graph of social network messages).
  • Simple Test Case for the Cora dataset (analysis of the citation graph of social network messages).
  • Examples of finding the shortest distance on a graph for the Bellman-Ford algorithm and modified Bellman-Ford algorithm.
  • Industry example for vacancy search.

About

Graph neural network library for career trajectory definition

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages