Library for graph neural networks [graph-nn]

Short description

This library contains tools for working with graph neural networks, as well as auxiliary modules and algorithms that together allow you to create, train and use models, layers and datasets that work with data in a graph representation.

The library is under active development with the ultimate goal of solving predictive analytics tasks in the field of social network analysis and building career paths for university students and graduates, as well as for companies interested in developing their employees and recruiting staff.

To do this, already at the current stage of development, in addition to the basic models of graph neural networks, examples and tools for creating inherited solutions, the library includes a link parser of the VKontakte social network and HeadHunter labor exchange, as well as algorithms for finding the shortest path in a weighted graph with different types of connections and vertices.

All this together gives researchers and developers the basis for creating their own solutions in the field of graph neural networks for solving complex social and technical problems.

Repository composition

Datasets

The library contains definitions for working with datasets created according to the principles of inheritance from the base class. The base class of the dataset is set in the corresponding part of the dataset module. The library also defines private implementations of datasets (social networks) for the development of examples and tests (in particular, Cora dataset for examples with the analysis of citations of social media messages), and also an example of a dataset for industrial application in terms of job search SfeduDataset and a special dataset for loading data from the graph database ArangoDataset.

Loaders

To download datasets from the server, it was decided to implement a special Loader and define the single data upload mode for other implemented elements of a graph neural network and several examples. Additionally, appends the BatchLoader for butch data upload and DisjointLoader for disjoint loading.

Graph

The main work of a graph neural network is determined by the base class Graph, which is a container for data. The container works with the following parameters:

x: to represent the features of nodes,
a: to represent the adjacency matrix,
e: to represent the attributes of the edges of the graph,
y: to represent the nodes or labels of the graph.

Additionally, an algorithm for finding the shortest Bellman-Ford distance is implemented, represented by the corresponding original class and modified.

Neural network layers

The following neural network layers were created for the main work of the library:

Convolutional layer Chebyshev for a graph neural network.
Main (base) class for convolutional layer Graph neural network.
Convolutional a layer of a graph neural network.
A special GraphConv layer with a trainable skip connection.
The main (base) layer class for GlobalPool.
Global Sum is an implementation of the GlobalPoolLayer base class.
The main layer with the algorithm GraphSAGE.

Sending messages

To implement the algorithm for promoting information on a graph neural network, an algorithm was implemented via Base Class for transmitting messages in a graph neural network (for the GraphSage algorithm).

Models

The Main model was also created convolutional neural network, complementing the Tensorflow/Karas model and special industry model SfeduModel.

Dispersion models

For the basic message passing function Generic Message Passing, as well as a sub-library, scattering models were implemented:

scatter_max: Reduces the number of messages.
scatter_mean: Averages messages.
scatter_min: Reduces the number of messages.
scatter_prod: Multiplies messages.
scatter_sum: Summarizes the messages.

Transformations

Defined by the transformation base class LayerPreprocess - Implements the preprocessing function in the convolutional layer for the adjacency matrix.

Utilities

The library has a sufficient number of utilities and auxiliary functions:

add_self_loops: Adds loops to a given adjacency matrix.
batch_generator: Iterates over data with a given number of epochs, returns as a python generator one packet at a time.
chebyshev_filter: Implementation of the Chebyshev filter for a given adjacency matrix.
chebyshev_polynomial: Computes Chebyshev polynomials from X up to the order of k.
check_dtypes: Checking the data set type.
check_dtypes_decorator: Decorator for automatic type checking.
collate_labels_disjoint: Matches this list of labels for disjoint mode.
degree_power: Calculates the deviation
deserialize_kwarg: Deserialization of arguments
deserialize_scatter: Deserialization of scattering (scatter)
dot_production: Calculates the multiplication of a @b for a and b of the same rank (both 2 or both 3 ranks).
gcn_filter: Filters garf.
get_spec: Returns a specification (description or metadata) for a tensorflow type tensor.Tensor.
idx_to_mask: Returns the mask by indexes.
load_binary_file: Loads a value from a file serialized by the pickle module.
mask_to_float_weights: Converts the bit mask into simple weights to calculate the average losses across the network nodes.
mask_to_simple_weights: Converts the bit mask into simple weights to calculate the average losses across the network nodes.
dot_production_in_mixed_mode: Calculates the equivalent of the tf.einsum function('ij, bjk->bik', a, b).
dot_production_modal: Calculates matrix multiplication for a and b.
normalized_adjacency_matrix: Normalizes a given adjacency matrix.
normalized_laplacian: Computes the normalized Laplacian of a given adjacency matrix.
preprocess_features: Computing features.
read_file: Reading the file.
rescale_laplacian: Scales the Laplace eigenvalues to [-1,1].
reshape: Changes the shape according to the shape, automatically coping with the rarefaction.
serialize_kwarg: Serialization of attributes.
serialize_scatter: Serialization of the scatter.
shuffle_inplace: Shuffle np.random.shuffle.
sparse_matrices_to_sparse_tensors: Transformation of Scipy sparse matrices into a tensor.
sparse_matrix_to_sparse_tensor: Converts a sparse Scipy matrix into a sparse tensor.
convert_node_objects_to_disjoint: Converts lists of node objects, adjacency matrices, and boundary objects into disjoint mode.
to_tensorflow_signature: Converts a dataset signature to a TensorFlow signature.
transpose: Transposes parameter a, automatically coping with sparsity using overloaded TensorFLow functions.

Configuration, parameters and settings

Library configuration sets a lot of files in the config directory.

The main composition (named parameters):

aggregation methods,
properties and attributes,
application constants,
data types,
datasets,
folders,
named functions,
initializers,
models,
names,
links.

How to use

Can be used like this.

Set up envs

cp .env.dist .env

Create virtual environment

virtualenv -p <path_to_python> venv
source venv/bin/activate

Install packages

pip install -r requirements.txt

or

make install

If you change some packages, you can freeze this with command

pip freeze > requirements.txt

or

make freeze

Additional tools

HH crawler

Defines Vacancies/Keywords DataSet generator from HH.ru.

Collection of simple scripts for crawling vacancies from HH.ru site via API for generating CSV file by fields data like: name, description and key skills.

It helps to generate CSV file with following format:

"$name1 & $description1","key skills1"
"$name2 & $description2","key skills2"
"$name3 & $description3","key skills3"
...

Scripts tested on python 3.10 but should work on previous versions too.

Get pages

Change text field in download.py to yours:

text = 'NAME:Data science'

Then run script

cd ./gns/crawlers/hh
python download.py

This script will download save results from API to ./docs/pagination folder in JSON format.

Get details about vacancies

On the next step we need to download extended details about vacancies:

python parse.py

Script will call API and save responses to ./docs/vacancies folder.

Generate CSV

python generate.py

Result will be saved to ./docs/csv folder.

VK API crawler

How to use

cd ./gns/crawlers/vk
python main.py <vk_nickname_or_id>

Makefile

A Makefile is provided to automate some tasks. Available commands:

install: Installing packages.
freeze: Fixing packages.
clear: clearing the cache.
serve: package maintenance:
- landing,
- automatic formatting,
- sorting of imports,
typing check.
test: run tests.

Examples

Examples are provided in the directory examples:

Test example for the Cora dataset (analysis of the citation graph of social network messages).
Test case for the Cora dataset for the Chebyshev Convolutional layer (analysis of the citation graph of social network messages).
Simple Test Case for the Cora dataset (analysis of the citation graph of social network messages).
Examples of finding the shortest distance on a graph for the Bellman-Ford algorithm and modified Bellman-Ford algorithm.
Industry example for vacancy search.

Name		Name	Last commit message	Last commit date
Latest commit History 34 Commits
.github		.github
arangodb_data		arangodb_data
docs		docs
examples		examples
gns		gns
tests		tests
.dockerignore		.dockerignore
.env.example		.env.example
.env.test		.env.test
.gitignore		.gitignore
.gitlab-ci.yml		.gitlab-ci.yml
.isort.cfg		.isort.cfg
.pylintrc		.pylintrc
Dockerfile		Dockerfile
LICENSE		LICENSE
Makefile		Makefile
README.en.md		README.en.md
README.md		README.md
__init__.py		__init__.py
docker-compose.yml		docker-compose.yml
dtype_ratio.txt		dtype_ratio.txt
glrunner_config.toml		glrunner_config.toml
mypy.ini		mypy.ini
requirements.txt		requirements.txt
setup.py		setup.py
versions.py		versions.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Library for graph neural networks [graph-nn]

Short description

Repository composition

Datasets

Loaders

Graph

Neural network layers

Sending messages

Models

Dispersion models

Transformations

Utilities

Configuration, parameters and settings

Library configuration sets a lot of files in the config directory.

How to use

Additional tools

HH crawler

Get pages

Get details about vacancies

Generate CSV

VK API crawler

How to use

Makefile

Examples

About

Releases

Packages

Contributors 2

Languages

License

bda82/graphnn

Folders and files

Latest commit

History

Repository files navigation

Library for graph neural networks [graph-nn]

Short description

Repository composition

Datasets

Loaders

Graph

Neural network layers

Sending messages

Models

Dispersion models

Transformations

Utilities

Configuration, parameters and settings

Library configuration sets a lot of files in the config directory.

How to use

Additional tools

HH crawler

Get pages

Get details about vacancies

Generate CSV

VK API crawler

How to use

Makefile

Examples

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages