This library contains tools for working with graph neural networks, as well as auxiliary modules and algorithms that together allow you to create, train and use models, layers and datasets that work with data in a graph representation.
The library is under active development with the ultimate goal of solving predictive analytics tasks in the field of social network analysis and building career paths for university students and graduates, as well as for companies interested in developing their employees and recruiting staff.
To do this, already at the current stage of development, in addition to the basic models of graph neural networks, examples and tools for creating inherited solutions, the library includes a link parser of the VKontakte social network and HeadHunter labor exchange, as well as algorithms for finding the shortest path in a weighted graph with different types of connections and vertices.
All this together gives researchers and developers the basis for creating their own solutions in the field of graph neural networks for solving complex social and technical problems.
The library contains definitions for working with datasets created according to the principles of inheritance from the base class. The base class of the dataset is set in the corresponding part of the dataset module. The library also defines private implementations of datasets (social networks) for the development of examples and tests (in particular, Cora dataset for examples with the analysis of citations of social media messages), and also an example of a dataset for industrial application in terms of job search SfeduDataset and a special dataset for loading data from the graph database ArangoDataset.
To download datasets from the server, it was decided to implement a special Loader and define the single data upload mode for other implemented elements of a graph neural network and several examples. Additionally, appends the BatchLoader for butch data upload and DisjointLoader for disjoint loading.
The main work of a graph neural network is determined by the base class Graph, which is a container for data. The container works with the following parameters:
x
: to represent the features of nodes,a
: to represent the adjacency matrix,e
: to represent the attributes of the edges of the graph,y
: to represent the nodes or labels of the graph.
Additionally, an algorithm for finding the shortest Bellman-Ford distance is implemented, represented by the corresponding original class and modified.
The following neural network layers were created for the main work of the library:
- Convolutional layer Chebyshev for a graph neural network.
- Main (base) class for convolutional layer Graph neural network.
- Convolutional a layer of a graph neural network.
- A special
GraphConv
layer with a trainable skip connection. - The main (base) layer class for GlobalPool.
- Global Sum is an implementation of the GlobalPoolLayer base class.
- The main layer with the algorithm GraphSAGE.
To implement the algorithm for promoting information on a graph neural network, an algorithm was implemented via
Base Class for transmitting messages in a graph neural network (for the GraphSage
algorithm).
The Main model was also created convolutional neural network, complementing the Tensorflow/Karas model and special industry model SfeduModel.
For the basic message passing function Generic Message Passing
, as well as a sub-library, scattering models were implemented:
- scatter_max: Reduces the number of messages.
- scatter_mean: Averages messages.
- scatter_min: Reduces the number of messages.
- scatter_prod: Multiplies messages.
- scatter_sum: Summarizes the messages.
Defined by the transformation base class LayerPreprocess - Implements the preprocessing function in the convolutional layer for the adjacency matrix.
The library has a sufficient number of utilities and auxiliary functions:
- add_self_loops: Adds loops to a given adjacency matrix.
- batch_generator: Iterates over data with a given number of epochs, returns as a python generator one packet at a time.
- chebyshev_filter: Implementation of the Chebyshev filter for a given adjacency matrix.
- chebyshev_polynomial: Computes Chebyshev polynomials from X up to the order of k.
- check_dtypes: Checking the data set type.
- check_dtypes_decorator: Decorator for automatic type checking.
- collate_labels_disjoint: Matches this list of labels for disjoint mode.
- degree_power: Calculates the deviation
- deserialize_kwarg: Deserialization of arguments
- deserialize_scatter: Deserialization of scattering (
scatter
) - dot_production: Calculates the multiplication of a @b for a and b of the same rank (both 2 or both 3 ranks).
- gcn_filter: Filters garf.
- get_spec: Returns a specification (description or metadata) for a tensorflow type tensor.Tensor.
- idx_to_mask: Returns the mask by indexes.
- load_binary_file: Loads a value from a file serialized by the pickle module.
- mask_to_float_weights: Converts the bit mask into simple weights to calculate the average losses across the network nodes.
- mask_to_simple_weights: Converts the bit mask into simple weights to calculate the average losses across the network nodes.
- dot_production_in_mixed_mode: Calculates the equivalent of the
tf.einsum function('ij, bjk->bik', a, b)
. - dot_production_modal: Calculates matrix multiplication for a and b.
- normalized_adjacency_matrix: Normalizes a given adjacency matrix.
- normalized_laplacian: Computes the normalized Laplacian of a given adjacency matrix.
- preprocess_features: Computing features.
- read_file: Reading the file.
- rescale_laplacian: Scales the Laplace eigenvalues to
[-1,1]
. - reshape: Changes the shape according to the shape, automatically coping with the rarefaction.
- serialize_kwarg: Serialization of attributes.
- serialize_scatter: Serialization of the scatter.
- shuffle_inplace: Shuffle
np.random.shuffle
. - sparse_matrices_to_sparse_tensors: Transformation of Scipy sparse matrices into a tensor.
- sparse_matrix_to_sparse_tensor: Converts a sparse Scipy matrix into a sparse tensor.
- convert_node_objects_to_disjoint: Converts lists of node objects, adjacency matrices, and boundary objects into disjoint mode.
- to_tensorflow_signature: Converts a dataset signature to a TensorFlow signature.
- transpose: Transposes parameter
a
, automatically coping with sparsity using overloaded TensorFLow functions.
Library configuration sets a lot of files in the config directory.
The main composition (named parameters):
- aggregation methods,
- properties and attributes,
- application constants,
- data types,
- datasets,
- folders,
- named functions,
- initializers,
- models,
- names,
- links.
Can be used like this.
Set up envs
cp .env.dist .env
Create virtual environment
virtualenv -p <path_to_python> venv
source venv/bin/activate
Install packages
pip install -r requirements.txt
or
make install
If you change some packages, you can freeze this with command
pip freeze > requirements.txt
or
make freeze
Defines Vacancies/Keywords DataSet generator from HH.ru.
Collection of simple scripts for crawling vacancies from HH.ru site via API for generating CSV file by fields data like: name, description and key skills.
It helps to generate CSV file with following format:
"$name1 & $description1","key skills1"
"$name2 & $description2","key skills2"
"$name3 & $description3","key skills3"
...
Scripts tested on python 3.10 but should work on previous versions too.
Change text
field in download.py
to yours:
text = 'NAME:Data science'
Then run script
cd ./gns/crawlers/hh
python download.py
This script will download save results from API to ./docs/pagination
folder in JSON format.
On the next step we need to download extended details about vacancies:
python parse.py
Script will call API and save responses to ./docs/vacancies
folder.
python generate.py
Result will be saved to ./docs/csv
folder.
cd ./gns/crawlers/vk
python main.py <vk_nickname_or_id>
A Makefile is provided to automate some tasks. Available commands:
- install: Installing packages.
- freeze: Fixing packages.
- clear: clearing the cache.
- serve: package maintenance:
- landing,
- automatic formatting,
- sorting of imports,
- typing check.
- test: run tests.
Examples are provided in the directory examples:
- Test example for the
Cora
dataset (analysis of the citation graph of social network messages). - Test case for the
Cora
dataset for the Chebyshev Convolutional layer (analysis of the citation graph of social network messages). - Simple Test Case for the Cora dataset (analysis of the citation graph of social network messages).
- Examples of finding the shortest distance on a graph for the Bellman-Ford algorithm and modified Bellman-Ford algorithm.
- Industry example for vacancy search.