dynamic-org-chart-scripts

See parent repository for more information on the dynamic-org-chart project

dynamic-org-chart-scripts

Some preprocessing scripts to prepare data for the Dynamic Org Chat project

This repository runs a scheduled job that fetches the Government of Canada Employee Contact Information (GEDS) dataset that is made available under the Open Government License - Canada. The data produced by the scripts in this repository are used in a REST API which can be found [here](ADD LINK), and an instance of Elastic Search. These services then provide data to a front-end that shows a searchable interactive organizational chart, which can be found [here](ADD LINK).

TODO:

Make org chart write "children" keys instead of "_children" keys by default, so that the front end does not need to unhide and rehide during d3 indexing.

Create separate indices in elastic search for french/english fields (will likely want to search them differently).

Check how the organization name column is being compared with the organization path column. Some typos/casing/spacing errors cause a failure to identify a search path.

Start up instructions

Pull the elasticsearch docker image and run elasticsearch as a docker container.

docker pull docker.elastic.co/elasticsearch/elasticsearch:6.5.1

docker run -p 9200:9200 -p 9300:9300 -e "http.cors.enabled=true" -e "http.cors.allow-origin=*" -e "discovery.type=single-node" docker.elastic.co/elasticsearch/elasticsearch:6.5.1

Clone the dynamic-org-chart-scripts repository to a folder on your computer.

git clone https://github.com/DSD-ESDC-EDSC/dynamic-org-chart-scripts

In a new terminal, create and activate the virtual environment in the root of the dynamic-org-chart-scripts folder.

conda env create -f environment.yml

conda activate ./venv

Start the main script in the process-geds-data repository by running the following command from within the virtual environment in the project root:

python start.py

Data

The data produced by the scripts in this repository can be broken into two types: flat and hierarchical. Flat data are written into the SQL tables described below.

SQL Tables

employees

	column name	type
PK	employee_id	INTEGER
	last_name	TEXT
	first_name	TEXT
	job_title_en	TEXT
	job_title_fr	TEXT
	phone_number	TEXT
	email	TEXT
	address_en	TEXT
	address_fr	TEXT
	province_en	TEXT
	province_fr	TEXT
	city_en	TEXT
	city_fr	TEXT
	postal_code	TEXT
FK	org_id	INTEGER
FK	dept_id	INTEGER

organizations

	column name	type	description
PK	org_id	INTEGER
	org_name_en	TEXT
	org_name_fr	TEXT
FK	dept_id	INTEGER
	org_chart_path	TEXT	An array (serialized to string) describing the tree traversal required to arrive at the organization in the org chart.

departments

	column name	type	description
PK	dept_id	INTEGER
	department_en	TEXT
	department_fr	TEXT
	org_chart_en	TEXT	A JSON (serialized to string) describing the english org chart for the department.
	org_chart_fr	TEXT	A JSON (serialized to string) describing the french org chart for the department.

Hierarchical Data

Using a column from the csv extracted from GEDS ("Organization Structure (EN/FR)"), it is possible to extract a hierarchical structure, which yields an organizational structure for the Government of Canada, as described in GEDS. The format of the data is as follows:

{
  "name": "Employment and Social Development Canada",
  "_children": [
    {
      "name": "Deputy Minister of Employment and Social Development Canada",
      "_children": [
        {
          "name": "Strategic and Service Policy Branch",
          "_children": [
            {
              ...
            }
          ]
        }
      ]
    }
  ]
}

Each node has associated with it a name and _children property. name, of course, is the name of the organizational unit being described, and _children is an array of nodes that are direct descendents of the current node. Due to the recursive nature of this structure, it is possible to search for specific business units. Since this linear search through the tree is slow, the path to each organization (starting at the root of its tree) is identified offline during the creation of the database tables (stored in the org_chart_path column of the organizations table). This way, as soon as an organization name or person is searched for (e.g. using ElasticSearch), the path to arrive at that organization or person is instantly available, without requiring a linear search through the tree at the time the request is made.

Folder Organization

data

For illustrative purposes, a small csv with fake data is included to simulate what raw data on employees of an organization might look like.

schedule

config

Configurations are stored here using *.cfg files, and parsed with Python's configparser library.

main

The code below runs a scheduled job using Python's apscheduler library. In practice, this job will likely be run every 24 hours to pick up the latest posting of GEDS data.

Every function is called from main. The call graph below outlines which functions are called and in which order in development vs. production. The main difference between the two is that the production graph will not fetch the GEDS data from the url every time it is ran; instead, it will store a local copy as a csv file and use that instead to avoid unnecessarily downloading the same file many times while testing during development.

Dev Module Dependency Graph

Production Module Dependency Graph

test

Elasticsearch

TODO: need to test that the org chart paths generated agree with the paths to nodes in the org chart tree structure.

This repository assumes there exists an instance of Elasticsearch that it can upload data to via Elasticsearch's bulk api. This repository makes use of the Python client for Elasticsearch.

Note: Elasticsearch 6.0.0+ removed support for multiple mapping types. As an alternative to multiple mapping types, this repository uses a single index for each type of document. In this case, the two types are employee and organization. As such, there are two Elasticsearch indices named employee and organization.

Set up Elasticsearch with Docker

If you have Docker installed on your system, you can get up-and-running with Elasticsearch in only a few steps. These steps are outlined briefly below, but see here for more information.

Pull the Elasticsearch image from the Elastic docker registry.

docker pull docker.elastic.co/elasticsearch/elasticsearch:6.5.1

Start a single node cluster of elasticsearch on your local machine (host defaults to localhost). Note that the arguments -e "http.cors.enabled=true" -e "http.cors.allow-origin=*" should only be used in development.

docker run -p 9200:9200 -p 9300:9300 -e "http.cors.enabled=true" -e "http.cors.allow-origin=*" -e "discovery.type=single-node" docker.elastic.co/elasticsearch/elasticsearch:6.5.1

Note: there are xms and xmx flags that control the minimum/maximum heap size for JVM.

Postman Collection

TODO: create this and document it

Resources

License

Unless otherwise noted, program source code of this project is covered under Crown Copyright, Government of Canada, and is distributed under the MIT License.

The Canada wordmark and related graphics associated with this distribution are protected under trademark law and copyright law. No permission is granted to use them outside the parameters of the Government of Canada's corporate identity program. For more information, see Federal identity requirements.

Attribution

This project would not be possible without the availability and use of open source software. Acknowledgement and attribution to the open source tools used, along with their corresponding open licenses (where one was found), can be found in the ATTRIBUTION.md file in this repository. Users are advised to consult original sources for official information, especially if they plan on re-distributing all or parts of these code artifacts.

How to Contribute

Instructions for how to contribute can be found in the CONTRIBUTING.md file.

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
data/test		data/test
schedule		schedule
.gitignore		.gitignore
ATTRIBUTION.md		ATTRIBUTION.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE.md		LICENSE.md
README.md		README.md
SECURITY.md		SECURITY.md
app.sh		app.sh
environment.yml		environment.yml
start.py		start.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

dynamic-org-chart-scripts

Start up instructions

Data

SQL Tables

Hierarchical Data

Folder Organization

data

schedule

config

main

test

Elasticsearch

Set up Elasticsearch with Docker

Postman Collection

Resources

License

Attribution

How to Contribute

About

Releases

Packages

Contributors 2

Languages

License

DSD-ESDC-EDSC/dynamic-org-chart-scripts

Folders and files

Latest commit

History

Repository files navigation

dynamic-org-chart-scripts

Start up instructions

Data

SQL Tables

Hierarchical Data

Folder Organization

data

schedule

config

main

test

Elasticsearch

Set up Elasticsearch with Docker

Postman Collection

Resources

License

Attribution

How to Contribute

About

Resources

License

Security policy

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages