Networks of Names extracts relationships between people and organizations and makes them explorable visually through an interactive interface.
Networks of Names is a web application based on the Play Framework, which is required to run it. Typically, Play applications are started on an own port using the Netty server included in Play. For persistence, the application depends on a working installation of Postgre-SQL. The application is developed in Scala, but all necessary dependencies are resolved by the Play framework. You should make sure to have Java 7 runtime installed.
To install and start Networks of Names, you need to perform the following steps:
-
Download and extract
non-vis.tar.gz
to a location not directly accessible through a running webserver. -
Most of the dependencies are handled automatically by SBT when Play compiles the sources. However, some dependencies are not available from Maven/SBT repositories. To obtain them, change into the project's root directory and execute
./download-libs.bsh
. This will download and extract ELKI (version 0.5.5) and Java-ML (version 0.1.7). You can alternatively download the respective jars manually and place them into the project'slib/
directory. -
In the user's home folder, create the following folder structure to be used by Networks of Names (back- and frontend). Although some of the folders are created as they are required, the application generally assumes them to be present.
netsofnames/ config/ In/ Out/ server/ tmp/
The folders are used for different use cases by the preprocessor and the visual interactive system:
config
contains configuration files (they can generally be placed anywhere, but should not be publicly available),In
is meant to hold primary corpus data, while preprocessor output can be saved to (and imported from) folders inOut
. Inserver
, the visual interactive system caches calculation results. Scripts required for preprocessor execution are unpacked from the jar-file intotmp
. -
Copy
conf/application.conf
from the project directory to$HOME/conf/application.conf
(or any other filename).Edit the file to configure your instance. Specifically, you need to configure database access information and credentials by setting
db.default.url
,db.default.user
, anddb.default.password
.Other instance properties can also be configured here. Properties in the
application
namespace are parameters to Play, while properties in thenon
namespace configure Networks of Names. You can enablenon.dev
to run the application in development mode (making development actions, such as database modification, available) andnon.log
to enable action logging in the application.You can create different versions of the conf-file to be able to switch easily between different parametrizations when starting the application.
-
Make sure you have Play installed (Networks of Names was developed using Play version 2.1.2), following the installation instructions in the documentation.
Build the project by changing into the project directory and executing
play update clean compile
This will download all dependencies and compile source files.
-
Make sure a Postgre-SQL database is installed and running, with database name and credentials from the conf-file being correct.
-
To start the project, change into the project directory and execute
play -Dconfig.file=/path/to/your/configuration/file start
This will start the application on port 9000. To change the port, pass
-Dhttp.port=8080
(or any other port) toplay
in addition to theDconfig.file
property (all parameters need to be passed beforestart
, otherwise Play will ignore them). In general, all other possibilities to deploy a Play application should also work. For that, see the Play documentation.Starting the application with no data imported (and tables not present in the database) will result in instant termination. See the following section for data import.
The preprocessor extracts entities and relationships from (German) newspaper sentences (or any natural-language sentences) and produces the data format required by the visual interactive system. Once preprocessed data is present, it can be imported to the database and used for visual exploration.
The preprocessor is split into two parts to allow easier development without having to restart (and reload) the complete system: One part loads the named entity recognizer and makes entity extraction available as a service. The other part contains the preprocessor logic, including the corpus interface and input/output implementations.
The implementation depends on Java 7 (earlier versions will not work) and a UNIX system for the execution of sort
and
sed
commands.
To execute preprocessing, perform the following steps:
-
Download and extract
non-ner-server.tar.gz
andnon-preprocessor.tar.gz
. Both archives contain an executable jar-file, another jar with sources, and a lib subdirectory with dependencies. -
Start the NER server by running:
java -Xms512M -Xmx2G -jar non-ner-server.jar
The server will register a local service and load the language model. This can take some time. To start the server in the background,
nohup
can be used:nohup java -Xms512M -Xmx2G -jar non-ner-server.jar netsofnames/In/70M/ &> non-ner-server.out &
-
Once the NER server has loaded the language model, run the preprocessor like this (or using
nohup
in analogy to the NER server):java -Xms512M -Xmx2G -jar non-preprocessor.jar -input /path/to/corpus/files
The folder given as
-input
should contain one or several corpora in German from the Leipzig Corpora Collection (although other languages and sources are theoretically possible, both possibilities are not implemented).
To import data, run the visual interactive system in development mode (by setting non.dev=true
in the conf-file),
navigate to
http://localhost:9000/database/import
(replacing the host and port to match the actual location).
The application will show subfolders of $HOME/netsofnames/Out
for selection, if they contain all relevant files, namely
entities.tsv
, relationships.tsv
, sentences.tsv
, sources.tsv
, relationships_to_sentences.tsv
, and
sentences_to_sources.tsv
. Those files should contain a relational representation of the data in tab-separated format
(as produced by the preprocessor) and will be imported into the database into tables corresponding to the filenames.
Check the option "Clean" to perform data cleaning after import. This is not strictly required, but highly recommended to remove junk data and increase performance.
Importing and cleaning data extracted from large corpora may take a long time.
Networks of Names uses additional tables for creating tags, automatic classification, and storing actions logs. To (re)create those tables, navigate to
http://localhost:9000/database/create/tags
http://localhost:9000/database/create/logs
For details on other administration features, see app/controllers/Administration.scala
.
Networks of Names is made available under the Apache 2.0 License. The project depends on a number of third-party libraries that are subject to their own licenses.
Preprocessor dependencies are packed into the jar
, unmodified, and subject to their respective licenses. As for October
9th 2013, the dependency versions and respective licenses are as follows:
- Scala (version 2.10)
- BSD-style license
- ScalaTest (version 2.0.M6-SNAP8)
- Apache License, Version 2.0
- JodaTime (version 2.1)
- Apache License, Version 2.0
- Stanford NLP (version 2.10)
- GPL v2
Server-side dependencies of the visual interactive system are not distributed with the project, but obtained by Play
during compilation or downloaded by the user (manually or through download-libs.bsh
in the project's root folder). For
details on the dependencies, see project/target/Build.scala
.
Several JavaScript frameworks are distributed with the project for use by the frontend. As for October 9th 2013, the dependency versions and respective licenses are as follows:
- Twitter Bootstrap (version 2.3.2)
- Apache License, Version 2.0
- D3.js (version 3.3.6?)
- BSD
- bootstrap-datepicker
- Apache License, Version 2.0
- FileSaver (from 2013-01-23)
- MIT/X11 License
- hoverIntent (version r7)
- MIT License
- jQuery (version 1.9.0)
- MIT License