Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Potential things to address from comments, if we haven't already #8

Open
codydunne opened this issue Aug 16, 2024 · 0 comments
Open

Comments

@codydunne
Copy link
Member

codydunne commented Aug 16, 2024

I removed all these old comments from the paper and left them here as notes for us.

background.qmd

This work stemmed from our need to find datasets tailored to test graph layout algorithms for graphs with specific features.
While many extant graph databases exist across various archives, we found ourselves seeking datasets used for algorithms in the graph drawing and network visualization literature, including information about their transformations and usage.
For instance, some layered graph drawing papers mention imposing layers on non-layered graphs for their evaluations [cite].
Therefore, different evaluations using different implementations of the layering may have different graphs despite using the same underlying dataset.
Similarly, we also wanted to find real-world datasets for case studies, which might not be accurate using a synthetic layering of non-layered graphs.
While our work focused on layered graphs, similar datasets are used in many graph layout domains, like edge bundling [cite], crossing number reduction [], and graph partitioning problems [cite].
Therefore, our work focuses on providing a graph benchmark collection that categorizes datasets by how they organize their graphs and emphasizes their features (e.g., ranging sizes for testing scalability).
We aim to facilitate researchers' choice of benchmarks to reflect real use cases or allow comparisons to other algorithms in their respective fields.
We also summarize the graphs to help users overview the dataset before downloading.
This summary also includes some analysis to provide overarching quantitative graph information, such as node and edge distribution, which is relevant for problems involving graph sparsity.

Beyond finding relevant datasets for comparison and consistency with real-case scenarios, another priority that impulsed this work was the reproducibility of past and future research.
A dataset that has been used in an evaluation and is now unaccessible greatly hinders its reproducibility.
In the worst case, it makes it impossible to reproduce and, as such, much less meaningful.
In this context, it is also worth mentioning that in recent years, several initiatives have been aimed at encouraging care for replicability in research.
One such example is the graphics replicability stamp [https://www.replicabilitystamp.org/], which endorses the replicability of the results presented in a paper and ensures its replicability through an additional review process.
Another similar initiative is the ACM badges [https://www.acm.org/publications/policies/artifact-review-and-badging-current], or the SIGMOD availability and reproducibility initiative, which goes one step further and publishes full reports commenting on how reproducible a paper is.
Because our work aims to maintain a useful repository of graph datasets, we encourage anyone who may want to correct, integrate, or replace information to contact us. Authors are also welcome to submit a pull request to the following GitHub repository [link-git-hub].
Similarly, we host our work on the Open Science Framework (OSF), which contains a snapshot of the data and the code for formatting, collecting, and re-creating data when applicable.

Related Work {#sec-related-work}

While the list of network datasets could be vastly expanded, this paper limits itself to listing only the datasets that are explicitly cited in our corpus of 196 papers.

In section [Large collections], we discuss these and several others in more detail and offer links to their sites. We also provide examples from our survey of how those graphs are used to evaluate graph layout algorithms.

We discuss some of these in further detail as well in section [Sec: Uniform].
The Graph Drawing organization hosts three primary data sets used significantly in the field: the AT&T graphs, the Rome graphs, and randomly generated directed acyclic graphs [cite].

Besides, while the collections above offer insights into the properties and features of the graphs (see KONECT), the insights offered are not explicitly tailored to graph layout algorithms.
Several other efforts have been made to compile similar benchmarks within graph drawing and other graph problems.

Although we do include a discussion on the content of these collections, we believe they can cover a different purpose than the one we present above.

Similarly, Bachmeier et al. proposed the Open Graph Archive in 2011 as an effort to create a graph database that categorizes, analyses, and visualizes graphs uploaded from the community [cite].
Their work consisted of a similarly developed web-based interface that allows graphs to be exported in several formats.
They also included graphs across various large collections like the SuiteSparse Matrix Collection (formerly known as Florida in their paper).
While this work is valuable, the project is discontinued, and the URLs to the site are broken as of this paper's writing.
Hence, we emphasize our work with OSF for the longer posterity of the graph base we collect, independent of the health of our proposed web interface.

Kennedy et al. highlight some of the importance of tools to help researchers choose appropriate benchmarks in network sciences and graph drawing [].
Therefore, they proposed The Graph Landscape, a visual system with several views and graph metrics to compare across graphs from various bases like the Rome library.
While our work does not focus on the same extensive metrics they do, our base still provides visual examples of graph usage and summary statistics per dataset.
They also propose a visual system and prototype that could be conceptually used with the collections we compile.
A major emphasis of our work was to provide several file formats to facilitate the use of other tools and allow researchers greater ease of use.

Our intention with this paper is to create a network repository of graph collections tailored explicitly for evaluating graph layout algorithms. This is provided by linking collections to graph features present in the datasets (dynamic, layered, containing clusters, etc.), providing analysis and examples of usage in previous research to aid in discovering and finding relevant datasets.

datasets_in_use.qmd

There is a lot here, but I don't know everything I need to in order to decide what to cut... @cwilson22 ?

discussion.qmd

Right at the beginning:

::: {.callout-note title="Neural Network" collapse=true appearance="minimal"}

//| echo: false

make_sparkline("Neural_Network")


<div id="named-list-Neural_Network" data-bs-spy="scroll"  data-bs-target="db-nav-list" data-bs-offset="20" tabindex="0"></div>

::: 

::: {.callout-note title="Tobler's Flow Mapper" collapse=true appearance="minimal"}

```{ojs}
//| echo: false

make_sparkline("Toblers_FlowMapper")

:::

at the end:

We hope to help both current and future researchers to benefit from a rich repository of data, standardizing the evaluation process for graph layout algorithms, fostering innovation and development within the field and contributes to the advancement of graph drawing and visualization research.
With this work, we hope to underscore the importance of accessible, well-documented benchmark datasets in driving scientific progress and highlights our commitment to enhancing the integrity and reliability of computational evaluations in the Graph Drawing community.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant