CSVExport exports data do not follow the column header #1002

cric96 · 2021-12-26T23:03:50Z

Bug overview

CSVExport accepts a List of Export. Each of them can extract some data from the current state of Alchemist simulation.
To do that, Export implementations need to define columnNames and extractData. In particular, extractData is a Map that should link each column name with a value.
The problem here is that there is a mismatch in the file exported from CSVExport from column names and their value.

Example

Suppose you have implemented an Export with this column name:

val columnNames: List<String> = (1 to 5).map(i => s"temperature-$i").asJava // scala language

And then you have implemented the extractData method returning a Map for each temperature:

def extractData[T](....) = {
   val temperature1 = "temperature-1" -> 10
   val rest = (2 to 5).map("i => s"temperature-$i" -> 0)
   val allData = temperature1 :: rest
   Map(allData:_*).asJava
}

I excepted that the CSV that use this kind of export is structured as:

temperature-1 temperature-2 temperature-3 temperature-4 temperature-5
10 0 0 0 0

But, the file produced is something like:

temperature-1 temperature-2 temperature-3 temperature-4 temperature-5
0 0 0 0 10

How to reproduce

Here there is an MWE where the problem is shown.
So:
1- clone the repository
2- run ./gradlew runAllBatch
3- look at data/ to verify that the export file is effectively wrong

Possible problem

CSVExport prints the first line using columnNames but then it takes the extractData result directly without following the same
order of the columnNames value:

column names part

dataExtractors.flatMap {
    it.columnNames
}.forEach {
    outputPrintStream.print(it)
    outputPrintStream.print(" ")
}

extract data part

dataExtractors.forEach {
    it.extractData(environment, reaction, time, step).values.forEach { value ->
        outputPrintStream.print(value)
        outputPrintStream.print(' ')
    }
}

Possible solution

Mapping the column names with the output produced by the extractor.

val data = extractor.extractData(environment, reaction, time, step)
extractor.columnNames.map { data[it] }.filterNotNull().forEach {
    value -> outputPrintStream.print(value)
    outputPrintStream.print(' ')
}

Probably it is not the best solution in Kotlin so far, but it is just to underline the problem.

The text was updated successfully, but these errors were encountered:

DanySK · 2021-12-26T23:45:29Z

Can you transform the MWE into a test? Once it is in the test suite, I think I can fix it quickly, but I'd like to prevent further regressions as much as possible.

DanySK · 2021-12-27T00:12:31Z

Labels do not necessarily align with actual data (you might extract one value per node, and the number of nodes may vary during a simulation), so the proposed solution won't work. I think that the culprit here is the transition between Scala's and Java's maps, which kills iteration order.

If you force a LinkedHashMap and add the elements in order, does the exporter work?
If so, I'm considering emitting a warning in case the provided map contains more than one element and is not a SortedMap or a LinkedHashMap - and updating the documentation of Exporter accordingly, specifying that in some context the iteration order may be relevant to some exporters.

I checked why the bug did not appear with Kotlin exporters, apparently, they use LinkedHashMap by default in transformations.

>>> (1..1000).map { it to "ciao" }.toMap()::class.java
res0: java.lang.Class<out kotlin.collections.Map<kotlin.Int, kotlin.String>> = class java.util.LinkedHashMap

cric96 · 2021-12-27T08:53:30Z

Labels do not necessarily align with actual data (you might extract one value per node, and the number of nodes may vary during a simulation), so the proposed solution won't work. I think that the culprit here is the transition between Scala's and Java's maps, which kills iteration order.

Ah, right, I hadn't thought of that.

If you force a LinkedHashMap and add the elements in order, does the exporter work?

Yes (In particular I used TreeMap, but it makes no difference). It sounds to me like a workaround, so I did not mention that.

If so, I'm considering emitting a warning in case the provided map contains more than one element and is not a SortedMap or a LinkedHashMap - and updating the documentation of Exporter accordingly, specifying that in some context the iteration order may be relevant to some exporters.

It could be a good idea! BTW scala-java converters are quite "tricky". When I call the method asJava it wraps the current map with an internal type representation MapWrapper, so I think that the warning does not help in that case. But a hint in the documentation could help!

I checked why the bug did not appear with Kotlin exporters, apparently, they use LinkedHashMap by default in transformations.
>>> (1..1000).map { it to "ciao" }.toMap()::class.java
res0: java.lang.Class<out kotlin.collections.Map<kotlin.Int, kotlin.String>> = class java.util.LinkedHashMap

Interesting, Scala instead uses HashMap by default (when you pass more than four values, otherwise the factory returns a specialized implementation of Map like Map1, Map2, Map3, etc.).

DanySK · 2021-12-27T11:15:29Z

A reasonable way to face this is the following:

if the set of keys matches the column names, use the workaround you proposed
if it does not, check that the map is a singleton map, a SortedMap, or a LinkedHashMap
if so, iterate over the map as we do now
if not, raise a warning

I don't see why the warning should not arrive when you use Scala: in the end, it won't find any supported type. But yes, the documentation needs to be updated anyway.

cric96 · 2021-12-27T12:11:47Z

Yes, this solution seems reasonable to me.

Another solution came up in my mind: what do you think if we enforce correct extractData and columnNames order directly in a Extractor interface extension?

For instance, we can define another interface that enforces SortedMap as the return for extractData method:

interface SortedExtractor<E: Any> : Extractor<E> {
    override fun <T> extractData(....): SortedMap<String, E>
    val columnNames: List<String>
}

Or something like that.
We can also check here if the key set follows the columnNames. Otherwise we can throw an Exception.
I do know if it makes any sense at all.

DanySK · 2021-12-27T12:22:23Z

It does, I do not want to force SortedMap as the usual way to have iteration-predictable maps is via LinkedHashMap which is not a SortedMap.

fixes #1002

DanySK · 2021-12-27T13:21:06Z

#1004 contains a fix for this issue. Still, I'd like to see a test before merging, so if you could create a PR with a regression test, I'd be happy to merge and release the fix. Ideally, create a test in a branch of yours and open two PRs: one with master as base, and one with fix/1002 as base. I expect the former to fail and the latter to succeed.

…verifies the column alignment problem of CSVExport

fixes #1002

* test(loading): add a test for the issue #1002 that verifies the column alignment problem of CSVExport * style(loading): solve detekt debts

fixes #1002

* test(loading): add a test for the issue #1002 that verifies the column alignment problem of CSVExport * style(loading): solve detekt debts

## [12.1.6](12.1.5...12.1.6) (2021-12-28) ### Bug Fixes * require predictable iteration order for extractors in CSVExporter, fixes [#1002](#1002) ([838236a](838236a))

DanySK · 2021-12-28T15:05:13Z

🎉 This issue has been resolved in version 12.1.6 🎉

The release is available on:

12.1.6
GitHub release

Your semantic-release bot 📦🚀

DanySK added the bug label Dec 26, 2021

DanySK added a commit that referenced this issue Dec 27, 2021

fix: require predictable iteration order for extractors in CSVExporter,

8c99cf3

fixes #1002

cric96 added a commit to cric96/Alchemist that referenced this issue Dec 27, 2021

test(loading): add a test for the issue AlchemistSimulator#1002 that …

aea72a1

…verifies the column alignment problem of CSVExport

DanySK added a commit that referenced this issue Dec 28, 2021

fix: require predictable iteration order for extractors in CSVExporter,

b511a1e

fixes #1002

DanySK added a commit that referenced this issue Dec 28, 2021

fix: require predictable iteration order for extractors in CSVExporter,

37988f6

fixes #1002

DanySK pushed a commit that referenced this issue Dec 28, 2021

test(loading): add regression test for #1002 (#1005)

6745e7b

* test(loading): add a test for the issue #1002 that verifies the column alignment problem of CSVExport * style(loading): solve detekt debts

DanySK added a commit that referenced this issue Dec 28, 2021

fix: require predictable iteration order for extractors in CSVExporter,

2d505c1

fixes #1002

DanySK pushed a commit that referenced this issue Dec 28, 2021

test(loading): add regression test for #1002 (#1005)

a63f368

* test(loading): add a test for the issue #1002 that verifies the column alignment problem of CSVExport * style(loading): solve detekt debts

DanySK closed this as completed in 838236a Dec 28, 2021

DanySK pushed a commit that referenced this issue Dec 28, 2021

test(loading): add regression test for #1002 (#1005)

4125dd9

* test(loading): add a test for the issue #1002 that verifies the column alignment problem of CSVExport * style(loading): solve detekt debts

DanySK pushed a commit that referenced this issue Dec 28, 2021

chore(release): 12.1.6 [skip ci]

b13f4db

## [12.1.6](12.1.5...12.1.6) (2021-12-28) ### Bug Fixes * require predictable iteration order for extractors in CSVExporter, fixes [#1002](#1002) ([838236a](838236a))

DanySK added the released label Dec 28, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CSVExport exports data do not follow the column header #1002

CSVExport exports data do not follow the column header #1002

cric96 commented Dec 26, 2021

DanySK commented Dec 26, 2021

DanySK commented Dec 27, 2021

cric96 commented Dec 27, 2021

DanySK commented Dec 27, 2021 •

edited

Loading

cric96 commented Dec 27, 2021

DanySK commented Dec 27, 2021

DanySK commented Dec 27, 2021

DanySK commented Dec 28, 2021

CSVExport exports data do not follow the column header #1002

CSVExport exports data do not follow the column header #1002

Comments

cric96 commented Dec 26, 2021

Bug overview

Example

How to reproduce

Possible problem

Possible solution

DanySK commented Dec 26, 2021

DanySK commented Dec 27, 2021

cric96 commented Dec 27, 2021

DanySK commented Dec 27, 2021 • edited Loading

cric96 commented Dec 27, 2021

DanySK commented Dec 27, 2021

DanySK commented Dec 27, 2021

DanySK commented Dec 28, 2021

DanySK commented Dec 27, 2021 •

edited

Loading