Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CSVExport exports data do not follow the column header #1002

Closed
cric96 opened this issue Dec 26, 2021 · 8 comments
Closed

CSVExport exports data do not follow the column header #1002

cric96 opened this issue Dec 26, 2021 · 8 comments

Comments

@cric96
Copy link
Contributor

cric96 commented Dec 26, 2021

Bug overview

CSVExport accepts a List of Export. Each of them can extract some data from the current state of Alchemist simulation.
To do that, Export implementations need to define columnNames and extractData. In particular, extractData is a Map that should link each column name with a value.
The problem here is that there is a mismatch in the file exported from CSVExport from column names and their value.

Example

Suppose you have implemented an Export with this column name:

val columnNames: List<String> = (1 to 5).map(i => s"temperature-$i").asJava // scala language

And then you have implemented the extractData method returning a Map for each temperature:

def extractData[T](....) = {
   val temperature1 = "temperature-1" -> 10
   val rest = (2 to 5).map("i => s"temperature-$i" -> 0)
   val allData = temperature1 :: rest
   Map(allData:_*).asJava
}

I excepted that the CSV that use this kind of export is structured as:

temperature-1 temperature-2 temperature-3 temperature-4 temperature-5
10 0 0 0 0

But, the file produced is something like:

temperature-1 temperature-2 temperature-3 temperature-4 temperature-5
0 0 0 0 10

How to reproduce

Here there is an MWE where the problem is shown.
So:
1- clone the repository
2- run ./gradlew runAllBatch
3- look at data/ to verify that the export file is effectively wrong

Possible problem

CSVExport prints the first line using columnNames but then it takes the extractData result directly without following the same
order of the columnNames value:

column names part

dataExtractors.flatMap {
    it.columnNames
}.forEach {
    outputPrintStream.print(it)
    outputPrintStream.print(" ")
}

extract data part

dataExtractors.forEach {
    it.extractData(environment, reaction, time, step).values.forEach { value ->
        outputPrintStream.print(value)
        outputPrintStream.print(' ')
    }
}

Possible solution

Mapping the column names with the output produced by the extractor.

val data = extractor.extractData(environment, reaction, time, step)
extractor.columnNames.map { data[it] }.filterNotNull().forEach {
    value -> outputPrintStream.print(value)
    outputPrintStream.print(' ')
}

Probably it is not the best solution in Kotlin so far, but it is just to underline the problem.

@DanySK
Copy link
Member

DanySK commented Dec 26, 2021

Can you transform the MWE into a test? Once it is in the test suite, I think I can fix it quickly, but I'd like to prevent further regressions as much as possible.

@DanySK DanySK added the bug label Dec 26, 2021
@DanySK
Copy link
Member

DanySK commented Dec 27, 2021

Labels do not necessarily align with actual data (you might extract one value per node, and the number of nodes may vary during a simulation), so the proposed solution won't work. I think that the culprit here is the transition between Scala's and Java's maps, which kills iteration order.

If you force a LinkedHashMap and add the elements in order, does the exporter work?
If so, I'm considering emitting a warning in case the provided map contains more than one element and is not a SortedMap or a LinkedHashMap - and updating the documentation of Exporter accordingly, specifying that in some context the iteration order may be relevant to some exporters.

I checked why the bug did not appear with Kotlin exporters, apparently, they use LinkedHashMap by default in transformations.

>>> (1..1000).map { it to "ciao" }.toMap()::class.java
res0: java.lang.Class<out kotlin.collections.Map<kotlin.Int, kotlin.String>> = class java.util.LinkedHashMap

@cric96
Copy link
Contributor Author

cric96 commented Dec 27, 2021

Labels do not necessarily align with actual data (you might extract one value per node, and the number of nodes may vary during a simulation), so the proposed solution won't work. I think that the culprit here is the transition between Scala's and Java's maps, which kills iteration order.

Ah, right, I hadn't thought of that.

If you force a LinkedHashMap and add the elements in order, does the exporter work?

Yes (In particular I used TreeMap, but it makes no difference). It sounds to me like a workaround, so I did not mention that.

If so, I'm considering emitting a warning in case the provided map contains more than one element and is not a SortedMap or a LinkedHashMap - and updating the documentation of Exporter accordingly, specifying that in some context the iteration order may be relevant to some exporters.

It could be a good idea! BTW scala-java converters are quite "tricky". When I call the method asJava it wraps the current map with an internal type representation MapWrapper, so I think that the warning does not help in that case. But a hint in the documentation could help!

I checked why the bug did not appear with Kotlin exporters, apparently, they use LinkedHashMap by default in transformations.

>>> (1..1000).map { it to "ciao" }.toMap()::class.java
res0: java.lang.Class<out kotlin.collections.Map<kotlin.Int, kotlin.String>> = class java.util.LinkedHashMap

Interesting, Scala instead uses HashMap by default (when you pass more than four values, otherwise the factory returns a specialized implementation of Map like Map1, Map2, Map3, etc.).

@DanySK
Copy link
Member

DanySK commented Dec 27, 2021

A reasonable way to face this is the following:

  1. if the set of keys matches the column names, use the workaround you proposed
  2. if it does not, check that the map is a singleton map, a SortedMap, or a LinkedHashMap
  3. if so, iterate over the map as we do now
  4. if not, raise a warning

I don't see why the warning should not arrive when you use Scala: in the end, it won't find any supported type. But yes, the documentation needs to be updated anyway.

@cric96
Copy link
Contributor Author

cric96 commented Dec 27, 2021

Yes, this solution seems reasonable to me.

Another solution came up in my mind: what do you think if we enforce correct extractData and columnNames order directly in a Extractor interface extension?

For instance, we can define another interface that enforces SortedMap as the return for extractData method:

interface SortedExtractor<E: Any> : Extractor<E> {
    override fun <T> extractData(....): SortedMap<String, E>
    val columnNames: List<String>
}

Or something like that.
We can also check here if the key set follows the columnNames. Otherwise we can throw an Exception.
I do know if it makes any sense at all.

@DanySK
Copy link
Member

DanySK commented Dec 27, 2021

It does, I do not want to force SortedMap as the usual way to have iteration-predictable maps is via LinkedHashMap which is not a SortedMap.

@DanySK
Copy link
Member

DanySK commented Dec 27, 2021

#1004 contains a fix for this issue. Still, I'd like to see a test before merging, so if you could create a PR with a regression test, I'd be happy to merge and release the fix. Ideally, create a test in a branch of yours and open two PRs: one with master as base, and one with fix/1002 as base. I expect the former to fail and the latter to succeed.

cric96 added a commit to cric96/Alchemist that referenced this issue Dec 27, 2021
…verifies the column alignment problem of CSVExport
DanySK pushed a commit that referenced this issue Dec 28, 2021
* test(loading): add a test for the issue #1002 that verifies the column alignment problem of CSVExport

* style(loading): solve detekt debts
DanySK pushed a commit that referenced this issue Dec 28, 2021
* test(loading): add a test for the issue #1002 that verifies the column alignment problem of CSVExport

* style(loading): solve detekt debts
@DanySK DanySK closed this as completed in 838236a Dec 28, 2021
DanySK pushed a commit that referenced this issue Dec 28, 2021
* test(loading): add a test for the issue #1002 that verifies the column alignment problem of CSVExport

* style(loading): solve detekt debts
DanySK pushed a commit that referenced this issue Dec 28, 2021
## [12.1.6](12.1.5...12.1.6) (2021-12-28)

### Bug Fixes

* require predictable iteration order for extractors in CSVExporter, fixes [#1002](#1002) ([838236a](838236a))
@DanySK
Copy link
Member

DanySK commented Dec 28, 2021

🎉 This issue has been resolved in version 12.1.6 🎉

The release is available on:

Your semantic-release bot 📦🚀

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants