-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[#228] Implemented processing of xls files without transformation to csv #285
base: main
Are you sure you want to change the base?
Conversation
XLS and CSV files are now processed using adapters, will fix the HTML processing soon |
Processing HTML tables works again, though it still has its inelegant solution with a conversion to CSV. |
HTML tables are now being processed directly, without conversion. However, I'm not sure that won't introduce more bugs because HTML doesn't store info of all the cells in every row, so I have to use a workaround to remember if we should have merged cells in every place. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See my suggestions.
s-pipes-modules/module-tabular/src/main/java/cz/cvut/spipes/modules/util/FileReaderAdapter.java
Outdated
Show resolved
Hide resolved
s-pipes-modules/module-tabular/src/main/java/cz/cvut/spipes/modules/util/FileReaderAdapter.java
Outdated
Show resolved
Hide resolved
s-pipes-modules/module-tabular/src/main/java/cz/cvut/spipes/modules/TabularModule.java
Outdated
Show resolved
Hide resolved
} | ||
|
||
@Override | ||
public String[] getHeader() throws IOException { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should support skipHeader = true
...s-modules/module-tabular/src/main/java/cz/cvut/spipes/modules/util/XLSFileReaderAdapter.java
Outdated
Show resolved
Hide resolved
logMissingQuoteError(); | ||
return getExecutionContext(inputModel, outputModel); | ||
} | ||
fileReaderAdapter.initialise(new ByteArrayInputStream(sourceResource.getContent()), sourceResourceFormat, processTableAtIndex); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Before it was implemented like:
ICsvListReader listReader = getCsvListReader(csvPreference);
if (listReader == null) {
logMissingQuoteError();
return getExecutionContext(inputModel, outputModel);
}
but there is no reason to have
if (listReader == null) {
logMissingQuoteError();
return getExecutionContext(inputModel, outputModel);
}
outside of getCsvListReader() method.
|
||
tableSchema.adjustProperties(hasInputSchema, outputColumns, sourceResource.getUri()); | ||
tableSchema.setColumnsSet(new HashSet<>(outputColumns)); | ||
tableSchema.adjustProperties(hasInputSchema, outputColumns, sourceResource.getUri()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we lost implementation of listReader.close()
1f22c86
to
940f458
Compare
940f458
to
d53c74c
Compare
Resolves #228.
.xls files are now being processed separately, without conversion to .csv. This is preliminary code, I'll work on reducing the amount of duplicate code using adapters and refractor it.