[#228] Implemented processing of xls files without transformation to csv #285

MSattrtand · 2024-12-12T11:53:21Z

Resolves #228.

.xls files are now being processed separately, without conversion to .csv. This is preliminary code, I'll work on reducing the amount of duplicate code using adapters and refractor it.

MSattrtand · 2024-12-15T20:46:22Z

XLS and CSV files are now processed using adapters, will fix the HTML processing soon

MSattrtand · 2024-12-15T23:53:38Z

Processing HTML tables works again, though it still has its inelegant solution with a conversion to CSV.

MSattrtand · 2024-12-18T00:09:49Z

HTML tables are now being processed directly, without conversion. However, I'm not sure that won't introduce more bugs because HTML doesn't store info of all the cells in every row, so I have to use a workaround to remember if we should have merged cells in every place.

blcham

See my suggestions.

s-pipes-modules/module-tabular/src/main/java/cz/cvut/spipes/modules/util/FileReaderAdapter.java

s-pipes-modules/module-tabular/src/main/java/cz/cvut/spipes/modules/TabularModule.java

blcham · 2024-12-18T12:02:47Z

...s-modules/module-tabular/src/main/java/cz/cvut/spipes/modules/util/XLSFileReaderAdapter.java

+    }
+
+    @Override
+    public String[] getHeader() throws IOException {


should support skipHeader = true

...s-modules/module-tabular/src/main/java/cz/cvut/spipes/modules/util/XLSFileReaderAdapter.java

blcham · 2024-12-18T12:18:12Z

s-pipes-modules/module-tabular/src/main/java/cz/cvut/spipes/modules/TabularModule.java

-                logMissingQuoteError();
-                return getExecutionContext(inputModel, outputModel);
-            }
+            fileReaderAdapter.initialise(new ByteArrayInputStream(sourceResource.getContent()), sourceResourceFormat, processTableAtIndex);


Before it was implemented like:

ICsvListReader listReader = getCsvListReader(csvPreference); if (listReader == null) { logMissingQuoteError(); return getExecutionContext(inputModel, outputModel); }

but there is no reason to have

if (listReader == null) { logMissingQuoteError(); return getExecutionContext(inputModel, outputModel); }

outside of getCsvListReader() method.

blcham · 2024-12-18T12:30:05Z

s-pipes-modules/module-tabular/src/main/java/cz/cvut/spipes/modules/TabularModule.java

-
-        tableSchema.adjustProperties(hasInputSchema, outputColumns, sourceResource.getUri());
-        tableSchema.setColumnsSet(new HashSet<>(outputColumns));
+            tableSchema.adjustProperties(hasInputSchema, outputColumns, sourceResource.getUri());


we lost implementation of listReader.close()

blcham requested changes Dec 18, 2024

View reviewed changes

MSattrtand force-pushed the 228-tabular-data-processing branch from 1f22c86 to 940f458 Compare December 22, 2024 00:56

MSattrtand closed this Dec 22, 2024

MSattrtand force-pushed the 228-tabular-data-processing branch from 940f458 to d53c74c Compare December 22, 2024 00:59

[#228] HTML and XLS files are now being processed directly

17cbbf8

MSattrtand reopened this Dec 22, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[#228] Implemented processing of xls files without transformation to csv #285

[#228] Implemented processing of xls files without transformation to csv #285

MSattrtand commented Dec 12, 2024 •

edited by blcham

Loading

MSattrtand commented Dec 15, 2024

MSattrtand commented Dec 15, 2024

MSattrtand commented Dec 18, 2024

blcham left a comment

blcham Dec 18, 2024

blcham Dec 18, 2024

blcham Dec 18, 2024

[#228] Implemented processing of xls files without transformation to csv #285

Are you sure you want to change the base?

[#228] Implemented processing of xls files without transformation to csv #285

Conversation

MSattrtand commented Dec 12, 2024 • edited by blcham Loading

MSattrtand commented Dec 15, 2024

MSattrtand commented Dec 15, 2024

MSattrtand commented Dec 18, 2024

blcham left a comment

Choose a reason for hiding this comment

blcham Dec 18, 2024

Choose a reason for hiding this comment

blcham Dec 18, 2024

Choose a reason for hiding this comment

blcham Dec 18, 2024

Choose a reason for hiding this comment

MSattrtand commented Dec 12, 2024 •

edited by blcham

Loading