-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adds support for Excelx (xlsx) Exporter #64
base: master
Are you sure you want to change the base?
Changes from all commits
13e71b9
aafde94
1035f6c
01fc93f
a6746c4
f478470
2f2bd6b
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -80,6 +80,7 @@ def write(path) | |
contents.each { |content| csv << content } | ||
csv.close | ||
end | ||
true | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Why is it necessary? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I thought that this could act as a consistent means of checking whether the "writing into a file" actually took place. Else, the It's not really necessary, but I thought this would be better. Let me know if this can be reverted. |
||
end | ||
|
||
private | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,114 @@ | ||
require 'daru/io/exporters/base' | ||
|
||
module Daru | ||
module IO | ||
module Exporters | ||
# Excelx Exporter Class, that extends `to_excelx_string` and `write_excelx` methods to | ||
# `Daru::DataFrame` instance variables | ||
class Excelx < Base | ||
Daru::DataFrame.register_io_module :to_excelx_string, self | ||
Daru::DataFrame.register_io_module :write_excelx, self | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Apart from standalone xlsx exporter, doesn't it make sense for the excel exporter to redirect here incase of There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Not 100% sure. Yep, it looks humane, but if/when you'll add formatting support for Excel exporters -- conventions on formatting are pretty different in two versions of Excel files and corresponding libraries. |
||
|
||
# Initializes an Excelx Exporter instance. | ||
# | ||
# @param dataframe [Daru::DataFrame] A dataframe to export. Supports even dataframes | ||
# with multi-index. | ||
# @param sheet [String] A sheet name, to export the dataframe into. Defaults to | ||
# 'Sheet0'. | ||
# @param header [Boolean] Defaults to true. When set to false or nil, | ||
# headers are not written. | ||
# @param data [Boolean] Defaults to true. When set to false or nil, | ||
# data values are not written. | ||
# @param index [Boolean] Defaults to true. When set to false or nil, | ||
# index values are not written | ||
# | ||
# @example Initializing an Excel Exporter instance | ||
# df = Daru::DataFrame.new([[1,2],[3,4]], order: [:a, :b]) | ||
# | ||
# #=> #<Daru::DataFrame(2x2)> | ||
# # a b | ||
# # 0 1 3 | ||
# # 1 2 4 | ||
# | ||
# instance = Daru::IO::Exporters::Excelx.new(df) | ||
def initialize(dataframe, sheet: 'Sheet0', header: true, data: true, index: true) | ||
optional_gem 'rubyXL' | ||
|
||
super(dataframe) | ||
@data = data | ||
@index = index | ||
@sheet = sheet | ||
@header = header | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Hm, do we have this lot of boolean vars in other importers too? I believe that one hash |
||
end | ||
|
||
# Exports an Excelx Exporter instance to a file-writable String. | ||
# | ||
# @return [String] A file-writable string | ||
# | ||
# @example Getting a file-writable string from Excelx Exporter instance | ||
# instance.to_s | ||
# | ||
# #=> "PK\u0003\u0004\u0014\u0000\u0000\u0000\b\u0000X\xA5YK\u0018\x87\xFC\u0017..." | ||
def to_s | ||
super(file_extension: '.xlsx') | ||
end | ||
|
||
# Exports an Excelx Exporter instance to an xlsx file. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I believe it doesn't export the exporter, it "exports the dataframe", or "performs the export(er)". (If it is standard phrase for all exporters, this notice is related to all of them). |
||
# | ||
# @param path [String] Path of excelx file where the dataframe is to be saved | ||
# | ||
# @example Writing an Excelx Exporter instance to an xlsx file | ||
# instance.write('filename.xlsx') | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I don't believe that such a simple example brings any clarity. Just remove it? |
||
def write(path) | ||
@workbook = RubyXL::Workbook.new | ||
@sheet = @workbook.add_worksheet(@sheet) | ||
process_offsets | ||
|
||
write_row(@header ? 0 : 1, fetch_headers) | ||
|
||
@dataframe.each_row_with_index.with_index do |(row, idx), i| | ||
write_row(@row_offset+i, fetch_index(idx) + fetch_data(row)) | ||
end | ||
|
||
@workbook.write(path) | ||
true | ||
end | ||
|
||
private | ||
|
||
def process_offsets | ||
@row_offset = @header ? 1 : 0 | ||
@col_offset = 0 unless @index | ||
@col_offset ||= @dataframe.index.is_a?(Daru::MultiIndex) ? @dataframe.index.width : 1 | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Is this or similar statement repeated in other places in codebase (calculating of index witdth)? Maybe move it to some utility module, or to base exporter? |
||
end | ||
|
||
def fetch_headers | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Is "fetch" a good name? "Fetch" implies some external source to be used, while this method in fact "formats" or "renders" its own instance variables. |
||
formatting([' '] * @col_offset + @dataframe.vectors.map(&:to_s), @header) | ||
end | ||
|
||
def fetch_index(idx) | ||
formatting(idx, @index) | ||
end | ||
|
||
def fetch_data(row) | ||
formatting(row, @data) | ||
end | ||
|
||
def formatting(idx, format) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Not the best name, as we'll add "real" Excel formatting once. Something like just # or "to_strings"?..
def to_a(object, render = true)
return [] unless render
Array(object).map(&:to_s)
end BTW, have you checked that |
||
return [] unless format | ||
|
||
case idx | ||
when Daru::Vector, Daru::MultiIndex, Array then idx.map(&:to_s) | ||
else [idx.to_s] | ||
end | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Wouldn't |
||
end | ||
|
||
def write_row(row_index, row_array) | ||
row_array.each_with_index do |element, col_index| | ||
@sheet.insert_cell(row_index, col_index, element.to_s) | ||
end | ||
end | ||
end | ||
end | ||
end | ||
end |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,39 @@ | ||
RSpec.describe Daru::IO::Exporters::Excelx do | ||
include_context 'exporter setup' | ||
|
||
let(:filename) { ['test_write', '.xlsx'] } | ||
let(:content) { Roo::Excelx.new(tempfile.path).sheet('Sheet0').to_a } | ||
|
||
before { described_class.new(df, **opts).write(tempfile.path) } | ||
|
||
context 'writes to excelx worksheet without index' do | ||
subject { Daru::DataFrame.rows(content[1..-1].map { |x| x.map { |y| convert(y) } }, order: content[0]) } | ||
|
||
let(:opts) { {index: false} } | ||
|
||
it_behaves_like 'exact daru dataframe', | ||
ncols: 4, | ||
nrows: 5, | ||
order: %w[a b c d], | ||
data: [ | ||
[1,2,3,4,5], | ||
[11,22,33,44,55], | ||
['a', 'g', 4, 5,'addadf'], | ||
['', 23, 4,'a','ff'] | ||
] | ||
end | ||
|
||
context 'writes to excelx worksheet with multi-index' do | ||
subject { content.map { |x| x.map { |y| convert(y) } } } | ||
|
||
let(:df) do | ||
Daru::DataFrame.new( | ||
[[1,2],[3,4]], | ||
order: %i[x y], | ||
index: [%i[a b c], %i[d e f]] | ||
) | ||
end | ||
|
||
it { is_expected.to eq([[' ', ' ', ' ', 'x', 'y'], ['a', 'b', 'c', 1, 3], ['d', 'e', 'f', 2, 4]]) } | ||
end | ||
end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am starting to think this part of README becames too long and repetitive. Just as an idea for the future:
WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For generating files like
{FORMAT}_Importer.md
, does an ERB template work? Or is there a better option?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's think. In my head, it (probably) does the following:
Formats.md
(one, not document-per-formatter, it is tiresome to browse)._preface.md
).Can be implemented with just Ruby code, or small ERB template, which is the simplest.
WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@zverok - Acknowledged. Let us take this discussion to #71 please? 😄