Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adds support for Excelx (xlsx) Exporter #64

Open
wants to merge 7 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion CONTRIBUTING.md
Original file line number Diff line number Diff line change
Expand Up @@ -39,7 +39,7 @@ Please proceed with a Pull Request only after you're assigned. It'd be sad if yo

4. Add any gem dependencies required for the Format Importer to the `:optional` group of the Gemfile.

5. Add code and YARD documentation to `lib/daru/io/importers/format.rb`, consistent with other IO modules.
5. Add code and YARD documentation to `lib/daru/io/importers/format.rb`, consistent with other IO modules. Update the `README.md` if required.

6. Add tests to `spec/daru/io/importers/format_spec.rb`. Add any `.format` files required for importer in `spec/fixtures/format/` directory.

Expand Down
1 change: 1 addition & 0 deletions Gemfile
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,7 @@ group :optional do
gem 'redis'
gem 'roo', '~> 2.7.0'
gem 'rsruby'
gem 'rubyXL'
gem 'snappy'
gem 'spreadsheet', '~> 1.1.1'
gem 'sqlite3'
Expand Down
24 changes: 23 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,7 @@ While supporting various IO modules, daru-io also provides an easier way of addi

- [Installation](#installation)
- *[Importers](#importers): [ActiveRecord](#activerecord-importer), [Avro](#avro-importer), [CSV](#csv-importer), [Excel](#excel-importer), [Excelx](#excelx-importer), [HTML](#html-importer), [JSON](#json-importer), [Mongo](#mongo-importer), [Plaintext](#plaintext-importer), [RData](#rdata-importer), [RDS](#rds-importer), [Redis](#redis-importer), [SQL](#sql-importer)*
- *[Exporters](#exporters): [Avro](#avro-exporter), [CSV](#csv-exporter), [Excel](#excel-exporter), [JSON](#json-exporter), [RData](#rdata-exporter), [RDS](#rds-exporter), [SQL](#sql-exporter)*
- *[Exporters](#exporters): [Avro](#avro-exporter), [CSV](#csv-exporter), [Excel](#excel-exporter), [Excelx](#excelx-exporter), [JSON](#json-exporter), [RData](#rdata-exporter), [RDS](#rds-exporter), [SQL](#sql-exporter)*
- [Creating your own IO modules](#creating-your-own-io-modules)
- [Contributing](#contributing)
- [License](#license)
Expand Down Expand Up @@ -458,6 +458,28 @@ Exports a **Daru::DataFrame** into a **.xls** file.
df.write_excel('path/to/file.xls', header: {color: :red, weight: :bold}, data: {color: :blue }, index: false)
```

### Excelx Exporter

[(Go to Table of Contents)](#table-of-contents)

Exports a **Daru::DataFrame** into a **.xlsx** file.

- **Docs**: [rubydoc.info](http://www.rubydoc.info/github/athityakumar/daru-io/master/Daru/IO/Exporters/Excelx)
- **Gem Dependencies**: `rubyXL` gem
- **Usage**:
```ruby
#! Partially require just Excelx Exporter
require 'daru/io/exporters/excelx'

#! Usage from Daru::IO
string = Daru::IO::Exporters::Excelx.new(df, index: false).to_s
Daru::IO::Exporters::Excelx.new(df, index: false).write('path/to/file.xlsx')

#! Usage from Daru::DataFrame
string = df.to_excelx_string(index: false)
df.write_excel('path/to/file.xlsx', index: false)
```

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am starting to think this part of README becames too long and repetitive. Just as an idea for the future:

  1. Leave 2-3 most "tasty" examples in REDME + link to...
  2. Formats.md with regular structure...
  3. Which is built automatically, gathering the comments from top of all exporters/importers files, with similar format.

WDYT?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For generating files like {FORMAT}_Importer.md, does an ERB template work? Or is there a better option?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's think. In my head, it (probably) does the following:

  • takes all the importers (exporters) code;
  • extract, say, top-of-the-file comment with explanations;
  • join them together with pretty headers;
  • store to Formats.md (one, not document-per-formatter, it is tiresome to browse).
    • probably generates TOC on the top of the file;
    • probably adds some "preface" at the top of the file (taken from some _preface.md).

Can be implemented with just Ruby code, or small ERB template, which is the simplest.

WDYT?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@zverok - Acknowledged. Let us take this discussion to #71 please? 😄

### JSON Exporter

[(Go to Table of Contents)](#table-of-contents)
Expand Down
1 change: 1 addition & 0 deletions daru-io.gemspec
Original file line number Diff line number Diff line change
Expand Up @@ -33,5 +33,6 @@ Gem::Specification.new do |spec|
spec.add_development_dependency 'simplecov'
spec.add_development_dependency 'webmock'
spec.add_development_dependency 'yard'

spec.add_development_dependency 'guard-rspec' if RUBY_VERSION >= '2.2.5'
end
1 change: 1 addition & 0 deletions lib/daru/io/exporters/avro.rb
Original file line number Diff line number Diff line change
Expand Up @@ -78,6 +78,7 @@ def write(path)
@writer.close

File.open(path, 'w') { |file| file.write(@buffer.string) }
true
end

private
Expand Down
4 changes: 2 additions & 2 deletions lib/daru/io/exporters/base.rb
Original file line number Diff line number Diff line change
Expand Up @@ -41,8 +41,8 @@ def initialize(dataframe)
#
# instance = Daru::IO::Exporters::Format.new(opts)
# instance.to_s #! same as df.to_format_string(opts)
def to_s
tempfile = Tempfile.new('tempfile')
def to_s(file_extension: '')
tempfile = Tempfile.new(['filename', file_extension])
path = tempfile.path
write(path)

Expand Down
1 change: 1 addition & 0 deletions lib/daru/io/exporters/csv.rb
Original file line number Diff line number Diff line change
Expand Up @@ -80,6 +80,7 @@ def write(path)
contents.each { |content| csv << content }
csv.close
end
true
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is it necessary?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought that this could act as a consistent means of checking whether the "writing into a file" actually took place. Else, the File.write(...) just returns the number of characters written - which may not mean much to the user/developer compared to a consistent (constant) true.

It's not really necessary, but I thought this would be better. Let me know if this can be reverted.

end

private
Expand Down
3 changes: 2 additions & 1 deletion lib/daru/io/exporters/excel.rb
Original file line number Diff line number Diff line change
Expand Up @@ -62,7 +62,7 @@ def initialize(dataframe, header: true, data: true, index: true)
# @return [String] A file-writable string
#
# @example Getting a file-writable string from Excel Exporter instance
# simple_instance.to_s #! same as df.to_avro_string(schema)
# simple_instance.to_s
#
# #=> "\xD0\xCF\u0011\u0871\u001A\xE1\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000..."
#
Expand Down Expand Up @@ -92,6 +92,7 @@ def write(path)
end

@book.write(path)
true
end

private
Expand Down
114 changes: 114 additions & 0 deletions lib/daru/io/exporters/excelx.rb
Original file line number Diff line number Diff line change
@@ -0,0 +1,114 @@
require 'daru/io/exporters/base'

module Daru
module IO
module Exporters
# Excelx Exporter Class, that extends `to_excelx_string` and `write_excelx` methods to
# `Daru::DataFrame` instance variables
class Excelx < Base
Daru::DataFrame.register_io_module :to_excelx_string, self
Daru::DataFrame.register_io_module :write_excelx, self
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apart from standalone xlsx exporter, doesn't it make sense for the excel exporter to redirect here incase of .xlsx filename in excel exporter's write method? Though, for such a redirect to be possible, they would need to have similar kind of arguments. That is, formatting has to be supported by xlsx exporter like excel exporter.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not 100% sure. Yep, it looks humane, but if/when you'll add formatting support for Excel exporters -- conventions on formatting are pretty different in two versions of Excel files and corresponding libraries.
In fact, I believe that in 2017 support for Excel 1999-2003 format (xls) should be done in "maintenance" mode, while xlsx exporter should be shiny and friendly.


# Initializes an Excelx Exporter instance.
#
# @param dataframe [Daru::DataFrame] A dataframe to export. Supports even dataframes
# with multi-index.
# @param sheet [String] A sheet name, to export the dataframe into. Defaults to
# 'Sheet0'.
# @param header [Boolean] Defaults to true. When set to false or nil,
# headers are not written.
# @param data [Boolean] Defaults to true. When set to false or nil,
# data values are not written.
# @param index [Boolean] Defaults to true. When set to false or nil,
# index values are not written
#
# @example Initializing an Excel Exporter instance
# df = Daru::DataFrame.new([[1,2],[3,4]], order: [:a, :b])
#
# #=> #<Daru::DataFrame(2x2)>
# # a b
# # 0 1 3
# # 1 2 4
#
# instance = Daru::IO::Exporters::Excelx.new(df)
def initialize(dataframe, sheet: 'Sheet0', header: true, data: true, index: true)
optional_gem 'rubyXL'

super(dataframe)
@data = data
@index = index
@sheet = sheet
@header = header
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, do we have this lot of boolean vars in other importers too? I believe that one hash @render_parts = {data: data, headers: headers, index: index} could be more effective and will allow to generalize a lot. In addition, looking at var names you can wrongly conclude that, say, @index contains dataframe's index, not "if we need to render index" boolean.

end

# Exports an Excelx Exporter instance to a file-writable String.
#
# @return [String] A file-writable string
#
# @example Getting a file-writable string from Excelx Exporter instance
# instance.to_s
#
# #=> "PK\u0003\u0004\u0014\u0000\u0000\u0000\b\u0000X\xA5YK\u0018\x87\xFC\u0017..."
def to_s
super(file_extension: '.xlsx')
end

# Exports an Excelx Exporter instance to an xlsx file.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe it doesn't export the exporter, it "exports the dataframe", or "performs the export(er)".

(If it is standard phrase for all exporters, this notice is related to all of them).

#
# @param path [String] Path of excelx file where the dataframe is to be saved
#
# @example Writing an Excelx Exporter instance to an xlsx file
# instance.write('filename.xlsx')
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't believe that such a simple example brings any clarity. Just remove it?

def write(path)
@workbook = RubyXL::Workbook.new
@sheet = @workbook.add_worksheet(@sheet)
process_offsets

write_row(@header ? 0 : 1, fetch_headers)

@dataframe.each_row_with_index.with_index do |(row, idx), i|
write_row(@row_offset+i, fetch_index(idx) + fetch_data(row))
end

@workbook.write(path)
true
end

private

def process_offsets
@row_offset = @header ? 1 : 0
@col_offset = 0 unless @index
@col_offset ||= @dataframe.index.is_a?(Daru::MultiIndex) ? @dataframe.index.width : 1
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this or similar statement repeated in other places in codebase (calculating of index witdth)? Maybe move it to some utility module, or to base exporter?

end

def fetch_headers
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is "fetch" a good name? "Fetch" implies some external source to be used, while this method in fact "formats" or "renders" its own instance variables.

formatting([' '] * @col_offset + @dataframe.vectors.map(&:to_s), @header)
end

def fetch_index(idx)
formatting(idx, @index)
end

def fetch_data(row)
formatting(row, @data)
end

def formatting(idx, format)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not the best name, as we'll add "real" Excel formatting once. Something like just to_a?.. Like this:

# or "to_strings"?..
def to_a(object, render = true)
  return [] unless render
  Array(object).map(&:to_s)
end

BTW, have you checked that to_s is good enough? For example, wouldn't numbers or dates be formatted as strings, instead of appropriate Excel column types?

return [] unless format

case idx
when Daru::Vector, Daru::MultiIndex, Array then idx.map(&:to_s)
else [idx.to_s]
end
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't Array(idx).map(&:to_s) work instead of this case?..

end

def write_row(row_index, row_array)
row_array.each_with_index do |element, col_index|
@sheet.insert_cell(row_index, col_index, element.to_s)
end
end
end
end
end
end
1 change: 1 addition & 0 deletions lib/daru/io/exporters/json.rb
Original file line number Diff line number Diff line change
Expand Up @@ -487,6 +487,7 @@ def write(path)
File.open(path, 'w') do |file|
file.write(::JSON.send(@pretty ? :pretty_generate : :generate, to))
end
true
end

private
Expand Down
1 change: 1 addition & 0 deletions lib/daru/io/exporters/r_data.rb
Original file line number Diff line number Diff line change
Expand Up @@ -59,6 +59,7 @@ def write(path)
end.flatten
@statements << "save(#{@options.keys.map(&:to_s).join(', ')}, file='#{path}')"
@statements.each { |statement| @instance.eval_R(statement) }
true
end
end
end
Expand Down
1 change: 1 addition & 0 deletions lib/daru/io/exporters/rds.rb
Original file line number Diff line number Diff line change
Expand Up @@ -53,6 +53,7 @@ def write(path)
@statements = process_statements(@r_variable, @dataframe)
@statements << "saveRDS(#{@r_variable}, file='#{path}')"
@statements.each { |statement| @instance.eval_R(statement) }
true
end

private
Expand Down
4 changes: 2 additions & 2 deletions lib/daru/io/importers/excel.rb
Original file line number Diff line number Diff line change
Expand Up @@ -10,9 +10,9 @@ class Excel < Base
Daru::DataFrame.register_io_module :read_excel do |*args, &io_block|
if args.first.end_with?('.xlsx')
require 'daru/io/importers/excelx'
Daru::IO::Importers::Excelx.new(*args[1..-1], &io_block).read(*args[0])
Daru::IO::Importers::Excelx.new.read(*args[0]).call(*args[1..-1], &io_block)
else
Daru::IO::Importers::Excel.new(*args[1..-1], &io_block).read(*args[0])
Daru::IO::Importers::Excel.new.read(*args[0]).call(*args[1..-1], &io_block)
end
end

Expand Down
32 changes: 17 additions & 15 deletions lib/daru/io/link.rb
Original file line number Diff line number Diff line change
Expand Up @@ -25,20 +25,22 @@ class << self
#
# #### Exporters
#
# | `Daru::DataFrame` instance method | `Daru::IO::Exporters` class |
# | --------------------------------- | -----------------------------------|
# | `Daru::DataFrame.to_avro_string` | {Daru::IO::Exporters::Avro#to_s} |
# | `Daru::DataFrame.write_avro` | {Daru::IO::Exporters::Avro#write} |
# | `Daru::DataFrame.to_csv_string` | {Daru::IO::Exporters::CSV#to_s} |
# | `Daru::DataFrame.write_csv` | {Daru::IO::Exporters::CSV#write} |
# | `Daru::DataFrame.to_excel_string` | {Daru::IO::Exporters::Excel#to_s} |
# | `Daru::DataFrame.write_excel` | {Daru::IO::Exporters::Excel#write} |
# | `Daru::DataFrame.to_json` | {Daru::IO::Exporters::JSON#to} |
# | `Daru::DataFrame.to_json_string` | {Daru::IO::Exporters::JSON#to_s} |
# | `Daru::DataFrame.write_json` | {Daru::IO::Exporters::JSON#write} |
# | `Daru::DataFrame.to_rds_string` | {Daru::IO::Exporters::RDS#to_s} |
# | `Daru::DataFrame.write_rds` | {Daru::IO::Exporters::RDS#write} |
# | `Daru::DataFrame.to_sql` | {Daru::IO::Exporters::SQL#to} |
# | `Daru::DataFrame` instance method | `Daru::IO::Exporters` class |
# | -----------------------------------| ------------------------------------|
# | `Daru::DataFrame.to_avro_string` | {Daru::IO::Exporters::Avro#to_s} |
# | `Daru::DataFrame.write_avro` | {Daru::IO::Exporters::Avro#write} |
# | `Daru::DataFrame.to_csv_string` | {Daru::IO::Exporters::CSV#to_s} |
# | `Daru::DataFrame.write_csv` | {Daru::IO::Exporters::CSV#write} |
# | `Daru::DataFrame.to_excel_string` | {Daru::IO::Exporters::Excel#to_s} |
# | `Daru::DataFrame.write_excel` | {Daru::IO::Exporters::Excel#write} |
# | `Daru::DataFrame.to_excelx_string` | {Daru::IO::Exporters::Excelx#to_s} |
# | `Daru::DataFrame.write_excelx` | {Daru::IO::Exporters::Excelx#write} |
# | `Daru::DataFrame.to_json` | {Daru::IO::Exporters::JSON#to} |
# | `Daru::DataFrame.to_json_string` | {Daru::IO::Exporters::JSON#to_s} |
# | `Daru::DataFrame.write_json` | {Daru::IO::Exporters::JSON#write} |
# | `Daru::DataFrame.to_rds_string` | {Daru::IO::Exporters::RDS#to_s} |
# | `Daru::DataFrame.write_rds` | {Daru::IO::Exporters::RDS#write} |
# | `Daru::DataFrame.to_sql` | {Daru::IO::Exporters::SQL#to} |
#
# @param function [Symbol] Functon name to be monkey-patched into +Daru::DataFrame+
# @param instance [Class] The Daru-IO class to be linked to monkey-patched function
Expand All @@ -62,7 +64,7 @@ def register_exporter(function, instance)
case function.to_s
when /\Ato_.*_string\Z/ then instance.new(self, *args, &io_block).to_s
when /\Ato_/ then instance.new(self, *args, &io_block).to
when /Awrite_/ then instance.new(self, *args[1..-1], &io_block).write(*args[0])
when /\Awrite_/ then instance.new(self, *args[1..-1], &io_block).write(*args[0])
end
end
end
Expand Down
2 changes: 1 addition & 1 deletion spec/daru/io/exporters/csv_spec.rb
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@
before { described_class.new(df, opts).write(tempfile.path) }

context 'writes DataFrame to a CSV file' do
subject { Daru::DataFrame.rows content[1..-1].map { |x| x.map { |y| convert(y) } }, order: content[0] }
subject { Daru::DataFrame.rows(content[1..-1].map { |x| x.map { |y| convert(y) } }, order: content[0]) }

let(:opts) { {} }
let(:content) { CSV.read(tempfile.path) }
Expand Down
2 changes: 1 addition & 1 deletion spec/daru/io/exporters/excel_spec.rb
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
include_context 'exporter setup'

let(:filename) { 'test_write.xls' }
let(:content) { Spreadsheet.open tempfile.path }
let(:content) { Spreadsheet.open(tempfile.path) }
let(:opts) { {header: {color: :blue}, data: {color: :red}, index: {color: :green}} }

before { described_class.new(df, **opts).write(tempfile.path) }
Expand Down
39 changes: 39 additions & 0 deletions spec/daru/io/exporters/excelx_spec.rb
Original file line number Diff line number Diff line change
@@ -0,0 +1,39 @@
RSpec.describe Daru::IO::Exporters::Excelx do
include_context 'exporter setup'

let(:filename) { ['test_write', '.xlsx'] }
let(:content) { Roo::Excelx.new(tempfile.path).sheet('Sheet0').to_a }

before { described_class.new(df, **opts).write(tempfile.path) }

context 'writes to excelx worksheet without index' do
subject { Daru::DataFrame.rows(content[1..-1].map { |x| x.map { |y| convert(y) } }, order: content[0]) }

let(:opts) { {index: false} }

it_behaves_like 'exact daru dataframe',
ncols: 4,
nrows: 5,
order: %w[a b c d],
data: [
[1,2,3,4,5],
[11,22,33,44,55],
['a', 'g', 4, 5,'addadf'],
['', 23, 4,'a','ff']
]
end

context 'writes to excelx worksheet with multi-index' do
subject { content.map { |x| x.map { |y| convert(y) } } }

let(:df) do
Daru::DataFrame.new(
[[1,2],[3,4]],
order: %i[x y],
index: [%i[a b c], %i[d e f]]
)
end

it { is_expected.to eq([[' ', ' ', ' ', 'x', 'y'], ['a', 'b', 'c', 1, 3], ['d', 'e', 'f', 2, 4]]) }
end
end
1 change: 1 addition & 0 deletions spec/spec_helper.rb
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,7 @@
require 'dbd/SQLite3'
require 'active_record'
require 'redis'
require 'roo'
require 'dbi'
require 'jsonpath'
require 'nokogiri'
Expand Down