Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

All file original_name values report encoding as ASCII-8BIT #5670

Open
conorom opened this issue Jun 8, 2022 · 0 comments
Open

All file original_name values report encoding as ASCII-8BIT #5670

conorom opened this issue Jun 8, 2022 · 0 comments
Labels

Comments

@conorom
Copy link
Contributor

conorom commented Jun 8, 2022

Descriptive summary

This relates to Hyrax version from 2.9.5 to 3.4.1 (and main branch).

From my investigations in heliotrope, it seemed to happen when Ruby was updated from 2.5.x to 2.7.x, so that might be more relevant to the problem. Hyrax did not update in that commit, but some underlying gems did (faraday?).

I'm still very much scratching my head about where in the stack this happens, best guess is maybe Faraday and Ruby version somehow. However it occurs, it does affect how Hyrax code interacts with this original_name value, especially when it's used to set a metadata field in Fedora, such as conditionally here (I'll write another ticket for that), as this eventually hits an "encode" in rdf here, and that will bow out with a Encoding::UndefinedConversionError should the string have any unexpected characters relative to its encoding of ASCII-8BIT.

See this pithy comment explaining this in a former similar issue in rdf.

Would it be acceptable to force the encoding to UTF-8 here in AF, if no better solution can be found? This seems to be the code that provides the value. [edit 20230117 - tried this in dev and it didn't work]
Or maybe that method should be wrapped/overridden in Hyrax so that the encoding can be forced in Hyrax itself.

FileSet.first.files.first.method(:original_name).source_location
=> ["/Users/conorom/.rbenv/versions/2.7.4/lib/ruby/gems/2.7.0/gems/active-fedora-13.2.7/lib/active_fedora/file/attributes.rb", 12]

Rationale

It didn't used to be this way, which I just verified by downgrading heliotrope to use Ruby 2.5.8

Inspiration for this testing was drawn from another encoding issue, #1089

Expected behavior (when I downgrade to 2.9.5 on Ruby 2.5.8)

f = ActiveFedora::File.new
f.content = 'asdf' # needed to save the file
f.original_name = 'blah.txt'
f.original_name.encoding # => #<Encoding:UTF-8>

f.save # => true
f.original_name # => "blah.txt"
f.original_name.encoding # => #<Encoding:UTF-8>

Actual behavior (main branch on Ruby 2.7.4)

f = ActiveFedora::File.new
f.content = 'asdf' # needed to save the file
f.original_name = 'blah.txt'
f.original_name.encoding # => #<Encoding:UTF-8>

f.save # => true
f.original_name # => "blah.txt"
f.original_name.encoding # => #<Encoding:ASCII-8BIT>

Steps to reproduce the behavior

Steps outlined above. Or just call FileSet.first.original_file.original_name.encoding on your newer/older Hyrax installs.
It's always Encoding:ASCII-8BIT now

Related work

#5671

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant