You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This relates to Hyrax version from 2.9.5 to 3.4.1 (and main branch).
From my investigations in heliotrope, it seemed to happen when Ruby was updated from 2.5.x to 2.7.x, so that might be more relevant to the problem. Hyrax did not update in that commit, but some underlying gems did (faraday?).
I'm still very much scratching my head about where in the stack this happens, best guess is maybe Faraday and Ruby version somehow. However it occurs, it does affect how Hyrax code interacts with this original_name value, especially when it's used to set a metadata field in Fedora, such as conditionally here (I'll write another ticket for that), as this eventually hits an "encode" in rdf here, and that will bow out with a Encoding::UndefinedConversionError should the string have any unexpected characters relative to its encoding of ASCII-8BIT.
See this pithy comment explaining this in a former similar issue in rdf.
Would it be acceptable to force the encoding to UTF-8 here in AF, if no better solution can be found? This seems to be the code that provides the value. [edit 20230117 - tried this in dev and it didn't work]
Or maybe that method should be wrapped/overridden in Hyrax so that the encoding can be forced in Hyrax itself.
It didn't used to be this way, which I just verified by downgrading heliotrope to use Ruby 2.5.8
Inspiration for this testing was drawn from another encoding issue, #1089
Expected behavior (when I downgrade to 2.9.5 on Ruby 2.5.8)
f = ActiveFedora::File.new
f.content = 'asdf' # needed to save the file
f.original_name = 'blah.txt'
f.original_name.encoding # => #<Encoding:UTF-8>
f.save # => true
f.original_name # => "blah.txt"
f.original_name.encoding # => #<Encoding:UTF-8>
Actual behavior (main branch on Ruby 2.7.4)
f = ActiveFedora::File.new
f.content = 'asdf' # needed to save the file
f.original_name = 'blah.txt'
f.original_name.encoding # => #<Encoding:UTF-8>
f.save # => true
f.original_name # => "blah.txt"
f.original_name.encoding # => #<Encoding:ASCII-8BIT>
Steps to reproduce the behavior
Steps outlined above. Or just call FileSet.first.original_file.original_name.encoding on your newer/older Hyrax installs.
It's always Encoding:ASCII-8BIT now
Descriptive summary
This relates to Hyrax version from 2.9.5 to 3.4.1 (and main branch).
From my investigations in heliotrope, it seemed to happen when Ruby was updated from 2.5.x to 2.7.x, so that might be more relevant to the problem. Hyrax did not update in that commit, but some underlying gems did (faraday?).
I'm still very much scratching my head about where in the stack this happens, best guess is maybe Faraday and Ruby version somehow. However it occurs, it does affect how Hyrax code interacts with this original_name value, especially when it's used to set a metadata field in Fedora, such as conditionally here (I'll write another ticket for that), as this eventually hits an "encode" in rdf here, and that will bow out with a
Encoding::UndefinedConversionError
should the string have any unexpected characters relative to its encoding of ASCII-8BIT.See this pithy comment explaining this in a former similar issue in rdf.
Would it be acceptable to force the encoding to UTF-8 here in AF, if no better solution can be found? This seems to be the code that provides the value. [edit 20230117 - tried this in dev and it didn't work]
Or maybe that method should be wrapped/overridden in Hyrax so that the encoding can be forced in Hyrax itself.
Rationale
It didn't used to be this way, which I just verified by downgrading heliotrope to use Ruby 2.5.8
Inspiration for this testing was drawn from another encoding issue, #1089
Expected behavior (when I downgrade to 2.9.5 on Ruby 2.5.8)
Actual behavior (main branch on Ruby 2.7.4)
Steps to reproduce the behavior
Steps outlined above. Or just call
FileSet.first.original_file.original_name.encoding
on your newer/older Hyrax installs.It's always
Encoding:ASCII-8BIT
nowRelated work
#5671
The text was updated successfully, but these errors were encountered: