-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implementing a string method that works for iterators and generators #57180
base: master
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change | ||||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
@@ -100,6 +100,7 @@ Base.startswith(io::IO, prefix::AbstractString) = startswith(io, String(prefix)) | |||||||||||||||||||||||||||||
|
||||||||||||||||||||||||||||||
function endswith(a::Union{String, SubString{String}}, | ||||||||||||||||||||||||||||||
b::Union{String, SubString{String}}) | ||||||||||||||||||||||||||||||
cub = ncodeunits(b) | ||||||||||||||||||||||||||||||
astart = ncodeunits(a) - ncodeunits(b) + 1 | ||||||||||||||||||||||||||||||
if astart < 1 | ||||||||||||||||||||||||||||||
false | ||||||||||||||||||||||||||||||
|
@@ -1274,3 +1275,33 @@ function Base.rest(s::AbstractString, st...) | |||||||||||||||||||||||||||||
end | ||||||||||||||||||||||||||||||
return String(take!(io)) | ||||||||||||||||||||||||||||||
end | ||||||||||||||||||||||||||||||
""" | ||||||||||||||||||||||||||||||
The `String` constructor is enhanced to accept iterators/generator objects. | ||||||||||||||||||||||||||||||
|
||||||||||||||||||||||||||||||
### Method Details: | ||||||||||||||||||||||||||||||
- **String(x::AbstractIterator)** | ||||||||||||||||||||||||||||||
- Converts an iterator into a string. | ||||||||||||||||||||||||||||||
- Throws a `MethodError` if the iterator contains invalid data types (non-Char types) or if it is an infinite iterator. | ||||||||||||||||||||||||||||||
- Ensures that the result is a valid string representation composed solely of characters (`Char`). | ||||||||||||||||||||||||||||||
|
||||||||||||||||||||||||||||||
### Examples | ||||||||||||||||||||||||||||||
```jldoctest | ||||||||||||||||||||||||||||||
julia> String(Iterators.map(c -> c+1, "Hello, world")) | ||||||||||||||||||||||||||||||
"Ifmmp-!xpsme" # Generates a string by incrementing ASCII values of each character. | ||||||||||||||||||||||||||||||
|
||||||||||||||||||||||||||||||
julia> String(Iterators.take("Hello, world", 5)) | ||||||||||||||||||||||||||||||
"Hello" # Takes the first 5 characters of the string and converts it to a string. | ||||||||||||||||||||||||||||||
""" | ||||||||||||||||||||||||||||||
String(x) = _string_iterator(x, IteratorSize(x)) | ||||||||||||||||||||||||||||||
_string_iterator(x, ::IsInfinite) = throw(MethodError(String, (x,))) | ||||||||||||||||||||||||||||||
_string_iterator(x, ::IteratorSize) = begin | ||||||||||||||||||||||||||||||
try | ||||||||||||||||||||||||||||||
collected = collect(Char, x) | ||||||||||||||||||||||||||||||
if ndims(collected) != 1 | ||||||||||||||||||||||||||||||
throw(MethodError(String, (x,))) | ||||||||||||||||||||||||||||||
end | ||||||||||||||||||||||||||||||
return String(collected) | ||||||||||||||||||||||||||||||
catch e | ||||||||||||||||||||||||||||||
throw(MethodError(String, (x,))) | ||||||||||||||||||||||||||||||
end | ||||||||||||||||||||||||||||||
end | ||||||||||||||||||||||||||||||
Comment on lines
+1295
to
+1307
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I still very strongly dislike this implementation for the number of surprise errors it has, where things apparently fail for no reason. The point about each element needing to be a Char is reasonable though. I see the existing methods all type-assert the content is either UInt8 or Char. That unfortunately is not consistent with your attempted definitions here however. Your version can still be a one-liner (below), but note that your version does not like pizza (or other unicode). This is what it should give: julia> String(codeunits("🍕"))
"🍕"
julia> String("🍕")
"🍕" This is a one-liner equivalent to yours:
Suggested change
This is what yours gives:
Eww, that seems like gross pizza 😬 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's just make this a concise one-liner, which is more efficient than your version too (since it needs to make fewer intermediate copies):
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But wait, do we want this? This has completely different semantics from the existing
String
function. For example, with this change we get:To me this seems like the wrong meaning of the type constructor
String
. This should only be used to convert things that are already a little bit 'string-like' and should not be conflated withprint
, which this implementation suggests.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, that makes sense. This seems more like
write
then:The print behavior was from looking at the implementation of
String(::AbstractVector{<:AbstractChar})
, but that probably also can also be callingwrite
, since it is a case whereprint
is defined to be a call towrite
.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this is a good idea, Examples
Char(0x2ee8)
andString([0x2ee8])
should be consistent. Also the dependency on Litte/Big-Endiness is not expected.I think the prototype should be like
... String(collect(Char, x))
as I mentioned before.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
okay, so the problem with making just one liners is the error statements end up being loose for cyclic iterators, or even integers in some cases. So sticking with the
... String(collect(Char, x))
prototype.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That seems to only works for ASCII examples though. For utf8 bytes, the existing String methods already define that those results are not consistent. For your example, I think the existing method that behaves like that is called
transcode
, though it would be hard to generalize to this (since it doesn't in general iterate the contents of the source but rather decode it)