Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implementing a string method that works for iterators and generators #57180

Open
wants to merge 6 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
31 changes: 31 additions & 0 deletions base/strings/util.jl
Original file line number Diff line number Diff line change
Expand Up @@ -100,6 +100,7 @@ Base.startswith(io::IO, prefix::AbstractString) = startswith(io, String(prefix))

function endswith(a::Union{String, SubString{String}},
b::Union{String, SubString{String}})
cub = ncodeunits(b)
astart = ncodeunits(a) - ncodeunits(b) + 1
if astart < 1
false
Expand Down Expand Up @@ -1274,3 +1275,33 @@ function Base.rest(s::AbstractString, st...)
end
return String(take!(io))
end
"""
The `String` constructor is enhanced to accept iterators/generator objects.

### Method Details:
- **String(x::AbstractIterator)**
- Converts an iterator into a string.
- Throws a `MethodError` if the iterator contains invalid data types (non-Char types) or if it is an infinite iterator.
- Ensures that the result is a valid string representation composed solely of characters (`Char`).
Comment on lines +1282 to +1285
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's just make this a concise one-liner, which is more efficient than your version too (since it needs to make fewer intermediate copies):

String(x) = sprint(join, x)
Suggested change
- **String(x::AbstractIterator)**
- Converts an iterator into a string.
- Throws a `MethodError` if the iterator contains invalid data types (non-Char types) or if it is an infinite iterator.
- Ensures that the result is a valid string representation composed solely of characters (`Char`).
- **String(any_iterable)**
- prints an iterable object into a string using `join`.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But wait, do we want this? This has completely different semantics from the existing String function. For example, with this change we get:

julia> String(Int8[101, 102])
"101102"

julia> String(UInt8[101, 102])
"ef"

To me this seems like the wrong meaning of the type constructor String. This should only be used to convert things that are already a little bit 'string-like' and should not be conflated with print, which this implementation suggests.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, that makes sense. This seems more like write then:

String(x) = sprint(io -> foreach(c -> write(io, c), x))

The print behavior was from looking at the implementation of String(::AbstractVector{<:AbstractChar}), but that probably also can also be calling write, since it is a case where print is defined to be a call to write.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is a good idea, Examples Char(0x2ee8) and String([0x2ee8]) should be consistent. Also the dependency on Litte/Big-Endiness is not expected.
I think the prototype should be like ... String(collect(Char, x)) as I mentioned before.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

okay, so the problem with making just one liners is the error statements end up being loose for cyclic iterators, or even integers in some cases. So sticking with the ... String(collect(Char, x)) prototype.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should be consistent

That seems to only works for ASCII examples though. For utf8 bytes, the existing String methods already define that those results are not consistent. For your example, I think the existing method that behaves like that is called transcode, though it would be hard to generalize to this (since it doesn't in general iterate the contents of the source but rather decode it)


### Examples
```jldoctest
julia> String(Iterators.map(c -> c+1, "Hello, world"))
"Ifmmp-!xpsme" # Generates a string by incrementing ASCII values of each character.

julia> String(Iterators.take("Hello, world", 5))
"Hello" # Takes the first 5 characters of the string and converts it to a string.
"""
String(x) = _string_iterator(x, IteratorSize(x))
_string_iterator(x, ::IsInfinite) = throw(MethodError(String, (x,)))
_string_iterator(x, ::IteratorSize) = begin
try
collected = collect(Char, x)
if ndims(collected) != 1
throw(MethodError(String, (x,)))
end
return String(collected)
catch e
throw(MethodError(String, (x,)))
end
end
Comment on lines +1295 to +1307
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still very strongly dislike this implementation for the number of surprise errors it has, where things apparently fail for no reason. The point about each element needing to be a Char is reasonable though. I see the existing methods all type-assert the content is either UInt8 or Char. That unfortunately is not consistent with your attempted definitions here however. Your version can still be a one-liner (below), but note that your version does not like pizza (or other unicode). This is what it should give:

julia> String(codeunits("🍕"))
"🍕"

julia> String("🍕")
"🍕"

This is a one-liner equivalent to yours:

Suggested change
String(x) = _string_iterator(x, IteratorSize(x))
_string_iterator(x, ::IsInfinite) = throw(MethodError(String, (x,)))
_string_iterator(x, ::IteratorSize) = begin
try
collected = collect(Char, x)
if ndims(collected) != 1
throw(MethodError(String, (x,)))
end
return String(collected)
catch e
throw(MethodError(String, (x,)))
end
end
String(x) = sprint(io -> foreach(c -> write(io, Char(c)::Char), x))

This is what yours gives:

julia> S(codeunits("🍕"))
"ð\u9f\u8d\u95"

Eww, that seems like gross pizza 😬
(knowyourmemes: #3721)

37 changes: 37 additions & 0 deletions test/strings/util.jl
Original file line number Diff line number Diff line change
Expand Up @@ -791,3 +791,40 @@ end
@test endswith(A, split(B, ' ')[end])
@test endswith(A, 'g')
end
@testset "String Iterator Tests" begin
for S in (String, SubStr, Test.GenericString)
@test String(Iterators.map(c -> c+1, "abc")) == "bcd"
@test String(Iterators.take("hello world", 5)) == "hello"
@test String(Iterators.filter(c -> c != ' ', "hello world")) == "helloworld"
@test String(Iterators.drop("hello world", 6)) == "world"
@test String(Iterators.map(c -> 'a', "hello")) == "aaaaa"
@test String(Iterators.map(c -> c == ' ' ? ' ' : 'A', "hello world")) == "AAAAA AAAAA"
@test String(Iterators.map(c -> '😊', "hello")) == "😊😊😊😊😊"
@test String(Iterators.take("", 3)) == ""
@test String(Iterators.drop("", 3)) == ""
@test_throws MethodError String(Iterators.filter(c -> c > 128, "hello"))
@test_throws MethodError String(Iterators.cycle("abc"))
@test_throws MethodError String(19)
@test_throws MethodError String(3.14)
@test String(Iterators.map(c -> Char(c), "abc")) == "abc"
@test String(Iterators.flatten(Iterators.map(c -> c, ["hello", "world"]))) == "helloworld"
@test String(Iterators.flatten(Iterators.map(c -> "hello", 1:3))) == "hellohellohello"
@test String(Iterators.filter(c -> false, "hello")) == ""
@test String(Iterators.map(c -> ' ', "hello world")) == " "
@test String(Iterators.map(c -> '😊', "hi there")) == "😊😊😊😊😊😊😊😊"
@test String("ÄÖÜ⻨") == "ÄÖÜ⻨"
@test String(Iterators.map(c -> 'A', "ÄÖÜ⻨")) == "AAAA"
@test String(Iterators.map(c -> c == 'Ö' ? 'O' : c, "ÄÖÜ⻨")) == "ÄOÜ⻨"
@test String(Iterators.take("ÄÖÜ⻨", 2)) == "ÄÖ"
@test String(Iterators.drop("ÄÖÜ⻨", 1)) == "ÖÜ⻨"
end
@test_throws MethodError String(Iterators.cycle("abc"))
@test String("valid string") == "valid string"
@test String("test") == "test"
@test String(Iterators.flatten(Iterators.map(c -> "abc", 1:3))) == "abcabcabc"
@test String(Iterators.flatten(Iterators.map(c -> "😊", 1:2))) == "😊😊"
@test String(Iterators.filter(c -> false, "hello")) == ""
@test String(Iterators.take("hello", 0)) == ""
@test String(Iterators.take("a"^1000, 500)) == "a"^500
@test String(Iterators.map(c -> 'a', "a"^100000)) == "a"^100000
Priynsh marked this conversation as resolved.
Show resolved Hide resolved
end