-
Notifications
You must be signed in to change notification settings - Fork 30
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support alternative textual representations #42
Comments
I can see two options here. Quantify over the string typeThat is, data TextDetails = Chr {-# UNPACK #-} !Char -- ^ A single Char fragment
| forall r. RuneSequence r => Str r -- ^ A whole string fragment Under this scheme Use laziness to defer projecting to a StringThat is, data TextDetails = Chr {-# UNPACK #-} !Char -- ^ A single Char fragment
| Str Int String -- ^ A whole String fragment
str :: RuneSequence r => r -> TextDetails
str x = Str (len x) (unpack x) Under this scheme |
Hey Ben, thanks for reading my rumination up until the end!
I would have loved to do this, but this looks similar to the approach I have tried already (I have omitted this, but I have indeed tried both the
and I cannot generate this manually due to the
Ah, this is interesting instead! If I've got you you are saying, in other terms, that we might still be able to get some speedup thanks to the deliberate use of a more efficient I think that looks plausible! I would be interested to see the memory tradeoff we will have to make here (although |
@adinapoli Great work! Thanks for such a nice write up :) Sorry for the delay in replying here. I'm currently on holiday for a few weeks and travelling so sporadic time for "work" stuff. Let's step back a moment. I think there are two ways to think about this at a high-level before we bring in the mechanisms that Haskell gives us. Approach 1: Abstract the base storage fromWe need some way to store actual strings and characters. Right now, we use two concrete types for that,
I'd be curious to know once we had a benchmark, how well Then, the approach (which is what you've tried so far) is to move away from I've in the past played with using a typeclass to abstract the storage, but didn't continue down this pathway after Duncan objected (and two Simon's agreed) on the grounds that we already have a way to somewhat abstract this through the Typeclasses are obviously as you saw expensive since they push outwards to the clients of the pretty library and greatly complicate the interface.
Approach 2: Single, efficient storage typeCan we keep a single concrete storage type but move to one that is more efficient than So, what is the interface a fast string type needs to support for pretty to work well? And how do users of the pretty library construct these efficiently too? E.g., it's no good if we just push the performance problem one layer up! :) |
Hey David, thanks for the great feedback(s)! Getting quite late over here so I beg you a pardon if the synapses won't work as they should. Let's start with the motivation, as you asked for it in my PR: The TL;DR is that I have embarked in an ambitious mission in trying to improve GHC compilation times. After fiddling a bit with a I think you raised very good points, so I'm going to split this reply in 2 sections; one comments benchmarking (and the need for it?), the other the two approaches you describe. On benchmarking
I think by now it's cropping up quite the legitimate question "Why is String slow?" and "Why do we need to improve this?". The short answer is that I'm simply assuming (perhaps erroneously) that the benchmarking was done (or at least some kind of analysis) by people way smarter than me and that worked on GHC for a long time. I'm definitely on board in writing criterion benches for On the different abstraction approachesBoth have advantages and disadvantages (everything is a trade-off, right? hehe).
|
Hey @dterei !
As the title suggests, this is yet another attempt to generalise
pretty
to support alternative textual representations, such that it would pave the way for #1 .Now, I know that a lot has been written in the past, but I'm giving it another go mainly as a means for me to write down an experience report of two days of tinkering which led nowhere.
Introduction
The bigger goal is to replace GHC's copy of
pretty
with this library, but to the best of my knowledge there are two main blockers:GHC uses some ad-hoc textual representations for performance reasons. This doesn't go along well with
pretty
's user-facing API, which exposes combinators liketext
(see below).There are a couple of commits by @thomie which have been reverted due to some allocation regression in the compiler.
I think 2. is orthogonal to the scope of this ticket (but certainly relevant to the bigger goal), so I'm going to focus on 1. only.
In this ticket, @bgamari summarise the issue quite neatly:
Starting from this ticket, I have started exploring different options, but none of my designs seemed to lead me somewhere. Therefore I'm seeking help from you fine hackers about how to proceed. Hereby follows a couple of failed attempts with a streams of thoughts of what made me pick certain decisions over something else. I'm relying on memory here, so please do not hold anything against me if some of these ramblings are incomplete or imprecise 😉
Attempt 1: Start top-down from the combinators
I started by doing exactly what Ben suggested; I created a type class like this (naming is hard, I picked this name just to avoid confusion with other existing textual types):
This of course allows us to generalise things like
text
like the following:This won't compile though as it calls internally
textBeside_
which relies onTextDetails
and the latter "leaks" its internals, by which I mean theStr
constructor, which expects aString
. One could argue I could then useunpack
to write the following:But this left a sour taste in my mouth:
unpack
can be costly, especially if we do lots of it. I really don't want to make the performance of the library any worse;It really doesn't solve the crux of the problem: If we compare GHC's
TextDetails
, we will see it's defined like this:In order to have a chance of unifying the two libraries, one possibility I saw was to abstract away
TextDetails
, which led me to the second bikeshed:Attempt 2: Polymorphic
AnnotDetails
I asked myself "Can I abstract away TextDetails altogether, so that user code can plug its own TextDetails"? In brief, write the following:
I have one big problem with this and with all its variations: it will make the
r
parameter to bubble up in theDoc
type, to the point which it will becomeDoc r a
. You might argue this could read like "This is a Doc node annotated with an annotation of type a and an inner textual representation of type r". This though introduces a massive breaking change in the API (no good) and I suspect we'll still need to be rely on type classes anyway (likeRuneSequence
or similar) as most of the combinators are using the constructors ofTextDetails
anyway. So, up to the next.Attempt 3: Add a new constructor to TextDetails
In brief, write the following:
This might reconcile the two worlds, but it has several problems as well:
r
like solution 2, so it's a no-goI have tried to remove the need for the extra type param with a
RankNType
annotation like this:Although this might work (but I doubt) I couldn't use it as TextDetails needs to derive
Show
,Eq
andGeneric
. The first two can be derived viaStandaloneDeriving
, but the third cannot, and we cannot write it manually by hand due to theSafe
constraint we have on the package. Argh!At this point, I felt like I have reached an impasse. Thanks for reading up to this point, and hopefully somebody can chime in and tell me if there is a simple solution to this riddle 😉
Alfredo
The text was updated successfully, but these errors were encountered: