Skip to content

effectful sequences of bytes; an alternative no-lazy-io implementation of Data.ByteString.Lazy

License

Notifications You must be signed in to change notification settings

michaelt/streaming-bytestring

Repository files navigation

bytestring-streaming

This package depends on the streaming library

          copy 200M file    divide it on lines, 
                            adding '!' to each 
                            
lazy      0m0.813s          0m8.597s
streaming 0m0.783s          0m9.664s
pipes     0m0.771s          0m49.176s
conduit	  0m1.068s          2m25.437s

This library is modeled as far as possible on the internal structure of Data.ByteString.Lazy. There are two changes: a chunk may be delayed by a monadic step, and the sucession of steps has a 'return' value:

data ByteString m r =
  Empty r
  | Chunk {-#UNPACK #-} !S.ByteString (ByteString m r)
  | Go (m (ByteString m r ))

unlike

data ByteString = 
  Empty 
  | Chunk {-#UNPACK #-} !S.ByteString ByteString

That's it.


Another module is planned that would correspond more closely to Pipes.Bytestring than to Data.ByteString.Lazy.
Producer ByteString m r as it is treated in pipes-bytestring as the ByteString m r type is here. The result is much faster, at least with preliminary tests. The modules integrating attoparsec and aeson are simple replicas of k0001's pipes-attoparsec and pipes-aeson. Also included is a replica of pipes-http.

It is possible that streaming-bytestring is conceptually clearer than pipes-bytestring as well - and clearer than the approach taken by conduit and io-streams. All of these are forced to integrate the conception of an amorphous succession of bytes that may be chunked anywhere - the direct result of, say, fromHandle, sourceFile and the like - and a succession of 'semantically' distinct bytestrings of interest under a single concept.


Strange as it may seem, it is arguable that the general Producer, Source, and InputStream concepts from these libraries ought not to hold ByteStrings except as conceptually separate units, e.g. the lines of a document taken as strict bytestrings, where that is legitimate. An InputStream ByteString is like an InputStream Int; a Conduit.Source m ByteString has the same type as a Source m Int; a Pipes.Producer ByteString m r has the same type as a Producer Int m r. These types are suited to the general stream transformations these libraries make possible.

We can see the strangeness in the io-streams lines

lines :: InputStream ByteString -> IO (InputStream ByteString)

and the conduit linesUnboundedAscii

linesUnboundedAscii :: (Monad m) => Conduit ByteString m ByteString

(specializing slightly). In either case, what enters on the left will be a succession of anyhow-chunked bytes; what exits on the right will be a succession of significant individual things of type ByteString.

What we find in IOStreams.lines and linesUnlimitedAscii are comparable to what we would have if bytestring defined

lines :: L.ByteString -> [S.ByteString]

or more absurdly

lines :: L.ByteString -> L.ByteString 

and exposed methods for inspecting the hitherto secret chunks contained in lazy bytestrings.

The model employed by the present package is a little different. First, the primitive lines concept is just

lines :: ByteString m r -> Stream (ByteString m) m r

as in pipes-bytestring; this corresponds precisely to

lines :: ByteString -> [ByteString]

as it appears in Data.ByteString.Lazy -- the elements of the list (stream) are themselves lazy bytestrings.

But pipes-bytestring attempts to mean by Producer ByteString m r what we express by ByteString m r - the undifferentiated byte stream. But (we are provisionally suggesting) that isn't what Producer ByteString m r means, and this is part of the reason why pipes-bytestring is difficult for people to grasp. The user frequently proposes to inspect and work with individual lines with Pipes themselves and thus needs

produceLines :: Producer ByteString m r -> Producer ByteString m r
produceLines = folds B.concat B.empty id . view Pipes.ByteString.lines

Here we would instead write a

produceLines :: ByteString m r -> Stream (Of ByteString) m r

which is transparently related to the type of lines itself

lines :: ByteString m r -> Stream (ByteString m) m r

The distinctive type of produceLines clearly express the transition from the world of amorphously chunked bytestreams to the world of significant individual values, in this case individual strict bytestrings.

About

effectful sequences of bytes; an alternative no-lazy-io implementation of Data.ByteString.Lazy

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published