-
Notifications
You must be signed in to change notification settings - Fork 1.1k
#13943: format, add a new pp_with_size
function
#13947
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: trunk
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This LGTM, interface and implementation. I made a few comments, but this is a definite improvement.
let with_next_size state f s = match state.pp_stream_state with | ||
| Next_size n -> set_stream_state state Normal; n | ||
| Newline | Normal -> f state s |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can't say that I'm super fond of this sort of side-effecting function that also returns a value, but not a big deal for a narrowly-scoped internal function.
On the other hand, it is short, has only 3 uses, and I think I would find the call sites easier to read if this function were just inlined.
|
||
(* Format a slice *) | ||
let format_pp_substring state size ~pos ~len source = | ||
state.pp_space_left <- state.pp_space_left - size; | ||
pp_output_substring state ~pos ~len source; | ||
state.pp_is_new_line <- false | ||
set_stream_state state Normal | ||
|
||
(* Format a string by its length, if not empty *) | ||
let format_string state s = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is perhaps a question of whether pp_with_size
should interact with the strings that can be printed by pp_print_custom_break
. The current state of the PR implementation does not adapt the call to pp_string_width
on the next line to with_next_size
. Users might think that a call to pp_with_size
followed immediately by a call to pp_print_custom_break
would cause the "fits" string to be considered as it it had the size indicated by pp_with_size
. To support that, the implementation of pp_print_custom_break
would need to adapt pp_string_width state before
(line 729/749) to use with_next_size
, and I think that is it. Maybe that is not desired though, and pp_with_size
could just be documented as not applying to strings printed by pp_print_custom_break
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Applying the with_size
modifier to the fits
part of the breaks makes sense to me. We could choose to affect the whole fitting text but it is probably more regular to affect only the before
part to keep the strict separation between the out_space
and out_string
lower-level instructions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thinking more about this, for pp_print_custom_break ppf ~fits:(s1, n, s2) ~breaks:(s3, m, s4)
, the sizes of s1
n
spaces and s2
are all relevant in the case that no break is inserted at the break hint. In case a break is inserted at the break hint, then the sizes of m
spaces and s4
are relevant. The size of s3
is not since it is emitted before the line break no matter what its size is.
So if user code is using strings that need size-correction, it would be consistent to support specifying the size of fits
, either piece by piece or as a whole, and the size of the m
and s4
parts of breaks
, again either piece by piece or as a whole. It would also be consistent to not support specifying the size of any of these strings. I don't think that the @<N>
mechanism can be used to support the general case, so I worry that using it to support some incomplete portion is just a step down a path leading to a dead end. If it becomes necessary to support specifying the size of these strings, it seems that we will need to add a size for each of the strings, essentially storing arguments to print_as
instead of print_string
.
So I think that the right approach is to not have pp_with_size
interact with the pp_print_custom_break
strings at all.
(** [pp_with_size ppf w] sets the width of the next element to be printed to | ||
[w]. *) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would help to clarify that pp_with_size
does not affect the apparent size of the next element if there are any intervening box opens, box closes, or break hints that break. I think that to understand this function requires users to think not in terms of a tree of boxes, but to consider its fringe as a linear sequence of "elements" (that end up being strings). It is not clear to me that this will be obvious.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that some clarification is required, what about
[pp_with_size ppf w] sets the width of the next element to be printed on the current line to [w], this does not affect the width of boxes nor the width of elements printed after a linebreak.
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that still leaves the possibility for people to expect that @<10>@[%s
will affect the size of the string given for the %s
. So perhaps:
[pp_with_size ppf w] sets the width of the next element to be printed on the current line to [w]; this does not have any effect if there are any line breaks, tabulation breaks, or any boxes are opened or closed before the next element.
|
||
let () = test "foo@ @<1000>bar" | ||
let () = test "foo@ @<1000>%s" "bar" | ||
let () = test "foo@ @<1000>%a" Format.pp_print_string "bar" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might be good to have another test case similar to this one, but such as
t () = test "foo@ @<1000>%a" (fun ppf -> Format.fprintf ppf "@[bar@]")
to illustrate that using boxes will defeat with_size
.
My understanding of the current documentation of format is that @ should be followed by a "conversion specification" (the %a, %d, %s, %t stuff), and that the whole printing of that conversion will be shown "as if it was of size ...". If I understand your comments correctly, this works well for "leaf" conversions like %d, %s, but not for "composed" or "nested" conversions like %a or %t that may include arbitrary boxes. You propose to change the specific so that it behaves differently on "nested" conversions like %a and %t. Your proposal is more or less to act on the leftmost "leaf" node of the underlying format. Another option would be to simply reject those cases, with a failure when applying @ on non-leaf formats. (We could probably even make it a static failure if we want; adding a dynamic failure sounds a bit risky and then we would rather suggest to ignore it altogether.) Finally, one may wonder whether the "full" feature could be supported. (For example we could begin a "Start_with_size" constructor before and a End_with_size token after the conversion, and hope to deal with that correctly in the printer.) My understanding from a distance is that it's not clear what it would mean several tokens together have to take a certain size (how to spread the size among the list of tokens?). Maybe an attempt would be as follows:
|
Sherlocode finds uses of I think it would be interesting to look at the usage of @<1>%a to understand whether the behavior you propose fits the intent of the author of the code. If it does not, I would be in favor of rejecting the combination altogether, instead of silently doing something unintended. (I'm not particularly pushing for the hard route of implementing the feature in its full generality.) |
(I only recall using |
@gasche , beware that Sherlocode is dropping non-ascii characters, thus some of @c-cube's code is incorrectly displayed. @c-cube, speaking of which, correcting by hand the width of unicode characters will not be be needed anymore with 5.4 where Enumerating all use cases on Sherlodoc:
Thus there is one and half case where the behavior is improved, none where it is degraded. |
This is diverging from the current PR, but one alternative to
would be to add support for specifying the size of boxes. In this case |
Although use cases expecting a modularspace font may still want to specify e.g. |
I have a bad uninformed gut feeling about this PR. I agree that the behavior is slightly better than the current behavior wrt. %a or %t, but it comes with a change to the specification that I think makes the specification worse. I consider the following futures:
(Also: the code examples I found in the wild gave me the impression that the user code was written with the current specification in mind, and not with the newly proposed specification in mind.) |
This PR proposes to implement the
@<n>
format specifier as app_with_size
function onFormat.formatter
rather than as a rewriting rule on the stream of formatting instructions.This makes the behaviour of the various format string interpreters (
sprintf
,fprintf
,Format_doc.Doc.msg
) more uniform in term of the interaction between@<n>
and%a
and%t
(see #13943 for an example).With this PR, the semantics for
pp_with_size
is that it affects the size of the next token printed with thepp_print_string
primitive, in the absence of new lines, box opening or closing, or tabulation breaks.