Page MenuHomePhabricator

fix: include adjacent blank lines in ranges to be fixed
Needs ReviewPublic

Authored by msuozzo on Sep 18 2020, 3:31 PM.


Group Reviewers

When you remove a statement in python and the two adjacent lines are blank,
this can create lint errors due to improper spacing. I'm sure this is also the
case with other whitespace-aware languages and file formats. The current fix
command skips all removal diffs and so doesn't trigger the auto-formatting of
that whitespace.

Net Cost:

  • Two extra line checks for all diffs and fix ranges that could be 1-2 lines longer.
  • One extra fix range generated for each pure-removal diff.

Net Benefit:

  • Whitespace-aware languages are able to format and resolve whitespace errors.

Diff Detail

rHG Mercurial
No Linters Available
No Unit Test Coverage

Event Timeline

msuozzo created this revision.Sep 18 2020, 3:31 PM

I'm not sure how general the "blank lines" heuristic actually is. I think this should be an opt-in feature based on a per-formatter config variable. You can start with an entry in "FIXER_ATTRS" and a check around line 685. Whatever we end up with, it needs to be documented in the module docstring (which is the source for hg help -e fix)

I don't have a list of examples handy, but I wouldn't be surprised if this produces bad results in relatively common cases. This tends to be subjective, but we don't want to ship a foot gun :)

Your ":deletion" suggestion is related to the possibility of supporting "zero-length byte offset ranges" or "zero-length line ranges" in addition to "line ranges." In general, I would recommend avoiding the temptation to complicate the interface between and formatter tools. Most of the formatters I've had to integrate with did not support "formatting a deletion" anyway. There are some important counter examples like clang-format.

Your ":deletion" suggestion is related to the possibility of supporting "zero-length byte offset ranges" or "zero-length line ranges" in addition to "line ranges." In general, I would recommend avoiding the temptation to complicate the interface between and formatter tools. Most of the formatters I've had to integrate with did not support "formatting a deletion" anyway. There are some important counter examples like clang-format.

Turns out that was my own comment about :deletion. I see this change as a workaround for formatters that don't explicitly support formatting around deleted sections, regardless of whether we ever implement a way to integrate with formatters that do support it.

I oppose enabling this as default behavior. A plausible way this could go wrong is if formatting an unchanged blank line also implicitly formats an unchanged block of code on the other side. I don't think we can assume that tools will never do this. I do think it would be surprising in a bad way for users, even if we technically don't make any guarantees about the minimal-ness of the diffs used to determine incremental formatting. Not formatting deletions also sucks, but it's less destructive.

msuozzo updated this revision to Diff 22764.Sep 22 2020, 11:48 PM

Alright so as discussed, this is now a per-fixer config option which I called "adjacentblanks" (not married to it). There's also the new issue of having to plumb the flag down through the necessary layers of abstraction. I figured doing so with a fixer instance would be the least complex way but I'm not sure how idiomatic that is in hg's codebase.

hooper added inline comments.Sep 25 2020, 7:09 PM

I don't think anyone has put a lot of effort into profiling this code, but I'm worried this might increase peak memory use by a significant factor. With large files, this can put users at increased risk of memory exhaustion, swapping, etc. Architectural changes to prevent that are out of scope for this change.

Computing a set of indices for blank lines might be more efficient.


Are we really looking for blank lines, or whitespace-only lines? Does it need to be configurable? I'm worried this behavior is overly specific to your use case.


You might be able to get away with "printf" here, and a newline in :linerange. Not sure about portability, but I've been told to use printf instead of echo before.

msuozzo marked an inline comment as done.Sep 29 2020, 6:43 PM
msuozzo updated this revision to Diff 22939.
msuozzo marked 2 inline comments as done.Sep 29 2020, 6:45 PM
msuozzo added inline comments.

I'd done this because this is passed to mdiff so I'd assumed that mdiff was already paying the price for storing this construct.

Again, I'd assumed this was more efficient than the alternative but if you're concerned about the cost, I can definitely calculate whitespace-only lines instead.


Yeah I really considered this but ended up sticking with the current implementation since it meant a byte-wise comparison which I thought would be much faster than a slice comparison.

However if we're doing the blank-line calculation once at the top of the function, making it whitespace instead is trivial. Done.

pulkit added a subscriber: pulkit.Oct 6 2020, 4:37 AM

@hooper can you please have a look again. Thank you!

Consider this C++ example:

#include <foo>

#include "foo.h"

Deleting the blank line would ideally lead clang-format to add the blank line back. Implementing this kind of behavior on the basis of adjacent blank lines will never provide a complete solution.

Now consider this example:

void foo() {
  int y;
  if (true) { 
                 int x;

Delete int y;, then format the two adjacent lines unconditionally. The indentation of int x; will be fixed, even though it has nothing to do with the diff. This behavior counteracts the purpose of the feature in an avoidable way.

I don't think there is any middle ground here, unless it involves language-aware analysis of the diff. I don't think that belongs here. A more robust solution is to communicate deletions in a precise way to the tools. For example, I think clang-format can reasonably accept it as a zero-length range at a byte offset. How to best express this in configuration is an open question, but it would maintain the separation of concerns between version control and language tools.

I think this change is reducing the robustness of the established interface. It provides some utility, but it also over-promises on its ability to handle deletions. It might be more appropriate as a separate extension that wraps the difflineranges function.


I would explain this more fully by framing it, roughly, as three points:

  1. There is a config that will expand *all* changed line ranges to include adjacent blank lines.
  2. This is useful because it inhibits the previously described omission of deleted line ranges, allowing code formatters to fix whitespace surrounding deletions.
  3. The subtleties of how we define "adjacent" and "blank line".

We can additionally make the extra work conditional on the config value.


This change introduces the possibility that the appended line range is adjacent to the previous one. I think we should extend the existing line range instead of adding another one. This will avoid exposing that corner case in formatter tools, which will make this extension more reliable in practice. It may also be a performance improvement for some tools that don't consider this case.

Please include another test case for that, regardless.

msuozzo marked 2 inline comments as done.Oct 19 2020, 10:07 PM
msuozzo updated this revision to Diff 23265.

I'm definitely sympathetic to your position.

As discussed offline, I think an improvement to generality would be to change this config option to relate to deletion diff policies instead and have adjacent blanks be a policy therein.

I'm flexible on naming here. I think "emitdeletedranges" is a bit clunky but I can't think of something that fits much better.

PTAL and thanks!


this is probably obsolete given the change but I did try to more rigorously define what I'm considering "whitespace" here.

The config changes are a big improvement, but I'm still not sure if it's worth supporting. One thing to note is that this feature doesn't help users who have multiple formatters behind a wrapper script configured as a single fixer tool. Configuring it as multiple fixer tools is possible, but not always practical, since it may duplicate some logic between the hg config and something else (sorry to be vague about that).


The O(lines) initialization of blanks2 is still happening whether or not we use it.


"Produce a line range for every addition and modification. Conditionally produce ranges for deletions."


Might be more readable to give this expression a name:

deletion = (firstline == lastline)


I would prefer ":" instead of ": ", or at least " : ".


I still suggest combining adjacent/overlapping ranges.

msuozzo updated this revision to Diff 23292.Oct 22 2020, 5:21 PM
msuozzo marked 7 inline comments as done.Oct 22 2020, 5:27 PM

Okay the implementation itself looks better to me.

As for the multi-formatter considerations, I just don't think it should play a big role in whether this feature is worthwhile. This clearly addresses a configurability knob that didn't previously exist and I think that alone should dictate the viability of this patch.


Reworded a bit. PTAL

hooper added inline comments.Oct 23 2020, 6:37 PM

This is where you can add simple tests for the overlapping line ranges issue.

You might want to expand these test cases anyway, so we can be more confident we're not changing the behavior when the new config is off.


Sorry, I just realized this gets passed through unionranges() later on (to combine line ranges computed from multiple "diff base" commits). Not sure if we really need to ensure that this function returns non-overlapping ranges, but it seems like it might prevent mistakes later. The code is not strictly necessary here, though.