Same-same: a Git diff highlighter
The highlighter accomplishes several things:
- Highlights add/remove lines using background colors.
- Tokenizes changed blocks and highlights per-token changes using text colors.
- Detects moved lines and highlights them with separate (yellowish) text colors.
- Distinguishes file and hunk metadata lines with background colors.
- Removes leading '+'/'-'/' ' characters from hunk lines, which makes copy/paste from the terminal more convenient.
- Reformats chunk headers to look like "dir/source.ext:line_num:", which allows editors such as VSCode to click through the source location.
- Rewrites invisible characters in the ASCII and Latin-1 Unicode blocks to display as escape codes, and highlights them using color inversion. Note that this includes the tab character, which is great if you use two-space code indentation like I do, but not so great for other settings. If someone wants to make this configurable for other tab widths, please get in touch!
An example diff: git's colored output on the left, and highlighted by Same-same on the right:
Same-same is dedicated to the public domain under CC0: https://creativecommons.org/publicdomain/zero/1.0/.
Same-same currently requires Python 3.x and an
xterm-256 compatible terminal, as it generates 256-color output.
Same-same has been tested only on macOS 10.13 and Apple Terminal.
The program is a standalone Python script. To install, just copy it to some location on your shell's PATH, e.g.:
.../same-same $ cp same-same.py /usr/local/bin/same-same
Or you can install the program as a symlink to the source file in the developer directory:
.../same-same $ ./install-dev.sh
Then update your git configuration:
$ git config --global core.pager 'same-same | LESSANSIENDCHARS=mK less --RAW-CONTROL-CHARS' $ git config --global interactive.diffFilter 'same-same -interactive | LESSANSIENDCHARS=mK less --RAW-CONTROL-CHARS'
Or edit your
~/.gitconfig or project
.gitconfig by hand:
[core] pager = same-same | LESSANSIENDCHARS=mK less --RAW-CONTROL-CHARS [interactive] diffFilter = same-same -interactive | LESSANSIENDCHARS=mK less --RAW-CONTROL-CHARS
As an alternative or in addition to
core.pager, you can set any of
pager.diff to use different highlighter/pager combinations for the various git commands.
same-same misbehaves, please report the problem (with a repro if possible) as a GitHub issue.
same-same in passthrough mode, set the environment variable
same-same in debug mode (just classify each line, then print its kind and repr), set
Same-same does two things that are interesting algorithmically: movement detection, and per-token highlighting.
The movement detector runs first, as follows:
- Find all unique removed lines and unique added lines, ignoring whitespace (this allows for detection even with changed indentation).
- For each unique line that was both added and removed:
- Greedily expand the moved block backwards and forwards, again ignoring whitespace but also allowing non-unique lines to match.
Per-token highlighting is performed per "chunk", where a chunk is a block of consecutive removed and/or added lines. This is in contrast to git's notion of a "hunk", which also includes context lines. For each chunk:
- Assemble a list of tokens for each side, including newline tokens.
- Diff the lists on a token-by-token basis.
- Currently we use Python's builtin
difflib.SequnceMatcheralgorithm, and treat non-newline whitespace tokens as junk (a feature of
- Anecdotally, I think that the "Patience" algorithm produces better diffs than
difflibdoes, and Git's "histogram" algorithm is purported to be better still. I would love to find the time to write (or collaborate on!) a public domain implementation.
- Currently we use Python's builtin
- Iterate over the resulting diff sequence and output the sequence of tokens with highlight colors inserted.
- Replace the text for each line with highlighted sequences.
I learned some interesting things that may be helpful for others creating git highlighters, other git tools, and tools with color console output.
Git can do line movement detection
git diff has a new
--color-moved[=<mode>] option to detect line movement, and offers several modes. Same-same ignores all colors in the diff and does its own movement detection. This is partly out of simplicity, but also to allow for experimentation and improvements to the movement detection algorithm.
Limitation of Git's interactive mode
git add -p (interactive staging) works by slicing the diff by line positions. Therefore if a highlighter omits or inserts lines, then the output will get sliced incorrectly and will not make sense. For this reason,
same-same -interactive disables the omission of unhelpful metadata lines and dims them instead.
Coloring to end-of-line
Some terminals, including Apple Terminal, make it difficult to set the background color of a complete line without printing trailing spaces to fill the screen. Filling with spaces requires querying for the terminal width, and is an unacceptable hack because it cannot cope with a resized terminal window. This StackOverflow answer shows how to use the ANSI CSI sequence for "erase to end-of-line" (
'\x1b[0k') to get Terminal to highlight from the current position to the right margin.
This trick allows Same-same to use a different highlighting style than GitHub: the background color is used to indicate the line-by-line diff, and the text color is used to indicate the per-token diff.
Getting Less to respect the clear-to-eol trick
less will strip out the "erase" code used above, but the
--RAW-CONTROL-CHARS option, in conjunction with the
LESSANSIENDCHARS=mK environment variable tells it to leave both SGR (color) and erasure sequences in the text stream.
Contributors are welcome! Please get in touch via GitHub issues or email to discuss.
One obvious addition would be to query
os.environ for custom colors.