go-runewidth
Provides functions to get fixed width of the character or string.
Usage
runewidth.StringWidth("つのだ☆HIRO") == 12
Author
Yasuhiro Matsumoto
License
under the MIT License: http://mattn.mit-license.org/2013
Category: Golang / Text Processing |
Watchers: 15 |
Star: 444 |
Fork: 74 |
Last update: May 17, 2022 |
Provides functions to get fixed width of the character or string.
runewidth.StringWidth("つのだ☆HIRO") == 12
Yasuhiro Matsumoto
under the MIT License: http://mattn.mit-license.org/2013
This introduces an implementation of StringWidth() using Unicode grapheme clusters which should be the correct way to split a string into its individual characters. The built-in assumption is that if we have combined runes (emojis, flags etc.), their width is the width of the first non-zero-width rune. Many of these combined runes were previously not handled correctly by this package.
Please note:
rivo/uniseg
over.)TestStringWidth
test case but only the part where EastAsianWidth = true
. I'm not very familiar with this flag so I don't know how to fix that. You may want to review this.Update runewidth to use unicode9 character width tables. This is the default in vim and neovim now, so should be safe to use in any terminal.
The Condition
is no longer calculated using IsEastAsian()
as terminals do not use locale to determine how wide to draw ambiguous with characters. Rather, they simply default to 1, and may offer an option to set to 2 (which is discouraged).
All tests still pass. The API is very close to the way it was, but not identical due to the change in Condition
.
I recognize this is a fairly large diff, so I'd be happy to work with you in any way you feel best to get this merged.
Check this for the definition of box-drawing (BD below) characters.
I found that these characters are defined to be of ambiguous width, so passing these to RuneWidth
returns 2 in my environment. This is somehow inconvenient since AFAIK, terminal fonts tend to interpret BD characters in half-width.
Is it possible to remove these characters from the ambiguous table? I can make the PR if you think this sounds sane.
Thanks.
Here's a short example that illustrates an issue with flags (or "regional indicators"):
fmt.Println(runewidth.StringWidth("🇩🇪")) // Should be "2", outputs "4".
The flag consists of two code points which are processed separately by runewidth
. But most modern systems will combine them into one flag emoji.
This is part of a larger topic which I describe in more detail here: gdamore/tcell#264. It doesn't just affect flags but also characters in e.g. Arabic and Korean where there are more sophisticated rules than "combining characters" and zero-width joiners (which you added with #20).
I don't know exactly how you calculate the widths of characters. I'm also not sure how you would solve flags as well as some of the other rules described in the Unicode specification but it would sure be nice as printing these flags currently gives me trouble in tview
. There have been multiple issues asking for better support for different languages and emojis so it seems that there are quite a few people who use the terminal with these characters.
(Maybe my new package uniseg
can help you here.)
This is an oversight in git commit ef4e261f1f0f2198dbea63dfc6639910969ef297.
It would be great if you could add support for zero-width joiners (ZWJ). I have the following code example which doesn't work as expected:
package main
import (
"fmt"
runewidth "github.com/mattn/go-runewidth"
)
func main() {
e := "👨👨👧"
r := []rune(e)
var widths []int
for _, c := range r {
widths = append(widths, runewidth.RuneWidth(c))
}
fmt.Printf("%s : len=%d numrunes=%d width=%d widths=%v runes=%X\n", e, len(e), len(r), runewidth.StringWidth(e), widths, r)
}
The output is:
👨👨👧 : len=18 numrunes=5 width=6 widths=[2 0 2 0 2] runes=[1F468 200D 1F468 200D 1F467]
Specifically, width
should be 2
instead of 6
. I found this article which explains how they work. It does not only affect emojis but also characters in some languages.
This came up in rivo/tview#161. It would be great if support for ZWJ could be added so I can implement support for these Unicode characters in tview
. I understand that not all kinds of combinations are supported and it's probably difficult to figure out which ones are. But assuming these characters are supported will help a lot. I don't expect users to try to print ZWJ combinations which are not supported anyway.
Thanks!
Hi,
Consider the following three similar unicode characters:
'-' - Unicode Character 'HYPHEN-MINUS' (U+002D)
'–' - Unicode Character 'EN DASH' (U+2013)
'—' - Unicode Character 'EM DASH' (U+2014)
From https://github.com/shurcooL/markdownfmt/issues/7#issuecomment-46792756, I've learned that go-runewidth
considers the width of the first character to be 1, and the width of second and third characters to be 2.
Is that intended?
I'm not sure how to test this reliably, but in most environments it seems that EN DASH has width that's closer to 1 than 2.
Any thoughts on this?
I stumbled over a character that, when output to the console directly, takes up two characters. But StringWidth()
gives me 1
. This is because the first rune of this character has a width of 1
and that's what's being used, see here. I know I wrote this code and I'm sure that you cannot simply add up the widths of individual runes ("🏳️🌈" would then have a width of 4 which is obviously wrong) and using the first rune's width worked fine so far. But it turns out that it fails in some cases.
I'm not familiar with Indian characters but it seems to me that the second rune is a modifier that turns the character from a width of 1
into a width of 2
. Are you aware of any logic that we could add to go-runewidth
that makes this right?
Here's example code that illustrates the issue:
package main
import (
"fmt"
runewidth "github.com/mattn/go-runewidth"
)
func main() {
s := "खा"
fmt.Println("0123456789")
fmt.Println(s + "<")
fmt.Printf("String width: %d\n", runewidth.StringWidth(s))
var i int
for _, r := range s {
fmt.Printf("Rune %s (%d) width: %d\n", string(r), i, runewidth.RuneWidth(r))
i++
}
}
Output (on macOS with iTerm2):
ZeroWidthJoiner
was removed after v0.0.9
: https://github.com/mattn/go-runewidth/blob/v0.0.9/runewidth.go#L14
The next version was v0.0.10
, but this introduced a breaking API change.
While being v0
means you can introduce breaking API changes, would it be possible to get a v1
release that can ensure API stability?
It's fine to just keep cutting new versions when API changes happen, but right now it makes managing Go Module dependencies rather painful, since it just assumes patch versions don't introduce breaking changes.
If LANG="zh_CN.UTF8",the Tabs(0x2500-0x257F) will return 2,but should return 1
I stumbled over this while working on #47.
It seems that RuneWidth is not always equal to the StringWidth of a single rune.
This is quite unexpected, TBH.
Please see https://github.com/markus-oberhumer/mattn--go-runewidth/commit/5da511d36b1ea1ad913590b7b27357e5fffd3512 for a test case.
Added power support for the travis.yml file with ppc64le. This is part of the Ubuntu distribution for ppc64le. This helps us simplify testing later when distributions are re-building and re-releasing. For more info tag @gerrith3.
This is a question about how you are defining "width"? I'm mostly looking for a solution that gives me character width in monospaced fonts. So example in #39 and #36, the "width" would still be 2
as a flag although is considered 1 character in modern renders, it still takes up the space of 2 normal characters.