go-runewidth
Provides functions to get fixed width of the character or string.
Usage
runewidth.StringWidth("つのだ☆HIRO") == 12
Author
Yasuhiro Matsumoto
License
under the MIT License: http://mattn.mit-license.org/2013
Category: Golang / Text Processing |
Watchers: 14 |
Star: 508 |
Fork: 80 |
Last update: Jan 24, 2023 |
Provides functions to get fixed width of the character or string.
runewidth.StringWidth("つのだ☆HIRO") == 12
Yasuhiro Matsumoto
under the MIT License: http://mattn.mit-license.org/2013
This introduces an implementation of StringWidth() using Unicode grapheme clusters which should be the correct way to split a string into its individual characters. The built-in assumption is that if we have combined runes (emojis, flags etc.), their width is the width of the first non-zero-width rune. Many of these combined runes were previously not handled correctly by this package.
Please note:
rivo/uniseg
over.)TestStringWidth
test case but only the part where EastAsianWidth = true
. I'm not very familiar with this flag so I don't know how to fix that. You may want to review this.Update runewidth to use unicode9 character width tables. This is the default in vim and neovim now, so should be safe to use in any terminal.
The Condition
is no longer calculated using IsEastAsian()
as terminals do not use locale to determine how wide to draw ambiguous with characters. Rather, they simply default to 1, and may offer an option to set to 2 (which is discouraged).
All tests still pass. The API is very close to the way it was, but not identical due to the change in Condition
.
I recognize this is a fairly large diff, so I'd be happy to work with you in any way you feel best to get this merged.
trim prefix method:
n
cells from the beginning of the stringprefix
to string if setCheck this for the definition of box-drawing (BD below) characters.
I found that these characters are defined to be of ambiguous width, so passing these to RuneWidth
returns 2 in my environment. This is somehow inconvenient since AFAIK, terminal fonts tend to interpret BD characters in half-width.
Is it possible to remove these characters from the ambiguous table? I can make the PR if you think this sounds sane.
Thanks.
Here's a short example that illustrates an issue with flags (or "regional indicators"):
fmt.Println(runewidth.StringWidth("🇩🇪")) // Should be "2", outputs "4".
The flag consists of two code points which are processed separately by runewidth
. But most modern systems will combine them into one flag emoji.
This is part of a larger topic which I describe in more detail here: gdamore/tcell#264. It doesn't just affect flags but also characters in e.g. Arabic and Korean where there are more sophisticated rules than "combining characters" and zero-width joiners (which you added with #20).
I don't know exactly how you calculate the widths of characters. I'm also not sure how you would solve flags as well as some of the other rules described in the Unicode specification but it would sure be nice as printing these flags currently gives me trouble in tview
. There have been multiple issues asking for better support for different languages and emojis so it seems that there are quite a few people who use the terminal with these characters.
(Maybe my new package uniseg
can help you here.)
This is an oversight in git commit ef4e261f1f0f2198dbea63dfc6639910969ef297.
The rivo/uniseg
package has received a major update which also includes methods for grapheme cluster parsing that are much faster than the previously used Graphemes
class.
I've upgraded your package accordingly and updated the relevant code to use these faster methods. It would be great if you could merge these changes.
Thank you!
ps. I noticed that some automatic checks did not complete successfully because they are still running on Go 1.15. Would you like me to look into upgrading them to the current version (1.18)?
I stumbled over a character that, when output to the console directly, takes up two characters. But StringWidth()
gives me 1
. This is because the first rune of this character has a width of 1
and that's what's being used, see here. I know I wrote this code and I'm sure that you cannot simply add up the widths of individual runes ("🏳️🌈" would then have a width of 4 which is obviously wrong) and using the first rune's width worked fine so far. But it turns out that it fails in some cases.
I'm not familiar with Indian characters but it seems to me that the second rune is a modifier that turns the character from a width of 1
into a width of 2
. Are you aware of any logic that we could add to go-runewidth
that makes this right?
Here's example code that illustrates the issue:
package main
import (
"fmt"
runewidth "github.com/mattn/go-runewidth"
)
func main() {
s := "खा"
fmt.Println("0123456789")
fmt.Println(s + "<")
fmt.Printf("String width: %d\n", runewidth.StringWidth(s))
var i int
for _, r := range s {
fmt.Printf("Rune %s (%d) width: %d\n", string(r), i, runewidth.RuneWidth(r))
i++
}
}
Output (on macOS with iTerm2):
ZeroWidthJoiner
was removed after v0.0.9
: https://github.com/mattn/go-runewidth/blob/v0.0.9/runewidth.go#L14
The next version was v0.0.10
, but this introduced a breaking API change.
While being v0
means you can introduce breaking API changes, would it be possible to get a v1
release that can ensure API stability?
It's fine to just keep cutting new versions when API changes happen, but right now it makes managing Go Module dependencies rather painful, since it just assumes patch versions don't introduce breaking changes.
If LANG="zh_CN.UTF8",the Tabs(0x2500-0x257F) will return 2,but should return 1
I stumbled over this while working on #47.
It seems that RuneWidth is not always equal to the StringWidth of a single rune.
This is quite unexpected, TBH.
Please see https://github.com/markus-oberhumer/mattn--go-runewidth/commit/5da511d36b1ea1ad913590b7b27357e5fffd3512 for a test case.