zhwiki dictionary for fcitx5-pinyin and rime Installation: - Arch Linux: Install from AUR: https://aur.archlinux.org/packages/fcitx5-pinyin-zhwiki/ - Others: Download latest version of "zhwiki.dict" from: https://github.com/felixonmars/fcitx5-pinyin-zhwiki/releases Copy into /usr/share/fcitx5/pinyin/dictionaries/ (create the folder if it does not exist) Build Requirements: libime (https://github.com/fcitx/libime/) Python modules: opencc (https://pypi.org/project/OpenCC/) pypinyin (https://pypi.org/project/pypinyin/) Manual Build & Installation: make sudo make install Manual Build rime dict & Installation make zhwiki.dict.yaml sudo make install_rime_dict License: Unlicense Note that the generated dictionary follows Wikimedia's license: https://dumps.wikimedia.org/legal.html
Fcitx 5 Pinyin Dictionary from zh.wikipedia.org
zhwiki dictionary for fcitx5-pinyin and rimeCategory: Python / HTML Manipulation |
Watchers: 5 |
Star: 136 |
Fork: 9 |
Last update: Jul 15, 2020 |
rime自带opencc,内置的字典都是繁体字典,故不需要简繁转换 rime自带自动注音功能,并且能够处理多音字,故不需要手动注音,详见
关于rime的多音字处理,例如:長月達平
,会自动添加chang yue da ping
和zhang yue da ping
两个拼音,比pypinyin方便多了😅
关掉pypinyin的注音后,还有个好处,词典文件小一半(
I tried to update it through the AUR, it was working fine until a recent system update, which gives me this error
[omitted]
INFO:root:1242000 words generated
INFO:root:1243000 words generated
INFO:root:1244000 words generated
INFO:root:1244188 words generated
libime_pinyindict zhwiki.raw zhwiki.dict
libime_pinyindict: symbol lookup error: /usr/lib/libIMECore.so.0: undefined symbol: _ZN5fcitx3Log9logStreamEv
make: *** [Makefile:26: zhwiki.dict] Error 127
==> ERROR: A failure occurred in build().
Aborting...
error making: fcitx5-pinyin-zhwiki
Use sort -u
to remove duplicate words.
Although "n" is the correct pinyin for "嗯", I think people are used to typing "en" in pinyin input method. There is another similar situation for "呣", see this issue for reference.
I checked the zhwiki-20200801.dict.yaml
file, it only involves two words, "呒染挖啊嗯啊" and "嗯哈哈乐团", so it is not a big deal.
Maybe this following codes will help:
# convert.py
# to replace line 64
pinyin_with_n_m = lazy_pinyin(title)
fix_dic = {'n':'en','m':'mu'}
pinyin_fixed = [ fix_dic[i] if i in fix_dic else i for i in pinyin_with_n_m ]
pinyin = _PINYIN_SEPARATOR.join(pinyin_fixed)
Thank you for the excellent dictionary, it helps me a lot.
I run command yay -S fcitx5-pinyin-zhwiki
Got error:
yay -S fcitx5-pinyin-zhwiki
:: Checking for conflicts...
:: Checking for inner conflicts...
[Aur: 1] fcitx5-pinyin-zhwiki-20200601-1
1 fcitx5-pinyin-zhwiki (Build Files Exist)
==> Packages to cleanBuild?
==> [N]one [A]ll [Ab]ort [I]nstalled [No]tInstalled or (1 2 3, 1-3, ^4)
==>
:: PKGBUILD up to date, Skipping (1/1): fcitx5-pinyin-zhwiki
1 fcitx5-pinyin-zhwiki (Build Files Exist)
==> Diffs to show?
==> [N]one [A]ll [Ab]ort [I]nstalled [No]tInstalled or (1 2 3, 1-3, ^4)
==>
:: Parsing SRCINFO (1/1): fcitx5-pinyin-zhwiki
1 fcitx5-pinyin-zhwiki (Build Files Exist)
==> PKGBUILDs to edit?
==> [N]one [A]ll [Ab]ort [I]nstalled [No]tInstalled or (1 2 3, 1-3, ^4)
==>
==> Making package: fcitx5-pinyin-zhwiki 20200601-1 (Sat 20 Jun 2020 09:27:00 AM CST)
==> Retrieving sources...
-> Found fcitx5-pinyin-zhwiki-0.2.1.tar.gz
-> Found zhwiki-20200601-all-titles-in-ns0.gz
==> Validating source files with md5sums...
fcitx5-pinyin-zhwiki-0.2.1.tar.gz ... Passed
zhwiki-20200601-all-titles-in-ns0.gz ... Passed
==> Making package: fcitx5-pinyin-zhwiki 20200601-1 (Sat 20 Jun 2020 09:27:01 AM CST)
==> Checking runtime dependencies...
==> Checking buildtime dependencies...
==> Retrieving sources...
-> Found fcitx5-pinyin-zhwiki-0.2.1.tar.gz
-> Found zhwiki-20200601-all-titles-in-ns0.gz
==> Validating source files with md5sums...
fcitx5-pinyin-zhwiki-0.2.1.tar.gz ... Passed
zhwiki-20200601-all-titles-in-ns0.gz ... Passed
==> Removing existing $srcdir/ directory...
==> Extracting sources...
-> Extracting fcitx5-pinyin-zhwiki-0.2.1.tar.gz with bsdtar
-> Extracting zhwiki-20200601-all-titles-in-ns0.gz with gzip
==> Starting prepare()...
==> Sources are ready.
==> Making package: fcitx5-pinyin-zhwiki 20200601-1 (Sat 20 Jun 2020 09:27:04 AM CST)
==> Checking runtime dependencies...
==> Checking buildtime dependencies...
==> WARNING: Using existing $srcdir/ tree
==> Removing existing $pkgdir/ directory...
==> Starting build()...
gzip -k -d zhwiki-20200601-all-titles-in-ns0.gz
./zhwiki-web-slang.py > web-slang.source
Traceback (most recent call last):
File "/usr/lib64/python3.8/urllib/request.py", line 1350, in do_open
h.request(req.get_method(), req.selector, req.data, headers,
File "/usr/lib64/python3.8/http/client.py", line 1240, in request
self._send_request(method, url, body, headers, encode_chunked)
File "/usr/lib64/python3.8/http/client.py", line 1286, in _send_request
self.endheaders(body, encode_chunked=encode_chunked)
File "/usr/lib64/python3.8/http/client.py", line 1235, in endheaders
self._send_output(message_body, encode_chunked=encode_chunked)
File "/usr/lib64/python3.8/http/client.py", line 1006, in _send_output
self.send(msg)
File "/usr/lib64/python3.8/http/client.py", line 946, in send
self.connect()
File "/usr/lib64/python3.8/http/client.py", line 1402, in connect
super().connect()
File "/usr/lib64/python3.8/http/client.py", line 917, in connect
self.sock = self._create_connection(
File "/usr/lib64/python3.8/socket.py", line 808, in create_connection
raise err
File "/usr/lib64/python3.8/socket.py", line 796, in create_connection
sock.connect(sa)
OSError: [Errno 101] Network is unreachable
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "./zhwiki-web-slang.py", line 11, in <module>
page = urllib.request.urlopen(_ZHWIKI_SOURCE_URL + urllib.parse.quote(_PAGE)).read()
File "/usr/lib64/python3.8/urllib/request.py", line 222, in urlopen
return opener.open(url, data, timeout)
File "/usr/lib64/python3.8/urllib/request.py", line 525, in open
response = self._open(req, data)
File "/usr/lib64/python3.8/urllib/request.py", line 542, in _open
result = self._call_chain(self.handle_open, protocol, protocol +
File "/usr/lib64/python3.8/urllib/request.py", line 502, in _call_chain
result = func(*args)
File "/usr/lib64/python3.8/urllib/request.py", line 1393, in https_open
return self.do_open(http.client.HTTPSConnection, req,
File "/usr/lib64/python3.8/urllib/request.py", line 1353, in do_open
raise URLError(err)
urllib.error.URLError: <urlopen error [Errno 101] Network is unreachable>
make: *** [Makefile:14: web-slang.source] Error 1
==> ERROR: A failure occurred in build().
Aborting...
Error making: fcitx5-pinyin-zhwiki
I have installed dependecny:
sudo pacman -S --noconfirm libime opencc
And
pip install -U opencc pypinyin
内容重复了一次
是否可以直接发布繁体字版本?毕竟 rime 词库应该就是繁体字的。谢谢。
通过Github Action自动打包
tag名,我直接用的Makefile里面的VERSION=xxx
,你这边要不要对应改下?
其一,维基文库的标题可能也可以拿来做词典用; 其二,不知道维基百科的公共转换组功能是否可以用来助力简繁转换,毕竟这是有大量实践及人工调整的简繁转换工程。
例如,细胞词库可以使用 弗雷德霍姆行列式
测试是否启用成功
比如 孟山都 meng shan dou 应该是 孟山都 meng shan du