pykakasi

Kana kanji simple inversion library

These details have not been verified by PyPI

Project links

Homepage

Project description

Pykakasi

Overview

pykakasi is a Python Natural Language Processing (NLP) library to transliterate hiragana, katakana and kanji (Japanese text) into rōmaji (Latin/Roman alphabet). It can handle characters in NFC form.

Its algorithms are based on the kakasi library, which is written in C.

Install (from PyPI): pip install pykakasi
Documentation available on readthedocs

Supported python versions

pykakasi supports python 3.6, 3.7, 3.8, 3.9, and pypy3

Usage

Transliterate Japanese text to kana, hiragana and romaji:

import pykakasi
kks = pykakasi.kakasi()
text = "かな漢字"
result = kks.convert(text)
for item in result:
    print("{}: kana '{}', hiragana '{}', romaji: '{}'".format(item['orig'], item['kana'], item['hira'], item['hepburn']))

かな: kana 'カナ', hiragana: 'かな', romaji: 'kana'
漢字: kana 'カンジ', hiragana: 'かんじ', romaji: 'kanji'

Here is an example that output as similar with furigana mode.

import pykakasi
kks = pykakasi.kakasi()
text = "かな漢字交じり文"
result = kks.convert(text)
for item in result:
    print("{}[{}] ".format(item['orig'], item['hepburn'].capitalize()), end='')
print()

かな[Kana] 漢字[Kanji] 交じり[Majiri] 文[Bun]

Benchmark result

You can see benchmark result on various versions and platforms at https://github.com/miurahr/pykakasi/issues/123

Copyright and License

PyKakasi::

KAKASI Dictionary::

UniDic::

Unidic is released under any of the GPL2, the LGPL2.1, or the 3-clause BSD License. (See src/data/unidic/BSD.txt) PyKakasi relicenses a part of the unidic with GPL3+.

This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.

This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

You should have received a copy of the GNU General Public License along with this program. If not, see <http://www.gnu.org/licenses/>.

PyKakasi ChangeLog

All notable changes to this project will be documented in this file.

Unreleased

Added

dictionary: add noun and adjectives from UniDic(#140)

Changed

Fixed

Fix segmentation (wakati) when combination with Katakana and Hiragana(#142)

Deprecated

Removed

Security

v2.1.1 (16, May 2021)

Added

Provide Kakasi.normalize(text) class method
Add unidic data into data (not used yet), and add parse utility.

Fixed

Put type hint stub into package
Copyright notifications

Changed

Expand all cletter into dictionary (#139)
Change primary kanwadict index from str to int
test: gather all legacy test into test_pykakasi_legacy.py file.

v2.1.0 (6, May 2021)

Added

Deprecation warning when using old api(#124)
Add type hint file(pyi) (#124)
Benchmark test codes(#122)

Changed

Cache internal results and improve performance about 30-40 times.(#128)
Use standard pickle for database file(#128)
Exceptions module is now pykakasi, not pykakasi.exceptions

Removed

Dependency for klepto(#128)

v2.0.8 (4, May 2021)

Added

test: Benchmark and profiling (#122)

Changed

Performance: avoid ord() when checking long-mark, speed up about 6%
Reformat code by black(#121)

v2.0.7 (26, Feb. 2021)

Fixed

Infinite loop after running for a while, handle independent HW VOICED SOUND MARK (#115, #118)

v2.0.6 (7, Feb. 2021)

Fixed

Hiragana for Age countersa(#116,#117)

v2.0.5 (5, Feb. 2021)

Changed

CLI: use argparse for option parse(#113)

Fixed

Handle 思った、言った、行った properly.(#114)
CI: fix coveralls error

Deprecated

CI: drop travis-ci test and badge

v2.0.4 (26, Nov. 2020)

Fixed

CLI: Fix -v and -h option crash on python 3.7 and before (#108).

v2.0.3 (25, Nov. 2020)

Fixed

CLI: Fix -v and -h option crash (#108).

v2.0.2 (23, Jul. 2020)

Fixed

Fix convert() to handle Katakana correctly.(#103)

v2.0.1 (23, Jul. 2020)

Changed

Update setup.py, setup.cfg, tox.ini(#102)

Fixed

Fix convert() misses last part of a text (#99, #100)
Fix CI, coverage, and coveralls configurations(#101)

Algorithm	Hash digest
SHA256	`75aa8840f077c0d0e4ca68f036d62657faf9047a7c7b8d2f77c51ec1ec9fa8d0`
MD5	`279e24c4b8cf1f15c1f256cf5c1f526c`
BLAKE2b-256	`3064cde3c95b51e834760df8a9a450d9c4c6dda2119a75f698fbf9b1882fa391`

Algorithm	Hash digest
SHA256	`5d81e1e6918eba5a0b7211e433bef335b61bde876a66bbe57dfd133da895661c`
MD5	`9a9d8eb19a0ac2d6e547da49450bc513`
BLAKE2b-256	`b097d23bde7a2a305219af5b226683ee8365c1ac06e9b2b040ba1bf4e86c6f26`

pykakasi 2.2.0b2

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Pykakasi

Overview

Supported python versions

Usage

Benchmark result

Copyright and License

PyKakasi ChangeLog

Unreleased

Added

Changed

Fixed

Deprecated

Removed

Security

v2.1.1 (16, May 2021)

Added

Fixed

Changed

v2.1.0 (6, May 2021)

Added

Changed

Removed

v2.0.8 (4, May 2021)

Added

Changed

v2.0.7 (26, Feb. 2021)

v2.0.6 (7, Feb. 2021)

v2.0.5 (5, Feb. 2021)

Changed

Fixed

Deprecated

v2.0.4 (26, Nov. 2020)

v2.0.3 (25, Nov. 2020)

v2.0.2 (23, Jul. 2020)

v2.0.1 (23, Jul. 2020)

Changed

Fixed

v2.0.0 (31, May. 2020)

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes