php - How to detect language of text?

Question

Ask a Question

Welcome To Ask or Share your Answers For Others

php - How to detect language of text?

asked Oct 24, 2021 in Technique[技术] by 深蓝 (71.8m points)

I have a form which lets users input text snippets. So how can figure out the language of the entered text?

Specifically these languages for now:

Arabic: ??? ?? ??? ?????? ???????

Chinese: 这是一些阿拉伯文字

Japanese: これは、いくつかのアラビア語のテキストです

[Edit] The detection has work on text which is retrieved via an API too (no browser involved)

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

544 views

1 Answer

深蓝 · Answer 1 · 2021-10-23T19:32:12+0000

You can figure out whether the characters are from the Arabic, Chinese, or Japanese sections of the Unicode map.

If you look at the list on Wikipedia, you'll see that each of those languages has many sections of the map. But you're not doing translation, so you don't need to worry about every last glyph.

For example, your Chinese text begins (in hex) 0x8FD9 0x662F 0x4E00 - and those are all in the "CJK Unified Ideographs" section, which is Chinese. Here are a few ranges to get you started:

Arabic (0600–06FF)

Japanese

Hiragana (3040–309F)
Katakana (30A0–30FF)
Kanbun (3190–319F)

Chinese

CJK Unified Ideographs (4E00–9FFF)

(I got the hex for your Chinese by using a Chinese to Unicode Converter.)

Categories

php - How to detect language of text?

Please log in or register to add a comment.

Please log in or register to answer this question.

1 Answer

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags