Welcome to ShenZhenJia Knowledge Sharing Community for programmer and developer-Open, Learning and Share
menu search
person
Welcome To Ask or Share your Answers For Others

Categories

What is the best way to convert a string from Unicode to ASCII without changing it's length (that is very important in my case)? Also the characters without any conversion problems must be at the same positions as in the original string. So an "?" must be converted to "A" and not something cryptic that has more characters.

Edit:
@novalis - Such symbols (for example of asian languages) should just be converted to some placeholders. I am not too interested in those words or what they mean.

@MtnViewMark - I must preserve the number of all characters and the position of ASCII available characters under any circumstance.

Here some more info: I have some text mining tools that can only process ASCII strings. Most of the text that should be processed is in English, but some do contain non ASCII characters. I am not interested in those words, but I must be sure that the words I am interested in (those that only contain ASCII characters) are at the same positions after the string conversion.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
thumb_up_alt 0 like thumb_down_alt 0 dislike
596 views
Welcome To Ask or Share your Answers For Others

1 Answer

As stated in this answer, the following code should work:

    String s = "口水雞 hello ?";

    String s1 = Normalizer.normalize(s, Normalizer.Form.NFKD);
    String regex = "[\p{InCombiningDiacriticalMarks}\p{IsLm}\p{IsSk}]+";

    String s2 = new String(s1.replaceAll(regex, "").getBytes("ascii"), "ascii");

    System.out.println(s2);
    System.out.println(s.length() == s2.length());

Output is

??? hello A
true

So you first remove diactrical marks, the convert to ascii. Non-ascii characters will become question marks.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
thumb_up_alt 0 like thumb_down_alt 0 dislike
Welcome to ShenZhenJia Knowledge Sharing Community for programmer and developer-Open, Learning and Share
...