.net - C# Encoding.Converting Latin to Hebrew

Question

Welcome To Ask or Share your Answers For Others

.net - C# Encoding.Converting Latin to Hebrew

asked Oct 24, 2021 in Technique[技术] by 深蓝 (71.8m points)

I'm trying to fetch and parse an online excel document which is written in hebrew but unfortunately in a non-hebrew encoding.

As an example I'm trying to convert the following string: "aìé??_1", which serves as the 1st sheet name to hebrew using C# code, but I'm unable to do so.

I know the above is convertible, since when I open it up in NotePad++ and select Encoding/Character Sets/Hebrew/Windows 1255, I can see: "?????_1" which is the correct hebrew representation of the above string.

I'm using the below code

            string str = "aìé??_1";

            Encoding windows = Encoding.GetEncoding("Windows-1255");
            Encoding ascii = Encoding.GetEncoding("Windows-1252");
            byte[] asciiBytes = ascii.GetBytes(str);
            byte[] windowsBytes = Encoding.Convert(ascii, windows, asciiBytes);

            char[] windowsChars = new char[windows.GetCharCount(windowsBytes, 0, windowsBytes.Length)];
            windows.GetChars(windowsBytes, 0, windowsBytes.Length, windowsChars, 0);
            string windowsString = new string(windowsChars);

I assumed that the encoding of the origin string is Windows-1252 since when I paste it in NotePad++ and change the encoding to Windows-1252 the string remains the same...

I'm probably doing something wrong here, anyone know how to convert the above correctly?

Thanks,

Mikey

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

1.0k views

1 Answer

深蓝 · Answer 1 · 2021-10-23T18:24:37+0000

const string Str = "aìé??_1";

Encoding latinEncoding = Encoding.GetEncoding("Windows-1252");
Encoding hebrewEncoding = Encoding.GetEncoding("Windows-1255");

byte[] latinBytes = latinEncoding.GetBytes(Str);

string hebrewString = hebrewEncoding.GetString(latinBytes);

hebrewString:

?????_1

In your supplied example "Window-1252" is not actualy ASCII, it is extended ASCII, and for some reason Encoding.Convert with these two encodings cannot convert extended range ASCII, so all +127 characters are converted as 63 (i.e. ?). When "converting" from one extended ASCII character byte[] to another, I would expect the bytes to be the same, it is only when you convert them to a .Net unicode string I would expect them to be different. Not sure why Convert is converting +127 chars to '?'.

Categories

.net - C# Encoding.Converting Latin to Hebrew

Please log in or register to add a comment.

Please log in or register to answer this question.

1 Answer

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags