Welcome to ShenZhenJia Knowledge Sharing Community for programmer and developer-Open, Learning and Share
menu search
person
Welcome To Ask or Share your Answers For Others

Categories

I am using a XML Text reader on a XML file that may contain characters that are invalid for the reader. My initial thought was to create my own version of the stream reader and clean out the bad characters but it is severely slowing down my program.

public class ClensingStream : StreamReader
{
        private static char[] badChars = { 'x00', 'x09', 'x0A', 'x10' };
    //snip
        public override int Read(char[] buffer, int index, int count)
        {
            var tmp = base.Read(buffer, index, count);

            for (int i = 0; i < buffer.Length; ++i)
            {
                //check the element in the buffer to see if it is one of the bad characters.
                if(badChars.Contains(buffer[i]))
                    buffer[i] = ' ';
            }

            return tmp;
        }
}

according to my profiler the code is spending 88% of its time in if(badChars.Contains(buffer[i])) what is the correct way to do this so I am not causing horrible slowness?

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
thumb_up_alt 0 like thumb_down_alt 0 dislike
141 views
Welcome To Ask or Share your Answers For Others

1 Answer

The reason that it spends so much time in that line is because the Contains method loops through the array to look for the character.

Put the characters in a HashSet<char> instead:

private static HashSet<char> badChars =
  new HashSet<char>(new char[] { 'x00', 'x09', 'x0A', 'x10' });

The code to check if the set contains the character looks the same as when looking in the array, but it uses the hash code of the character to look for it instead of looping through all the items in the array.

Alternatively, you could put the characters in a switch, that way the compiler would create an efficient comparison:

switch (buffer[i]]) {
  case 'x00':
  case 'x09':
  case 'x0A':
  case 'x10': buffer[i] = ' '; break;
}

If you have more characters (five or six IIRC), the compiler will actually create a hash table to look up the cases, so that would be similar to using a HashSet.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
thumb_up_alt 0 like thumb_down_alt 0 dislike
Welcome to ShenZhenJia Knowledge Sharing Community for programmer and developer-Open, Learning and Share
...