Welcome to ShenZhenJia Knowledge Sharing Community for programmer and developer-Open, Learning and Share
menu search
person
Welcome To Ask or Share your Answers For Others

Categories

I'm reading in an XML stream that's approximately 100mb, and I'd like to replace values that are over 1mb.

example input

<root>
    <visit>yes</visit>
    <filedata>SDFSFDSDFfgdfgsgdf==(this is 5 mb)</filedata>
    <type>pdf</type>
    <moredata>sssssssssssssss (this 2mb)</moredata>
</root>

expected output

<root>
    <visit>yes</visit>
    <filedata>REPLACED TEXT</filedata>
    <type>pdf</type>
    <moredata>REPLACED TEXT</moredata>
</root>

Here's what I am using to read the stream, as well as checking the size:

XmlReader rdr = XmlReader.Create (new System.IO.StringReader (xml));
while (rdr.Read ()) {
    if (rdr?.Value.Length > ONEMEGABYTE) {
        //replace value with "REPLACE TEXT"}
    }

How do I replace the value in rdr.Value?

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
thumb_up_alt 0 like thumb_down_alt 0 dislike
150 views
Welcome To Ask or Share your Answers For Others

1 Answer

You can subclass XmlReader to "filter" out undesired elements, then use XmlDocument.Load() with your reader instead of letting it create its own.

Note that this will exclude only the value of the offending tags: If you put a breakpoint in your Read() loop, you'll find that <foo>bar</foo> comes in three pieces: <foo> has NodeType Element with no value, "bar" has NodeType Text, with an empty LocalName, and </foo> is NodeType EndElement with no value. If "bar" were over the limit length, the "filter" below would turn <foo>bar</foo> into <foo></foo> To exclude all of <foo>bar</foo> based on the length of "bar", you'd have to look ahead. Doable, but maybe not worth your time. Hopefully that's not a requirement here.

An alternative (or addition) to this class might be a version of this with a Func<string, string> that every Value is passed through: s => (s.Length > MAX_LEN) ? "" : s.

Also, for all I know, XmlTextReaderImpl (the actual type of _reader) may cache the whole text and kill your performance anyway. You may have to write your own guts for the thing as well.

public class FilteredXmlReader : XmlReader
{
    public Func<XmlReader, bool> Filter;

    private XmlReader _reader;
    private FilteredXmlReader(TextReader input, Func<XmlReader, bool> filterProc)
    {
        Filter = filterProc;
        _reader = XmlReader.Create(input);
    }

    public static new XmlReader Create(TextReader input, Func<XmlReader, bool> filterProc)
    {
        return new FilteredXmlReader(input, filterProc);
    }

    public override bool Read()
    {
        var b = _reader.Read();

        while (!(bool)Filter?.Invoke(_reader))
        {
            b = _reader.Read();
        }

        return b;
    }

    #region Wrapper Boilerplate

    public override XmlNodeType NodeType => _reader.NodeType;

    public override string LocalName => _reader.LocalName;

    public override string NamespaceURI => _reader.NamespaceURI;

    public override string Prefix => _reader.Prefix;

    public override string Value => _reader.Value;

    public override int Depth => _reader.Depth;

    public override string BaseURI => _reader.BaseURI;

    public override bool IsEmptyElement => _reader.IsEmptyElement;

    public override int AttributeCount => _reader.AttributeCount;

    public override bool EOF => _reader.EOF;

    public override ReadState ReadState => _reader.ReadState;

    public override XmlNameTable NameTable => _reader.NameTable;

    public override string GetAttribute(string name) => _reader.GetAttribute(name);

    public override string GetAttribute(string name, string namespaceURI) => _reader.GetAttribute(name, namespaceURI);

    public override string GetAttribute(int i) => _reader.GetAttribute(i);

    public override string LookupNamespace(string prefix) => _reader.LookupNamespace(prefix);

    public override bool MoveToAttribute(string name) => _reader.MoveToAttribute(name);

    public override bool MoveToAttribute(string name, string ns) => _reader.MoveToAttribute(name, ns);

    public override bool MoveToElement() => _reader.MoveToElement();

    public override bool MoveToFirstAttribute() => _reader.MoveToFirstAttribute();

    public override bool MoveToNextAttribute() => _reader.MoveToNextAttribute();

    public override bool ReadAttributeValue() => _reader.ReadAttributeValue();

    public override void ResolveEntity() => _reader.ResolveEntity();

    #endregion Wrapper Boilerplate
}

Usage:

var xml = "<test />";
XmlDocument doc = new XmlDocument();

XmlReader rdr = FilteredXmlReader.Create(new System.IO.StringReader(xml), 
                    r => r?.Value.Length < 20);

var filteredXML = doc.OuterXml;

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
thumb_up_alt 0 like thumb_down_alt 0 dislike
Welcome to ShenZhenJia Knowledge Sharing Community for programmer and developer-Open, Learning and Share
...