Welcome to ShenZhenJia Knowledge Sharing Community for programmer and developer-Open, Learning and Share
menu search
person
Welcome To Ask or Share your Answers For Others

Categories

I am just looking for a really easy way to clean up some HTML (possibly with embedded JavaScript code). I tried two different HTML Tidy .NET ports and both are throwing exceptions...

Sorry, by "clean" I mean "indent". The HTML is not malformed, at all. It's XHTML strict.


I finally got something working with SGML, but this is seriously the most ridiculous chunk of code ever to indent some HTML.

private static string FormatHtml(string input)
{
    var sgml = new SgmlReader {DocType = "HTML", InputStream = new StringReader(input)};
    using (var sw = new StringWriter())
    using (var xw = new XmlTextWriter(sw) { Indentation = 2, Formatting = Formatting.Indented })
    {
        sgml.Read();
        while (!sgml.EOF)
            xw.WriteNode(sgml, true);
    }
    return sw.ToString();
}
See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
thumb_up_alt 0 like thumb_down_alt 0 dislike
423 views
Welcome To Ask or Share your Answers For Others

1 Answer

AngleSharp 100% c#

    var parser = new HtmlParser();

    var document = parser.Parse("<html><head></head><body><i></i></body></html>");

    var sw = new StringWriter();
    document.ToHtml(sw, new PrettyMarkupFormatter());

    var HTML_prettified = sw.ToString());

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
thumb_up_alt 0 like thumb_down_alt 0 dislike
Welcome to ShenZhenJia Knowledge Sharing Community for programmer and developer-Open, Learning and Share

548k questions

547k answers

4 comments

86.3k users

...