I am just looking for a really easy way to clean up some HTML (possibly with embedded JavaScript code). I tried two different HTML Tidy .NET ports and both are throwing exceptions...
Sorry, by "clean" I mean "indent". The HTML is not malformed, at all. It's XHTML strict.
I finally got something working with SGML, but this is seriously the most ridiculous chunk of code ever to indent some HTML.
private static string FormatHtml(string input)
{
var sgml = new SgmlReader {DocType = "HTML", InputStream = new StringReader(input)};
using (var sw = new StringWriter())
using (var xw = new XmlTextWriter(sw) { Indentation = 2, Formatting = Formatting.Indented })
{
sgml.Read();
while (!sgml.EOF)
xw.WriteNode(sgml, true);
}
return sw.ToString();
}
See Question&Answers more detail:os