Disabling html encoding

Nov 11, 2011 at 10:39 PM

Hi!

First of all, thanks for the great library!

Today I came to notice that BBCodeParser.ToHtml() also encodes the string. Could there be some way to make this encoding optional? E.g. if one's using some anti-XSS library then the content could be pre-escaped, so there is no need for encoding, or even, encoding again clutters the string.

Could you please implement this?

Coordinator
Nov 12, 2011 at 12:19 PM
Edited Nov 12, 2011 at 12:19 PM

This is already possible. Say, you wanted to have a tag called [unencoded] that passes its content through as HTML:

public static readonly BBCodeParser DefaultBBCodeParser = new BBCodeParser(new[]
    {
        new BBTag("unencoded""<div>""</div>",
            new BBAttribute("htmlString"""nullHtmlEncodingMode.UnsafeDontEncode)),
    });

Done.

Nov 13, 2011 at 6:40 PM

Thanks. Yes, that would do it if one'd like to disable encoding for a tag. However I'd like to disable encoding altogether, for the whole string. Now of course it's possible to wrap the whole string into an unencoded tag, but that just feels a bit hack-ish.

Coordinator
Nov 16, 2011 at 12:32 PM

I am not sure what this would be useful for. I would not recommend you produce entirely unescaped output and only later try to sanitize it. Sanitization is either unsafe or destroys content.

Anyway, this is not possible right now and I don't think this feature would fit well in the library. You can modify the source code however and remove/disable all usages of HttpUtility.Html(Attribute)Encode.

Nov 16, 2011 at 12:55 PM

No, not sanitizing after parsing, I'd feed already escaped content to the parser.

As I mentioned, there are situations where the string that should be parsed is already escaped, like when it comes from a datasource out of our competence and the data it produces is already escaped, or something more important: if we want to use an anti-xss library. HttpUtility.HtmlEncode is not safe for avoiding XSS, but since the parser uses it to encode the output there is no simple way of using any other library to defeat XSS attacks.

Coordinator
Nov 16, 2011 at 5:11 PM
HttpUtility.HtmlEncode _does_ safely avoid XSS (if used correctly). Any example of what would be unsafe?
But I understand what you mean. Currently, the only way to change the type of encoding is to change the source code or configure every single tag to not encode its output.
Nov 16, 2011 at 5:38 PM

Please see this Stackoverflow thread that outlines the (important) differences between HttpUtility.HtmlEncode and the standard ASP.NET Anti-XSS library (and why Anti-XSS is better for preventing XSS attacks), as well as the blog entry linked there that demonstrated a concrete attack that defeats HttpUtility.HtmlEncod. Also, the download page of Anti-XSS describes why HttpUtility.HtmlEncode is disadvised in terms of preventing XSS.

Coordinator
Nov 23, 2011 at 4:17 PM

The blog post you linked is vulnerable because HtmlEncode is used to encode an attribute value. You need to use HtmlAttributeEncode which is 100% safe. Codekicker.BBCode does this correctly.

You always need to choose the correct encoding function depending on the syntactic context (HTML text vs. HTML attribute in this case). You wouldn't expect HtmlEncode to work in a Javascript literal, wouldn't you? It is just the wrong encoder.

AntiXSS is only useful if you want to keep some parts of the HTML or convert HTML to text. It adds nothing at all if you just want to keep all chars literally.

Dec 13, 2011 at 9:17 PM

I didn't know that with HtmlAttributeEncode, thanks.

Jan 31, 2012 at 8:50 PM
Edited Jan 31, 2012 at 9:08 PM

I too would like to be able to disable Html encoding for the entire thing. I have to run the string through two different versions of the parser (one before data reaches database and again when the data is pulled out) and this encoding is tripping me up. I think the choice should be up to us, even if the author thinks it is less secure.

In my scenario I have some special tags that need some pre-rendering before the content is saved to the database. The pre-rendering simplifies these special tags and turns them into basic tags. Then I store the content with basic tags in the DB. When I pull the content out I want to run through a different parser to turn all tags to HTML before rendering to client. I can't store HTML in the database, only BBCode because it may not necessarily be written out to a web browser so the parsing needs to be different based on how it is being viewed.

The problem is that this HTML encoding happens on the first parser (before the DB) so even content that's not going to be sent to a browser is HTML encoded. Also, when the data is pulled out the content is re-encoded by the second parser. So instead of encoding < to &lt; it ends up being &amp;lt;. I really need the ability to disable encoding in the first parser (before the DB).

Great library, but this inability to turn this off is super super annoying.

Silly that I have to re-compile the source just to remove calls to htmlencode.

Jan 31, 2012 at 9:40 PM
Edited Jan 31, 2012 at 9:40 PM

For those of you interested in a version where you can disable HTML encoding, I downloaded the source and I modified it to include an overload to the ToHtml() method.

 

To disable HTML encoding simply change this:

parser.ToHtml(text);

To this:

parser.ToHtml(text, false);

 

You can find this version here: http://www.CodeTunnel.com/content/CodeKicker.BBCode.zip