Home > ASP.NET > HTML Entities

HTML Entities

As of writing, there are various forms of HTML Encoding

  1. The old school version, e.g &lt, >l&
  2. URL Encoding e.g %20
  3. HTML Entities/HTML Number e.g ! (http://www.ascii.cl/htmlcodes.htm)

However the built in libraries in .NET is only capable of handling #1 and #2


For #1 we can either use System.Web.HttpUtility.HtmlDecode or System.Net.WebUtility.HtmlDecode.


For #2, we can use the same libraries as #1 but changing it to URLDecode


However it seems there is no built in libraries to handle #3. The good news is there is an easy way to do it with the current available libraries in .NET


Since all HTML entities are in the format &#[number]; we can use regex to find all of such patterns and replace them with the equivalent character (char)number


Sample code of such a function:

static string HtmlDecode(string s)
           var s2 = WebUtility.HtmlDecode(s);
           foreach(Match M in Regex.Matches(s2, @"\&\#(.*?)\;", RegexOptions.IgnoreCase | RegexOptions.Multiline | RegexOptions.Singleline))
               s2 = Regex.Replace(s2, @"\&\#" + M.Groups[1].Value + @"\;", ""+(char)Convert.ToInt64(M.Groups[1].Value), RegexOptions.IgnoreCase | RegexOptions.Multiline | RegexOptions.Singleline);

           return s2;

Categories: ASP.NET
  1. No comments yet.
  1. No trackbacks yet.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: