Here is a simple extension method (provided in both VB and C#) that uses a regular expression to remove HTML from a string. This has worked well in every case I’ve used it in so far.
C#
/// <summary>
/// Removes HTML from a string.
/// </summary>
public static string RemoveHtml(string html)
{
html = Regex.Replace(html, "<(.|\\n)*?>", string.Empty);
html = html.Replace("\t", " ");
html = html.Replace("\r\n", string.Empty);
html = html.Replace(" ", " ");
return html.Replace(" ", " ");
}
VB.Net
''' <summary>
''' Removes HTML from a string.
''' </summary>
<Extension()> _
Public Shared Function RemoveHtml(html As String) As String
html = Regex.Replace(html, "<(.|\n)*?>", String.Empty)
html = html.Replace(vbTab, " ")
html = html.Replace(vbCrLf, String.Empty)
html = html.Replace(" ", " ")
Return html.Replace(" ", " ")
End Function