Remove HTML from a string (VB.Net & C#)


Here is a simple extension method (provided in both VB and C#) that uses a regular expression to remove HTML from a string.  This has worked well in every case I’ve used it in so far.

C#

/// <summary>
/// Removes HTML from a string. 
/// </summary>
public static string RemoveHtml(string html)
{
    html = Regex.Replace(html, "<(.|\\n)*?>", string.Empty);
    html = html.Replace("\t", " ");
    html = html.Replace("\r\n", string.Empty);
    html = html.Replace("   ", " ");
    return html.Replace("  ", " ");
}

VB.Net

    ''' <summary>
    ''' Removes HTML from a string. 
    ''' </summary>
    <Extension()> _
    Public Shared Function RemoveHtml(html As String) As String
        html = Regex.Replace(html, "<(.|\n)*?>", String.Empty)
        html = html.Replace(vbTab, " ")
        html = html.Replace(vbCrLf, String.Empty)
        html = html.Replace("   ", " ")
        Return html.Replace("  ", " ")
    End Function