Remove HTML from a string (VB.Net & C#)


Here is a simple extension method (provided in both VB and C#) that uses a regular expression to remove HTML from a string.  This has worked well in every case I’ve used it in so far.

C#

/// <summary>
/// Removes HTML from a string. 
/// </summary>
public static string RemoveHtml(string html)
{
    html = Regex.Replace(html, "<(.|\\n)*?>", string.Empty);
    html = html.Replace("\t", " ");
    html = html.Replace("\r\n", string.Empty);
    html = html.Replace("   ", " ");
    return html.Replace("  ", " ");
}

VB.Net

    ''' <summary>
    ''' Removes HTML from a string. 
    ''' </summary>
    <Extension()> _
    Public Shared Function RemoveHtml(html As String) As String
        html = Regex.Replace(html, "<(.|\n)*?>", String.Empty)
        html = html.Replace(vbTab, " ")
        html = html.Replace(vbCrLf, String.Empty)
        html = html.Replace("   ", " ")
        Return html.Replace("  ", " ")
    End Function

Leave a comment

Please note that we won't show your email to others, or use it for sending unwanted emails. We will only use it to render your Gravatar image and to validate you as a real person.