NReadability Sample with VB.Net and C#


NReadability is a project/tool for removing clutter from HTML pages, much like Readability.js from arc90labs or how Instapaper is able to retrieve a web page, identify the main content and then provide a basic formatted document that's easy to read (and stripped down to the bare bones).

Here I'm going to provide a simple snippet that will show you how to read in content from a web page, get a UTF8 string from it and then write it out to a file on the hard drive (you should change the output file to somewhere that exists on your workstation). The encoding part is important for most web pages in order to render the string's characters correctly and not have certain characters return as garble. Anyway, here's the examples:

VB.Net

Using wc As New WebClient
    Dim html As Byte() = wc.DownloadData("http://en.wikipedia.org/wiki/.NET_Framework")
    Dim tc As New NReadability.NReadabilityTranscoder
    Dim ti As New NReadability.TranscodingInput(System.Text.Encoding.UTF8.GetString(html))
    Dim tcr As NReadability.TranscodingResult = tc.Transcode(ti)
    System.IO.File.WriteAllText("c:temptest.html", tcr.ExtractedContent, System.Text.Encoding.Unicode)
    System.Diagnostics.Process.Start("c:temptest.html")
End Using

C#

using (var wc = new WebClient())
{
    byte[] html = wc.DownloadData("http://en.wikipedia.org/wiki/.NET_Framework");
    NReadability.NReadabilityTranscoder tc = new NReadability.NReadabilityTranscoder();
    NReadability.TranscodingInput ti = new NReadability.TranscodingInput(System.Text.Encoding.UTF8.GetString(html));
    NReadability.TranscodingResult tcr = tc.Transcode(ti);
    System.IO.File.WriteAllText("c:\temp\test.html", tcr.ExtractedContent, System.Text.Encoding.Unicode);
    System.Diagnostics.Process.Start("c:\temp\test.html");
}

Leave a comment

Please note that we won't show your email to others, or use it for sending unwanted emails. We will only use it to render your Gravatar image and to validate you as a real person.