Rails HTML Sanitize gem

Saturday, January 17th, 2009

I was recently working on improving the search engine rankings of a site with lots of user generated content and noticed that users were creating 404s through bad links. The users were able to add links to other sites in their comments and such but sometimes the links were bad. Sometimes they were even local links so the search engines were effectively seeing a bunch of internal 404s from the user generated content. This was essentially defeating any seo being done elsewhere on the site and needed to be fixed quickly. My original idea was to use hpricot to scrub all the anchor tags and append a rel=”nofollow” tag to them all. I was mulling over how to write the hpricot parsing code when I found the Sanitize gem. It does exactly what I needed and saved me the hassle of writing the hpricot parsing code. The gist of it is:

Sanitize.clean(html, Sanitize::Config::BASIC)

As an added bonus, it also can scrub out unwanted script tags and more. Now, the site won’t be nicked for having internal 404s from the user generated content since they’ll all have rel=”nofollow” on them.