Archive for October, 2007

Open Source Keyword Tracker

Tuesday, October 30th, 2007

I’ve reached my limit of frustration with current keyword trackers. The technology is simple enough that it baffles my mind why so many keyword rank trackers are for profit. There doesn’t seem to be a single decent instance of an open source keyword tracker out there that I could find. I want something open and that can run in Linux of course but my searches have left me empty handed.

I’ve started designing my own keyword tracker as a result. I will release it under the GPL because I like to keep it real like that. It will be a Rails application and I will host a version for people to use free of charge (with some limitations so it doesn’t kill my servers). Basically, you can extend the app by creating a Rails plugin for it for different search engines. I will just write one for Google for starters. Hopefully I can get some community support to get more search engines working for it. I’ve got the database mostly planned out and will be starting the project in the next week or two. I will make an instance of Trac to help the collaboration and issue tracking.

Basically, I’ll be creating something that will have multiple users. A user can login and enter a new site or track an existing site. Each site has a set of keywords which the app tracks over time. I want graphs of the keyword activity over time and I want the ability to import keywords and export the rank history. If anyone is interested in helping me out on this project just comment on this post to let me know and I’ll set you up with a Trac account so we can get started.

404 error checker and site crawler

Friday, October 12th, 2007

Google punishes sites heavily for 404 errors. By the time you realize your site has an error its usually too late and you’re already being punished for them. I suggest you stay proactive on your 404 errors and use a link checker. I found this extremely useful tool. Xenu’s Link Sleuth. It basically crawls your entire site for every single internal and external link. You can chose to ignore external links if you want as well and just focus on internal links. It even visits images and mailto and just about anything that has a ’src’ or ‘href’ in the html of your site. I considered it a nice toy when I first used it but when it quickly found numerous serious 404 issues on a few of my sites I upgraded the importance of this tool in my toolbox. This will keep you ahead of the curve instead of constantly playing catchup. Try it out on your site and I guarantee you’ll find 404s you had no idea existed. It shows you all the sites that have the bad links on them as well so you know where to go to correct the problem. Best of all this software is free! Its a Windows application unfortunately but any self respecting web developer has virtual machines with different operating systems on them, Windows being one of them, so that shouldn’t be much of a problem if you’re a serious developer.

Rails form select integer drop-down helper method

Friday, October 12th, 2007

I’ve often come across situations while developing Rails apps where I just want a simple integer drop-down box. The default Rails helpers for selects and its options aren’t really geared for something simple like that. I don’t want to have to create a collection of integers and pass them into blocks or any other ridiculous workaround in my views. I want them clean and simple. I created a helper function which allows you to easily create integer drop downs. Just toss this in your application_helper.rb.

And in your view simply call:

And you have yourself an integer dropdown from 1-20. I tried to make the options and select formatting and id/name conventions the same as the rest of the Rails select/option helper methods to keep things consistent.

The future of search

Tuesday, October 2nd, 2007

I’ve been ranting and raving lately about how Google’s search sucks. There are numerous reasons but lets just focus on relevancy of the results for this rant. Anyone using Google lately has seen the spammy websites that come up on search results. By spammy I mean those sites which are nothing more than screen scrapers, web directories, google adwords pages that use search results to generate static pages with scraped content mixed with more adwords, and on and on. Most of these junk sites have tons of Google adwords all over them and so of course why would Google care if they are ranked #1? They don’t and that’s precisely the problem.

I’m rambling on about Google because anyone with a profitable website knows that Google is your primary traffic driver (most of the time of course). Google used to weight external links to your site very heavily. As a result, people started creating link farms and easily getting around that. External links still count of course but more for going from one tier to the next in their ranking scheme. Yes, there are multiple tiers. Relate that to primary and secondary indexes and you’ll know what I mean. Since everyone realized how easy it was to fool Google with external links to your site they altered their algorithm ever so slightly over the years to make internal linking much more important. That’s why you see all these junk sites now a days. They’re is a very straight forward way to create a site with a good internal linking structure. Think of tags, relevant tags, and similar concepts along with your traditional site hierarchy type linking structure as the way to create a well connected internal linking structure. Google will eat this up and the junk sites that employ this sort of design are proof that they rank internal linking much higher than external links to your site.

Now what’s one to do about all this mess? People and business are always going to find ways to get ranked high in search engines. Its the name of the game in online commerce. As a result, there will always be junk sites like the millions that Google is indirectly creating (because their algorithm favors them). The solution that I see is a combination of ideas that are already present in their own forms in one way or another. Search results need to learn who I am and what I mean when I use certain word. For instance, searching for the word ‘rails’ might mean I’m looking for trains or it might mean I’m learning about Ruby on Rails. A good search engine of the future would learn from my search behaviors and somehow be able to pick the context out of the words I’m using. It needs to learn what sort of sites I favor over others. I hate Google adwords junk sites yet I get them all the time. This sort of site structure along with its abundant links to Google’s Javascript for adwords could easily be understood as something I would rather not see. Learning will be key to the future of search.

I mentioned understanding the context of my words without me providing context (rails). That implies that search engines will need to figure out some type of semantic meaning from pages other than just words and what words are near them. That’s a problem that some are already attempting to solve. Its a huge scalability problem though since parsing semantic meaning takes much longer than a simple dumb indexing of words like Google does. The future of search will definitely include semantic meaning whether it just be a more sophisticated word indexing that effectively achieves semantic understanding or one that truly parses out sentences for parts of speech and such. Combine that with a little machine learning and you have yourself a pretty good search.

Finally, some suggest that social bookmarking and rating sites such as Reddit are the future of search. I disagree. Mob rule is never good. However, if it were to create a hidden set of like minded individuals for me (based on who means what with their search terms) it could get a better understanding of who I am and what I mean when I say lisp. Then again, what happens when I’m a geek all my life and I suddenly have a kid who has a lisp. Will it always be up to the user to figure out how to find their results? Will businesses and individuals always be able to ruin search engines with junk sites that have figured out the algorithm? So far that’s the case. A little learning and a little semantic understanding should do the trick though.