I’ve been ranting and raving lately about how Google’s search sucks. There are numerous reasons but lets just focus on relevancy of the results for this rant. Anyone using Google lately has seen the spammy websites that come up on search results. By spammy I mean those sites which are nothing more than screen scrapers, web directories, google adwords pages that use search results to generate static pages with scraped content mixed with more adwords, and on and on. Most of these junk sites have tons of Google adwords all over them and so of course why would Google care if they are ranked #1? They don’t and that’s precisely the problem.
I’m rambling on about Google because anyone with a profitable website knows that Google is your primary traffic driver (most of the time of course). Google used to weight external links to your site very heavily. As a result, people started creating link farms and easily getting around that. External links still count of course but more for going from one tier to the next in their ranking scheme. Yes, there are multiple tiers. Relate that to primary and secondary indexes and you’ll know what I mean. Since everyone realized how easy it was to fool Google with external links to your site they altered their algorithm ever so slightly over the years to make internal linking much more important. That’s why you see all these junk sites now a days. They’re is a very straight forward way to create a site with a good internal linking structure. Think of tags, relevant tags, and similar concepts along with your traditional site hierarchy type linking structure as the way to create a well connected internal linking structure. Google will eat this up and the junk sites that employ this sort of design are proof that they rank internal linking much higher than external links to your site.
I mentioned understanding the context of my words without me providing context (rails). That implies that search engines will need to figure out some type of semantic meaning from pages other than just words and what words are near them. That’s a problem that some are already attempting to solve. Its a huge scalability problem though since parsing semantic meaning takes much longer than a simple dumb indexing of words like Google does. The future of search will definitely include semantic meaning whether it just be a more sophisticated word indexing that effectively achieves semantic understanding or one that truly parses out sentences for parts of speech and such. Combine that with a little machine learning and you have yourself a pretty good search.
Finally, some suggest that social bookmarking and rating sites such as Reddit are the future of search. I disagree. Mob rule is never good. However, if it were to create a hidden set of like minded individuals for me (based on who means what with their search terms) it could get a better understanding of who I am and what I mean when I say lisp. Then again, what happens when I’m a geek all my life and I suddenly have a kid who has a lisp. Will it always be up to the user to figure out how to find their results? Will businesses and individuals always be able to ruin search engines with junk sites that have figured out the algorithm? So far that’s the case. A little learning and a little semantic understanding should do the trick though.