I’ve been ranting and raving lately about how Google’s search sucks. There are numerous reasons but lets just focus on relevancy of the results for this rant. Anyone using Google lately has seen the spammy websites that come up on search results. By spammy I mean those sites which are nothing more than screen scrapers, web directories, google adwords pages that use search results to generate static pages with scraped content mixed with more adwords, and on and on. Most of these junk sites have tons of Google adwords all over them and so of course why would Google care if they are ranked #1? They don’t and that’s precisely the problem.
I’m rambling on about Google because anyone with a profitable website knows that Google is your primary traffic driver (most of the time of course). Google used to weight external links to your site very heavily. As a result, people started creating link farms and easily getting around that. External links still count of course but more for going from one tier to the next in their ranking scheme. Yes, there are multiple tiers. Relate that to primary and secondary indexes and you’ll know what I mean. Since everyone realized how easy it was to fool Google with external links to your site they altered their algorithm ever so slightly over the years to make internal linking much more important. That’s why you see all these junk sites now a days. They’re is a very straight forward way to create a site with a good internal linking structure. Think of tags, relevant tags, and similar concepts along with your traditional site hierarchy type linking structure as the way to create a well connected internal linking structure. Google will eat this up and the junk sites that employ this sort of design are proof that they rank internal linking much higher than external links to your site.
Now what’s one to do about all this mess? People and business are always going to find ways to get ranked high in search engines. Its the name of the game in online commerce. As a result, there will always be junk sites like the millions that Google is indirectly creating (because their algorithm favors them). The solution that I see is a combination of ideas that are already present in their own forms in one way or another. Search results need to learn who I am and what I mean when I use certain word. For instance, searching for the word ‘rails’ might mean I’m looking for trains or it might mean I’m learning about Ruby on Rails. A good search engine of the future would learn from my search behaviors and somehow be able to pick the context out of the words I’m using. It needs to learn what sort of sites I favor over others. I hate Google adwords junk sites yet I get them all the time. This sort of site structure along with its abundant links to Google’s Javascript for adwords could easily be understood as something I would rather not see. Learning will be key to the future of search.
I mentioned understanding the context of my words without me providing context (rails). That implies that search engines will need to figure out some type of semantic meaning from pages other than just words and what words are near them. That’s a problem that some are already attempting to solve. Its a huge scalability problem though since parsing semantic meaning takes much longer than a simple dumb indexing of words like Google does. The future of search will definitely include semantic meaning whether it just be a more sophisticated word indexing that effectively achieves semantic understanding or one that truly parses out sentences for parts of speech and such. Combine that with a little machine learning and you have yourself a pretty good search.
Finally, some suggest that social bookmarking and rating sites such as Reddit are the future of search. I disagree. Mob rule is never good. However, if it were to create a hidden set of like minded individuals for me (based on who means what with their search terms) it could get a better understanding of who I am and what I mean when I say lisp. Then again, what happens when I’m a geek all my life and I suddenly have a kid who has a lisp. Will it always be up to the user to figure out how to find their results? Will businesses and individuals always be able to ruin search engines with junk sites that have figured out the algorithm? So far that’s the case. A little learning and a little semantic understanding should do the trick though.