Archive for the 'SEO' Category

Google algorithm update

Saturday, November 3rd, 2007

For those of you paying attention to the latest changes Google has been making you may be wondering what’s been going on. Some have speculated that they’ve been cracking down on paid link exchanges for high page rank sites. That may be part of the change but it certainly isn’t all of it. Besides updating page ranks they’ve also modified the weight they give to external links. For a while they were giving huge weight to good internal linking structures so everyone starting having really good internal linking. Now they’ve shifted their focus again to increase the weight of external links to your site. This might not be a direct algorithm change but could be a side effect of a page rank update. I’m thinking that as the page ranks become stale the weights on external links diminishes because they are less certain of the reliability of those external links. When the do a page rank update those external links count more. A natural side effect of a page rank update. They count more because they just updated them so they’re much more valid than they will be four to five months from now prior to another pr update. So if your ranking dropped as a result of the latest changes (but PR stayed the same) then I suggest you focus on building some better external links into your site. If your ranking increased, don’t just sit pretty and smile at yourself because you’re getting more traffic. You need to start solidifying your position by creating more content and continue working on your internal linking strategy. If your ranking pretty much stayed the same then you need a magic 8 ball because I have no answers for you.

On a side note, from my analysis of current traffic and ranking on many different sites it appears as if Yahoo’s rankings have also been adjusted. This is less than a week after Google’s latest update. It seems odd to me that Yahoo rankings are adjusted along with Google’s. I’ve seen sites who’s page rank go up yet Google traffic go down while at the same time Yahoo traffic increases. Could this be some sort of link between Yahoo rankings and Google rankings? What purpose would Yahoo have in learning from Google ranks? Well, if they had a method to their madness then certainly they could combine learning from Google ranks with what they believe to be good and bad sites to improve their own algorithm. Search engines are always trying to improve and filter out spammy sites. Yahoo does a much better job than Google when it comes to this. Their index is slow but steady. Google freaks out over every minor change you make to your site. Put a page up and a day later you take it down? Google flips out and tosses you into the 404 trash bin of junk sites. Yahoo is much calmer and collective when it comes to their index. It takes longer to get into it but when you are in you don’t have to worry about Yahoo freaking out because your site has a 404 or two every now and then. Its one of the biggest things that upsets me about Google’s index. The web is dynamic not static. If I put a page up today and take it down tomorrow it doesn’t mean my site is junk or isn’t worthy of high rankings. It means I’m adaptable. Some things work. Some things don’t. That’s how you develop a good site. You find out what works. With Google index though, if you publish something you better be damn sure that you want it to be online forever or else.

Another note, the proper way to remove a page from the net without Google pissing on you is to first remove all links to the page. Wait a few weeks. Even though some people claim Google updates their index continuously it isn’t true. They cache pages and base their indexing off of those cached pages. You need to wait weeks before Google goes through and updates all the cached pages it has for your site. It may continuously update its index based on a tiny subset of its index (say your main page to get the latest blog) but if you remove a deeply linked page it will take a while for it to work its way out of the system. Next, you’ll want to submit a url removal request with Google using their webmaster tools. Finally, once you’ve waited long enough you can remove the page. This procedure is completely ridiculous by modern standards. Google has some serious catching up to do with other search engines in my opinion. Their indexing is inherently flawed and their results are littered with junk spammy sites. Oddly enough as far as search results go Yahoo has much more relevant results and has a much more reliable algorithm than Google could ever hope for. Google needs to take a lesson from Yahoo on search. They seem to have focused on everything but search since their IPO. Its time to get back to your roots Google and fix the problems with your search that we’ve known for some time. They’re lagging and leaving the door completely open for a rival to move in. Someone just needs a better algorithm and enough momentum.

Open Source Keyword Tracker

Tuesday, October 30th, 2007

I’ve reached my limit of frustration with current keyword trackers. The technology is simple enough that it baffles my mind why so many keyword rank trackers are for profit. There doesn’t seem to be a single decent instance of an open source keyword tracker out there that I could find. I want something open and that can run in Linux of course but my searches have left me empty handed.

I’ve started designing my own keyword tracker as a result. I will release it under the GPL because I like to keep it real like that. It will be a Rails application and I will host a version for people to use free of charge (with some limitations so it doesn’t kill my servers). Basically, you can extend the app by creating a Rails plugin for it for different search engines. I will just write one for Google for starters. Hopefully I can get some community support to get more search engines working for it. I’ve got the database mostly planned out and will be starting the project in the next week or two. I will make an instance of Trac to help the collaboration and issue tracking.

Basically, I’ll be creating something that will have multiple users. A user can login and enter a new site or track an existing site. Each site has a set of keywords which the app tracks over time. I want graphs of the keyword activity over time and I want the ability to import keywords and export the rank history. If anyone is interested in helping me out on this project just comment on this post to let me know and I’ll set you up with a Trac account so we can get started.

404 error checker and site crawler

Friday, October 12th, 2007

Google punishes sites heavily for 404 errors. By the time you realize your site has an error its usually too late and you’re already being punished for them. I suggest you stay proactive on your 404 errors and use a link checker. I found this extremely useful tool. Xenu’s Link Sleuth. It basically crawls your entire site for every single internal and external link. You can chose to ignore external links if you want as well and just focus on internal links. It even visits images and mailto and just about anything that has a ’src’ or ‘href’ in the html of your site. I considered it a nice toy when I first used it but when it quickly found numerous serious 404 issues on a few of my sites I upgraded the importance of this tool in my toolbox. This will keep you ahead of the curve instead of constantly playing catchup. Try it out on your site and I guarantee you’ll find 404s you had no idea existed. It shows you all the sites that have the bad links on them as well so you know where to go to correct the problem. Best of all this software is free! Its a Windows application unfortunately but any self respecting web developer has virtual machines with different operating systems on them, Windows being one of them, so that shouldn’t be much of a problem if you’re a serious developer.

The future of search

Tuesday, October 2nd, 2007

I’ve been ranting and raving lately about how Google’s search sucks. There are numerous reasons but lets just focus on relevancy of the results for this rant. Anyone using Google lately has seen the spammy websites that come up on search results. By spammy I mean those sites which are nothing more than screen scrapers, web directories, google adwords pages that use search results to generate static pages with scraped content mixed with more adwords, and on and on. Most of these junk sites have tons of Google adwords all over them and so of course why would Google care if they are ranked #1? They don’t and that’s precisely the problem.

I’m rambling on about Google because anyone with a profitable website knows that Google is your primary traffic driver (most of the time of course). Google used to weight external links to your site very heavily. As a result, people started creating link farms and easily getting around that. External links still count of course but more for going from one tier to the next in their ranking scheme. Yes, there are multiple tiers. Relate that to primary and secondary indexes and you’ll know what I mean. Since everyone realized how easy it was to fool Google with external links to your site they altered their algorithm ever so slightly over the years to make internal linking much more important. That’s why you see all these junk sites now a days. They’re is a very straight forward way to create a site with a good internal linking structure. Think of tags, relevant tags, and similar concepts along with your traditional site hierarchy type linking structure as the way to create a well connected internal linking structure. Google will eat this up and the junk sites that employ this sort of design are proof that they rank internal linking much higher than external links to your site.

Now what’s one to do about all this mess? People and business are always going to find ways to get ranked high in search engines. Its the name of the game in online commerce. As a result, there will always be junk sites like the millions that Google is indirectly creating (because their algorithm favors them). The solution that I see is a combination of ideas that are already present in their own forms in one way or another. Search results need to learn who I am and what I mean when I use certain word. For instance, searching for the word ‘rails’ might mean I’m looking for trains or it might mean I’m learning about Ruby on Rails. A good search engine of the future would learn from my search behaviors and somehow be able to pick the context out of the words I’m using. It needs to learn what sort of sites I favor over others. I hate Google adwords junk sites yet I get them all the time. This sort of site structure along with its abundant links to Google’s Javascript for adwords could easily be understood as something I would rather not see. Learning will be key to the future of search.

I mentioned understanding the context of my words without me providing context (rails). That implies that search engines will need to figure out some type of semantic meaning from pages other than just words and what words are near them. That’s a problem that some are already attempting to solve. Its a huge scalability problem though since parsing semantic meaning takes much longer than a simple dumb indexing of words like Google does. The future of search will definitely include semantic meaning whether it just be a more sophisticated word indexing that effectively achieves semantic understanding or one that truly parses out sentences for parts of speech and such. Combine that with a little machine learning and you have yourself a pretty good search.

Finally, some suggest that social bookmarking and rating sites such as Reddit are the future of search. I disagree. Mob rule is never good. However, if it were to create a hidden set of like minded individuals for me (based on who means what with their search terms) it could get a better understanding of who I am and what I mean when I say lisp. Then again, what happens when I’m a geek all my life and I suddenly have a kid who has a lisp. Will it always be up to the user to figure out how to find their results? Will businesses and individuals always be able to ruin search engines with junk sites that have figured out the algorithm? So far that’s the case. A little learning and a little semantic understanding should do the trick though.

Problems with non-english characters in urls

Thursday, September 27th, 2007

I have a site that used non-english characters in the url. They were basically characters in Spanish for the names of things. Some had little accent marks on them and such. Anyway, everything worked fine in my browser using those characters in the url. My browser sees a link with the strange character and it escapes it to something like %3d. Great. The problem though, is that if I change my default character encoding to say Traditional Chinese that same character gets escaped into something completely different like %8f. That’s no good because when they try to visit that url it doesn’t always go to the same page. Why? I’m not entirely sure but I suspect its Apache or Rails translating that url using a certain character encoding.

Logic would tell you that I should just put the character encoding in the html headers right? Yes, that works in theory. Everything works in theory though. In practice, not every browser or spider actually listens to that. I mentioned spiders because some spiders will automatically assume a particular character encoding and do the same thing as a browser with the default character encoding set. What to do? What to do?

My solution was to just get rid of all non english characters. No one is accidentally escaping an ‘a’ to %ef. So far the solution is working out fine. I don’t entirely like the urls now but its better than having characters being escaped improperly by browsers and spiders.

My beef with the Google god

Friday, September 7th, 2007

Google called me out of the blue the other day asking if I wanted a job. It sounded like a good idea at first so I followed through with my updated resume and such. At some point they said they had 3 separate positions for me and that I should try to get in their core team first. I had a brief interview with the core team where they judged my qualifications based on 3 questions. I don’t remember what they were but I answered them all wrong. Well, the second one I didn’t even try and just said I don’t know because I was pissed that they were giving me a pop quiz and I got the first one wrong. No googling. After that interview it took less than 1 minute to find the answers. I never responded to the other requests from them because I don’t really want to work for a company who is so full of themselves that they honestly think that pop quizes are the best way to weed people out. Pass. Good luck though google. Now, onto the real meat of this post.

Google’s index seems like its continuously updated. That’s great for search if there were so many junk results.

  • Internal linking - Google loves internal linking which is one reason why there are so many junk results in their search. From my experience, tossing in link dumps all over your site actually helps it do better. The result? Everyone link dumps and gets better rankings so you get a bunch of crappy search results.
  • Sensitivity - For get search for now, lets focus on Google from a developer’s perspective. Google continuously updates their index. Great. What does that mean for a developer? It means that if you forget an apostrophe on an anchor tag you end up with dozens of 404s. No big deal if you catch the mistake early right? Wrong. What happens is the missing apostrophe bleeds the link on to whatever follows causing invalid links. If the Google god happens to see your mistake they will try adding those invalid links to their index. They won’t be valid so you will be penalized for having 404s on your site. That’s a sure fire way to see your rankings drop off the map for some ridiculous mistake that was corrected a few hours after it was made.
  • Poor tools - Luckily Google provides you with a way to remove invalid links from their index but good luck using that thing. Lets say that one mistake created 50 404s across your site. You have to copy and paste each 404 url to the url removal form, one at a time. Not only that but you have to remove the domain name from the pasted version. So its copy paste edit, copy paste edit, 50 times in a row. Yay! Or you can copy paste them to a text editor and global replace the domain and then copy paste them into the removal form 1 by 1.
  • Poor responsiveness - Ok so what? At least they provide a way to remove your urls from their index instead of waiting around for weeks right? Well, kinda of. Its not the same continuous updating that they do themselves. They’ll eventually listen to your request but only on their time. When they’re good and ready. I’ve had a request pending removal for over 2 weeks. That’s 2 weeks of being penalized for a missing apostrophe that was only live for less than 1 hour. Way to go Google.

This wouldn’t be complete without some suggestions. First, get a clue about hiring good developers. You’re going to eventually end up with a crap gene pool like Microsoft and Yahoo and be usurped by my new search engine. Next, don’t be so harsh on the occational 404 and at least provide a quick way to remove them. Your index is updated continuously, if you want feedback then use it. Don’t sit on my feedback for weeks. Next, give your web developer tools some love. They’re so primitive with little thought put in to usability. Also, internal linking shouldn’t count for nearly as much as you give credit. Look at the sites coming out these days. Huge link dumps that people pass right over. You’re forcing the creation of millions of junk sites on the internet. That’s not a good thing. And finally, your search obviously uses some type of machine learning and appears to be in a rut. People have your search figured out and are taking advantage of that. You need a smarter machine learning algorithm.

Computational justification for the use of meta descriptions and keywords

Sunday, August 5th, 2007

Search engines have a lot of work to do crawling the web constantly. There must be a lot of computational power required to constantly parse html pages and grant rankings for the enormous number of sites now out there. As such, it makes perfect sense for a search engine to want to speed up that process in any way it can. The use of meta descriptions and meta keywords help search engines speed up their algorithms by not having to parse your entire page. It just has to read the header information and it can move on.

The problem is that people realize this so they do a bit of keyword stuffing to try and give them a boost. Search engines don’t simply ignore your page when you use keywords and descriptions. They just don’t parse the entire page as often if you’re meta keywords and meta descriptions match the content on your page. If they don’t match, of course your site will require more processing because they have to parse the entire page and not just trust your keywords and descriptions.

The use of meta tags saves search engines tons of time. Since you do them a favor, they do you a favor and you get higher rankings.

Brian's slick Wordpress titles for pseudo-indented Google search results

Thursday, June 7th, 2007

There are plenty of people talking about how you can make your Wordpress blog titles more SEO friendly. No one I’ve found however has mentioned what I’ve stumbled upon by accident. Pseudo-indented listings in Google search results. What I’ve started doing on my blogs is making my title like this:


<title>
<?php if ( is_single() ) { ?> &raquo; <? } ?> <?php wp_title(''); ?> <?php if ( is_single() ) { ?> &raquo; < ? } ?> < ?php bloginfo('name'); ?>
</title>

This adds a » in front of your post title. The result is that when the search results show up in Google your links look like it stands out more because of the » in the title. It almost makes it look like your site is more official and Google is giving you a little arrow in front of your link to prove it. It won’t effect your rankings as far as I know. Its more of a psychological advantage than anything else but I have no scientific data to back up that claim. Take a look at the title of this post to see an example title.

Google update this weekend

Friday, January 12th, 2007

For some strange reason, I have this funny feeling that Google will be doing some updating this weekend. If you have an Adwords account, you may have noticed the message that some services may be unavailable on January 13. It’s been a while since the last PR update so we’re about due for something from Google. All of you who constantly watch your rankings, pay close attention in the next few weeks if its not this weekend. Its coming soon.

Google SEO, Adsense, and the junk it creates

Wednesday, January 3rd, 2007

The internet is being filled with junk. Sure, there have always been a lot of useless pages and sites out there but there are new trends happening and we have only Google to blame. Google ranks sites based on how many links come into a site (a simplified explanation I admit). The result? The internet now has millions more web directories than it did. People in the quest for better rankings visit these sites or hire some SEO people to do it for them. There are even people promoting the addition to a large number of these web directories at once. Yay! Lets be honest. Who visits these random web directories? Sure someone may stumble upon one from a search engine and happen to click through it but its purpose and usefulness is extremely limited. I consider them junk sites that clog up my search results on occasion and make the quest for rankings as meaningless as the millions of web directories themselves. Lets fill the web with decent content like this site instead of useless junk like web directories shall we?

Another trend of junk sites are sites that just contain a bunch of Google Adsense links. These sites are everywhere these days and are always coming up in my search results on Google. These sites are even more useless than the web directories. They fill up the internet with ads that and screw up rankings and search results for everyone. If you’ve ever used Adsense and tried out the content networks you’ll see how pointless they are. Sure you’ll see a huge amount of impressions and maybe even a few click-throughs. If you actually take a closer look at that traffic and where its coming from you’ll see that it comes from these pages with just Adsense crap on them. Really good ad targeting there. Where do I sign up? I wonder if I had a few thousand Windows zombies out there if I could make easy money by having them all click-through my ads whenever their ips changed. Maybe I could just use Tor and save myself some time. I place the value of a content click-through at negative or at the most 0. I think they are just a waste of advertising money.

Which is why I have Google Adsense running at the top now? I’m not adverse to advertisements on legitimate sites that have content. Sites that are filled up with ads are not legitimate content in my opinion. Its rubbish. I think the content network is a good thing and should be more targeted to the site’s content but that’ll only work if Google could somehow get rid of all the junk sites.