Archive for September, 2007

Problems with non-english characters in urls

Thursday, September 27th, 2007

I have a site that used non-english characters in the url. They were basically characters in Spanish for the names of things. Some had little accent marks on them and such. Anyway, everything worked fine in my browser using those characters in the url. My browser sees a link with the strange character and it escapes it to something like %3d. Great. The problem though, is that if I change my default character encoding to say Traditional Chinese that same character gets escaped into something completely different like %8f. That’s no good because when they try to visit that url it doesn’t always go to the same page. Why? I’m not entirely sure but I suspect its Apache or Rails translating that url using a certain character encoding.

Logic would tell you that I should just put the character encoding in the html headers right? Yes, that works in theory. Everything works in theory though. In practice, not every browser or spider actually listens to that. I mentioned spiders because some spiders will automatically assume a particular character encoding and do the same thing as a browser with the default character encoding set. What to do? What to do?

My solution was to just get rid of all non english characters. No one is accidentally escaping an ‘a’ to %ef. So far the solution is working out fine. I don’t entirely like the urls now but its better than having characters being escaped improperly by browsers and spiders.

one-to-many associations made easy with ActiveScaffold

Friday, September 7th, 2007

I just dove in and started using ActiveScaffold for a new project. There was a little learning curve since I was doing that along with using RESTful Rails. I just started with a simple 1-many association. I setup my 2 models as usual with has_many :ads and belongs_to :affiliate. Then I created two controllers that just had something like:

[source language=":ruby"]
class AdController < ApplicationController
active_scaffold
layout ‘main’
end

class AffiliateController < ApplicationController
active_scaffold
layout ‘main’
end
[/source]

And finally added this to my routes:

[source language=":ruby"]
map.resources :ad, :active_scaffold => true
map.resources :affiliate, :active_scaffold => true
[/source]

When I went to http://localhost:3000/affiliate I was just amazed to see it actually worked. It let me create my affiliate and add ads to that affilaite on the fly. All my crud operations already done without having to manually link them in the controller like I had been doing in previous projects. I’m not sure how well its going to scale with the project in the long run but it certainly is an improvement over the traditional Rails scaffolding and I highly recommend giving it a try.

My beef with the Google god

Friday, September 7th, 2007

Google called me out of the blue the other day asking if I wanted a job. It sounded like a good idea at first so I followed through with my updated resume and such. At some point they said they had 3 separate positions for me and that I should try to get in their core team first. I had a brief interview with the core team where they judged my qualifications based on 3 questions. I don’t remember what they were but I answered them all wrong. Well, the second one I didn’t even try and just said I don’t know because I was pissed that they were giving me a pop quiz and I got the first one wrong. No googling. After that interview it took less than 1 minute to find the answers. I never responded to the other requests from them because I don’t really want to work for a company who is so full of themselves that they honestly think that pop quizes are the best way to weed people out. Pass. Good luck though google. Now, onto the real meat of this post.

Google’s index seems like its continuously updated. That’s great for search if there were so many junk results.

  • Internal linking – Google loves internal linking which is one reason why there are so many junk results in their search. From my experience, tossing in link dumps all over your site actually helps it do better. The result? Everyone link dumps and gets better rankings so you get a bunch of crappy search results.
  • Sensitivity – For get search for now, lets focus on Google from a developer’s perspective. Google continuously updates their index. Great. What does that mean for a developer? It means that if you forget an apostrophe on an anchor tag you end up with dozens of 404s. No big deal if you catch the mistake early right? Wrong. What happens is the missing apostrophe bleeds the link on to whatever follows causing invalid links. If the Google god happens to see your mistake they will try adding those invalid links to their index. They won’t be valid so you will be penalized for having 404s on your site. That’s a sure fire way to see your rankings drop off the map for some ridiculous mistake that was corrected a few hours after it was made.
  • Poor tools – Luckily Google provides you with a way to remove invalid links from their index but good luck using that thing. Lets say that one mistake created 50 404s across your site. You have to copy and paste each 404 url to the url removal form, one at a time. Not only that but you have to remove the domain name from the pasted version. So its copy paste edit, copy paste edit, 50 times in a row. Yay! Or you can copy paste them to a text editor and global replace the domain and then copy paste them into the removal form 1 by 1.
  • Poor responsiveness – Ok so what? At least they provide a way to remove your urls from their index instead of waiting around for weeks right? Well, kinda of. Its not the same continuous updating that they do themselves. They’ll eventually listen to your request but only on their time. When they’re good and ready. I’ve had a request pending removal for over 2 weeks. That’s 2 weeks of being penalized for a missing apostrophe that was only live for less than 1 hour. Way to go Google.

This wouldn’t be complete without some suggestions. First, get a clue about hiring good developers. You’re going to eventually end up with a crap gene pool like Microsoft and Yahoo and be usurped by my new search engine. Next, don’t be so harsh on the occational 404 and at least provide a quick way to remove them. Your index is updated continuously, if you want feedback then use it. Don’t sit on my feedback for weeks. Next, give your web developer tools some love. They’re so primitive with little thought put in to usability. Also, internal linking shouldn’t count for nearly as much as you give credit. Look at the sites coming out these days. Huge link dumps that people pass right over. You’re forcing the creation of millions of junk sites on the internet. That’s not a good thing. And finally, your search obviously uses some type of machine learning and appears to be in a rut. People have your search figured out and are taking advantage of that. You need a smarter machine learning algorithm.