Wednesday, February 20, 2008
Tuesday, February 19, 2008
It begins...
I used to be an avid fan of automated web crawlers. I'd use them for all sorts of things, from finding Torrents online (for, *ahem*, legal downloads like OpenOffice! Yeah! Ok, ok - let's just leave it at looking for torrents and not talk about what for), anonymous proxies (because when you're looking for torrents, you don't want half the script kiddies in the world portscanning your IP five minutes after hitting their site), and even more recently for marketing research (keyword placement, adwords info, etc - anyone even remotely into online marketing knows the drill).
I've kinda been "out of it", though, for the past month, while I was promoting an eBook that I just released.
So I was surprised yesterday when one of my old automated searching tools, which I'd just gotten back to for the first time in 2008, just chugged along and got absolutely nowhere.
Odd.
So I started looking into it. And noticed that all of my searches were getting a 302 response and getting redirected to a page at sorry.google.com with a captcha. Hmm.
My immediate response was, "Cute idea, Google, but why?" I mean has abuse really been that bad on Google that Google blocks everyone with captchas these days? I mean blocking a search with a captcha is a bit rediculous - and after using the automated tool for 5 minutes, Google was now blocking many of my normal browser searches with a captcha too! Talk about the end of the information age.
Being a programmer, I thought "There's got to be an intelligent way to get around this without compromising what Google's obviously trying to block - completely automated 24x7 unmonitored bots." And there is, I think. My thought is that completly banning bots is rediculous. I mean, where would Google be if we all played turnabout and blocked THEIR automated bot, the GoogleBot? I'll tell you where they'd be: bankrupcy.
At the very least, they should let monitored bots run. So my thought was to make some sort of proxy server which would intercept captcha requests and let the user unlock the captcha to unblock google. That way the bots could keep crunching away, Google knows there's someone real there, and everyone's happy.
I decided to start this blog because I think this tool could be useful for many people, and I want to get feedback from the community on this. I'm just getting started and not sure where this is going to go, yet.
I've kinda been "out of it", though, for the past month, while I was promoting an eBook that I just released.
So I was surprised yesterday when one of my old automated searching tools, which I'd just gotten back to for the first time in 2008, just chugged along and got absolutely nowhere.
Odd.
So I started looking into it. And noticed that all of my searches were getting a 302 response and getting redirected to a page at sorry.google.com with a captcha. Hmm.
My immediate response was, "Cute idea, Google, but why?" I mean has abuse really been that bad on Google that Google blocks everyone with captchas these days? I mean blocking a search with a captcha is a bit rediculous - and after using the automated tool for 5 minutes, Google was now blocking many of my normal browser searches with a captcha too! Talk about the end of the information age.
Being a programmer, I thought "There's got to be an intelligent way to get around this without compromising what Google's obviously trying to block - completely automated 24x7 unmonitored bots." And there is, I think. My thought is that completly banning bots is rediculous. I mean, where would Google be if we all played turnabout and blocked THEIR automated bot, the GoogleBot? I'll tell you where they'd be: bankrupcy.
At the very least, they should let monitored bots run. So my thought was to make some sort of proxy server which would intercept captcha requests and let the user unlock the captcha to unblock google. That way the bots could keep crunching away, Google knows there's someone real there, and everyone's happy.
I decided to start this blog because I think this tool could be useful for many people, and I want to get feedback from the community on this. I'm just getting started and not sure where this is going to go, yet.
Subscribe to:
Comments (Atom)
