<a href="/index.cgi?p=blog;tag=General"> as opposed to <a href="/index.cgi?p=blog&tag=General">The reasoning behind it was something along these lines:
&
" characters should be fully typed out as "&
", because HTML 4.01 no longer allows a single & without any kind of escape sequence following it.
So, http://www.cuvou.com/?p=blog;id=36
looks right in the Google search results, but after it gets chewed up with Google's outgoing statistic gathering and finally accessed by the browser, the latter part of that request comes to my site looking more like this: /?p=blog%3Bid=36
. CGI.pm has no idea what to make of this and it can't be blamed. I've tried substituting it in $ENV{QUERY_STRING}
before CGI.pm can get its hands on it, but it doesn't help.
So effectively the user is greeted with a "Forbidden" page of mine, which was fired because the value of "p=" contains some invalid character (notably, that % symbol there).
So there's a conundrum here: semicolons as delimiters works as far as CGI is concerned, and it perfectly validates as HTML 4.01 Strict, and you don't need to write "&" all the time inside your internal site links. I mean seriously, how ugly is this HTML code?
<a href="/index.cgi?p=blog&id=36">It validates, it works as expected provided you're using it "properly", however it breaks your links in Google and possibly other search engines, at least in Firefox.
For my CMS, none of my links are "properly" written to begin with. They're like <a href="$link:blog;id=36">
which is translated on-the-fly, so it was fairly trivial to change the code to fix these things on the way out the door.
For the W3C's HTML validator, my links are translated to include the full and proper &
text. It's ugly and I'm only glad I don't have to write the links like that directly; my Perl code does it for me.
The other half of the dirty hack is to detect when a troublesome URL has been linked to: particularly if %3B
is found. If so, the CGI fixes the query string and sends an HTTP 301 redirect to the proper version of the URL, using the real semicolons (I could replace them with &'s here, but, why? The CGI module takes care of it anyway ;-) ).
I'll have to investigate what other web developers do with their query string delimiters...
There is 1 comment on this page. Add yours.
Using a semicolon as a delimiter has been recommended by the W3C since 1994 (see http://www.ietf.org/rfc/rfc1738.txt ). It's insane for Google to escape it.
There's an open trouble ticket at Google at http://www.google.com/support/forum/p/Webmasters/thread?tid=661e8286964e2195&hl=en .
0.0116s
.