It’s just data

Making the Web Safe for application/xhtml+xml

Rami Kayyali: Funny how Google (a leader in the Web space) doesn’t recognize Intertwingly’s (a leader in Web standards) Content-Type.

If you check again today, you will see that this situation is changing.  As Google re-crawls my site, it is starting to recognize the content type.


What changed, you or Google?

Posted by Mark at

I think it’s Google. Sam’s still serving application/xhtml+xml.

Posted by Rami Kayyali at

Simply google have a brunch of various bots to crawl the web, each one for straight work, so it doesnt seems to be  the “work on mistakes”.

Posted by Mika at

I think it’s Google.

Me too.

Posted by Sam Ruby at

Are you sure you’re not serving text/html to Google, like you do for IE?

Posted by Lachlan Hunt at

What Accept: header does Google’s bot send?
Does it explicitly mention application/xhtml+xml? (*/* is, almost by definition, a lie.

Posted by Jacques Distler at

Are you sure you’re not serving text/html to Google, like you do for IE?

Unless they changed their accept header recently, yes, I’m sure.

What Accept: header does Google’s bot send?

*/*

Posted by Sam Ruby at

The fact hat Google is a leader in the web space doesn’t really mean squat when it comes to conformity, validity and following standards. Just try to validate Google.com. Or Google Code. Just look at the source code of GMail. It’s not obvious to me that they pay a lot of attention to web standards, but I might of course be missing something.

Posted by Asbjørn Ulsberg at

This would explain why feeds are showing up in search results.

Posted by James at

So a slightly related question: I just noticed that Google Reader doesn’t seem to like relative links at the top of your atom feed hence your blog home page isn’t linked.

That must be a recent regression on their part or did you change your feed recently - Bloglines seems to be ok

<link href="."/>

Our job is never done.

Posted by koranteng Ofosu-Amaah at

“Our job is never done”

Right. it just becomes less unfinished.

Posted by roberthahn at

Since your content-type is a matter of discussion again, I should take this opportunity to mention something which has been a minor annoyance for a while.

In IE6, if I follow a link to your blog, everything works just fine. 

But if I open a link to your blog in a new window (like by shift-clicking, or by normal-clicking a link within GMail), I get a File Download dialog which says, “Do you want to save this file?” and shows the file type as “Unknown File Type, 10.5 KB”.

Posted by Kevin H at

But if I open a link to your blog in a new window (like by shift-clicking, or by normal-clicking a link within GMail), I get a File Download dialog

Can you experiment a bit with test.cgi and tell me what is different?

Posted by Sam Ruby at

When following that link, I see:

HTTP_ACCEPT image/gif, image/x-xbitmap, image/jpeg, image/pjpeg, application/x-shockwave-flash, application/xaml+xml, application/vnd.ms-xpsdocument, application/x-ms-xbap, application/x-ms-application, application/vnd.ms-excel, application/vnd.ms-powerpoint, application/msword, */*

When shift-clicking that link, I see:

HTTP_ACCEPT */*

There are also changes in the REMOTE_PORT and UNIQUE_ID fields, but those are to be expected.

I don’t think this is affecting the outcome in this case, but I should mention that we have an ISA proxy that all our outbound connections run through.

HTTP_VIA 1.1 SVGMSISA

Lastly, just FYI:

HTTP_USER_AGENT Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.1.4322; .NET CLR 2.0.50727; .NET CLR 3.0.04506.30; InfoPath.2)



Posted by Kevin H at

One more thing I just noticed:

If I follow your link to test.cgi, and then REFRESH that page, the variables change.  On refresh I get

HTTP_ACCEPT */*

and I see a new variable as well

HTTP_PRAGMA no-cache

Refreshing when viewing your blog does not cause the Unknown File Type dialog box to appear, however.

Posted by Kevin H at

Sam Ruby: Making the Web Safe for application/xhtml+xml

I think it’s Google. Sam’s still serving application/xhtml+xml....

Excerpt from del.icio.us/rami/comments at

For the record, I also get HTTP_ACCEPT */* when opening the link in a new tab (ctrl-click) or a new window (shift-click) with IE 7.0.6000.16473 on Vista. More importantly, I’m seeing the same thing testing on a local server while watching with a packet sniffer. It looks like a client problem to me.

Posted by James Holderness at

Does it really surprise anyone that there might be bugs in Internet Explorer’s (any version) content sniffing algorithm? Why would anyone tech savvy enonugh to read this blog be using that browser anyway?

Posted by Asbjørn Ulsberg at

[speaking as a Google employee]

I spoke with Matt the last time this issue arose, and we did some research internally to see what the impact would be (besides Sam’s site).  He thinks we may have made a change that would result in what you’re seeing, but we don’t have definite confirmation yet.

Also, we’re working on the “feeds show up in web results” problem.

That’s all I know.

Posted by Mark at

“Why would anyone tech savvy enonugh to read this blog be using that browser anyway?”

Arrogance at its finest — even when couched in supposed objectivity.

Posted by Dilip at

“The fact that Google is a leader in the web space doesn’t really mean squat when it comes to conformity, validity and following standards.”

How true! I’ve mentioned this at other locations in the past... What’s scary is that it also sounds just like M$!

btw: nice to know I’m tech-savvy :-)

Posted by BillyG at

[from mlinksva] Sam Ruby: Making the Web Safe for application/xhtml+xml

[link]...

Excerpt from del.icio.us/network/gojomo at

I forwarded Mark 2-3 internal email threads where we’d been discussing it. Google did make a change for xhtml+xml data, and I hope things are better. If not, post a query that shows what’s still bad and explain it so that my non-native-to-XML brain can understand, and we’ll get the ball rolling again. :)

Posted by Matt Cutts at

And will application/xhtml+xml content rank as well as if it were text/html content?

Posted by Ruben Verborgh at

Add your comment