google crawlers - What happens when GET robots.txt returns an unrelated html file? -

- January 15, 2011

i have web server capable of serving assets of various web apps. when requested asset doesn't exist, sends index.html. in other words:

get /img/exists.png -> exists.png
get /img/inexistent.png -> index.html

this means that:

get /robots.txt -> index.html

how google (and other) crawlers handle this? detect robots.txt invalid , ignore (same returning 404)? or penalize ranking serving invalid robots.txt? acceptible, or should make point of returning 404 when app i'm serving has no robots.txt?

every robots.txt handler know of deals invalid lines discarding them. html file (which presumably not contain valid robots.txt directives) treated if blank file. not part of official standards, though. (semi-)official standard assumes robots.txt file contain robot.txt directives. behavior robots.txt file contains html undefined.

if care crawlers, bigger problem not serve invalid robot.txt file, it's have no mechanism tell crawlers (or else) when resource not exist. crawlers point of view, site contain normal pages plus infinite number of exact copies of home page. encourage find way change setup resources don't exist return status 404.

Search This Blog

Dil

google crawlers - What happens when GET robots.txt returns an unrelated html file? -

Comments

Post a Comment

Popular posts from this blog

c# - Store DBContext Log in other EF table -

c# - Display ASPX Popup control in RowDeleteing Event (ASPX Gridview) -

Nuget pack csproj using nuspec -