google crawlers - What happens when GET robots.txt returns an unrelated html file? -


i have web server capable of serving assets of various web apps. when requested asset doesn't exist, sends index.html. in other words:

  • get /img/exists.png -> exists.png
  • get /img/inexistent.png -> index.html

this means that:

get /robots.txt -> index.html

how google (and other) crawlers handle this? detect robots.txt invalid , ignore (same returning 404)? or penalize ranking serving invalid robots.txt? acceptible, or should make point of returning 404 when app i'm serving has no robots.txt?

every robots.txt handler know of deals invalid lines discarding them. html file (which presumably not contain valid robots.txt directives) treated if blank file. not part of official standards, though. (semi-)official standard assumes robots.txt file contain robot.txt directives. behavior robots.txt file contains html undefined.

if care crawlers, bigger problem not serve invalid robot.txt file, it's have no mechanism tell crawlers (or else) when resource not exist. crawlers point of view, site contain normal pages plus infinite number of exact copies of home page. encourage find way change setup resources don't exist return status 404.


Comments

Popular posts from this blog

javascript - Karma not able to start PhantomJS on Windows - Error: spawn UNKNOWN -

c# - Display ASPX Popup control in RowDeleteing Event (ASPX Gridview) -

Nuget pack csproj using nuspec -