This is not a long post, but I wanted to post this somewhere. This may be useful if someone is doing an article about Google or something like that.
While I was changing some things in my server configuration, some user accessed a public folder on my site, I was looking at the access logs of it at the time, everything completely normal up to that point until 10 SECONDS AFTER the user request, a request coming from a Google IP address with Googlebot/2.1; +http://www.google.com/bot.html
user-agent hits the same public folder. Then I noticed that the user-agent of the user that accessed that folder was Chrome/131.0.0.0
.
I have a subdomain and there is some folders of that subdomain that are actually indexed on the Google search engine, but that specific public folder doesn’t appear to be indexed at all and it doesn’t show up on searches.
May be that google uses Google Chrome users to discover unindexed paths of the internet and add them to their index?
I know it doesn’t sound very shocking because most people here know that Google Chrome is a privacy nightmare and it should be avoided at all times, but I never saw this type of behavior on articles about “why you should avoid Google Chrome” or similar.
I’m not against anyone scrapping the page either since it’s public anyways, but the fact they discover new pages of the internet making use of Google Chrome impressed me a little.
Edit: Fixed a typo
100% if you have enabled “Safe browsing” (which is enabled by default). This also applies to Firefox, but I don’t know if there is enabled by default.
deleted by creator
Are you using Google’s DNS?
DNS will only leak domains (and subdomains); not paths