Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

The site adheres to a strict url structure. /state/city/id/schoolname - entering from the homepage, the only way to crawl the site 1 level at a time would be crawling the shortest urls first. this structure is also emphasised in the breadcrumbs on every page, the shortest urls are also the ones with the most internal links.

why would you crawl the site in any other way?



If you look at a particular city page you will notice that the cities are in alphabetical order, however google bot still crawls by length of url...


"why would you crawl the site any other way?"

I could see crawling pages most likely to have changed first as those pages would most likely lead to fresh content.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: