Crawling Sites to Generate a List of URLs
07 Mar 2016 | Ben RobertsonHere’s the command to crawl a site and generate a list of urls:
wget --spider -r http://www.example.com 2>&1 | grep '^--' | awk '{ print $3 }' | grep -v '\.\(css\|js\|png\|gif\|jpg\|JPG\)$' > urls.txt
It will crawl http://www.example.com and spit out a list of urls at urls.txt
inside of whatever directory you are currently in. Have fun!