Skip to content

Instantly share code, notes, and snippets.

@johndavidsimmons
Created January 30, 2017 21:39
Show Gist options
  • Select an option

  • Save johndavidsimmons/794f559debf66c9246ff593e1bb11113 to your computer and use it in GitHub Desktop.

Select an option

Save johndavidsimmons/794f559debf66c9246ff593e1bb11113 to your computer and use it in GitHub Desktop.
Return a set of the relative anchors on the given page
def findAllAnchors(url):
# Use requests to get page source without using driver
r = requests.get(url)
# Turn the page source into soup for parsing
soup = bs4(r.text, "html5lib")
# Return a Set of relative anchors tags from a given
return set([anchor for anchor in soup.find_all('a', href=True) if anchor['href'].startswith('/') and not anchor['href'].endswith(('.com', '.org', '.net'))])
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment