Skip to content

Instantly share code, notes, and snippets.

@colindix
Last active July 8, 2019 08:39
Show Gist options
  • Select an option

  • Save colindix/aac7283873c791befe1709d78f41ab23 to your computer and use it in GitHub Desktop.

Select an option

Save colindix/aac7283873c791befe1709d78f41ab23 to your computer and use it in GitHub Desktop.
Apache Spark Regex match for User Tokens in Dataframe and SQL syntax
# ### This one works in the dataframe syntax
dfuserrgx1 = r"\\users\\[^\\]+\\"
dfuserrgx2 = r"\\userdata\\[^\\]+\\"
#userrgx3 = r'\bS-1-5-21-\d{8,10}-\d{8,10}-\d{8,10}-\d{5,10}\b'
dfuserrgx3 = r'S-1-5-21-\d{8,10}-\d{8,10}-\d{8,10}-\d{5,10}'
dfusermatch = f"(?i)(?:{dfuserrgx1})|(?:{dfuserrgx2})|(?:{dfuserrgx3})"
################################################################
# ### Works in SQL - can reduce these to 4 x backslashes if raw strings are used
sqluserrgx1 = r"\\\\users\\\\[^\\\\]+\\\\"
sqluserrgx2 = r"\\\\\userdata\\\\[^\\\\]+\\\\"
#userrgx3 = r'\\bS-1-5-21-\\d{8,10}-\\d{8,10}-\\d{8,10}-\\d{5,10}\\b'
sqluserrgx3 = r'S-1-5-21-\\d{8,10}-\\d{8,10}-\\d{8,10}-\\d{5,10}'
sqlusermatch = f"(?i)(?:{sqluserrgx1})|(?:{sqluserrgx2})|(?:{sqluserrgx3})"
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment