Last active
July 8, 2019 08:39
-
-
Save colindix/aac7283873c791befe1709d78f41ab23 to your computer and use it in GitHub Desktop.
Apache Spark Regex match for User Tokens in Dataframe and SQL syntax
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| # ### This one works in the dataframe syntax | |
| dfuserrgx1 = r"\\users\\[^\\]+\\" | |
| dfuserrgx2 = r"\\userdata\\[^\\]+\\" | |
| #userrgx3 = r'\bS-1-5-21-\d{8,10}-\d{8,10}-\d{8,10}-\d{5,10}\b' | |
| dfuserrgx3 = r'S-1-5-21-\d{8,10}-\d{8,10}-\d{8,10}-\d{5,10}' | |
| dfusermatch = f"(?i)(?:{dfuserrgx1})|(?:{dfuserrgx2})|(?:{dfuserrgx3})" | |
| ################################################################ | |
| # ### Works in SQL - can reduce these to 4 x backslashes if raw strings are used | |
| sqluserrgx1 = r"\\\\users\\\\[^\\\\]+\\\\" | |
| sqluserrgx2 = r"\\\\\userdata\\\\[^\\\\]+\\\\" | |
| #userrgx3 = r'\\bS-1-5-21-\\d{8,10}-\\d{8,10}-\\d{8,10}-\\d{5,10}\\b' | |
| sqluserrgx3 = r'S-1-5-21-\\d{8,10}-\\d{8,10}-\\d{8,10}-\\d{5,10}' | |
| sqlusermatch = f"(?i)(?:{sqluserrgx1})|(?:{sqluserrgx2})|(?:{sqluserrgx3})" |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment