-
-
Save kpobococ/92f120c6c4a9a52b84e3 to your computer and use it in GitHub Desktop.
| <?php | |
| function rfc3986_validate_uri($uri) | |
| { | |
| // Play around with this regexp online: | |
| // http://regex101.com/r/hZ5gU9/1 | |
| // Links to relevant RFC documents: | |
| // RFC 3986: http://tools.ietf.org/html/rfc3986 (URI scheme) | |
| // RFC 2234: http://tools.ietf.org/html/rfc2234#section-6.1 (ABNF notation) | |
| $regex = '/ | |
| # URI scheme RFC 3986 | |
| (?(DEFINE) | |
| # ABNF notation of RFC 2234 | |
| (?<ALPHA> [\x41-\x5A\x61-\x7A] ) # Latin character (A-Z, a-z) | |
| (?<CR> \x0D ) # Carriage return (\r) | |
| (?<DIGIT> [\x30-\x39] ) # Decimal number (0-9) | |
| (?<DQUOTE> \x22 ) # Double quote (") | |
| (?<HEXDIG> (?&DIGIT) | [\x41-\x46] ) # Hexadecimal number (0-9, A-F) | |
| (?<LF> \x0A ) # Line feed (\n) | |
| (?<SP> \x20 ) # Space | |
| # RFC 3986 body | |
| (?<uri> (?&scheme) \: (?&hier_part) (?: \? (?&query) )? (?: \# (?&fragment) )? ) | |
| (?<hier_part> \/\/ (?&authority) (?&path_abempty) | |
| | (?&path_absolute) | |
| | (?&path_rootless) | |
| | (?&path_empty) ) | |
| (?<uri_reference> (?&uri) | (?&relative_ref) ) | |
| (?<absolute_uri> (?&scheme) \: (?&hier_part) (?: \? (?&query) )? ) | |
| (?<relative_ref> (?&relative_part) (?: \? (?&query) )? (?: \# (?&fragment) )? ) | |
| (?<relative_part> \/\/ (?&authority) (?&path_abempty) | |
| | (?&path_absolute) | |
| | (?&path_noscheme) | |
| | (?&path_empty) ) | |
| (?<scheme> (?&ALPHA) (?: (?&ALPHA) | (?&DIGIT) | \+ | \- | \. )* ) | |
| (?<authority> (?: (?&userinfo) \@ )? (?&host) (?: \: (?&port) )? ) | |
| (?<userinfo> (?: (?&unreserved) | (?&pct_encoded) | (?&sub_delims) | \: )* ) | |
| (?<host> (?&ip_literal) | (?&ipv4_address) | (?®_name) ) | |
| (?<port> (?&DIGIT)* ) | |
| (?<ip_literal> \[ (?: (?&ipv6_address) | (?&ipv_future) ) \] ) | |
| (?<ipv_future> \x76 (?&HEXDIG)+ \. (?: (?&unreserved) | (?&sub_delims) | \: )+ ) | |
| (?<ipv6_address> (?: (?&h16) \: ){6} (?&ls32) | |
| | \:\: (?: (?&h16) \: ){5} (?&ls32) | |
| | (?&h16)? \:\: (?: (?&h16) \: ){4} (?&ls32) | |
| | (?: (?: (?&h16) \: ){0,1} (?&h16) )? \:\: (?: (?&h16) \: ){3} (?&ls32) | |
| | (?: (?: (?&h16) \: ){0,2} (?&h16) )? \:\: (?: (?&h16) \: ){2} (?&ls32) | |
| | (?: (?: (?&h16) \: ){0,3} (?&h16) )? \:\: (?&h16) \: (?&ls32) | |
| | (?: (?: (?&h16) \: ){0,4} (?&h16) )? \:\: (?&ls32) | |
| | (?: (?: (?&h16) \: ){0,5} (?&h16) )? \:\: (?&h16) | |
| | (?: (?: (?&h16) \: ){0,6} (?&h16) )? \:\: ) | |
| (?<h16> (?&HEXDIG){1,4} ) | |
| (?<ls32> (?: (?&h16) \: (?&h16) ) | (?&ipv4_address) ) | |
| (?<ipv4_address> (?&dec_octet) \. (?&dec_octet) \. (?&dec_octet) \. (?&dec_octet) ) | |
| (?<dec_octet> (?&DIGIT) | |
| | [\x31-\x39] (?&DIGIT) | |
| | \x31 (?&DIGIT){2} | |
| | \x32 [\x30-\x34] (?&DIGIT) | |
| | \x32\x35 [\x30-\x35] ) | |
| (?<reg_name> (?: (?&unreserved) | (?&pct_encoded) | (?&sub_delims) )* ) | |
| (?<path> (?&path_abempty) | |
| | (?&path_absolute) | |
| | (?&path_noscheme) | |
| | (?&path_rootless) | |
| | (?&path_empty) ) | |
| (?<path_abempty> (?: \/ (?&segment) )* ) | |
| (?<path_absolute> \/ (?: (?&segment_nz) (?: \/ (?&segment) )* )? ) | |
| (?<path_noscheme> (?&segment_nz_nc) (?: \/ (?&segment) )* ) | |
| (?<path_rootless> (?&segment_nz) (?: \/ (?&segment) )* ) | |
| (?<path_empty> (?&pchar){0} ) # For explicity only | |
| (?<segment> (?&pchar)* ) | |
| (?<segment_nz> (?&pchar)+ ) | |
| (?<segment_nz_nc> (?: (?&unreserved) | (?&pct_encoded) | (?&sub_delims) | \@ )+ ) | |
| (?<pchar> (?&unreserved) | (?&pct_encoded) | (?&sub_delims) | \: | \@ ) | |
| (?<query> (?: (?&pchar) | \/ | \? )* ) | |
| (?<fragment> (?: (?&pchar) | \/ | \? )* ) | |
| (?<pct_encoded> \% (?&HEXDIG) (?&HEXDIG) ) | |
| (?<unreserved> (?&ALPHA) | (?&DIGIT) | \- | \. | \_ | \~ ) | |
| (?<reserved> (?&gen_delims) | (?&sub_delims) ) | |
| (?<gen_delims> \: | \/ | \? | \# | \[ | \] | \@ ) | |
| (?<sub_delims> \! | \$ | \& | \' | \( | \) | |
| | \* | \+ | \, | \; | \= ) | |
| ) | |
| ^(?&uri)$ | |
| /x'; | |
| return preg_match($regex, $uri) === 1; | |
| } |
Excellent! Works great.
Is there any way to get each part of the uri uing the $matches parameter of the function preg_match?
Would you be able to share this under a GPL-2 compatible license? Something like MIT or even explicitly state it's public domain? Thanks!
Would you be able to share this under a GPL-2 compatible license? Something like MIT or even explicitly state it's public domain? Thanks!
Sure, consider this code public domain. Or do you need me to add it as a comment in the source or something?
The regex throws an unknown modifier '/' error. Here's my test script:
The error specifically is:
PHP Warning: preg_match(): Unknown modifier '/' in /home/jaith/biz/erep/2017/05-12-url/function.rfc3986_validate_uri.php on line 112 PHP Stack trace: PHP 1. {main}() /home/jaith/biz/erep/2017/05-12-url/test.php:0 PHP 2. rfc3986_validate_uri() /home/jaith/biz/erep/2017/05-12-url/test.php:38 PHP 3. preg_match() /home/jaith/biz/erep/2017/05-12-url/function.rfc3986_validate_uri.php:112
GitHub did not send me a notification to your comment :(
Apparently, newer versions of PHP do not ignore slashes within regex comments, and I had a couple of URLs there. So I've updated the code
I think since this is a Gist your comment is fine. Thanks.
The regex throws an unknown modifier '/' error. Here's my test script:
The error specifically is: