Created
March 31, 2024 12:39
-
-
Save iandark/3985f5cfc803aed720eaa1fd600acee0 to your computer and use it in GitHub Desktop.
This Python script is designed to remove accents from the values of class and ID attributes within an HTML file. It reads the content of the specified HTML file and utilizes regular expressions to identify and target the values associated with class and id attributes.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| import re | |
| import unidecode | |
| def remove_accents(text): | |
| return unidecode.unidecode(text) | |
| def replace_function(match): | |
| return match.group(0).replace(match.group(2), remove_accents(match.group(2))) | |
| file_path = 'file.html' | |
| with open(file_path, 'r', encoding='utf-8') as file: | |
| content = file.read() | |
| modified_content = re.sub(r'(class|id)=["\']([^"\']+?)["\']', replace_function, content) | |
| with open(file_path, 'w', encoding='utf-8') as file: | |
| file.write(modified_content) |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment