Skip to content

Instantly share code, notes, and snippets.

@iandark
Created March 31, 2024 12:39
Show Gist options
  • Select an option

  • Save iandark/3985f5cfc803aed720eaa1fd600acee0 to your computer and use it in GitHub Desktop.

Select an option

Save iandark/3985f5cfc803aed720eaa1fd600acee0 to your computer and use it in GitHub Desktop.
This Python script is designed to remove accents from the values of class and ID attributes within an HTML file. It reads the content of the specified HTML file and utilizes regular expressions to identify and target the values associated with class and id attributes.
import re
import unidecode
def remove_accents(text):
return unidecode.unidecode(text)
def replace_function(match):
return match.group(0).replace(match.group(2), remove_accents(match.group(2)))
file_path = 'file.html'
with open(file_path, 'r', encoding='utf-8') as file:
content = file.read()
modified_content = re.sub(r'(class|id)=["\']([^"\']+?)["\']', replace_function, content)
with open(file_path, 'w', encoding='utf-8') as file:
file.write(modified_content)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment