Skip to content

Instantly share code, notes, and snippets.

View SamarthGarge's full-sized avatar
:shipit:
🛠️ Building... and rebuilding... and rebuilding

Samarth Garge SamarthGarge

:shipit:
🛠️ Building... and rebuilding... and rebuilding
  • 03:31 (UTC +05:30)
View GitHub Profile
@SamarthGarge
SamarthGarge / data_cleaning_functions.py
Created March 26, 2025 10:58
Outliers feature : 2 ways to use it => 1) From the outliers.py function 2) By integrating it in the clean_dataframe function
def clean_dataframe(df, null_method='nan', fix_numeric=True, remove_dups=True, dup_subset=None,
detect_outliers_flag=False, outlier_columns=None, outlier_method='zscore',
outlier_threshold=3, outlier_processing='remove', outlier_cap_values=None):
"""
Apply all cleaning functions in sequence
Parameters:
-----------
df : pandas.DataFrame
The dataframe to clean
import pandas as pd
import numpy as np
import re
import os
from pathlib import Path
def handle_null_values(df, method:'nan', columns=None):
"""
Check for null values and replace them with specified value.