Classification/preprocessing/ishwari (!1) · Merge requests · Mishra, Ritwik (PG/T - Comp Sci & Elec Eng) / ml_mavericks_coursework

This MR includes all preprocessing steps for the census income dataset, including:

Cleaned missing values (?, NaN, Null, Na etc.) and standardized text.
Dropped irrelevant or skewed features (capital-gain, capital-loss, education-num).
Performed EDA by demonstrating distribution of numerical and categorical features, heatmap, correlation matrix, boxplot for outliers.
Encoded categorical features using appropriate strategies:
- Binary encoding for sex
- One-hot encoding for race, relationship, marital-status, workclass, and grouped native-country
- Grouped rare/unknown categories to reduce sparsity
Cleaned numerical features by clipping extreme outliers in hours-per-week.

Finalized processed dataset and notebook

Classification/preprocessing/ishwari