diff --git a/notebooks/clustering/Kmeans.ipynb b/notebooks/clustering/Kmeans.ipynb index b315d00542a37600f917734c3c9f9624ec3d76cf..4834b1457972b7695878118a2a936e2cd198a7a9 100644 --- a/notebooks/clustering/Kmeans.ipynb +++ b/notebooks/clustering/Kmeans.ipynb @@ -836,13 +836,11 @@ ] }, { - "cell_type": "code", - "execution_count": null, - "id": "5d232e89-e0b4-4b6b-a12f-994a461247cf", + "cell_type": "markdown", + "id": "48f94202-5633-436e-807e-b3183ad9b9b0", "metadata": {}, - "outputs": [], "source": [ - "Sa" + "### Saving final file" ] }, { @@ -874,17 +872,56 @@ ] }, { - "cell_type": "code", - "execution_count": null, - "id": "a004e5d7-748a-4d2f-a219-072a1bac110e", + "cell_type": "markdown", + "id": "dfac0ebd-1b9e-47c1-a12e-831503541f63", "metadata": {}, - "outputs": [], - "source": [] + "source": [ + "### **Key Takeaways**\n", + "#### 🔠**1. Finding the Optimal Number of Clusters**\n", + "\n", + "* **Elbow method** showed a significant drop in inertia at `k = 3`, but diminishing gains after that.\n", + "* **Silhouette scores** peaked at **`k = 2 (0.5037)`**, indicating the best balance between compactness and separation.\n", + "* Based on both metrics, **`k = 2` was selected** for final modeling.\n", + "\n", + "#### **2. Final KMeans Model (k = 2)**\n", + "\n", + "* KMeans clustering was performed using all features.\n", + "* Cluster labels were assigned and added to the dataset.\n", + "* Distribution:\n", + "\n", + " * **Cluster 0**: 80.4% of records\n", + " * **Cluster 1**: 19.6% of records\n", + "\n", + "#### **3. Cluster Profiling**\n", + "\n", + "* **Cluster 0**: Higher average height, weight, CH2O, and physical activity — likely healthier group.\n", + "* **Cluster 1**: Very low `NCP` (meal frequency) and slightly lower activity — possibly a distinct behavior group or anomaly.\n", + "* **Standard deviation analysis** showed that Cluster 1 had more variability in age, eating patterns, and tech use.\n", + "\n", + "#### **4. PCA Visualization**\n", + "\n", + "* PCA (applied after clustering) showed **clear separation** between the two clusters on the first component.\n", + "* Visual confirmation that the KMeans clustering captured meaningful structure in the data.\n", + "\n", + "#### **5. Final Evaluation Metrics**\n", + "\n", + "| Metric | Value |\n", + "| ----------------------- | ------- |\n", + "| Silhouette Score | 0.5037 |\n", + "| Davies–Bouldin Index | 0.8034 |\n", + "| Calinski–Harabasz Index | 1947.96 |\n", + "\n", + "* The metrics confirm that **`k = 2` clusters are well-separated and statistically valid**.\n", + "\n", + "### **Conclusion**\n", + "\n", + "The KMeans clustering pipeline successfully segmented the data into two well-defined clusters with strong internal structure, supported by visual evidence and evaluation metrics. Cluster profiling provides valuable insights into lifestyle differences across groups, setting the stage for targeted recommendations or downstream modeling." + ] }, { "cell_type": "code", "execution_count": null, - "id": "1b9c6962-b58f-4b3c-a32d-ce02a705ba5e", + "id": "426ed14a-9136-4192-9b1d-2a92576da4e4", "metadata": {}, "outputs": [], "source": []