eda

Categorical vs Categorical Heatmap

Identify high impact categorical variables in classification datasets

414 words/2 min read

Classification datasets often have a number of categorical variables. There is always the need to select the more important categorical variables for modelling, especially in high-dimension datasets.

Naive Bayes With Quantile Discretization

Discretization saves the day!

208 words/1 min read
Listen – this blog post explained Often, classification datasets have a mix of continuous and categorical data. The continuous data typically have problems such as outliers, noise and lack of a defined distribution.
Choosing Right Colormap for Heatmap

Choosing Right Colormap for Heatmap

Improve the interpretation of a heatmap

177 words/1 min read

Heatmaps are used to visually represent correlation between various continuous features in a dataset. You can construct heatmaps and give them different colours for different values. This gives good visual appeal and makes it easier to understand.

Non-parametric One-way ANOVA

Correlate categorical predictor with continuous response variable

934 words/5 min read

Regression datasets often have a mix of categorical and continuous predictor variables. When the number of categorical variables is large, how do you pick the ones that are relevant to the regression(i.e. correlated to the response variable)?

Neat Wordcount Histograms

The convenience of horizontal histograms

230 words/2 min read

Most NLP projects require us to look at word counts in documents. The traditional way is to draw histograms with vertical bars. But, are they convenient?