Neat Wordcount Histograms

The convenience of horizontal histograms

September 12, 2022

230 words/2 min read

Most NLP projects require us to look at word counts in documents. The traditional way is to draw histograms with vertical bars. But, are they convenient?

Here is a code fragment that shows how to do it:

sns.set(rc={"figure.figsize":(6, 60)}) #width=6, #height=60
sns.countplot(data=rawdf.iloc[:], y='keyword', hue='target', )
plt.title('Counterintuitive keyword sense')

We count the number of occurences of words in the keyword column of the dataset stratified by the target variable.

The traditional histogram with vertical bars does not work well for large vocabularies – the word axis appears garbled and the entire histogram tries hard to fit within the page width.

The histogram on the left has none of these problems. It can grow to keep pace with the vocabulary, the word axis is not garbled and the words are easier to read.

Note: This is a code fragment and will not run as is. To see the code in context, refer to this kaggle notebook