Neat Wordcount Histograms
The convenience of horizontal histograms
Most NLP projects require us to look at word counts in documents. The traditional way is to draw histograms with vertical bars. But, are they convenient?
Here is a code fragment that shows how to do it:
sns.set(rc={"figure.figsize":(6, 60)}) #width=6, #height=60
sns.countplot(data=rawdf.iloc[:], y='keyword', hue='target', )
plt.title('Counterintuitive keyword sense')
We count the number of occurences of words in the keyword
column of the
dataset stratified by the target variable.
The traditional histogram with vertical bars does not work well for large vocabularies – the word axis appears garbled and the entire histogram tries hard to fit within the page width.
The histogram on the left has none of these problems. It can grow to keep pace with the vocabulary, the word axis is not garbled and the words are easier to read.
Note: This is a code fragment and will not run as is. To see the code in context, refer to this kaggle notebook