Using LSTMs Is Easy

Make your model understand text just as you would read it -- by learning the sequence of words

Sequence is an inherent part of a language. Imagine if the sequence of words were changed in a document! Learning the sequence of words in a language should be important for a classifier. The LSTM is a neural network that helps us to learn the sequences in data as an important feature.

Unlike a vanilla neural network, the data fed to an LSTM has to have a specific data structure and has to be shaped accordingly. The three dimensional structure is illustrated below. The first dimension is the sample or a particular time series. The second dimension is the time/temporal step of the time series and the third dimension are the feature(s) associated with each time step.

LSTM: General data structure

We can think of an analogous structure for textual data. The document corresponds to a sample. Each word corresponds to a temporal step and each word embedding vector corresponds to the features at each temporal step.

LSTM: Doc data structure

This data structure is created using a vectorizer layer and an embedding matrix that is used to initialize an embedding layer. To see how this is done, watch the video below. You can also follow along by cloning my Kaggle notebook on which this video is based.