Keras RNN with unequal sequence lengths

I am struggling to find an approach for training an RNN with unequal sequence lengths. I have a varying number of sequential observations for each sample in my dataset. Each observation represents a connection in a social network graph that was made. If two samples have different sized networks, they will have different length sequences.

The recommended approach in keras seems to be to pad all observations so that each is as long as the longest in the sample. This ensures they are the same size but uses a lot more memory if the longest sequence is indeed very long.

Are there other approaches to feeding sequence data of varying length to an RNN specifically using the Rstudio version of keras? An example using the fit_generator method would be very welcome as I am quite new to DL.

I also work with varying sequence length. As far as I have been able to read up on, extension is the way to go. You might consider if you have only very few very long sequences to omit them?

Thanks, I will try that when I return to the office on Monday. About 95% of my sequences are less than 200 elements with a max of 3100. I can probably safely omit those.

Another approach I was considering was to use subsequences of a fixed length, say 20 or 50 elements. Yet I'm unclear how to apply such a model to the full sequences once it is trained. Do you have any recommendations on that front?

I think that depends on the nature of your problem. I for one cannot truncate the sequences I am working on and thus have to use extension

I used an approach that works well for my needs. All of my timeseries consists of events of two different types occurring after a start time.

I simply discretized the time intervals 16 different spans and summed the counts of the different events that occurred within each. This transformed highly variable sequence lengths of individual events into these summary count sequences.

Hopefully someone else may find this approach useful.

1 Like