Univariate time series forecasting

A time series is a set of observations ordered in time and dependednt of each other. Due to this order in time, it is likely that the value of a variable \(y\) at moment \(t\) reflects the past history of the series, that is, the observations of a time series are likely to be correlated.

The term "univariate time series" refers to a time series that consists of single (scalar) observations recorded sequentially over equal time increments.

These are data sets in which only one variable is observed each time, for example the temperature at each hour, in the case of the notebook that was made.

1. Data preparation

1.1. Data extraction

First, we trained a model using only a single feature (temperature of the node 28), and use it to make predictions for that value in the future.

We select two months of data that do not contain any missing data during these two months on node 28.

The dataset contains \(1441\) rows, the first \(1200\) rows of the data will be the training dataset (\(TRAIN\_SPLIT = 1200\)), and there remaining will be the validation dataset. This amounts to \(~50\) days worth of training data and \(10\) days of validation data. (more information on the operating process of machine learning methods is presented in this link).

1.2. Data scaling

It is important to scale features before training a neural network. Standardization is a common way of doing this scaling by subtracting the mean and dividing by the standard deviation of each feature of the training data.

uni_train_mean = uni_data[:TRAIN_SPLIT].mean()
uni_train_std = uni_data[:TRAIN_SPLIT].std()
uni_data = (uni_data-uni_train_mean)/uni_train_std

1.3. Data modelling

The function below returns the windows of time for the model to train on. The parameter history_size is the size of the past window of information. The target_size is how far in the future does the model need to learn to predict. The target_size is the label that needs to be predicted:

def univariate_data(dataset, start_index, end_index, history_size, target_size):
  data = []
  labels = []

  start_index = start_index + history_size
  if end_index is None:
    end_index = len(dataset) - target_size

  for i in range(start_index, end_index):
    indices = range(i-history_size, i)
    # Reshape data from (history_size,) to (history_size, 1)
    data.append(np.reshape(dataset[indices], (history_size, 1)))
  return np.array(data), np.array(labels)

1.4. Data separation

Then comes the creation of the data. The model will be given the last 20 recorded temperature observations, and needs to learn to predict the temperature at the next time step:

univariate_past_history = 20
univariate_future_target = 0

x_train_uni, y_train_uni = univariate_data(uni_data, 0, TRAIN_SPLIT,
x_val_uni, y_val_uni = univariate_data(uni_data, TRAIN_SPLIT, None,

2. Baseline

Before proceeding to train a model, a simple baseline was first established. Given an input point, the baseline method looks at all the history (the information given to the network is given in pink, and it must predict the value at the red cross) and predicts the next point to be the average of the last 20 observations :

3. Recurrent neural network(RNN)

In order to beat this baseline, a recurrent neural network was used to predict the temperature at time \(t\) based on the last 20 observations.

3.1. Shuffling , batching and caching data

We used tf.data to shuffle, batch, and cache the dataset:


train_univariate = tf.data.Dataset.from_tensor_slices((x_train_uni, y_train_uni))
train_univariate = train_univariate.cache().shuffle(BUFFER_SIZE).batch(BATCH_SIZE).repeat()

val_univariate = tf.data.Dataset.from_tensor_slices((x_val_uni, y_val_uni))
val_univariate = val_univariate.batch(BATCH_SIZE).repeat()
  • The batch size is a term used in machine learning and refers to the number of training examples utilized in one iteration. It defines the number of samples that will be propagated through the network. It can have a significant impact on the model’s performance and training time.

  • The buffer size represents the maximum number elements that will be buffered when prefetching. It is important to make it large enough, or else shuffling will not be very effective. We have to shuffle the data with a buffer size nearly equal to the length of the dataset. This ensures good shuffling.

3.2. Defintion of the model

Then comes the definition of the model. For this, we used an artificial recurrent neural network (RNN) architecture named Long short-term memory (LSTM), used in the field of deep learning. Unlike standard feedforward neural networks, LSTM has feedback connections.

simple_lstm_model = tf.keras.models.Sequential([
    tf.keras.layers.LSTM(8, input_shape=x_train_uni.shape[-2:]),

The model is a sequential model, which consists of two layers: an LSTM input layer composed of 8 neurons, in which we passed the argument input_shape, and a dense output layer.

  • A sequential model is a linear stack of layers.

  • Dense layer just the regular densely-connected neural network layer.

Before training the model, the learning process was configured by calling the \(compile\) method, taking as optimizer the stochastic gradient descent (SGD), and as loss the root mean square error (RMSE) :

simple_lstm_model.compile(optimizer='SGD', loss=root_mean_squared_error)

3.3. Training the model

For that, we choosed an epoch of \(20\). An epoch is an arbitrary cutoff, generally defined as "one pass over the entire dataset", used to separate training into distinct phases, which is useful for logging and periodic evaluation.

history=simple_lstm_model.fit(train_univariate, epochs=EPOCHS, steps_per_epoch=TRAIN_SPLIT, validation_data=val_univariate, validation_steps=50)

We then obtain the following plot:

We notice that the two graphs have the same behavior:

  • a strong decrease in the value of the loss (which represents the root mean square error RMSE) from the 1st epoch with a value of \(0.2657\) for the training space and a value of \(0.2057\) for the validation space, to the 4th epoch with a value of \(0.0769\) for the training space and a value of \(0.0868\) for the validation space.

  • a slight decrease in the value of the loss over the course of the epochs from the 5th epoch onwards, converging to a value of \(0.06\).

Having obtained a low loss value, we then conclude that the model created is well and will be able to efficiently predict the value of the temperature at time \(t\) based on the last \(20\) observations.

After inputting our simple LSTM model, we confirm the last observation by making some predictions :


  • Deep learning for Time series by Jason Brownlee.

  • Deep Time Series Forecasting with Python An Intuitive Introduction to Deep Learning for Applied Time Series Modeling by N.D.Lewis.

  • Hands on ML SciKit Learn Tensorflow by Aurélien Géron.