How to encode cyclic time data for machine learning models
Updated: Sep 25, 2020
Time is the most valuable data in many cases as we all know. The way we feed this data to our machine learning model can make a difference in the success of our model. In this post, we will look at a simple but effective approach to encoding our cyclic time data.
Cyclicity of time
The two main hassles about periodic part of date and time information (months, days, hours, minutes and seconds) are to represent their steps better than just an integer to distinguish them more effectively, and to define their reset points (ex. 23 to 0 for hour, 59 to 0 for minute) as one step.
As we are trying to describe a periodic data better, we can use periodic functions "sine" and "cosine" to encode our data.
Advantages of "sine" and "cosine"
Besides their periodicity;
They are orthogonal to each other and cannot be generated from each other.
They are operating in [-1, 1] range which you don't need to normalize in most cases.
Last but not least, if you utilize them together to compose a 2D vector, they can form an unit circle.
How do we encode time by using "sine" and "cosine?
Let's do some practice on hours and minutes.
To do this, first we need to scale our time values to [0, 2π]. After that we can use "sine" and "cosine" functions with these scaled values.
scaled_hour = (hour/24)*2*π sine_encoded_hour = sin(scaled_hour) cosine_encoded_hour = cos(scaled_hour)
scaled_minute = (minute/60)*2*π sine_encoded_minute = sin(scaled_minute) cosine_encoded_minute = cos(scaled_minute)
We will be generating two time features that will form a 2D vector for each one time feature.
That's it. Let's have a look at sample python code and its outputs.
import numpy as np import plotly.graph_objects as go sin_encoded_min = np.sin((np.linspace(0,59, num=60)/60)*2*np.pi) cos_encoded_min = np.cos((np.linspace(0,59, num=60)/60)*2*np.pi) sin_encoded_hour = np.sin((np.linspace(0,23, num=24)/24)*2*np.pi) cos_encoded_hour = np.cos((np.linspace(0,23, num=24)/24)*2*np.pi) fig1 = go.Figure() fig1.add_trace(go.Scatter( x=sin_encoded_min, y=cos_encoded_min, name='Sin vs Cos for Minute', mode='lines+markers+text', text=["<b>" + str(i) + "</b>" for i in range(0,60)], showlegend=False )) fig1.update_traces(textfont_size=16, marker=dict(size=22)) fig1.update_layout(hovermode="x", xaxis_title="Sin of Minute", yaxis_title="Cos of Minute", yaxis = dict(scaleanchor = "x", scaleratio = 1), width=800, height=800, legend=dict(orientation="h", x=0.05), title="Sin/Cos Encoding of Minute", title_x=0.5) fig1.show() fig2 = go.Figure() fig2.add_trace(go.Scatter( x=sin_encoded_hour, y=cos_encoded_hour, name='Sin vs Cos for Hour', mode='lines+markers+text', text=["<b>" + str(i) + "</b>" for i in range(0,24)], showlegend=False )) fig2.update_traces(textfont_size=16, marker=dict(size=24)) fig2.update_layout(hovermode="x", xaxis_title="Sin of Hour", yaxis_title="Cos of Hour", yaxis = dict(scaleanchor = "x", scaleratio = 1), width=800, height=800, legend=dict(orientation="h", x=0.05), title="Sin/Cos Encoding of Hour", title_x=0.5) fig2.show()
With this encoding method, we can express time better as 2D vector instead of just a scalar, as well as define the reset points, which come from cyclicity, as a single step.
You can apply this method to any other cyclic data.
Hope you like it!