Dengue is a vector borne disease that affects many people in the tropics and subtropics. In its most severe form, dengue hemmoragic fever (DHF), patients bleed and suffer greatly. This affects mainly the very young and the very old.
Predicting outbreaks can help in planning, eg. in vector control before an outbreak occurs. We present a two selected articles by our students which illustrate both a good analysis of the forecasting problem and with a good solution.
Both submissions impressed us with their well-reasoned approach to the problem, in particular, the innovative and insightful feature engineering methods used by each team.
One problem faced by both teams was how to deal with the substantial amount of missing data in some weather stations:
Team Mopiko 2.0 chose to truncate the data to the last 8 years to avoid the time period with the most missing data. A dendrogram was then constructed using correlation between the weather stations and used to cluster stations with high correlation. The remaining missing data for each station was taken from the nearest station within each cluster. The dendrogram is suitable for clustering, however more reasoning could have been done for the clustering conditions, for example if there was a correlation threshold below which stations would be clustered together.
The good performance of Team Mopiko 2.0's model is interesting considering their truncation of the data, suggesting that short term but higher quality data may be sufficient to learn the dynamics of dengue transmission.
With data from more than fifty weather stations given, both teams came up with their own method to do feature selection or dimensional reduction.
Regardless of the approach used, domain knowledge was critical in the design of the features. Both teams windowed their input data in sizes that were informed by their knowledge of dengue transmission cycles and seasonal patterns of weather in Singapore.
Another unavoidable factor is experimentation, which both teams did in abundance, not only during feature engineering but also when deciding on the neural network architecture, performing hundreds of trials before finally settling on their optimal models.