How should I process music play data

Question

I have music play data organized by the day on which each track was played, from March 1st, 2015 to August 30th, 2015. The data set contains count data for every day a song was played. I'd like to predict play counts for each track for an out-of-time window, from 2015-09-01 to 2015-09-30.
Details of the data
The data is from Alibaba's TianChi Competition
Action Table (mars_tianchi_user_actions)

Any user’s action data for a song is indicated by a unique row
Songs Table（mars_tianchi_songs）

Result Set
The participants need to predict the artists’ plays data in the following two months (20150901-20151030).
Participant’s Result Table (mars_tianchi_artist_plays_predict)

For song id daa234f183aee2373d20987b247cd768(all the song ids are hash values). The play plot is:

As you can see it, the variance of each day's play for this track is quite large.
I also have data on who played each track, with each user ids represented as a hash value.
I'd like some guidance on which analysis method I could use here. I am considering time series analysis.

Kyle. · Answer

Interesting problem. You certainly could use time series methods for this, but you should consider whether you think there is face-validity to song plays following a temporal pattern. From your plot above, I don't see any structure in the variation, but perhaps there is something that an algorithm can pick up. A good starting point would be R's auto.arima function for autoregressive integrated moving average (ARIMA). The ARIMA method is more fully described elsewhere (e.g., here), but an interesting extension to this approach might be using exogenous predictors in your model. For example, does the presence or absence of rain affect the likelihood of some songs being played? My guess would be yes (I love listening to post-rock bands on rainy days), but perhaps that's not true of everyone. This extension to ARIMA is referred to as ARIMAX in the literature. A really nice breakdown of how to handle covariates in your type of analysis can be found on Rob Hyndman's blog.

A second thought is that, with sufficient computational resources, there's an interesting hidden markov model (HMM) problem here. You could potentially model the transition probabilities from one song to another. Presumably there's a high likelihood of transitioning from track 1 to track 2 of a given record, but sometimes people might switch to a different record after listening to a favorite track on an album. The extent to which this is true for different tracks could be a part of this analysis, and you could attempt to model the hidden underlying states that motivate a listener to stay within a record, or to move to a new one.

How should I process music play data

Details of the data

One Answer

Add your own answers!

Ask a Question