TransWikia.com

How should I process music play data

Data Science Asked by GoingMyWay on January 28, 2021

I have music play data organized by the day on which each track was played, from March 1st, 2015 to August 30th, 2015. The data set contains count data for every day a song was played. I’d like to predict play counts for each track for an out-of-time window, from 2015-09-01 to 2015-09-30.

Details of the data

The data is from Alibaba’s TianChi Competition

Action Table (mars_tianchi_user_actions)
enter image description here

Any user’s action data for a song is indicated by a unique row

Songs Table(mars_tianchi_songs)
enter image description here

Result Set

The participants need to predict the artists’ plays data in the following two months (20150901-20151030).

Participant’s Result Table (mars_tianchi_artist_plays_predict)
enter image description here

For song id daa234f183aee2373d20987b247cd768(all the song ids are hash values). The play plot is:

enter image description here

As you can see it, the variance of each day’s play for this track is quite large.

I also have data on who played each track, with each user ids represented as a hash value.

I’d like some guidance on which analysis method I could use here. I am considering time series analysis.

One Answer

Interesting problem. You certainly could use time series methods for this, but you should consider whether you think there is face-validity to song plays following a temporal pattern. From your plot above, I don't see any structure in the variation, but perhaps there is something that an algorithm can pick up. A good starting point would be R's auto.arima function for autoregressive integrated moving average (ARIMA). The ARIMA method is more fully described elsewhere (e.g., here), but an interesting extension to this approach might be using exogenous predictors in your model. For example, does the presence or absence of rain affect the likelihood of some songs being played? My guess would be yes (I love listening to post-rock bands on rainy days), but perhaps that's not true of everyone. This extension to ARIMA is referred to as ARIMAX in the literature. A really nice breakdown of how to handle covariates in your type of analysis can be found on Rob Hyndman's blog.

A second thought is that, with sufficient computational resources, there's an interesting hidden markov model (HMM) problem here. You could potentially model the transition probabilities from one song to another. Presumably there's a high likelihood of transitioning from track 1 to track 2 of a given record, but sometimes people might switch to a different record after listening to a favorite track on an album. The extent to which this is true for different tracks could be a part of this analysis, and you could attempt to model the hidden underlying states that motivate a listener to stay within a record, or to move to a new one.

Answered by Kyle. on January 28, 2021

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP