Is there a way to implement nested features in unsupervised models?

Question

Our project has built an unsupervised model that uses data about a number of companies. Some of these companies are public and some are private. The ones that are public have much higher financial reporting requirements than the private companies which means I have a lot more information about the public companies than I do the private ones.
Because the overall data we are working with has a large number of features, we don't build our model from direct data values, but rather we compute a set of principal components and then use those components as features. I would like to use the available financial data to inform my unsupervised model, but have so far had to exclude it because of the "missing" (i.e. unavailable) data from the private companies.
What I think I need is a way to incorporate a "nested" feature such that if the company is public (1), its financial data is utilized but if it is private (0) no financial data is utilized. This would create a reduced dimensional space just for the private companies instead of the entire dataset.
Are there any references to papers or examples of how to do something like this?

Is there a way to implement nested features in unsupervised models?

Add your own answers!

Ask a Question