TransWikia.com

Dividing a dataset to parallelize machine learning training on the cloud

Data Science Asked by ptushev on June 7, 2021

I’m very new to machine learning. I am doing a project for a subject called parallel and distributed computing, in which we have to speed up a heavy computation using parallelism or distributed computing. My idea was to have a dataset divided in equal parts, and for each subset to have a neural network to be trained on a separate machine in the cloud. Once the models are trained, they would be returned back to me and somehow combined into a single model. I am aware of federated learning but it doesn’t quite fit my scenario of actually sending and dividing the dataset into the cloud. Does someone know any feasible approaches (maybe a variant of federated learning) of how one would do this?

One Answer

There are many ways to parallelism machine learning. It is often better to distribute the model parameters, not the data.

Training models only a subset of data will result in worse parameter estimates than training a model on random samples of the data.

Additionally, moving data around is more expensive than moving parameters.

Answered by Brian Spiering on June 7, 2021

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP