TransWikia.com

Reasoning behind using Deep Learning on non-local data

Data Science Asked on August 31, 2021

I understand the using of deep learning for data that have "local" structure, for example, images/videos/texts, as the convolutional layers reduce the amount of dimensions.

However, I saw that some people use it on non-local data, as on databases for example, here or here on the titanic database.

My question is: as just one hidden layer with enough neurons within can theoretically creates as many dimensions as we want, why would one use several hidden layers/deep learning instead of just using a single bigger hidden layer?

2 Answers

It appear that I forgot one point of the ANN, at least forgot one of its effects : the activation function.

It is true that for linear activation, multi-layer can be reduce to a single-one, but with a non-linear function,

a two-layer neural network can be proven to be a universal function approximator.

Sources

However, it is true that I dont understand now why to use more than two hidden layers...

Answered by EzrielS on August 31, 2021

A couple answers:

Yes, it could be overkill in scenarios where simpler models suffice. Linear and logistic regression are trivially also representable as a neural network, but it's not the most efficient way to solve it.

On the plus side, deep learning frameworks are good at applying specialized hardware like GPUs. Where a problem also fits deep learning, it could be a performance win if GPUs are available.

It can learn non-linear relationships via the activation functions. That doesn't mean it easily learns, say, interaction features. Yes it's possible to approximate anything with two wide enough dense layers, but they would have to be ridiculously wide to learn some arbitrary functions.

They're useful for timeseries data, but that is kind of data with a 'locality' in a time dimension, which you're already ruling in.

The intermediate representation could be meaningful for other purposes. For example a network that learns to classify purchase intent from customer attributes produces an intermediate representation that might more meaningfully yield to clustering than the raw input. The embedding captures the input in a space that is meaningful with respect to the target.

Answered by Sean Owen on August 31, 2021

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP