Can my neural network learn conditional rules when classifying?

Question

I'm concerned that I'm attempting the impossible with my neural network. This is the scenario:
I have a 2D square world. In it, I create five circles of random size and position. I then classify one of them as the correct answer, based on the following rules:

If any circle's radius is > THRESHOLD, I choose the largest circle
Otherwise, I choose the circle with the origin nearest the center

I send the inputs as serial coordinates, like this: [X0, Y0, RADIUS0, X1, Y1, RADIUS1, ...].
The output is a one-hot array, e.g. [0, 0, 1, 0, 0].
I've modeled this in TensorFlow without success. My best scoring result appears to always choose the largest circle, ignoring the else clause of the arbitrary rule.
Am I fundamentally misunderstanding the capabilities of neural networks? I've tried many (many) different configurations (layer counts, node counts, activation functions ... you name it). All of my networks have been feed-forward, so far.
Thanks in advance for any insight!

Here are some details of my network and data:

I have tried with up to 500k cases. I separate 10% for generalization checks after training, and train on the remaining 90% with a 50/50 validation split.
I've tried with the test data weighted 75% toward rule A, 50/50, and 75% toward ruleB.
I've tried 0-10 hidden layers, and neuron counts from 2 to 256 (each hidden layer gets the same number of neurons).
I change the number epochs as time allows, but generally it's 10-100. My longest runs have been several hours (with giant case numbers, and dropouts to prevent overfitting).
I've used batch sizes of 1-50.
I've tried learning rates of 0.0001 - 0.1.
I'm currently using ReLU activation, initializing bias to const(0.1) and kernel w/ heNormal. I have tried several other approaches for all three.
I standardize the inputs to center on zero w/ variance of one.
The loss function is categoricalCrossentropy.
The optimizer is Adam.

Cameron Chandler · Answer

Yes, the Universal Approximation Theorem states that a neural network can learn any function in $R^n$ with one hidden layer and a finite number of neurons with a non-linear activation function. There are many things that can go wrong with training a network, for one, have you tried graphing its performance over time and seeing if it is converging?

Answered by Cameron Chandler on February 5, 2021

Jeremy List · Answer

Centering the data on zero and scaling to variance of one helps with a lot of classification problems but in this case it would remove information that's needed to solve your problem as I understand it.
Another possible problem is the loss function, which you don't mention at all in your question. I would suggest something that stays fairly high when your neural network is only learning one of the two rules.

Cameron Chandler · Answer

Alrighty, I wrote some code to emulate your problem. I found the same issues, and so simplified the problem. When I modified the label function to instead always choose the biggest radius regardless of the arbitrary rule, I found that it still could not figure it out, and instead would converge to predicting 0.2 for each of the 5 circles. It appears that if you don't order the circles at the input, then the network cannot differentiate between them. This makes sense if you think about the flow through the densely connected network. There may be some success if we try to order the circles before inputting them.
import numpy as np
from tqdm import tqdm

N_CIRCLES = 5
CENTRE_RANGE = 1
RMIN, RMAX = 0.1, 0.5
THRESHOLD = 0.45

def label(x):
    # If above threshold, then choose largest circle
    if np.any(x[:5] > THRESHOLD):
        return np.argmax(x[:5])
    
    # Else, choose the circle nearest to (0, 0)
    return np.argmax([np.linalg.norm(x[i:i+2]) for i in range(N_CIRCLES, 3*N_CIRCLES, 2)])

def generate_sample():
    # {r0, r1, r2, r3, r4, x0, y0, x1, y1, x2, y2, x3, y3, x4, y4}
    x = np.concatenate((np.random.uniform(RMIN, RMAX, N_CIRCLES), 
                        np.random.uniform(-CENTRE_RANGE, CENTRE_RANGE, 2*N_CIRCLES)))
    
    return x, label(x)

def generate_samples(n):
    x = np.zeros((n, N_CIRCLES*3))
    y = np.zeros(n)
    
    for i in range(n):
        x[i], y[i] = generate_sample()
    
    return x, y

import torch
import torch.nn as nn
import torch.nn.functional as F

class Net(nn.Module):
    def __init__(self):
        super().__init__()
        # Kernel size 5
        self.fc1 = nn.Linear(3*N_CIRCLES, 32)
        self.fc2 = nn.Linear(32, 64)
        self.fc3 = nn.Linear(64, N_CIRCLES)
        
    def forward(self, x):
        x = F.relu(self.fc1(x))
        x = F.relu(self.fc2(x))
        x = F.relu(self.fc3(x))
        return F.softmax(x, dim=1)
    
net = Net()

import torch.optim as optim

optimizer = optim.Adam(net.parameters(), lr=0.001)
loss_function = nn.MSELoss()

BATCH_SIZE = 100
EPOCHS = 1_000

losses = []
for epoch in tqdm(range(EPOCHS)):
    X, y = generate_samples(BATCH_SIZE)
    y = np.array(y, dtype=int)

ohe = np.zeros((y.size, y.max()+1))
    ohe[np.arange(y.size), y] = 1
    
    X = torch.Tensor(X).view(-1, 3*N_CIRCLES)
    y = torch.Tensor(ohe)

net.zero_grad()
    yhat = net(X)
    loss = loss_function(yhat, y)
    loss.backward()
    optimizer.step()
    
    losses.append(float(loss.detach().numpy()))    
    
import matplotlib.pyplot as plt
%matplotlib inline
import seaborn as sns

fig, ax = plt.subplots(figsize=(20, 10))
ax.plot(losses)
plt.show()
```

Can my neural network learn conditional rules when classifying?

3 Answers

Add your own answers!

Ask a Question