Data Science Asked by Thiedent on April 8, 2021
I am using Keras with Theanos backend in Python.
I have 2117 samples and each sample has an individual target (on purpose) ie. 2117 outputs.
As opposed to categories, the targets are ratings eg. (16.4714494876, 17.4129353234, 17.4476570289) the entirety of the number is important.
I am having problems/don’t know where to start.
1) When i run the NN it only outputs the targets as whole integers as opposed to the format of the actual values. eg. 16 instead of 16.xxxxxx
2) Presumably i will only be able to gauge the accuracy of predictions based on how close the output is to the target since there are so many targets, does this type of classification problem have a name that i can research?
3) In 3 research papers i have read that apply NN to my specific classification problem they list the output layer as only having 1 neuron but provide no further explanation, how could this be?
Here is my model.
# fix random seed for reproducibility
seed = 7
np.random.seed(seed)
X = np.array(df[FEATURES].values)
Y = np.array(df["MTPS"].values)
# define baseline model
def baseline_model():
# create model
model = Sequential()
model.add(Dense(10, input_dim=(len(FEATURES)), init='normal', activation='relu'))
model.add(Dense(2117, init='normal', activation='softmax'))
# Compile model
model.compile(loss='sparse_categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
return model
#build model
estimator = KerasClassifier(build_fn=baseline_model, nb_epoch=100, batch_size=5, verbose=2)
#cross validation
X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size=0.2, random_state=seed)
estimator.fit(X_train, Y_train)
#print class predictions
print estimator.predict(X_test)
print Y_test
Thanks for any help.
I'm confused. As you have said, the targets are ratings. It's definitely a regression problem to me, instead of a classification problem.
There's several problems in your code:
linear
as activation function of the last layer (sometimes relu
, even sigmoid
).mse
as metric(sometimes mae
, msle
, etc). categorical_crossentropy
is used for classification problem, and sparse_categorical_crossentropy
is used for sparse input classification problem. RefKerasClassifier
is used for classification problem, use KerasRegressor
instead.metrics=['accuracy']
is used for classification problem, and it's meaningless in regression problem. Refdf
is a Pandas DataFrame, then df.values
is naturally a ndarrary
, there's no need to cast np.array
.Now answering your question:
KerasClassifier
and .predict
(in scikit-learn APIs, .predict
returns integer, basically the predicted class and .predict_proba
returns float, indicating the probability of each class). Try to use .predict_proba
, it would help. BTW, you should really use KerasRegressor
.Dense(1, activation='linear')
as the output layer.Here's my version of your code, it might work:
# fix random seed for reproducibility
seed = 7
np.random.seed(seed)
X = df[FEATURES].values # no need to cast np.array
Y = df["MTPS"].values
# define baseline model
def baseline_model():
# create model
model = Sequential()
model.add(Dense(10, input_dim=(len(FEATURES)), init='normal', activation='relu'))
model.add(Dense(1, init='normal', activation='linear')) # one neuron, linear activation
# Compile model
model.compile(loss='mse', optimizer='adam') # mse loss
return model
#build model
estimator = KerasRegressor(build_fn=baseline_model, nb_epoch=100, batch_size=5, verbose=2) # KerasRegressor for regression problem
#cross validation
X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size=0.2, random_state=seed)
estimator.fit(X_train, Y_train)
#print class predictions
print estimator.predict(X_test)
print Y_test
Correct answer by Icyblade on April 8, 2021
Get help from others!
Recent Questions
Recent Answers
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP