TransWikia.com

Error with pandas dataframe (needs to be 1-dimensional)

Data Science Asked by Biohacker on September 1, 2021

I am trying to determine the conformal predictions for my model with my data. But it gives me following error that occurs at icp.calibrate(X_cal, y_cal) :

Exception: Data must be 1-dimensional

Below you can find the most recent traceback error about this. Unfortunately I am not sure on what this actually infers based on the code from above. I am using a pandas dataframe for this.

#Code Snippet

from sklearn.tree import DecisionTreeRegressor
from nonconformist.cp import IcpRegressor
from nonconformist.base import RegressorAdapter
from nonconformist.nc import RegressorNc, AbsErrorErrFunc, RegressorNormalizer, NcFactory
from sklearn.model_selection import train_test_split
import numpy as np
import pandas as pd

# -----------------------------------------------------------------------------
# Setup training, calibration and test data
# -----------------------------------------------------------------------------
df = pd.read_csv ("prepared_data.csv")


# Initial split into train/test data
train = df.loc[df['split']== 'train']
valid = df.loc[df['split']== 'valid']

# Proper Validation Set (Split the Validation set into features and target)
X_valid = valid.drop(['expression'], axis = 1)
y_valid = valid.drop(columns = ['new_host', 'split', 'sequence'])

# Create Training Set (Split the Training set into features and target)
X_train = valid.drop(['expression'], axis = 1)
y_train = valid.drop(columns = ['new_host', 'split', 'sequence'])

# Split Training set into further training set and calibration set
X_train, X_cal, y_train, y_cal = train_test_split(X_train, y_train, test_size =0.2)

# -----------------------------------------------------------------------------
# Train and calibrate underlying model
# -----------------------------------------------------------------------------
underlying_model = RegressorAdapter(DecisionTreeRegressor(min_samples_leaf=5))
print("Underlying model loaded")
model = RegressorAdapter(underlying_model)
nc = RegressorNc(model, AbsErrorErrFunc())

print("Nonconformity Function Applied")
icp = IcpRegressor(nc)  # Create an inductive conformal Regressor
print("ICP Regressor Created")

#Dataset Review
print('{} instances, {} features, {} classes'.format(y_train.size,
                                                   X_train.shape[1],
                                                   np.unique(y_train).size))

icp.fit(X_train, y_train)
icp.calibrate(X_cal, y_cal)

#Example Dataframe

new_host  split     sequence    expression
FALSE     train     AQVPYGVS    0.039267878
FALSE     train     ASVPYGVSI   0.039267878
FALSE     train     STNLYGSGR   0.261456561
FALSE     valid     NLYGSGLVR   0.265188519
FALSE     valid     SLGPSNLYG   0.419680588
FALSE     valid     ATSLGTTNG   0.145710993

I’ve tried splitting the dataset in various ways but I am continuing to have trouble with this. In this case I want to split the data into train and test sets according to an observation’s Data Split value. After which, I will split the train set into train and calibration in a second step. Where myfeatures, X_train and my target, y_train

One Answer

It seems to me that this question is better off on stackoverflow.

Nevertheless, X_cal gets generated from X_train and X_train from valid. But this is an atleast 2-dimensional dataframe with new_host and sequence. Like the error says you should only input data that is 1-dimensional.

Answered by N. Kiefer on September 1, 2021

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP