TransWikia.com

How to increase number of outliers in a dataset?

Data Science Asked on January 20, 2021

I have a dataset with 1000 rows and 4 columns with 3 outliers .I want to add another 7 outliers related to them for detection by clustering.

     Example TO What I did 
         Col1 col2 Col3 col4
     Out1 a1    b1   c1   d1 
     Out2 a2    b2   c2   d2 
     Out3 a3    b3   c3   d3 

I get mean and std for 7 columns of normal data ten calculate

Out4  normal1+mean+stdcol1   norm1mean+stdcol2
 Out5  normal2+mean+stdcol1  norm2mean+stdcol2
 Out6 ...........

I don’t know if what i did is right or a good solution?

I don’t want outliers to be so easy for detection

Thanks

One Answer

I'm assuming you want to create a point that, each column by itself appears normal, but when looking at all the columns appears as if it's an outlier (thus you'd need some sort of outlier detection). Thus the method of generating an outlier would require looking at all the dimensions in relation to each other. And since we didn't assume normality here, generating is not straightforward.

I would recommend first using some kind of outlier detection method from here on the original dataset, (Somethind like an Isolation Forest would work)

Outlier detection methods

Then you can generate random numbers, (or use the numbers you generated) to test if they are outliers or not. This should be easy to do by hand since you only want 7 points and each point only has 4 dimensions. Also an additional tip would be to test the numbers using one of the methods that returns a score instead of a 0,1 prediction so that you can make sure it's not too obvious of an outlier (since you didn't want that).

Lastly, if you generated points, some sort of sanity check would be to append those points to the dataset, apply PCA to reduce the dimensions down to 2, plot the PCA result with a separate colour for the appended outlier points. And you can check by eye if the outliers are far apart but not too far apart from your dataset.

Hope this helps and gives you some ideas.

Answered by A Kareem on January 20, 2021

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP