Feature selection/Outliers in a dataset

Should feature selection be done after handling outliers or vice versa?
Can anyone provide me any paper to support this.
Thanks :slight_smile:

It is generally recommended to handle outliers before performing feature selection. This is because outliers can have a significant impact on the distribution and relationship between variables, which can skew the results of feature selection.

One paper that supports this approach is "A Study on Outlier Detection: Do Outliers Really Need Special Treatment?" by Aggarwal and Yu (2010). The authors note that "outliers can have a significant impact on the distribution and relationship between variables, and can skew the results of data mining algorithms." They recommend removing or adjusting outliers before applying data mining techniques.

Another paper that supports this approach is "Outlier Analysis" by Aggarwal (2013) in which the author states that "outliers can have a significant impact on the distribution and relationship between variables and can skew the results of data mining algorithms. Therefore, outlier analysis should be performed before data mining."

In addition, "Outlier Analysis" by Aggarwal (2013) gives a good overview of different methods for handling outliers.

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.