This is for my assignment, where I have been given a dataset (Survey Methodology for Enterprise Surveys - World Bank Group) and assigned to predict the businesses with growth potential. However, there are some extreme numerical values. The "employee" variable refer to the number of full-time employees.
Below is the box plot:
The question given by the lecturer is: "Check the dataset for outliers and replace or delete those values as appropriate. You must provide justifications for your cleaning strategies and discuss the potential issues associated with your chosen strategies."
1st quartile: 9
3rd quartile: 72
Please assist me in whether I should delete the outlier or not and how to explain the reason.
Thank you! Your help is much appreciated!