I need help getting started with an assesment. First of all I do not ask you to do the assignment for me but I kindly ask you to help me getting started. I'm trying to become a Data Scientist but my employer is not sure about my skills. Therefore I have to make an assesment. I am allready familiar with the basics of R-studio, I leaned it (and still learning it) just by doing it.
My questions are:
- How to get started?
- What are the next steps?
- which models for predictive analytics do I use
The assigment is as follows:
_You are asked to analyze the dataset included with this assignment using either Python 3.x or R. If you are using IPython Notebook, please convert your code to a nicely formatted separate .py file. _
The dataset contains data from 157 patients diagnosed with three different types of breast cancer, including their DNA measurements. Specifically, we are interested in the type of breast cancer (‘class’ variable), and whether it can be predicted from other variables in the dataset (i.e. the DNA measurements).
Your analysis should include the following parts:
1 Create a breast cancer type classification model
2 Explain which rules and patterns in de data your model captures
3 Estimate the performance of your model on unseen data
4 List the top 5 most important predictors
5 Discuss how confident you are about these results (and why, or why not)
I thank you all in advance for your input or tips and tricks.