Need help to get on track with assesment

Hi all,

I need help getting started with an assesment. First of all I do not ask you to do the assignment for me but I kindly ask you to help me getting started. I'm trying to become a Data Scientist but my employer is not sure about my skills. Therefore I have to make an assesment. I am allready familiar with the basics of R-studio, I leaned it (and still learning it) just by doing it.

My questions are:

  • How to get started?
  • What are the next steps?
  • which models for predictive analytics do I use

The assigment is as follows:

_You are asked to analyze the dataset included with this assignment using either Python 3.x or R. If you are using IPython Notebook, please convert your code to a nicely formatted separate .py file. _

The dataset contains data from 157 patients diagnosed with three different types of breast cancer, including their DNA measurements. Specifically, we are interested in the type of breast cancer (‘class’ variable), and whether it can be predicted from other variables in the dataset (i.e. the DNA measurements).

Your analysis should include the following parts:
1 Create a breast cancer type classification model
2 Explain which rules and patterns in de data your model captures
3 Estimate the performance of your model on unseen data
4 List the top 5 most important predictors
5 Discuss how confident you are about these results (and why, or why not)

I thank you all in advance for your input or tips and tricks.

See the draft text book which covers all this.

I suggest you start with identifying the outcome variable as continuous vs categorical
See Chapter 56 on creating training and data sets and Chapter 64 for illustrations of evaluation of test results against unseen data.

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.