Censoring my data for a regression

Dear Rstudio Community members,

I have a probably very simple question, but I am still very new to Rstudio.

My data looks like this:
y x
7 0
23 1
0 1
222 0

and so on..

Now I would like to run a simple linear regression, but I would like to omit the highest and the lowest 1% of the variable y, i.e. censor the top and bottom 1%.

I tried to use the winsorize command from the RobustHD package. But somehow the linear regression [ lm(..) ] seems to ignore my command, probably I make a mistake.

Could someone please, through an example, show me how to do this?

Here is an example

library(tidyverse)

(example_input <- as_tibble(select(iris,
  y=Petal.Length,
  x=Petal.Width
)))

(step1 <- example_input %>% mutate(across(
 y,
  list(
    low = function(x) percent_rank(x) < 0.01,
    high = function(x) percent_rank(x) > 0.99
  )
)))

#for info only 
table(step1$y_low)
table(step1$y_high)

(step2 <- filter(step1, across(where( ~ is.logical(.x) &&
                                       starts_with("y")),
                              ~ .x == FALSE)))
#for info only 
table(step2$y_low)
table(step2$y_high)

(final <- select(
  step2,
  names(example_input)
))

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.