piece of logic that makes sure I have both "unfit" and "fit" statuses and that it's no more than 60% unfit for a csv extension

The excel sheet has this status in column AB with the heading "Estimated_Commercial_Viability__c"

perhaps think of another way to describe what you want to know in R.
I find this incomprehensible :frowning:

So I have an excel sheet with bunch of column and one of them has values "Fit", "Unfit" options with column heading as "Estimated_Commercial_Viability__c". I want to make a piece of logic which would make sure that "Unfit" values are not more than 60% for all total values. Does that make more sense?

how could it 'make sure' ? it would throw away records based on some rule ?

Yes bc if it has more than 60%, the next program tool wouldn't pick up the csv after this check happens.

df <- iris
df$isfit <- ifelse(df$Species=='setosa',"Fit","Unfit")

table(df$isfit)

# Fit Unfit 
# 50   100 
# so only 33% are really fit

sampled_df_index <- sample(x=1:nrow(df),
                     size=200,
                     replace=TRUE,
                     prob = ifelse(df$isfit=="Fit",6,1))


sampled_df <- df[sampled_df_index,]
table(sampled_df$isfit)

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.