I have a matrix of gene expression, where each column is a patient and each row is labeled as follows:
Column 1: cell line - A, B, C
Column 2: gene - abc, def, ghi (repeats for each cell line)
Column 3: expression level for patient 1
Column 4: expression level for patient 2
and so on for other patients
I want to use apply function to create a vector of most active genes across patients for each cell line. The criterion for active gene is more than 70% of patients show expression level > 0.
The output should be a vector of following format corresponding to cell lines A, B and C:
( [abc, def], [ abc], [ ] )
That means, for example, that for cell line B gene abc had expression level > 0 in more than 70% of patients.
Any suggestions are welcome.
Many thanks!