Question
Assume that we have 10000 genes, which is quantitied in normal ppl and turmor ppl. We have a algorithm to convert it to a bool table, than we want to pick up the combinations of gene have satisfied sensitivity and specificity
Sensitivity (also called the true positive rate, the recall, or probability of detection[1] in some fields) measures the proportion of positives that are correctly identified as such (e.g., the percentage of sick people who are correctly identified as having the condition).
Specificity (also called the true negative rate) measures the proportion of negatives that are correctly identified as such (e.g., the percentage of healthy people who are correctly identified as not having the condition).
First idea
Seem that intutive idea is like this
- read bool table
- generate combination?
- check the Sen. and Spe.
use python’s itertools.combination
to generate a iterator of combination of potential result and check it. However, even while handling 100 genes, the result may be tremendous. for example
The Language like Python or perl, handle these kind of loop was terriblly slow, one of my mate takes nearly 35 hours to finish a 5C100 recombination in Perl.
Is every combination needed?
After pick some genes, we actually have a bool matrix like this:
than we can calculate the spe, filter the set whose spe or sen < 0.8:
Here we feel diffcult to reduct the execute time of this part of codes. However, we find out that if one gene’s spe or sen < 0.8, than the combination won’t satisfied the requirements. So some improvements could be made.
Little improvements
so assume we have a list initial
, than we can filter the unsatisfied single gene.
Than we run first combination, 2 C len(candidate), far less than the inital count.
Go deeper
when we get the result of 2 elements’s combination, we next want to generate the 3 element’s combination.
|
|