here is an example of someone profiling code to achieve faster runtime, they had similar challenge of combinatorial explosion, but given the speed improvements we made to inner loop of the calculation the problem became tractable.
I have a list of many sets, and I want to calculate the overlap of the elements for each pairwise comparison. In my actual application, the sets contain names of genes grouped by some shared characteristic (e.g. from an annotation database such as gene ontology or reactome ). But to distill the problem, the input data is a named list of character vectors.
I have written a brute force approach to calculate each pairwise comparison. It works fine when the number of sets is small (<1000 sets). The problem is that these biological ontologies can have tens of thousands of sets, and the combinatorial explosion kills my brute force approach. For example, 10,000 sets requires {{10,000}\choose{2}}=49…
1 Like