Error Correction
Because all IMIs from the same parent molecule share a binning index, the number of unique binning indexes observed within a specific barcode and gene is determined by the number of molecules and is not impacted by the number of IMIs that were produced by the molecules. This means that the probabilistic relationship between the number of unique bins and the true number of molecules in a barcode and gene combination is constant and is the result of random sampling from the 64 possible bin indexes when each molecule is captured. For the subset of barcode and gene combinations with between 5 and 32 unique bin indexes, dividing the total number of IMIs by the average number of molecules expected based on the number of unique bin indexes gives you the estimated average IMIs per molecule (IPM).
The estimated molecular count for a barcode and gene is the total number of IMIs divided by the IPM, rounded down. The more true molecules a barcode and gene combination has, the true average IMIs per molecule should approach the average IPM of the sample. For barcode and gene combinations with very few molecules, the number of unique bins is expected to be a better predictor of the molecular count than the number of IMIs because the variance in the true IMIs per molecule among this group is high since the number of molecules in each individual barcode and gene combination is low. For this reason, IPM correction is applied for barcode and gene combinations with more than 10 unique bin indexes, and otherwise the corrected count is equal to the number of unique bin indexes.
Last updated
Was this helpful?