David Kastrup writes: > Another option avoiding the realloc would be not to use linked lists > at all but just collect all entries sequentually in "packed" form, > and sort them into order afterwards. That's an O(n lg n) operation > with a large n. Even if one has a sorting algorithm with good > memory locality, I doubt that the locality would compensate for the > lg n factor, even when taking into account that we save ourselves > the copying. And that is not even taken the possibility of having > to cull some buckets into account, another O(n) operation (which > amounts to copying everything once, too). As an illustration, I have implemented a proof-of-concept. This is a patch (not formatted for submission) against the current master. It takes about a 20% performance hit on the whole git-pack-objects run, so the diffing code itself is likely affected worse. For the proof-of-concept I did not do the hash bucket collision culling, nor did I realloc to the possibly reduced size. So a fairer comparison would likely involve removing the culling code from master (or implementing the culling in the proof-of-concept, but I am too lazy for that). I hope this addresses the not-in-place complaint. It is conceivable that under tight memory conditions, this approach might actually work out better. However, the glibc qsort implementation (at least it did so at one time) reverts to a mergesort for large numbers, and allocates a temporary buffer of the same size as the original data. If that is still the case, the memory impact would actually be worse here. One would need to implement a custom _real_ in place sorter to get around this. All in all, I don't think this approach is worth the trouble: the sorting overhead factor of lg n is just too much.