Hi Elijah,
On 11/11/2023 05:46, Elijah Newren wrote:
> * filename similarity is extraordinarily expensive compared to exact
> renames, and if not carefully handled, can sometimes rival the cost of
> file content similarity computations given our spanhash
> representations.
I've not heard of spanhash representation before. Any references or
further reading?
> Exact renames are tasked with finding renames even
> if they are known to not be relevant, simply because exact renames can
> do so very quickly. If we change that, we throw a monkey wrench in
> our performance handling elsewhere and have to rethink a number of
> other things.
--
Philip