Re: [PATCH 1/4] rebase -i: demonstrate obscure loose object cache bug
Ævar Arnfjörð Bjarmason
Johannes Schindelin via GitGitGadget
Junio C Hamano
Johannes Schindelin
Jeff King
See Also
Prev Ref 1 Ref 2 Ref 3
2019-03-13 16:53:21 UTC
On Wed, Mar 13, 2019 at 12:35:16PM -0400, Jeff King wrote:

> On Wed, Mar 13, 2019 at 05:11:44PM +0100, Ævar Arnfjörð Bjarmason wrote:
> > > And this is where the loose object cache interferes with this feature:
> > > if *some* loose object was read whose hash shares the same first two
> > > digits with a commit that was not yet created when that loose object was
> > > created, then we fail to find that new commit by its short name in
> > > `get_oid()`, and the interactive rebase fails with an obscure error
> > > message like:
> > >
> > > 	error: invalid line 1: pick 6568fef
> > > 	error: please fix this using 'git rebase --edit-todo'.
> Are we 100% sure this part is necessary? From my understanding of the
> problem, even without any ambiguity get_oid() could fail due to just
> plain not finding the object in question.

Sorry, I was being dumb. We do need a two-digit collision, not because
we need an ambiguity, but because the loose-object cache is filled in
one directory at a time. So we must be unlucky enough to hit the same
directory twice, and using these objects ensures that unluckiness. And
the commit message does describe this.

If we didn't want to depend on a particular hash, we could simply do it
N times, where N is enough to get us any two entries which collide in
the first byte. By the birthday paradox, that's 50% at only 2^4. But 50%
is not very good odds for the test working. You'd need 257 iterations to
ensure a collision by the pigeon-hole principle. That's enough that I
kind of prefer the found collision here, even if we'll have to update it
for a new hash (it is correctly marked with SHA1, so I don't think
finding it later will be a problem).

By the way, while reading the test more carefully, I did notice two
funny things:

> +test_expect_failure SHA1 'loose object cache vs re-reading todo list' '
> +	GIT_REBASE_TODO=.git/rebase-merge/git-rebase-todo &&
> +	export GIT_REBASE_TODO &&
> +	write_script <<-\EOS &&
> +	# For values 5 and 6, this yields SHA-1s with the same first two digits
> +	echo "pick $(git rev-parse --short \
> +		$(printf "%s\\n" \
> +			"tree $EMPTY_TREE" \
> +			"author A U Thor <> $1 +0000" \
> +			"committer A U Thor <> $1 +0000" \
> +			"" \
> +			"$1" |
> +		  git hash-object -t commit -w --stdin))" >>$GIT_REBASE_TODO

Here we redirect the output into $GIT_REBASE_TODO, not stdout.

> +	shift
> +	test -z "$*" ||
> +	echo "exec $0 $*" >>$GIT_REBASE_TODO

And here we do the same thing. That second redirection is unnecessary.

I also find it interesting that it iterates over its arguments by
recursive processes. Wouldn't:

  for i in "$@"; do
	echo "pick ..." >>$GIT_REBASE_TODO

be a bit more efficient (as well as more obvious?).