[PATCH] object-name: reject too-deep recursive ancestor queries
Jeff King
Patrick Steinhardt
Junio C Hamano
Carlos Andrés Ramírez Cataño
Taylor Blau
2023-11-20 16:13:45 UTC
When trying to resolve a revision query like "HEAD~~~~~", our call
pattern looks something like:

  - object-name.c::get_oid_with_context()
  - object-name.c::get_oid_1()
  - object-name.c::get_nth_ancestor()
  - object-name.c::get_oid_1()
  - ...

With `get_nth_ancestor()` and `get_oid_1()` mutually recurring, popping
one '~' off of the revision query for each round of the recursion.

Since this recursive behavior is unbounded, having too many "~"'s
contained in a revision query will cause us to blow the stack.
Generating a message like this when compiled under SANITIZE=address:

    $ valgrind git rev-parse "HEAD$(perl -e "print \"~\" x 1000000000000")"
    ==597453== Memcheck, a memory error detector
    ==597453== Copyright (C) 2002-2022, and GNU GPL'd, by Julian Seward et al.
    ==597453== Using Valgrind-3.19.0 and LibVEX; rerun with -h for copyright info
    ==597453== Command: /home/ttaylorr/local/bin/git.compile diff HEAD~~~~~~~~~~~~[...]
    ==597453==ERROR: AddressSanitizer: stack-overflow on address 0x7fffdd838ff8 (pc 0x7f2726082748 bp 0x7fffdd839110 sp 0x7fffdd839000 T0)
        #0 0x7f2726082748 in __asan::GetTLSFakeStack() ../../../../src/libsanitizer/asan/asan_fake_stack.cpp:176
        #1 0x7f2726082748 in GetFakeStackFast ../../../../src/libsanitizer/asan/asan_fake_stack.cpp:193
        #2 0x7f27260833de in OnMalloc ../../../../src/libsanitizer/asan/asan_fake_stack.cpp:207
        #3 0x7f27260833de in __asan_stack_malloc_1 ../../../../src/libsanitizer/asan/asan_fake_stack.cpp:256
        #4 0x563f9077d9d8 in get_nth_ancestor /home/ttaylorr/src/git/object-name.c:1087
        #5 0x563f9077e957 in get_oid_1 /home/ttaylorr/src/git/object-name.c:1295
        #6 0x563f9077da64 in get_nth_ancestor /home/ttaylorr/src/git/object-name.c:1092
        #7 0x563f9077e957 in get_oid_1 /home/ttaylorr/src/git/object-name.c:1295
        #8 0x563f9077da64 in get_nth_ancestor /home/ttaylorr/src/git/object-name.c:1092
        #247 0x563f9077e957 in get_oid_1 /home/ttaylorr/src/git/object-name.c:1295
        #248 0x563f9077da64 in get_nth_ancestor /home/ttaylorr/src/git/object-name.c:1092

    SUMMARY: AddressSanitizer: stack-overflow ../../../../src/libsanitizer/asan/asan_fake_stack.cpp:176 in __asan::GetTLSFakeStack()

(Note that the actual stack is much deeper. GDB reports that the bottom
of the stack looks something like the following):

    #54866 0x0000555555c6d3bf in get_oid_with_context_1 (repo=0x5555563849a0 <the_repo>, name=0x7fffffff4be5 "HEAD", '~' <repeats 196 times>..., flags=128, prefix=0x0, oid=0x7ffff5713d40, oc=0x7ffff5713d90) at object-name.c:1947
    #54867 0x0000555555c6e2fa in get_oid_with_context (repo=0x5555563849a0 <the_repo>, str=0x7fffffff4be5 "HEAD", '~' <repeats 196 times>..., flags=128, oid=0x7ffff5713d40, oc=0x7ffff5713d90) at object-name.c:2096
    #54868 0x0000555555d8eed8 in handle_revision_arg_1 (arg_=0x7fffffff4be5 "HEAD", '~' <repeats 196 times>..., revs=0x7ffff5b000d0, flags=0, revarg_opt=0) at revision.c:2174
    #54869 0x0000555555d8f1a9 in handle_revision_arg (arg=0x7fffffff4be5 "HEAD", '~' <repeats 196 times>..., revs=0x7ffff5b000d0, flags=0, revarg_opt=0) at revision.c:2189
    #54870 0x0000555555d97ca9 in setup_revisions (argc=2, argv=0x7fffffff4970, revs=0x7ffff5b000d0, opt=0x0) at revision.c:2932
    #54871 0x00005555557d6a63 in cmd_diff (argc=2, argv=0x7fffffff4970, prefix=0x0) at builtin/diff.c:502
    #54872 0x00005555557367bf in run_builtin (p=0x5555561c4c30 <commands+816>, argc=2, argv=0x7fffffff4970) at git.c:469
    #54873 0x000055555573716b in handle_builtin (argc=2, argv=0x7fffffff4970) at git.c:723
    #54874 0x000055555573785a in run_argv (argcp=0x7ffff56028b0, argv=0x7ffff56028e0) at git.c:787
    #54875 0x0000555555738626 in cmd_main (argc=2, argv=0x7fffffff4970) at git.c:922
    #54876 0x00005555559d3fdd in main (argc=3, argv=0x7fffffff4968) at common-main.c:62

Fortunately, we can impose a limit on the maximum recursion depth we're
willing to accept when resolving queries like the above without
significantly impeding users. This patch sets the limit at 4096, though
we could probably increase that limit depending on the size of each

The limit introduced here is large enough that any reasonable query
should still run to completion, but small enough that if the frame size
were to significantly increase, our protection would still be effective.

The change here is straightforward: each call to get_nth_ancestor()
increases a counter, and then decrements that counter before returning.

The diff is a little noisy since there are a handful of return paths
from `get_nth_ancestor()`, all of which need to decrement the depth

Since this is a local-only exploit, a user would have to be tricked into
running such a query by an adversary. Even if they were successfully
tricked into running the malicious query, the blast radius is limited to
a local stack overflow, which does not have meaningful paths to remote
code execution, arbitrary memory reads, or any more grave security

Reported-by: Carlos Andrés Ramírez Cataño <antaigroupltda@gmail.com>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
 object-name.c                  | 26 ++++++++++++++++++++------
 t/t1506-rev-parse-diagnosis.sh |  5 +++++
 2 files changed, 25 insertions(+), 6 deletions(-)

diff --git a/object-name.c b/object-name.c
index 0bfa29dbbf..675e0a759e 100644
--- a/object-name.c
+++ b/object-name.c
@@ -1080,6 +1080,9 @@ static enum get_oid_result get_parent(struct repository *r,
+static int get_nth_ancestor_max_depth = 4096;
+static int get_nth_ancestor_curr_depth;
 static enum get_oid_result get_nth_ancestor(struct repository *r,
 					    const char *name, int len,
 					    struct object_id *result,
@@ -1089,20 +1092,31 @@ static enum get_oid_result get_nth_ancestor(struct repository *r,
 	struct commit *commit;
 	int ret;
+	if (++get_nth_ancestor_curr_depth > get_nth_ancestor_max_depth)
+		 return error(_("exceeded maximum ancestor depth"));
 	ret = get_oid_1(r, name, len, &oid, GET_OID_COMMITTISH);
 	if (ret)
-		return ret;
+		goto done;
 	commit = lookup_commit_reference(r, &oid);
-	if (!commit)
+	if (!commit) {
+		goto done;
+	}
 	while (generation--) {
-		if (repo_parse_commit(r, commit) || !commit->parents)
-			return MISSING_OBJECT;
+		if (repo_parse_commit(r, commit) || !commit->parents) {
+			goto done;
+		}
 		commit = commit->parents->item;
 	oidcpy(result, &commit->object.oid);
-	return FOUND;
+	ret = FOUND;
+	get_nth_ancestor_curr_depth--;
+	return ret;
 struct object *repo_peel_to_type(struct repository *r, const char *name, int namelen,
diff --git a/t/t1506-rev-parse-diagnosis.sh b/t/t1506-rev-parse-diagnosis.sh
index ef40511d89..b3b9f6c8c5 100755
--- a/t/t1506-rev-parse-diagnosis.sh
+++ b/t/t1506-rev-parse-diagnosis.sh
@@ -244,6 +244,11 @@ test_expect_success 'reject Nth ancestor if N is too high' '
 	test_must_fail git rev-parse HEAD~100000000000000000000000000000000
+test_expect_success 'reject too-deep recursive ancestor queries' '
+	test_must_fail git rev-parse "HEAD$(perl -e "print \"~\" x 4097")" 2>err &&
+	grep "error: exceeded maximum ancestor depth" err
 test_expect_success 'pathspecs with wildcards are not ambiguous' '
 	echo "*.c" >expect &&
 	git rev-parse "*.c" >actual &&

base-commit: cfb8a6e9a93adbe81efca66e6110c9b4d2e57169