[PATCH v4 0/8] repack: support repacking into a geometric sequence
To
git@vger.kernel.org
Cc
peff@peff.net
dstolee@microsoft.com
gitster@pobox.com
From
Taylor Blau
See Also
Prev
Date
2021-02-23 02:24:59 UTC
Here's a very lightly modified version on v3 of mine and Peff's series
to add a new 'git repack --geometric' mode. Almost nothing has changed
since last time, with the exception of:

  - Packs listed over standard input to 'git pack-objects --stdin-packs'
    are sorted in descending mtime order (and objects are strung
    together in pack order as before) so that objects are laid out
    roughly newest-to-oldest in the resulting pack.

  - Swapped the order of two paragraphs in patch 5 to make the perf
    results clearer.

  - Mention '--unpacked' specifically in the documentation for 'git
    repack --geometric'.

  - Typo fixes.

Range-diff is below. It would be good to start merging this down since
we have a release candidate coming up soon, and I'd rather focus future
reviewer efforts on the multi-pack reverse index and bitmaps series
instead of this one.

Jeff King (4):
  p5303: add missing &&-chains
  p5303: measure time to repack with keep
  builtin/pack-objects.c: rewrite honor-pack-keep logic
  packfile: add kept-pack cache for find_kept_pack_entry()

Taylor Blau (4):
  packfile: introduce 'find_kept_pack_entry()'
  revision: learn '--no-kept-objects'
  builtin/pack-objects.c: add '--stdin-packs' option
  builtin/repack.c: add '--geometric' option

 Documentation/git-pack-objects.txt |  10 +
 Documentation/git-repack.txt       |  23 ++
 builtin/pack-objects.c             | 333 ++++++++++++++++++++++++-----
 builtin/repack.c                   | 187 +++++++++++++++-
 object-store.h                     |   5 +
 packfile.c                         |  67 ++++++
 packfile.h                         |   5 +
 revision.c                         |  15 ++
 revision.h                         |   4 +
 t/perf/p5303-many-packs.sh         |  36 +++-
 t/t5300-pack-object.sh             |  97 +++++++++
 t/t6114-keep-packs.sh              |  69 ++++++
 t/t7703-repack-geometric.sh        | 137 ++++++++++++
 13 files changed, 926 insertions(+), 62 deletions(-)
 create mode 100755 t/t6114-keep-packs.sh
 create mode 100755 t/t7703-repack-geometric.sh

Range-diff against v3:
1:  aa94edf39b = 1:  bb674e5119 packfile: introduce 'find_kept_pack_entry()'
2:  82f6b45463 = 2:  c85a915597 revision: learn '--no-kept-objects'
3:  033e4e3f67 ! 3:  649cf9020b builtin/pack-objects.c: add '--stdin-packs' option
    @@ builtin/pack-objects.c: static int git_pack_config(const char *k, const char *v,
     +	struct packed_git *a = ((const struct string_list_item*)_a)->util;
     +	struct packed_git *b = ((const struct string_list_item*)_b)->util;
     +
    ++	/*
    ++	 * order packs by descending mtime so that objects are laid out
    ++	 * roughly as newest-to-oldest
    ++	 */
     +	if (a->mtime < b->mtime)
    -+		return -1;
    -+	else if (b->mtime < a->mtime)
     +		return 1;
    ++	else if (b->mtime < a->mtime)
    ++		return -1;
     +	else
     +		return 0;
     +}
4:  f9a5faf773 = 4:  6de9f0c52b p5303: add missing &&-chains
5:  181c104a03 ! 5:  94e4f3ee3a p5303: measure time to repack with keep
    @@ Metadata
      ## Commit message ##
         p5303: measure time to repack with keep
     
    -    Add two new tests to measure repack performance. Both test split the
    +    Add two new tests to measure repack performance. Both tests split the
         repository into synthetic "pushes", and then leave the remaining objects
         in a big base pack.
     
    @@ Commit message
           5303.17: repack (1000)                      216.87(490.79+14.57)
           5303.18: repack with kept (1000)            665.63(938.87+15.76)
     
    -    Likewise, the scaling is pretty extreme on --stdin-packs:
    -
    -      5303.7: repack with --stdin-packs (1)       0.01(0.01+0.00)
    -      5303.13: repack with --stdin-packs (50)     3.53(12.07+0.24)
    -      5303.19: repack with --stdin-packs (1000)   195.83(371.82+8.10)
    -
         That's because the code paths around handling .keep files are known to
         scale badly; they look in every single pack file to find each object.
         Our solution to that was to notice that most repos don't have keep
    @@ Commit message
         single .keep, that part of pack-objects slows down again (even if we
         have fewer objects total to look at).
     
    +    Likewise, the scaling is pretty extreme on --stdin-packs (but each
    +    subsequent test is also being asked to do more work):
    +
    +      5303.7: repack with --stdin-packs (1)       0.01(0.01+0.00)
    +      5303.13: repack with --stdin-packs (50)     3.53(12.07+0.24)
    +      5303.19: repack with --stdin-packs (1000)   195.83(371.82+8.10)
    +
         Signed-off-by: Jeff King <peff@peff.net>
         Signed-off-by: Taylor Blau <me@ttaylorr.com>
     
6:  67af143fd1 = 6:  a116587fb2 builtin/pack-objects.c: rewrite honor-pack-keep logic
7:  e9e04b95e7 = 7:  db9f07ec1a packfile: add kept-pack cache for find_kept_pack_entry()
8:  bd492ec142 ! 8:  51f57d5da2 builtin/repack.c: add '--geometric' option
    @@ Documentation/git-repack.txt: depth is 4095.
     +packs determined to need to be combined in order to restore a geometric
     +progression.
     ++
    -+Loose objects are implicitly included in this "roll-up", without respect
    -+to their reachability. This is subject to change in the future. This
    -+option (implying a drastically different repack mode) is not guarenteed
    -+to work with all other combinations of option to `git repack`).
    ++When `--unpacked` is specified, loose objects are implicitly included in
    ++this "roll-up", without respect to their reachability. This is subject
    ++to change in the future. This option (implying a drastically different
    ++repack mode) is not guaranteed to work with all other combinations of
    ++option to `git repack`).
     +
      Configuration
      -------------
-- 
2.30.0.667.g81c0cbc6fd