[PATCH 00/10] repack: support repacking into a geometric sequence
To
git@vger.kernel.org
Cc
peff@peff.net
dstolee@microsoft.com
From
Taylor Blau
Date
2021-01-19 23:23:55 UTC
This series introduces a new mode of 'git repack' where (instead of packing just
loose objects or packing everything together into one pack), the set of packs
left forms a geometric progression by object count.

It does not depend on either series of the revindex patches I sent recently.

Roughly speaking, for a given factor, say "d", each pack has at least "d" times
the number of objects as the next largest pack. So, if there are "N" packs,
"P1", "P2", ..., "PN" ordered by object count (where "PN" has the most objects,
and "P1" the fewest), then:

  objects(Pi) > d * objects(P(i-1))

for all 1 < i <= N.

This is done by first ordering packs by object count, and then determining the
longest sequence of large packs which already form a geometric progression. All
packs on the small side of that cut must be repacked together, and so we check
that the existing progression can be maintained with the new pack, and adjust as
necessary.

In actuality, this is approximated in order for 'git repack' to have to create
at most one new pack. The details of this approximation are discussed at length
in the final patch.

'git repack' implements this new option by marking the packs that don't need to
be touched as "frozen" and it does this by marking them as pack_keep_in_core,
and then using a new option pack-objects option '--assume-kept-packs-closed' to
stop the reachability traversal once it encounters any objects in the kept
packs.

When repacking in this mode, the caller implicitly trusts that the unchanged
packs are closed under reachability, and thus they can halt the traversal as
soon as an object in any one of those packs is found.

The first three patches introduce the new revision and pack-objects options
necessary for this to work. The next four patches introduce an MRU cache for
kept packs only. Then a new pack-objects mode is introduced to allow callers to
specify the list of kept packs over stdin in case they are too long to be listed
as arguments. Finally, geometric repacking is introduced

Thanks in advance for your review.

Jeff King (4):
  p5303: add missing &&-chains
  p5303: measure time to repack with keep
  pack-objects: rewrite honor-pack-keep logic
  packfile: add kept-pack cache for find_kept_pack_entry()

Taylor Blau (6):
  packfile: introduce 'find_kept_pack_entry()'
  revision: learn '--no-kept-objects'
  builtin/pack-objects.c: learn '--assume-kept-packs-closed'
  builtin/pack-objects.c: teach '--keep-pack-stdin'
  builtin/repack.c: extract loose object handling
  builtin/repack.c: add '--geometric' option

 Documentation/git-pack-objects.txt |  19 +++
 Documentation/git-repack.txt       |  11 ++
 Documentation/rev-list-options.txt |   7 +
 builtin/pack-objects.c             | 161 ++++++++++++++--------
 builtin/repack.c                   | 206 ++++++++++++++++++++++++++---
 list-objects.c                     |   7 +
 object-store.h                     |  10 ++
 packfile.c                         |  69 ++++++++++
 packfile.h                         |   2 +
 revision.c                         |  15 +++
 revision.h                         |   4 +
 t/perf/p5303-many-packs.sh         |  18 ++-
 t/t6114-keep-packs.sh              | 128 ++++++++++++++++++
 t/t7703-repack-geometric.sh        |  81 ++++++++++++
 14 files changed, 663 insertions(+), 75 deletions(-)
 create mode 100755 t/t6114-keep-packs.sh
 create mode 100755 t/t7703-repack-geometric.sh

-- 
2.30.0.138.g6d7191ea01