[PATCH 00/20] pack-revindex: prepare for on-disk reverse index
Taylor Blau
2021-01-08 18:16:39 UTC

This is the first of two series to introduce an on-disk alternative to the
reverse index which is currently generated once per-process and stored in

As a reminder, the reverse index is used to translate an object's position
relative to other objects within the same packfile to that object's position
relative to the pack's index.

Generating the reverse index in memory for repositories with large packs has two
significant drawbacks:

  - It requires allocating sizeof(struct revindex_entry) per packed object.

  - It requires us to sort the entries by their pack offset. This is implemented
    in sort_revindex() using a radix sort, but still takes considerable time (as
    benchmarks found in the second series demonstrate).

Both of these can be addressed by storing the reverse index in a new '.rev' file
alongside the packs. This file is written once (during pack creation), and does
not require sorting when accessed, since it is stored in a sorted order.

This series lays the groundwork necessary to introduce the file format (which is
implemented in the second series). I'm splitting these two up since I think the
changes in this series can be queued independently from those in the latter

The goal of this series is to remove direct access of the `struct
revindex_entry` type, as well as `struct packed_git`'s `revindex` field. The
on-disk format will be mmap'd and accessed directly, but the format is
sufficiently different that the whole `revindex` array can't be written as-is.

Separately from our main goal, the new API that is introduced in this series is
IMHO easier to read, and more clearly describes what order the given and
returned values are relative to.

The changes are structured as follows:

  - First, a new API is proposed.

  - Then, uses of the old API are removed one by one and replaced with their
    new counterparts.

  - Finally, without any callers remaining, the old API is removed.

Thanks in advance for your review.

Taylor Blau (20):
  pack-revindex: introduce a new API
  write_reuse_object(): convert to new revindex API
  write_reused_pack_one(): convert to new revindex API
  write_reused_pack_verbatim(): convert to new revindex API
  check_object(): convert to new revindex API
  bitmap_position_packfile(): convert to new revindex API
  show_objects_for_type(): convert to new revindex API
  get_size_by_pos(): convert to new revindex API
  try_partial_reuse(): convert to new revindex API
  rebuild_existing_bitmaps(): convert to new revindex API
  get_delta_base_oid(): convert to new revindex API
  retry_bad_packed_offset(): convert to new revindex API
  packed_object_info(): convert to new revindex API
  unpack_entry(): convert to new revindex API
  for_each_object_in_pack(): convert to new revindex API
  builtin/gc.c: guess the size of the revindex
  pack-revindex: remove unused 'find_pack_revindex()'
  pack-revindex: remove unused 'find_revindex_position()'
  pack-revindex: hide the definition of 'revindex_entry'
  pack-revindex.c: avoid direct revindex access in

 builtin/gc.c           |  2 +-
 builtin/pack-objects.c | 33 +++++++++++++++++-----------
 pack-bitmap.c          | 44 ++++++++++++++++++-------------------
 pack-revindex.c        | 49 +++++++++++++++++++++++++++--------------
 pack-revindex.h        | 10 +++------
 packfile.c             | 50 ++++++++++++++++++++++++++----------------
 6 files changed, 109 insertions(+), 79 deletions(-)