[PATCH 0/8] pack-revindex: introduce on-disk '.rev' format
To
git@vger.kernel.org
Cc
peff@peff.net
jrnieder@gmail.com
From
Taylor Blau
Date
2021-01-08 18:19:52 UTC
Hi,

This is the second of two series to implement support for an on-disk format for
storing the reverse index. (It depends on the patches in the previous series
[1]).

The format is described in the first patch, but it is roughly as follows:

  - It begins with a 12-byte header, containing a magic string, a version
    identifier, and a hash function identifier.

  - It then contains a 4 * N (where 'N' is the number of objects) table of index
    positions, sorted by each object's offset within the corresponding packfile.

  - Finally, a trailer contains a checksum of the corresponding packfile, and a
    checksum of the above contents.

Since this is a large change, a new 'pack.writeReverseIndex' option is
introduced, which defaults to 'false'. When false, `*.rev` files are not
written, and Git gracefully falls back to generate each reverse index in
memory. This could optionally be tied to the "feature.experimental" option, and
eventually the defalt changed to 'true' in a couple of releases.

To test these new changes, the test suite now understands
'GIT_TEST_WRITE_REV_INDEX' to mean that 'pack.writeReverseIndex' should be
'true' everywhere. Some minor test fall-out is addressed in the sixth patch
before enabling this new mode in the seventh patch.

One option that is _not_ persued in this series is to store the (pack) offset of
each object in the `.rev` file. This would at worst triple the size of the file
(by having to store an additional eight bytes per entry), and add complexity
(like storing an extended offset table as in the `*.idx` format). An extensive
discussion about why this option was not persued can be found in the first
patch.

Thanks in advance for your review.

[1]: https://lore.kernel.org/git/cover.1610129796.git.me@ttaylorr.com/

Taylor Blau (8):
  packfile: prepare for the existence of '*.rev' files
  pack-write.c: prepare to write 'pack-*.rev' files
  builtin/index-pack.c: write reverse indexes
  builtin/pack-objects.c: respect 'pack.writeReverseIndex'
  Documentation/config/pack.txt: advertise 'pack.writeReverseIndex'
  t: prepare for GIT_TEST_WRITE_REV_INDEX
  t: support GIT_TEST_WRITE_REV_INDEX
  pack-revindex: ensure that on-disk reverse indexes are given
    precedence

 Documentation/config/pack.txt           |   7 ++
 Documentation/git-index-pack.txt        |  20 ++--
 Documentation/technical/pack-format.txt |  17 ++++
 builtin/index-pack.c                    |  67 +++++++++++--
 builtin/pack-objects.c                  |   9 ++
 builtin/repack.c                        |   1 +
 object-store.h                          |   3 +
 pack-revindex.c                         | 116 ++++++++++++++++++++--
 pack-revindex.h                         |   3 +
 pack-write.c                            | 123 +++++++++++++++++++++++-
 pack.h                                  |   4 +
 packfile.c                              |  13 ++-
 packfile.h                              |   1 +
 t/README                                |   3 +
 t/t5319-multi-pack-index.sh             |   2 +-
 t/t5325-reverse-index.sh                |  94 ++++++++++++++++++
 t/t5604-clone-reference.sh              |   2 +-
 t/t5702-protocol-v2.sh                  |   4 +-
 t/t6500-gc.sh                           |   4 +-
 t/t9300-fast-import.sh                  |   2 +-
 tmp-objdir.c                            |   4 +-
 21 files changed, 463 insertions(+), 36 deletions(-)
 create mode 100755 t/t5325-reverse-index.sh

-- 
2.30.0.138.g6d7191ea01