Hi,
This is the first of two series to introduce an on-disk alternative to the
reverse index which is currently generated once per-process and stored in
memory.
As a reminder, the reverse index is used to translate an object's position
relative to other objects within the same packfile to that object's position
relative to the pack's index.
Generating the reverse index in memory for repositories with large packs has two
significant drawbacks:
- It requires allocating sizeof(struct revindex_entry) per packed object.
- It requires us to sort the entries by their pack offset. This is implemented
in sort_revindex() using a radix sort, but still takes considerable time (as
benchmarks found in the second series demonstrate).
Both of these can be addressed by storing the reverse index in a new '.rev' file
alongside the packs. This file is written once (during pack creation), and does
not require sorting when accessed, since it is stored in a sorted order.
This series lays the groundwork necessary to introduce the file format (which is
implemented in the second series). I'm splitting these two up since I think the
changes in this series can be queued independently from those in the latter
series.
The goal of this series is to remove direct access of the `struct
revindex_entry` type, as well as `struct packed_git`'s `revindex` field. The
on-disk format will be mmap'd and accessed directly, but the format is
sufficiently different that the whole `revindex` array can't be written as-is.
Separately from our main goal, the new API that is introduced in this series is
IMHO easier to read, and more clearly describes what order the given and
returned values are relative to.
The changes are structured as follows:
- First, a new API is proposed.
- Then, uses of the old API are removed one by one and replaced with their
new counterparts.
- Finally, without any callers remaining, the old API is removed.
Thanks in advance for your review.
Taylor Blau (20):
pack-revindex: introduce a new API
write_reuse_object(): convert to new revindex API
write_reused_pack_one(): convert to new revindex API
write_reused_pack_verbatim(): convert to new revindex API
check_object(): convert to new revindex API
bitmap_position_packfile(): convert to new revindex API
show_objects_for_type(): convert to new revindex API
get_size_by_pos(): convert to new revindex API
try_partial_reuse(): convert to new revindex API
rebuild_existing_bitmaps(): convert to new revindex API
get_delta_base_oid(): convert to new revindex API
retry_bad_packed_offset(): convert to new revindex API
packed_object_info(): convert to new revindex API
unpack_entry(): convert to new revindex API
for_each_object_in_pack(): convert to new revindex API
builtin/gc.c: guess the size of the revindex
pack-revindex: remove unused 'find_pack_revindex()'
pack-revindex: remove unused 'find_revindex_position()'
pack-revindex: hide the definition of 'revindex_entry'
pack-revindex.c: avoid direct revindex access in
'offset_to_pack_pos()'
builtin/gc.c | 2 +-
builtin/pack-objects.c | 33 +++++++++++++++++-----------
pack-bitmap.c | 44 ++++++++++++++++++-------------------
pack-revindex.c | 49 +++++++++++++++++++++++++++--------------
pack-revindex.h | 10 +++------
packfile.c | 50 ++++++++++++++++++++++++++----------------
6 files changed, 109 insertions(+), 79 deletions(-)
--
2.30.0.138.g6d7191ea01