[PATCH v2 0/4] Hash Abstraction
Jonathan Nieder
Stefan Beller
Brandon Williams
brian m. carlson
2017-10-28 18:12:35 UTC
This is a series proposing a basic abstraction for hash functions.

As we get closer to converting the remainder of the codebase to use
struct object_id, we should think about the design we want our hash
function abstraction to take.  This series is a proposal for one idea.
Input on any aspect of this proposal is welcome.

This series exposes a struct git_hash_algo that contains basic
information about a given hash algorithm that distinguishes it from
other algorithms: name, identifiers, lengths, implementing functions,
and empty tree and blob constants.  It also exposes an array of hash
algorithms, and a constant for indexing them.

The series also demonstrates a simple conversion using the abstraction
over empty blob and tree values.

In order to avoid conflicting with the struct repository work and with
the goal of avoiding global variables as much as possible, I've pushed
the hash algorithm into struct repository and exposed it via a #define.

I propose this series now as it will inform the way we go about
converting other parts of the codebase, especially some of the pack
algorithms.  Because we share some hash computation code between pack
checksums and object hashing, we need to decide whether to expose pack
checksums as struct object_id, even though they are technically not
object IDs.  Furthermore, if we end up needing to stuff an algorithm
value into struct object_id, we'll no longer be able to directly
reference object IDs in a pack without a copy.

I've updated this series in some significant ways to reflect and better
implement the transition plan as it's developed.  If there are ways
in which this series (or future series) can converge better on the
transition plan, that input would be valuable.

This series is available from the usual places as branch hash-struct,
based against master as of 2.15-rc2.

Changes from v1:
* Rebase onto 2.15-rc2.
* Fix the uninitialized value that Peff pointed out.  This fixes the
  error, but leaves the code in the same place, since I think it's where
  it should be.
* Improve commit message to explain the meaning of current_hash WRT the
  transition plan.
* Added an unknown hash algorithm constant and value to better implement
  the transition plan.
* Explain in the commit message why hex size and binary size are both
* Add a format_id field to the struct, in coordination with the
  transition plan.
* Improve comments for struct fields and constants.

brian m. carlson (4):
  setup: expose enumerated repo info
  Add structure representing hash algorithm
  Integrate hash algorithm support with repo setup
  Switch empty tree and blob lookups to use hash abstraction

 builtin/am.c       |  2 +-
 builtin/checkout.c |  2 +-
 builtin/diff.c     |  2 +-
 builtin/pull.c     |  2 +-
 cache.h            | 67 ++++++++++++++++++++++++++++++++++++++++++++++++++----
 diff-lib.c         |  2 +-
 merge-recursive.c  |  2 +-
 notes-merge.c      |  2 +-
 repository.c       |  7 ++++++
 repository.h       |  5 ++++
 sequencer.c        |  6 ++---
 setup.c            | 49 ++++++++++++++++++++++-----------------
 sha1_file.c        | 43 +++++++++++++++++++++++++++++++++++
 submodule.c        |  2 +-
 14 files changed, 157 insertions(+), 36 deletions(-)