[PATCH v2] doc/reftable: document how to handle windows
To
git@vger.kernel.org
Cc
Han-Wen Nienhuys
Han-Wen Nienhuys
Han-Wen Nienhuys
From
Han-Wen Nienhuys via GitGitGadget
See Also
Prev
Date
2021-02-23 16:57:23 UTC
From: Han-Wen Nienhuys <hanwen@google.com>

On Windows we can't delete or overwrite files opened by other processes. Here we
sketch how to handle this situation.

We propose to use a random element in the filename. It's possible to design an
alternate solution based on counters, but that would assign semantics to the
filenames that complicates implementation.

Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
---
    doc/reftable: document how to handle windows
    
    On Windows we can't delete or overwrite files opened by other processes.
    Here we sketch how to handle this situation.
    
    Signed-off-by: Han-Wen Nienhuys hanwen@google.com

Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-git-951%2Fhanwen%2Fwindows-doc-v2
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-git-951/hanwen/windows-doc-v2
Pull-Request: https://github.com/git/git/pull/951

Range-diff vs v1:

 1:  a952bc478f86 ! 1:  e3854f2cc106 doc/reftable: document how to handle windows
     @@ Commit message
          On Windows we can't delete or overwrite files opened by other processes. Here we
          sketch how to handle this situation.
      
     +    We propose to use a random element in the filename. It's possible to design an
     +    alternate solution based on counters, but that would assign semantics to the
     +    filenames that complicates implementation.
     +
          Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
      
       ## Documentation/technical/reftable.txt ##
     -@@ Documentation/technical/reftable.txt: A collection of reftable files are stored in the `$GIT_DIR/reftable/`
     - directory:
     +@@ Documentation/technical/reftable.txt: A repository must set its `$GIT_DIR/config` to configure reftable:
     + Layout
     + ^^^^^^
       
     - ....
     +-A collection of reftable files are stored in the `$GIT_DIR/reftable/`
     +-directory:
     +-
     +-....
      -00000001-00000001.log
      -00000002-00000002.ref
      -00000003-00000003.ref
     -+00000001-00000001-RANDOM1.log
     -+00000002-00000002-RANDOM2.ref
     -+00000003-00000003-RANDOM3.ref
     - ....
     - 
     - where reftable files are named by a unique name such as produced by the
     +-....
     +-
     +-where reftable files are named by a unique name such as produced by the
      -function `${min_update_index}-${max_update_index}.ref`.
     -+function `${min_update_index}-${max_update_index}-${random}.ref`.
     ++A collection of reftable files are stored in the `$GIT_DIR/reftable/` directory.
     ++Their names should have a random element, such that each filename is globally
     ++unique; this helps avoid spurious failures on Windows, where open files cannot
     ++be removed or overwritten. It suggested to use
     ++`${min_update_index}-${max_update_index}-${random}.ref` as a naming convention.
       
       Log-only files use the `.log` extension, while ref-only and mixed ref
       and log files use `.ref`. extension.
     @@ Documentation/technical/reftable.txt: current files, one per line, in order, fro
       ....
       
       Readers must read `$GIT_DIR/reftable/tables.list` to determine which
     -@@ Documentation/technical/reftable.txt: Reftable files not listed in `tables.list` may be new (and about to be
     - added to the stack by the active writer), or ancient and ready to be
     - pruned.
     - 
     -+The random suffix added to table filenames ensures that we never attempt to
     -+overwrite an existing table, which is necessary for this scheme to work on
     -+Windows
     -+
     - Backward compatibility
     - ^^^^^^^^^^^^^^^^^^^^^^
     - 
      @@ Documentation/technical/reftable.txt: new reftable and atomically appending it to the stack:
       3.  Select `update_index` to be most recent file's
       `max_update_index + 1`.


 Documentation/technical/reftable.txt | 42 +++++++++++++++++-----------
 1 file changed, 26 insertions(+), 16 deletions(-)

diff --git a/Documentation/technical/reftable.txt b/Documentation/technical/reftable.txt
index 8095ab2590c8..3ef169af27d8 100644
--- a/Documentation/technical/reftable.txt
+++ b/Documentation/technical/reftable.txt
@@ -872,17 +872,11 @@ A repository must set its `$GIT_DIR/config` to configure reftable:
 Layout
 ^^^^^^
 
-A collection of reftable files are stored in the `$GIT_DIR/reftable/`
-directory:
-
-....
-00000001-00000001.log
-00000002-00000002.ref
-00000003-00000003.ref
-....
-
-where reftable files are named by a unique name such as produced by the
-function `${min_update_index}-${max_update_index}.ref`.
+A collection of reftable files are stored in the `$GIT_DIR/reftable/` directory.
+Their names should have a random element, such that each filename is globally
+unique; this helps avoid spurious failures on Windows, where open files cannot
+be removed or overwritten. It suggested to use
+`${min_update_index}-${max_update_index}-${random}.ref` as a naming convention.
 
 Log-only files use the `.log` extension, while ref-only and mixed ref
 and log files use `.ref`. extension.
@@ -893,9 +887,9 @@ current files, one per line, in order, from oldest (base) to newest
 
 ....
 $ cat .git/reftable/tables.list
-00000001-00000001.log
-00000002-00000002.ref
-00000003-00000003.ref
+00000001-00000001-RANDOM1.log
+00000002-00000002-RANDOM2.ref
+00000003-00000003-RANDOM3.ref
 ....
 
 Readers must read `$GIT_DIR/reftable/tables.list` to determine which
@@ -940,7 +934,7 @@ new reftable and atomically appending it to the stack:
 3.  Select `update_index` to be most recent file's
 `max_update_index + 1`.
 4.  Prepare temp reftable `tmp_XXXXXX`, including log entries.
-5.  Rename `tmp_XXXXXX` to `${update_index}-${update_index}.ref`.
+5.  Rename `tmp_XXXXXX` to `${update_index}-${update_index}-${random}.ref`.
 6.  Copy `tables.list` to `tables.list.lock`, appending file from (5).
 7.  Rename `tables.list.lock` to `tables.list`.
 
@@ -993,7 +987,7 @@ prevents other processes from trying to compact these files.
 should always be the case, assuming that other processes are adhering to
 the locking protocol.
 7.  Rename `${min_update_index}-${max_update_index}_XXXXXX` to
-`${min_update_index}-${max_update_index}.ref`.
+`${min_update_index}-${max_update_index}-${random}.ref`.
 8.  Write the new stack to `tables.list.lock`, replacing `B` and `C`
 with the file from (4).
 9.  Rename `tables.list.lock` to `tables.list`.
@@ -1005,6 +999,22 @@ This strategy permits compactions to proceed independently of updates.
 Each reftable (compacted or not) is uniquely identified by its name, so
 open reftables can be cached by their name.
 
+Windows
+^^^^^^^
+
+On windows, and other systems that do not allow deleting or renaming to open
+files, compaction may succeed, but other readers may prevent obsolete tables
+from being deleted.
+
+On these platforms, the following strategy can be followed: on closing a
+reftable stack, reload `tables.list`, and delete any tables no longer mentioned
+in `tables.list`.
+
+Irregular program exit may still leave about unused files. In this case, a
+cleanup operation can read `tables.list`, note its modification timestamp, and
+delete any unreferenced `*.ref` files that are older.
+
+
 Alternatives considered
 ~~~~~~~~~~~~~~~~~~~~~~~
 

base-commit: 66e871b6647ffea61a77a0f82c7ef3415f1ee79c
-- 
gitgitgadget