Re: [BUG] gitignore documentation inconsistent with actual behaviour
To
dana
Cc
git@vger.kernel.org
Matthieu Moy
Michael Haggerty
From
Ævar Arnfjörð Bjarmason
See Also
Prev
Date
2018-10-11 11:08:06 UTC

On Thu, Oct 11 2018, dana wrote:

> Hello,
>
> I'm a contributor to ripgrep, which is a grep-like tool that supports using
> gitignore files to control which files are searched in a repo (or any other
> directory tree). ripgrep's support for the patterns in these files is based on
> git's official documentation, as seen here:
>
>   https://git-scm.com/docs/gitignore
>
> One of the most common reports on the ripgrep bug tracker is that it does not
> allow patterns like the following real-world examples, where a ** is used along
> with other text within the same path component:
>
>   **/**$$*.java
>   **.orig
>   **local.properties
>   !**.sha1
>
> The reason it doesn't allow them is that the gitignore documentation explicitly
> states that they're invalid:
>
>   A leading "**" followed by a slash means match in all directories...
>
>   A trailing "/**" matches everything inside...
>
>   A slash followed by two consecutive asterisks then a slash matches zero or
>   more directories...
>
>   Other consecutive asterisks are considered invalid.
>
> git itself happily accepts these patterns, however, apparently treating the **
> like a single * without fnmatch(3)'s FNM_PATHNAME flag set (in other words, it
> matches / as a regular character, thus crossing directory boundaries).
>
> ripgrep's developer is loathe to reverse-engineer this undocumented behaviour,
> and so the reports keep coming, both for ripgrep itself and for down-stream
> consumers of it and its ignore crate (including most notably Microsoft's VS Code
> editor).
>
> My request: Assuming that git's actual handling of these patterns is intended,
> would it be possible to make it 'official' and explicitly add it to the
> documentation?
>
> References (the first one is the main bug):
>
> https://github.com/BurntSushi/ripgrep/issues/373
> https://github.com/BurntSushi/ripgrep/issues/507
> https://github.com/BurntSushi/ripgrep/issues/859
> https://github.com/BurntSushi/ripgrep/issues/945
> https://github.com/BurntSushi/ripgrep/issues/1080
> https://github.com/BurntSushi/ripgrep/issues/1082
> https://github.com/Microsoft/vscode/issues/24050

Yeah those docs seem wrong. In general the docs for the matching
function are quite bad. I have on my TODO list to factor this out into
some gitwildmatch manpage, but right now the bit in gitignore is all we
have.

Our matching function comes from rsync originally, whose manpage says:

    use ’**’ to match anything, including slashes.

I believe this is accurate as far as the implementation goes. You can
also see the rather exhaustive tests here:
https://github.com/git/git/blob/master/t/t3070-wildmatch.sh

Note the different behavior with e.g. --glob-pathspecs v.s. the
default. There's also stuff like:

    $ grep diff=perl .gitattributes
    *.perl eol=lf diff=perl
    *.pl eof=lf diff=perl
    *.pm eol=lf diff=perl
    $ git ls-files ":(attr:diff=perl)" | wc -l
    65

And then the exclude syntax. This is not in .gitignore:

    $ git ls-files ":(exclude)*.pm" ":(attr:diff=perl)" | wc -l
    41
    $ git ls-files ":^*.pm" ":(attr:diff=perl)" | wc -l
    41

I.e. we have wildmatch() on one hand and then the pathspec matching.

For an unrelated thing I have been thinking of adding a new plumbing
command who'd get input like this on stdin:

    1 text t/t0202/test.pl\0\0
    2 text perl/Git.pm\0\0
    3 text *.pm\0\0
    4 text :^*.pm"\0:(attr:diff=perl)\0\0
    5 match glob,icase\04\03\0\0
    6 match icase\04\02\0\0
    7 match \04\01\0\0

Which would return (in any order):

    1 OK
    2 OK
    3 OK
    4 OK
    5 NO
    6 NO
    7 YES

Or whatever. I.e. something where you can as a batch feed various
strings into a program in a batch, and then ask how some of them would
match against each other.

The reason for this is to extend something like git-multimail[1] with a
config where users can subscribe to changes to paths as declared by git
pathspecs, and be informed about which of the glob rules they defined
matched their config.

Right now you'd need to e.g. for a "git push" run each match rule for
each user with such config against each changed path in each commit
that's pushed, but plumbing like this would allow for feeding arbitrary
combinations of those in and ask what matches against what.

The reason I'm on this tangent is to ask whether if such a thing
existed, if it's something you can see something like ripgrep
using. I.e. ask git given its .gitignore and .gitattributes what thing
matches one of its pathspecs instead of carrying your own bug-compatible
implementation.

1. https://github.com/git-multimail/git-multimail/