Re: [PATCH v2] gitweb: map names/emails with mailmap
To
Eric Sunshine
Cc
Git List
Jakub Narębski
From
Emma Brooks
See Also
Prev Ref 1 Ref 2
Date
2020-08-10 03:12:00 UTC
On 2020-08-09 20:49:59-0400, Eric Sunshine wrote:
> On Sun, Aug 9, 2020 at 7:06 PM Emma Brooks <me@pluvano.com> wrote:
> > Add an option to map names and emails to their canonical forms via a
> > .mailmap file. This is enabled by default, consistent with the behavior
> > of Git itself.
> >
> > Signed-off-by: Emma Brooks <me@pluvano.com>
> > ---
> > diff --git a/Documentation/gitweb.conf.txt b/Documentation/gitweb.conf.txt
> > @@ -751,6 +751,11 @@ default font sizes or lineheights are changed (e.g. via adding extra
> > +mailmap::
> > +       Use mailmap to find the canonical name/email for
> > +       committers/authors (see linkgit:git-shortlog[1]). Enabled by
> > +       default.
> 
> Is this setting global or per-repository? (I ask because documentation
> for other options in this section document whether they can be set
> per-repository.)

Global. I'll add a note that it cannot be set per-project, or I could
add support for setting it per-project if that's wanted.

> Should there be any sort of support for functionality similar to the
> "mailmap.file" and "mailmap.blob" configuration options in Git itself?
> (Genuine question, not a demand for you to implement such support.)

Yes, that would be useful and should probably be supported.

> > diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl
> > +# Contents of mailmap stored as a referance to a hash with keys in the format
> 
> s/referance/reference/

OK.

> > +# of "name <email>" or "<email>", and values that are hashes containing a
> > +# replacement "name" and/or "email". If set (even if empty) the mailmap has
> > +# already been read.
> > +my $mailmap;
> > +
> > +sub read_mailmap {
> > +       my %mailmap = ();
> > +       open my $fd, '-|', quote_command(
> > +               git_cmd(), 'cat-file', 'blob', 'HEAD:.mailmap') . ' 2> /dev/null'
> > +               or die_error(500, 'Failed to read mailmap');
> 
> Am I reading this correctly that this will die if the project does not
> have a .mailmap file? If so, that seems like harsh behavior since
> there are many projects in the wild lacking a .mailmap file.

No, this error message is misleading. The die_error is called if there
is a problem executing git cat-file, but not if cat-file returns an
error. I'll revise this message to be more accurate.

> > +       return \%mailmap if eof $fd;
> > +       foreach (split '\n', <$fd>) {
> 
> If the .mailmap has no content, then the 'foreach' loop won't be
> entered, which means the early 'return' above it is unneeded, correct?
> (Not necessarily asking for the early 'return' to be removed, but more
> a case of checking that I'm understanding the logic.)

The early return is intended to catch when there is no mailmap, so $fd
does not get initialized. Without it, you would get an error when you
try to split $fd's content:

    Use of uninitialized value $fd in split at [the foreach]

> > +               next if (/^#/);
> > +               if (/(.*)\s+ <([^<>]+)>\s+ ((?:.*\s+)? <[^<>]+>) (?:\s+\#)/x ||
> > +                   /(.*)\s+ <([^<>]+)>\s+ ((?:.*\s+)? <[^<>]+>)/x) {
> > +                       # New Name <new@email> <old@email>
> > +                       # New Name <new@email> Old Name <old@email>
> 
> The first regex is intended to handle a trailing "# comment", whereas
> the second regex is for lines lacking a comment, correct? However,
> because neither of these expressions are anchored, the second regex
> will match both types of lines, thus the first regex is redundant. I'm
> guessing, therefore, that your intent was actually to anchor the
> expressions, perhaps like this:
> 
>     if (/^\s* (.*)\s+ <([^<>]+)>\s+ ((?:.*\s+)? <[^<>]+>) (?:\s+\#)/x ||
>         /^\s* (.*)\s+ <([^<>]+)>\s+ ((?:.*\s+)? <[^<>]+>) \s*$/x) {
> 
> Also, if you're matching lines of the form:
> 
>     name1 <email1> [optional-name] <email2>
> 
> in which you expect to see "name1", then is the loose "(.*)\s+"
> desirable? Shouldn't it be tighter "(.+)\s+"? For instance:
> 
>     if (/^\s* (.+)\s+ <([^<>]+)>\s+ ((?:.*\s+)? <[^<>]+>) (?:\s+\#)/x ||
>         /^\s* (.+)\s+ <([^<>]+)>\s+ ((?:.*\s+)? <[^<>]+>) \s*$/x) {

Yes and yes. I'll update those.

> > +                       $mailmap{$3} = ();
> 
> I wonder if you should be doing some sort of whitespace normalization
> on $3 before using it as a hash key. For instance, if someone has a
> .mailmap that looks like this (where I've used "." to represent
> space):
> 
>     name1.<email1>.name2...<email2>
> 
> then $3 will have three spaces between 'name2' and '<email2>' when
> used as a key, and that won't match later when you construct a "name
> <email>" key later in map_author() with a single space.

Yes, I hadn't considered that case.

Thanks.