Re: [PATCH v2] gitweb: map names/emails with mailmap
To
Emma Brooks
Cc
Git List
Jakub Narębski
From
Eric Sunshine
See Also
Prev Ref 1
Date
2020-08-10 00:49:59 UTC
On Sun, Aug 9, 2020 at 7:06 PM Emma Brooks <me@pluvano.com> wrote:
> Add an option to map names and emails to their canonical forms via a
> .mailmap file. This is enabled by default, consistent with the behavior
> of Git itself.
>
> Signed-off-by: Emma Brooks <me@pluvano.com>
> ---
> diff --git a/Documentation/gitweb.conf.txt b/Documentation/gitweb.conf.txt
> @@ -751,6 +751,11 @@ default font sizes or lineheights are changed (e.g. via adding extra
> +mailmap::
> +       Use mailmap to find the canonical name/email for
> +       committers/authors (see linkgit:git-shortlog[1]). Enabled by
> +       default.

Is this setting global or per-repository? (I ask because documentation
for other options in this section document whether they can be set
per-repository.)

Should there be any sort of support for functionality similar to the
"mailmap.file" and "mailmap.blob" configuration options in Git itself?
(Genuine question, not a demand for you to implement such support.)

> diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl
> +# Contents of mailmap stored as a referance to a hash with keys in the format

s/referance/reference/

> +# of "name <email>" or "<email>", and values that are hashes containing a
> +# replacement "name" and/or "email". If set (even if empty) the mailmap has
> +# already been read.
> +my $mailmap;
> +
> +sub read_mailmap {
> +       my %mailmap = ();
> +       open my $fd, '-|', quote_command(
> +               git_cmd(), 'cat-file', 'blob', 'HEAD:.mailmap') . ' 2> /dev/null'
> +               or die_error(500, 'Failed to read mailmap');

Am I reading this correctly that this will die if the project does not
have a .mailmap file? If so, that seems like harsh behavior since
there are many projects in the wild lacking a .mailmap file.

> +       return \%mailmap if eof $fd;
> +       foreach (split '\n', <$fd>) {

If the .mailmap has no content, then the 'foreach' loop won't be
entered, which means the early 'return' above it is unneeded, correct?
(Not necessarily asking for the early 'return' to be removed, but more
a case of checking that I'm understanding the logic.)

> +               next if (/^#/);
> +               if (/(.*)\s+ <([^<>]+)>\s+ ((?:.*\s+)? <[^<>]+>) (?:\s+\#)/x ||
> +                   /(.*)\s+ <([^<>]+)>\s+ ((?:.*\s+)? <[^<>]+>)/x) {
> +                       # New Name <new@email> <old@email>
> +                       # New Name <new@email> Old Name <old@email>

The first regex is intended to handle a trailing "# comment", whereas
the second regex is for lines lacking a comment, correct? However,
because neither of these expressions are anchored, the second regex
will match both types of lines, thus the first regex is redundant. I'm
guessing, therefore, that your intent was actually to anchor the
expressions, perhaps like this:

    if (/^\s* (.*)\s+ <([^<>]+)>\s+ ((?:.*\s+)? <[^<>]+>) (?:\s+\#)/x ||
        /^\s* (.*)\s+ <([^<>]+)>\s+ ((?:.*\s+)? <[^<>]+>) \s*$/x) {

Also, if you're matching lines of the form:

    name1 <email1> [optional-name] <email2>

in which you expect to see "name1", then is the loose "(.*)\s+"
desirable? Shouldn't it be tighter "(.+)\s+"? For instance:

    if (/^\s* (.+)\s+ <([^<>]+)>\s+ ((?:.*\s+)? <[^<>]+>) (?:\s+\#)/x ||
        /^\s* (.+)\s+ <([^<>]+)>\s+ ((?:.*\s+)? <[^<>]+>) \s*$/x) {

> +                       $mailmap{$3} = ();

I wonder if you should be doing some sort of whitespace normalization
on $3 before using it as a hash key. For instance, if someone has a
.mailmap that looks like this (where I've used "." to represent
space):

    name1.<email1>.name2...<email2>

then $3 will have three spaces between 'name2' and '<email2>' when
used as a key, and that won't match later when you construct a "name
<email>" key later in map_author() with a single space.

> +                       ($mailmap{$3}{name} = $1) =~ s/^\s+|\s+$//g;
> +                       $mailmap{$3}{email} = $2;
> +               } elsif (/(?: <([^<>]+)>\s+ | (.+)\s+ ) (<[^<>]+>) (?:\s+\#)/x ||
> +                        /(?: <([^<>]+)>\s+ | (.+)\s+ ) (<[^<>]+>)/x) {

Same comment as above about anchoring these patterns...

> +                       # New Name <old@email>
> +                       # <new@email> <old@email>
> +                       $mailmap{$3} = ();
> +                       if ($1) {
> +                               $mailmap{$3}{email} = $1;
> +                       } else {
> +                               ($mailmap{$3}{name} = $2) =~ s/^\s+|\s+$//g;
> +                       }
> +               }
> +       }
> +       return \%mailmap;
> +}