Re: [PATCH v5] gitweb: redacted e-mail addresses feature.
To
Eric Wong
Cc
Georgios Kontaxis via GitGitGadget
git@vger.kernel.org
"Ævar Arnfjörð Bjarmason"
brian m. carlson
From
Georgios Kontaxis
See Also
Prev Ref 1 Ref 2
Date
2021-03-29 03:17:36 UTC
> Georgios Kontaxis via GitGitGadget <gitgitgadget@gmail.com> wrote:
>> Gitweb extracts content from the Git log and makes it accessible
>> over HTTP. As a result, e-mail addresses found in commits are
>> exposed to web crawlers and they may not respect robots.txt.
>> This can result in unsolicited messages.
>
>> Introduce an 'email-privacy' feature which redacts e-mail addresses
>> from the generated HTML content
>
> A general reply to the topic: have you considered munging
> addresses in a way that is still human readable, but obviously
> obfuscated?
>
> On some other project, I settled on HTML "&#8226;" as a replacement
> for '.' for admins who enable that option.  The $USER@$NO_DOT
> remains as-is for easy identification+recognition of hosts.
>
Thanks for the suggestion.

People have been trying to hinder address harvesting for a while now.
Replacing '@' with "at", the dot with "dot", adding spaces, etc.
was pretty common at some point. May still be.
I would expect crawlers to have caught up and this includes
all sorts of character encodings and unicode look-alike substitutions.

At the end of the day we are looking for something that's easy for humans
to read but hard for scripts to parse as an e-mail address.
(And that scripts cannot learn through an additional regex)
I'm not aware of anything like that. (I know CAPTCHAs, etc.)

> I also considered Unicode homographs which can look identical
> to replacement characters, too; but rejected that idea since
> it would cause grief for legitimate users who would not notice
> the homograph when pasting into their mail client.
>
> Anyways, here's the list of candidates I tried:
>
> homograph∂80x24.org
> homograph@80x24ͺorg
> homograph@80x24·org
> homograph@80x24•org
> homographï¼ 80x24.org
> homograph﹫80x24.org
>
> https://en.wikipedia.org/wiki/Ano_Teleia#Similar_symbols
> https://en.wikipedia.org/wiki/Enclosed_A
>
> homographⒶ80x24.org
> homograph@80x24 org
> homograph@80x24․org
> homograph@80x24ꓸorg
>