2008-07-28 14:46:19

by [email protected]

[permalink] [raw]
Subject: 463 kernel developers missing!

Here's a new .mailmap file for the kernel that cleans up the horrible
mess of names and email addresses in the log. To use it put it at the
root of your kernel tree and type 'git shortlog'. Before the clean up
there were 4,284 developers, after 3,821. There are 5,051 unique
emails.

The mailmap file contains all email addresses that have been used to
submit patches to the kernel. Don't freak out about your email address
being in the file, if it is in the file it is already in Google since
the kernel log is already in Google.

Putting all the email addresses and names into this file allows it to
be used as a basis for future validation. Since I don't know perl, can
someone whip up a patch to checkpatch.pl that validates the emails in
new patches against the ones in mailmap? Then if you aren't in mailmap
part of your commit needs to include a new entry for mailmap.

Another useful script would take the output of "git log | grep ^Author
| sort -u" and diff the list of email address against the mailmap
file. Any new emails found are new people that need to be added to
mailmap. Only the emails should be checked, not the names.

Please excuse any errors I made in the clean up process, a large
portion of it was done manually. After the base file is in we can
patch it to fix the errors. For those of you using a dozen aliases,
you might want to order them so that your current email is the last
one in the list. James Bottomley has the most aliases, 13.

PS It's not a diff because it would be too big to post.

--
Jon Smirl
[email protected]


Attachments:
(No filename) (1.52 kB)
.mailmap.bz2 (67.82 kB)
Download all attachments

2008-07-28 15:20:57

by [email protected]

[permalink] [raw]
Subject: Re: 463 kernel developers missing!

Some stats using the new mailmap file:

Patches, number of developers
1, 1591
2, 484
3, 254
4, 152
5, 118
more, 1242

Top twenty developers:

Linus Torvalds (4350):
Andrew Morton (1789):
Adrian Bunk (1774):
Al Viro (1735):
Ingo Molnar (1393):
Ralf Baechle (1367):
Jeff Garzik (1291):
Takashi Iwai (1196):
Tejun Heo (1092):
Bartlomiej Zolnierkiewicz (1071):
David S. Miller (1069):
Patrick McHardy (1031):
Stephen Hemminger (1017):
Russell King (985):
Andi Kleen (973):
Thomas Gleixner (953):
Alan Cox (822):
Paul Mundt (813):
Dave Miller (787):

Make your own list:
git shortlog | grep -v ^[[:space:]] | grep -v ^$ | sort -t "(" -g -k 2 -r


--
Jon Smirl
[email protected]

2008-07-28 15:36:37

by Adrian Bunk

[permalink] [raw]
Subject: Re: 463 kernel developers missing!

On Mon, Jul 28, 2008 at 11:20:49AM -0400, Jon Smirl wrote:
> Some stats using the new mailmap file:
>
> Patches, number of developers
> 1, 1591
> 2, 484
> 3, 254
> 4, 152
> 5, 118
> more, 1242
>
> Top twenty developers:
>
> Linus Torvalds (4350):

You count merges as patches.

> Andrew Morton (1789):
> Adrian Bunk (1774):
> Al Viro (1735):
> Ingo Molnar (1393):
> Ralf Baechle (1367):
> Jeff Garzik (1291):
> Takashi Iwai (1196):
> Tejun Heo (1092):
> Bartlomiej Zolnierkiewicz (1071):
> David S. Miller (1069):
> Patrick McHardy (1031):
> Stephen Hemminger (1017):
> Russell King (985):
> Andi Kleen (973):
> Thomas Gleixner (953):
> Alan Cox (822):
> Paul Mundt (813):
> Dave Miller (787):

Dave Miller = David S. Miller

In this case your .mailmap made the result worse...

> Make your own list:
> git shortlog | grep -v ^[[:space:]] | grep -v ^$ | sort -t "(" -g -k 2 -r
>
> Jon Smirl

cu
Adrian

--

"Is there not promise of rain?" Ling Tan asked suddenly out
of the darkness. There had been need of rain for many days.
"Only a promise," Lao Er said.
Pearl S. Buck - Dragon Seed

2008-07-28 15:45:39

by [email protected]

[permalink] [raw]
Subject: Re: 463 kernel developers missing!

On 7/28/08, Adrian Bunk <[email protected]> wrote:
> On Mon, Jul 28, 2008 at 11:20:49AM -0400, Jon Smirl wrote:
> > Some stats using the new mailmap file:
> >
> > Patches, number of developers
> > 1, 1591
> > 2, 484
> > 3, 254
> > 4, 152
> > 5, 118
> > more, 1242
> >
> > Top twenty developers:
> >
> > Linus Torvalds (4350):
>
>
> You count merges as patches.

I just used the output from git shortlog, is there a better way?

>
>
> > Andrew Morton (1789):
> > Adrian Bunk (1774):
> > Al Viro (1735):
> > Ingo Molnar (1393):
> > Ralf Baechle (1367):
> > Jeff Garzik (1291):
> > Takashi Iwai (1196):
> > Tejun Heo (1092):
> > Bartlomiej Zolnierkiewicz (1071):
> > David S. Miller (1069):
> > Patrick McHardy (1031):
> > Stephen Hemminger (1017):
> > Russell King (985):
> > Andi Kleen (973):
> > Thomas Gleixner (953):
> > Alan Cox (822):
> > Paul Mundt (813):
> > Dave Miller (787):
>
>
> Dave Miller = David S. Miller
>
> In this case your .mailmap made the result worse...

Easily fixed, I just missed one of his aliases.
12% of all name/email pairs have errors in them, I didn't catch them all.

Linus Torvalds (4350):
David S. Miller (1856):
Andrew Morton (1789):
Adrian Bunk (1774):
Al Viro (1735):
Ingo Molnar (1393):
Ralf Baechle (1367):
Jeff Garzik (1291):
Takashi Iwai (1196):
Tejun Heo (1092):
Bartlomiej Zolnierkiewicz (1071):
Patrick McHardy (1031):
Stephen Hemminger (1017):
Russell King (985):
Andi Kleen (973):
Thomas Gleixner (953):
Alan Cox (822):
Paul Mundt (813):
Jean Delvare (771):



>
>
> > Make your own list:
> > git shortlog | grep -v ^[[:space:]] | grep -v ^$ | sort -t "(" -g -k 2 -r
> >
> > Jon Smirl
>
>
> cu
> Adrian
>
>
> --
>
> "Is there not promise of rain?" Ling Tan asked suddenly out
> of the darkness. There had been need of rain for many days.
> "Only a promise," Lao Er said.
> Pearl S. Buck - Dragon Seed
>
>


--
Jon Smirl
[email protected]

2008-07-28 15:46:20

by Adrian Bunk

[permalink] [raw]
Subject: Re: 463 kernel developers missing!

On Mon, Jul 28, 2008 at 10:45:59AM -0400, Jon Smirl wrote:
> Here's a new .mailmap file for the kernel that cleans up the horrible
> mess of names and email addresses in the log. To use it put it at the
> root of your kernel tree and type 'git shortlog'. Before the clean up
> there were 4,284 developers, after 3,821. There are 5,051 unique
> emails.
>
> The mailmap file contains all email addresses that have been used to
> submit patches to the kernel. Don't freak out about your email address
> being in the file, if it is in the file it is already in Google since
> the kernel log is already in Google.
>
> Putting all the email addresses and names into this file allows it to
> be used as a basis for future validation. Since I don't know perl, can
> someone whip up a patch to checkpatch.pl that validates the emails in
> new patches against the ones in mailmap? Then if you aren't in mailmap
> part of your commit needs to include a new entry for mailmap.
>
> Another useful script would take the output of "git log | grep ^Author
> | sort -u" and diff the list of email address against the mailmap
> file. Any new emails found are new people that need to be added to
> mailmap. Only the emails should be checked, not the names.
>
> Please excuse any errors I made in the clean up process, a large
> portion of it was done manually. After the base file is in we can
> patch it to fix the errors. For those of you using a dozen aliases,
> you might want to order them so that your current email is the last
> one in the list. James Bottomley has the most aliases, 13.

The charset of the names is pretty random - that should be fixed at some
point.

> PS It's not a diff because it would be too big to post.

200 kB would be OK for linux-kernel (AFAIR the current limit
is 400 kB). But to prevent charset problems a compressed attachment
might make sense...

> Jon Smirl

cu
Adrian

--

"Is there not promise of rain?" Ling Tan asked suddenly out
of the darkness. There had been need of rain for many days.
"Only a promise," Lao Er said.
Pearl S. Buck - Dragon Seed

2008-07-28 15:53:20

by Adrian Bunk

[permalink] [raw]
Subject: Re: 463 kernel developers missing!

On Mon, Jul 28, 2008 at 11:45:29AM -0400, Jon Smirl wrote:
> On 7/28/08, Adrian Bunk <[email protected]> wrote:
> > On Mon, Jul 28, 2008 at 11:20:49AM -0400, Jon Smirl wrote:
> > > Some stats using the new mailmap file:
> > >
> > > Patches, number of developers
> > > 1, 1591
> > > 2, 484
> > > 3, 254
> > > 4, 152
> > > 5, 118
> > > more, 1242
> > >
> > > Top twenty developers:
> > >
> > > Linus Torvalds (4350):
> >
> >
> > You count merges as patches.
>
> I just used the output from git shortlog, is there a better way?

I use "cg-log --summary", but there's most likely also some easy way
without cogito.

> > > Andrew Morton (1789):
> > > Adrian Bunk (1774):
> > > Al Viro (1735):
> > > Ingo Molnar (1393):
> > > Ralf Baechle (1367):
> > > Jeff Garzik (1291):
> > > Takashi Iwai (1196):
> > > Tejun Heo (1092):
> > > Bartlomiej Zolnierkiewicz (1071):
> > > David S. Miller (1069):
> > > Patrick McHardy (1031):
> > > Stephen Hemminger (1017):
> > > Russell King (985):
> > > Andi Kleen (973):
> > > Thomas Gleixner (953):
> > > Alan Cox (822):
> > > Paul Mundt (813):
> > > Dave Miller (787):
> >
> >
> > Dave Miller = David S. Miller
> >
> > In this case your .mailmap made the result worse...
>
> Easily fixed, I just missed one of his aliases.

There were two:
Dave Miller <[email protected]>
David Miller <[email protected]>

>...
> Jon Smirl

cu
Adrian

--

"Is there not promise of rain?" Ling Tan asked suddenly out
of the darkness. There had been need of rain for many days.
"Only a promise," Lao Er said.
Pearl S. Buck - Dragon Seed

2008-07-28 15:55:18

by [email protected]

[permalink] [raw]
Subject: Re: 463 kernel developers missing!

On 7/28/08, Adrian Bunk <[email protected]> wrote:
> On Mon, Jul 28, 2008 at 10:45:59AM -0400, Jon Smirl wrote:
> > Here's a new .mailmap file for the kernel that cleans up the horrible
> > mess of names and email addresses in the log. To use it put it at the
> > root of your kernel tree and type 'git shortlog'. Before the clean up
> > there were 4,284 developers, after 3,821. There are 5,051 unique
> > emails.
> >
> > The mailmap file contains all email addresses that have been used to
> > submit patches to the kernel. Don't freak out about your email address
> > being in the file, if it is in the file it is already in Google since
> > the kernel log is already in Google.
> >
> > Putting all the email addresses and names into this file allows it to
> > be used as a basis for future validation. Since I don't know perl, can
> > someone whip up a patch to checkpatch.pl that validates the emails in
> > new patches against the ones in mailmap? Then if you aren't in mailmap
> > part of your commit needs to include a new entry for mailmap.
> >
> > Another useful script would take the output of "git log | grep ^Author
> > | sort -u" and diff the list of email address against the mailmap
> > file. Any new emails found are new people that need to be added to
> > mailmap. Only the emails should be checked, not the names.
> >
> > Please excuse any errors I made in the clean up process, a large
> > portion of it was done manually. After the base file is in we can
> > patch it to fix the errors. For those of you using a dozen aliases,
> > you might want to order them so that your current email is the last
> > one in the list. James Bottomley has the most aliases, 13.
>
>
> The charset of the names is pretty random - that should be fixed at some
> point.

Follow on patches can fix the charset issues, right now they are
simply copied from the log messages. I've tried to preserve them as
best as I can but they have been mangled pretty badly.

>
>
> > PS It's not a diff because it would be too big to post.
>
>
> 200 kB would be OK for linux-kernel (AFAIR the current limit
> is 400 kB). But to prevent charset problems a compressed attachment
> might make sense...

It also saved the mail server from sending out a couple hundred GB of mail.

The main change is including every email in the mailmap and not just
the exceptions. By putting all emails into the file it becomes
possible to use the file for validation. And we need validation, the
current log has a 12% error rate.

I'll send it in patch form to whoever is going to send it upstream.
Who would that be?

>
>
> > Jon Smirl
>
> cu
> Adrian
>
> --
>
> "Is there not promise of rain?" Ling Tan asked suddenly out
> of the darkness. There had been need of rain for many days.
> "Only a promise," Lao Er said.
> Pearl S. Buck - Dragon Seed
>
>


--
Jon Smirl
[email protected]

2008-07-28 15:58:30

by [email protected]

[permalink] [raw]
Subject: Re: 463 kernel developers missing!

On 7/28/08, Adrian Bunk <[email protected]> wrote:
> On Mon, Jul 28, 2008 at 11:45:29AM -0400, Jon Smirl wrote:
> > On 7/28/08, Adrian Bunk <[email protected]> wrote:
> > > On Mon, Jul 28, 2008 at 11:20:49AM -0400, Jon Smirl wrote:
> > > > Some stats using the new mailmap file:
> > > >
> > > > Patches, number of developers
> > > > 1, 1591
> > > > 2, 484
> > > > 3, 254
> > > > 4, 152
> > > > 5, 118
> > > > more, 1242
> > > >
> > > > Top twenty developers:
> > > >
> > > > Linus Torvalds (4350):
> > >
> > >
> > > You count merges as patches.
> >
> > I just used the output from git shortlog, is there a better way?
>
>
> I use "cg-log --summary", but there's most likely also some easy way
> without cogito.
>
>
> > > > Andrew Morton (1789):
> > > > Adrian Bunk (1774):
> > > > Al Viro (1735):
> > > > Ingo Molnar (1393):
> > > > Ralf Baechle (1367):
> > > > Jeff Garzik (1291):
> > > > Takashi Iwai (1196):
> > > > Tejun Heo (1092):
> > > > Bartlomiej Zolnierkiewicz (1071):
> > > > David S. Miller (1069):
> > > > Patrick McHardy (1031):
> > > > Stephen Hemminger (1017):
> > > > Russell King (985):
> > > > Andi Kleen (973):
> > > > Thomas Gleixner (953):
> > > > Alan Cox (822):
> > > > Paul Mundt (813):
> > > > Dave Miller (787):
> > >
> > >
> > > Dave Miller = David S. Miller
> > >
> > > In this case your .mailmap made the result worse...
> >
> > Easily fixed, I just missed one of his aliases.
>
>
> There were two:
> Dave Miller <[email protected]>
> David Miller <[email protected]>

Added that one, now he has 11 aliases.
Two more developers disappear.

Linus Torvalds (4350):
David S. Miller (1858):
Andrew Morton (1789):
Adrian Bunk (1774):
Al Viro (1735):
Ingo Molnar (1393):
Ralf Baechle (1367):
Jeff Garzik (1291):
Takashi Iwai (1196):
Tejun Heo (1092):
Bartlomiej Zolnierkiewicz (1071):
Patrick McHardy (1031):
Stephen Hemminger (1017):
Russell King (985):
Andi Kleen (973):
Thomas Gleixner (953):
Alan Cox (822):
Paul Mundt (813):
Jean Delvare (771):



>
> >...
>
> > Jon Smirl
>
> cu
> Adrian
>
> --
>
> "Is there not promise of rain?" Ling Tan asked suddenly out
> of the darkness. There had been need of rain for many days.
> "Only a promise," Lao Er said.
> Pearl S. Buck - Dragon Seed
>
>


--
Jon Smirl
[email protected]

2008-07-28 16:13:15

by [email protected]

[permalink] [raw]
Subject: Re: 463 kernel developers missing!

Andrew was getting credit for Len Brown's patches, new top 20.

Of course the list has errors in it, but they are easy to find and
fix. Once they are fixed they will never change since they reflect the
existing history in the tree.

Let's get the main patch in and then everyone can send in little
patches to fix the errors. I've removed all of the errors I can via
scripts, now it is a manual process to spot the aliases.

Linus Torvalds (4350):
David S. Miller (1858):
Adrian Bunk (1774):
Al Viro (1735):
Ingo Molnar (1393):
Ralf Baechle (1367):
Jeff Garzik (1291):
Andrew Morton (1280):
Takashi Iwai (1196):
Tejun Heo (1092):
Bartlomiej Zolnierkiewicz (1071):
Patrick McHardy (1031):
Stephen Hemminger (1017):
Russell King (985):
Andi Kleen (973):
Thomas Gleixner (953):
Alan Cox (822):
Paul Mundt (813):
Jean Delvare (771):


--
Jon Smirl
[email protected]

2008-07-28 16:20:31

by Joe Perches

[permalink] [raw]
Subject: Re: 463 kernel developers missing!

On Mon, 2008-07-28 at 10:45 -0400, Jon Smirl wrote:
> if you aren't in mailmap part of your commit
> needs to include a new entry for mailmap.

Why?

2008-07-28 16:50:58

by [email protected]

[permalink] [raw]
Subject: Re: 463 kernel developers missing!

On 7/28/08, Joe Perches <[email protected]> wrote:
> On Mon, 2008-07-28 at 10:45 -0400, Jon Smirl wrote:
> > if you aren't in mailmap part of your commit
> > needs to include a new entry for mailmap.
>
> Why?

So that we can validate against future misspellings of your name and
email address. It's all about doing validation and stopping the
errors. The current log has over 1,000 typos in the names and emails.
Getting rid of the typos makes it possible to generate clean
statistics.

If you're paranoid get a new gmail address for each commit. From
reading the list of authors I'd say we have about 10 paranoid people
out of 3,840.

>
>


--
Jon Smirl
[email protected]

2008-07-28 16:54:52

by Simon Arlott

[permalink] [raw]
Subject: Re: 463 kernel developers missing!

On 28/07/08 15:45, Jon Smirl wrote:
> Here's a new .mailmap file for the kernel that cleans up the horrible
> mess of names and email addresses in the log. To use it put it at the
> root of your kernel tree and type 'git shortlog'. Before the clean up
> there were 4,284 developers, after 3,821. There are 5,051 unique
> emails.
>
> The mailmap file contains all email addresses that have been used to
> submit patches to the kernel. Don't freak out about your email address
> being in the file, if it is in the file it is already in Google since
> the kernel log is already in Google.

Just because anyone can grep the kernel log for email addresses [to
send spam to], doesn't mean that you need to do it for them.

Please read git-shortlog(1) and then remove me from this file because
it won't change anything.

--
Simon Arlott

2008-07-28 17:06:03

by [email protected]

[permalink] [raw]
Subject: Re: 463 kernel developers missing!

On 7/28/08, Simon Arlott <[email protected]> wrote:
> On 28/07/08 15:45, Jon Smirl wrote:
>
> > Here's a new .mailmap file for the kernel that cleans up the horrible
> > mess of names and email addresses in the log. To use it put it at the
> > root of your kernel tree and type 'git shortlog'. Before the clean up
> > there were 4,284 developers, after 3,821. There are 5,051 unique
> > emails.
> >
> > The mailmap file contains all email addresses that have been used to
> > submit patches to the kernel. Don't freak out about your email address
> > being in the file, if it is in the file it is already in Google since
> > the kernel log is already in Google.
> >
>
> Just because anyone can grep the kernel log for email addresses [to send
> spam to], doesn't mean that you need to do it for them.

You need to be in the file since you have submitted patches using three aliases.

> Please read git-shortlog(1) and then remove me from this file because it
> won't change anything.
>
> --
> Simon Arlott
>


--
Jon Smirl
[email protected]

2008-07-28 17:08:21

by Stefan Richter

[permalink] [raw]
Subject: Re: 463 kernel developers missing!

Jon Smirl wrote:
> Getting rid of the typos makes it possible to generate clean
> statistics.

It actually doesn't.
--
Stefan Richter
-=====-==--- -=== ===--
http://arcgraph.de/sr/

2008-07-28 17:10:58

by Simon Arlott

[permalink] [raw]
Subject: Re: 463 kernel developers missing!

On 28/07/08 18:05, Jon Smirl wrote:
> On 7/28/08, Simon Arlott <[email protected]> wrote:
>> On 28/07/08 15:45, Jon Smirl wrote:
>>
>> > Here's a new .mailmap file for the kernel that cleans up the horrible
>> > mess of names and email addresses in the log. To use it put it at the
>> > root of your kernel tree and type 'git shortlog'. Before the clean up
>> > there were 4,284 developers, after 3,821. There are 5,051 unique
>> > emails.
>> >
>> > The mailmap file contains all email addresses that have been used to
>> > submit patches to the kernel. Don't freak out about your email address
>> > being in the file, if it is in the file it is already in Google since
>> > the kernel log is already in Google.
>> >
>>
>> Just because anyone can grep the kernel log for email addresses [to send
>> spam to], doesn't mean that you need to do it for them.
>
> You need to be in the file since you have submitted patches using three aliases.

No, I've submitted patches using three email addresses (well, two - one is
a typo).

>> Please read git-shortlog(1) and then remove me from this file because it
>> won't change anything.

Try running "git shortlog" too, you'll see I only appear once using the
existing 99-line .mailmap file.

--
Simon Arlott

2008-07-28 17:15:52

by Christian Borntraeger

[permalink] [raw]
Subject: Re: 463 kernel developers missing!

Am Montag, 28. Juli 2008 schrieb Jon Smirl:
> > You count merges as patches.
>
> I just used the output from git shortlog, is there a better way?

git-shortlog --no-merges

You can also avoid the grepping:

git-shortlog -n -s --no-merges

2008-07-28 17:22:17

by [email protected]

[permalink] [raw]
Subject: Re: 463 kernel developers missing!

On 7/28/08, Simon Arlott <[email protected]> wrote:
> On 28/07/08 18:05, Jon Smirl wrote:
>
> > On 7/28/08, Simon Arlott <[email protected]> wrote:
> >
> > > On 28/07/08 15:45, Jon Smirl wrote:
> > >
> > > > Here's a new .mailmap file for the kernel that cleans up the horrible
> > > > mess of names and email addresses in the log. To use it put it at the
> > > > root of your kernel tree and type 'git shortlog'. Before the clean up
> > > > there were 4,284 developers, after 3,821. There are 5,051 unique
> > > > emails.
> > > >
> > > > The mailmap file contains all email addresses that have been used to
> > > > submit patches to the kernel. Don't freak out about your email address
> > > > being in the file, if it is in the file it is already in Google since
> > > > the kernel log is already in Google.
> > > >
> > >
> > > Just because anyone can grep the kernel log for email addresses [to
> send
> > > spam to], doesn't mean that you need to do it for them.
> > >
> >
> > You need to be in the file since you have submitted patches using three
> aliases.
> >
>
> No, I've submitted patches using three email addresses (well, two - one is
> a typo).

That's the whole point of this list. When you submit patches in the
future we can check your name/email against the list and flag it if it
isn't there. That will alert you that you've made a typo.

A later version of this list could separate the valid current
names/addresses from the entries that are fixing typos or that have
old emails. That would improve the validation. But I don't have an
automated way to tell me which alias is the current one. Access to the
current LKML subscriber list would supply the needed info as to which
one to pick.

> >
> > > Please read git-shortlog(1) and then remove me from this file because
> it
> > > won't change anything.
> > >
> >
>
> Try running "git shortlog" too, you'll see I only appear once using the
> existing 99-line .mailmap file.
>
> --
> Simon Arlott
>


--
Jon Smirl
[email protected]

2008-07-28 17:26:16

by [email protected]

[permalink] [raw]
Subject: Re: 463 kernel developers missing!

On 7/28/08, Christian Borntraeger <[email protected]> wrote:
> Am Montag, 28. Juli 2008 schrieb Jon Smirl:
>
> > > You count merges as patches.
> >
> > I just used the output from git shortlog, is there a better way?
>
>
> git-shortlog --no-merges
>
> You can also avoid the grepping:
>
> git-shortlog -n -s --no-merges
>

Top 20 without the merges.
Apparently Linus really doesn't write code.

1770 Adrian Bunk
1735 Al Viro
1716 David S. Miller
1367 Ralf Baechle
1280 Andrew Morton
1277 Ingo Molnar
1196 Takashi Iwai
1091 Tejun Heo
1071 Bartlomiej Zolnierkiewicz
1031 Patrick McHardy
1016 Stephen Hemminger
969 Andi Kleen
945 Thomas Gleixner
904 Russell King
822 Alan Cox
809 Paul Mundt
771 Jean Delvare
727 Trond Myklebust
721 Michael Krufky



--
Jon Smirl
[email protected]

2008-07-28 18:00:50

by Michael Ira Krufky

[permalink] [raw]
Subject: Re: 463 kernel developers missing!

On Mon, Jul 28, 2008 at 1:22 PM, Jon Smirl <[email protected]> wrote:
> On 7/28/08, Simon Arlott <[email protected]> wrote:
> A later version of this list could separate the valid current
> names/addresses from the entries that are fixing typos or that have
> old emails. That would improve the validation. But I don't have an
> automated way to tell me which alias is the current one. Access to the
> current LKML subscriber list would supply the needed info as to which
> one to pick.

Please dont use LKML subscriptions as the authority of one's preferred
email address.

I, for instance, am *only* subscribed to LKML using my gmail account.
I prefer that nobody ever email my gmail account directly -- I use my
gmail account as a filter -- gmail filters my mails and fwd's specific
mails to my other specific email addresses -- I rarely read gmail
directly, and I am unlikely to ever read an email addressed to my
gmail box.

I favor my "at linuxtv dot org" account for my kernel work, and I hope
that is the email that shows up as primary for me (I would only guess
that I have one or two aliases in this .mailmap file.)

Regards,

Mike

2008-07-28 18:11:12

by [email protected]

[permalink] [raw]
Subject: Re: 463 kernel developers missing!

On 7/28/08, Michael Krufky <[email protected]> wrote:
> On Mon, Jul 28, 2008 at 1:22 PM, Jon Smirl <[email protected]> wrote:
> > On 7/28/08, Simon Arlott <[email protected]> wrote:
>
> > A later version of this list could separate the valid current
> > names/addresses from the entries that are fixing typos or that have
> > old emails. That would improve the validation. But I don't have an
> > automated way to tell me which alias is the current one. Access to the
> > current LKML subscriber list would supply the needed info as to which
> > one to pick.
>
>
> Please dont use LKML subscriptions as the authority of one's preferred
> email address.

I was only going to use it to help decide which alias was the right
alias, not to generate new entries. In your case it wouldn't help.

>
> I, for instance, am *only* subscribed to LKML using my gmail account.
> I prefer that nobody ever email my gmail account directly -- I use my
> gmail account as a filter -- gmail filters my mails and fwd's specific
> mails to my other specific email addresses -- I rarely read gmail
> directly, and I am unlikely to ever read an email addressed to my
> gmail box.
>
> I favor my "at linuxtv dot org" account for my kernel work, and I hope
> that is the email that shows up as primary for me (I would only guess
> that I have one or two aliases in this .mailmap file.)

You have used four aliases: krufky, infradead, linuxtv, m1k.

>
> Regards,
>
> Mike
>


--
Jon Smirl
[email protected]

2008-07-28 18:11:35

by Simon Arlott

[permalink] [raw]
Subject: Re: 463 kernel developers missing!

On 28/07/08 18:22, Jon Smirl wrote:
> On 7/28/08, Simon Arlott <[email protected]> wrote:
>> On 28/07/08 18:05, Jon Smirl wrote:
>> > On 7/28/08, Simon Arlott <[email protected]> wrote:
>> > > Just because anyone can grep the kernel log for email addresses [to
>> send
>> > > spam to], doesn't mean that you need to do it for them.
>> > >
>> >
>> > You need to be in the file since you have submitted patches using three
>> aliases.
>> >
>>
>> No, I've submitted patches using three email addresses (well, two - one is
>> a typo).
>
> That's the whole point of this list. When you submit patches in the
> future we can check your name/email against the list and flag it if it
> isn't there. That will alert you that you've made a typo.

I don't make typos in my name or email address.

> A later version of this list could separate the valid current
> names/addresses from the entries that are fixing typos or that have
> old emails. That would improve the validation. But I don't have an
> automated way to tell me which alias is the current one. Access to the
> current LKML subscriber list would supply the needed info as to which
> one to pick.

Validation of what?
The current alias is the most recent active one.

I'm not subscribed to the LKML with a public address.
Please Cc: me if you submit a patch for this so I can add a Nacked-By:
and/or cleanup the list by removing redundant entires. Even if git-shortlog
is changed to distinguish between people who share the same name, it would
still be a list of exceptions rather than everyone.

--
Simon Arlott

2008-07-28 18:20:11

by [email protected]

[permalink] [raw]
Subject: Re: 463 kernel developers missing!

On 7/28/08, Simon Arlott <[email protected]> wrote:
> On 28/07/08 18:22, Jon Smirl wrote:
>
> > On 7/28/08, Simon Arlott <[email protected]> wrote:
> >
> > > On 28/07/08 18:05, Jon Smirl wrote:
> > > > On 7/28/08, Simon Arlott <[email protected]> wrote:
> > > > > Just because anyone can grep the kernel log for email addresses [to
> > > send
> > > > > spam to], doesn't mean that you need to do it for them.
> > > > >
> > > >
> > > > You need to be in the file since you have submitted patches using
> three
> > > aliases.
> > > >
> > >
> > > No, I've submitted patches using three email addresses (well, two - one
> is
> > > a typo).
> > >
> >
> > That's the whole point of this list. When you submit patches in the
> > future we can check your name/email against the list and flag it if it
> > isn't there. That will alert you that you've made a typo.
> >
>
> I don't make typos in my name or email address.

You just admitted six lines early that you made a typo.

> > A later version of this list could separate the valid current
> > names/addresses from the entries that are fixing typos or that have
> > old emails. That would improve the validation. But I don't have an
> > automated way to tell me which alias is the current one. Access to the
> > current LKML subscriber list would supply the needed info as to which
> > one to pick.
> >
>
> Validation of what?
> The current alias is the most recent active one.
>
> I'm not subscribed to the LKML with a public address.
> Please Cc: me if you submit a patch for this so I can add a Nacked-By:
> and/or cleanup the list by removing redundant entires. Even if git-shortlog
> is changed to distinguish between people who share the same name, it would
> still be a list of exceptions rather than everyone.

Comparing to LKML subscribers would not be used to generate new email
addresses. It would only help to identify which existing alias is your
current one. But we don't have to do it that way, I can use the email
of your most recent commit as your current address and you can fix it
if it is wrong. Only email addresses that appear in the kernel log
should appear in the mailmap file.


>
> --
> Simon Arlott
>


--
Jon Smirl
[email protected]

2008-07-28 18:35:00

by Simon Arlott

[permalink] [raw]
Subject: Re: 463 kernel developers missing!

On 28/07/08 19:19, Jon Smirl wrote:
> On 7/28/08, Simon Arlott <[email protected]> wrote:
>> On 28/07/08 18:22, Jon Smirl wrote:
>> > On 7/28/08, Simon Arlott <[email protected]> wrote:
>> > > No, I've submitted patches using three email addresses (well, two - one
>> > > is a typo).
>> > >
>> >
>> > That's the whole point of this list. When you submit patches in the
>> > future we can check your name/email against the list and flag it if it
>> > isn't there. That will alert you that you've made a typo.
>> >
>>
>> I don't make typos in my name or email address.
>
> You just admitted six lines early that you made a typo.

No, all I said was that one is a typo. I did not make it.

--
Simon Arlott

2008-07-28 19:00:27

by [email protected]

[permalink] [raw]
Subject: Re: 463 kernel developers missing!

On 7/28/08, Simon Arlott <[email protected]> wrote:
> On 28/07/08 19:19, Jon Smirl wrote:
>
> > On 7/28/08, Simon Arlott <[email protected]> wrote:
> >
> > > On 28/07/08 18:22, Jon Smirl wrote:
> > > > On 7/28/08, Simon Arlott <[email protected]> wrote:
> > > > > No, I've submitted patches using three email addresses (well, two -
> one
> > > > > is a typo).
> > > > >
> > > >
> > > > That's the whole point of this list. When you submit patches in the
> > > > future we can check your name/email against the list and flag it if it
> > > > isn't there. That will alert you that you've made a typo.
> > > >
> > >
> > > I don't make typos in my name or email address.
> > >
> >
> > You just admitted six lines early that you made a typo.
> >
>
> No, all I said was that one is a typo. I did not make it.

Other people aren't perfect, I've found over 1,000 typos in the those
names and emails. We need a validation mechanism.

--
Jon Smirl
[email protected]

2008-07-28 20:23:15

by Theodore Ts'o

[permalink] [raw]
Subject: Re: 463 kernel developers missing!

On Mon, Jul 28, 2008 at 03:00:13PM -0400, Jon Smirl wrote:
> Other people aren't perfect, I've found over 1,000 typos in the those
> names and emails. We need a validation mechanism.
>

You keep using the word "need"; I do not think it means what you think
it does. :-)

Seriously, why is it so important? It's a nice to have, and I
recognize that you've spent a bunch of time on it. But if the goal is
to get better statistics, and in exchange we forcibly map all Mark
Browns to one e-mail address, and/or force them to all adopt middle
initials (what if there are two Dan Smith's that don't have middle
initials) just for the convenience of your statistics gathering, I
would gently suggest to you that you've forgotten which is the tail,
and which is the dog.

Regards,

- Ted

2008-07-28 20:38:26

by [email protected]

[permalink] [raw]
Subject: Re: 463 kernel developers missing!

On 7/28/08, Theodore Tso <[email protected]> wrote:
> On Mon, Jul 28, 2008 at 03:00:13PM -0400, Jon Smirl wrote:
> > Other people aren't perfect, I've found over 1,000 typos in the those
> > names and emails. We need a validation mechanism.
> >
>
>
> You keep using the word "need"; I do not think it means what you think
> it does. :-)
>
> Seriously, why is it so important? It's a nice to have, and I
> recognize that you've spent a bunch of time on it. But if the goal is
> to get better statistics, and in exchange we forcibly map all Mark
> Browns to one e-mail address, and/or force them to all adopt middle
> initials (what if there are two Dan Smith's that don't have middle
> initials) just for the convenience of your statistics gathering, I
> would gently suggest to you that you've forgotten which is the tail,
> and which is the dog.

There are over 1,000 typos in the logs. No validation being done on
the names/addresses in the logs. Many email addresses aren't
syntactically valid. Why not put some checks in place to try and clean
this up? Signed-off-by is worthless if it is full of garbage.

The are two Mark Browns in the file:
Mark Brown <[email protected]>
Mark Brown <[email protected]>

I don't know if these are two different people or one person with two
emails. But the file doesn't force that decision. It's git shortlog
that is combining them.

The file serves two purposes:
Map people using multiple email aliases a human single name, It can be
any name they choose. Existing file already does this but the list is
not complete.
Enumerate all email addresses used in the log so that it is possible
to tell when a new address is encountered. Allows simple validation to
be implemented.

In it's current form it doesn't indicate which aliases is the
developer's currently active one.

--
Jon Smirl
[email protected]

2008-07-28 20:53:26

by Dave Jones

[permalink] [raw]
Subject: Re: 463 kernel developers missing!

On Mon, Jul 28, 2008 at 04:22:36PM -0400, Theodore Tso wrote:
> On Mon, Jul 28, 2008 at 03:00:13PM -0400, Jon Smirl wrote:
> > Other people aren't perfect, I've found over 1,000 typos in the those
> > names and emails. We need a validation mechanism.
> >
>
> You keep using the word "need"; I do not think it means what you think
> it does. :-)
>
> Seriously, why is it so important? It's a nice to have, and I
> recognize that you've spent a bunch of time on it. But if the goal is
> to get better statistics, and in exchange we forcibly map all Mark
> Browns to one e-mail address, and/or force them to all adopt middle
> initials (what if there are two Dan Smith's that don't have middle
> initials) just for the convenience of your statistics gathering, I
> would gently suggest to you that you've forgotten which is the tail,
> and which is the dog.

I'm beginning to question just how useful the continued measuring
of things like Signed-off-by's is. Last week at OLS, I overheard
a conversation where someone was talking about the "top 10" lists
that Greg has been talking about at various conferences.
The conversation went along the lines of "my manager really wants
to see us on that list, at any cost".
Whilst the niave may think 'more patches == more better', this isn't
necessarily the case given we have nowhere near enough review bandwidth
*now*, and flooding with a zillion trivial patches really isn't going
to make that job any easier.

Getting patches into the tree is easy, we've proven that.
As things stand now, it's also fairly easy to 'game' the system
by committing something in 10 changesets when it could be done
just as easily in 2-3.

How about we start measuring things that actually matter, like..

"How many patches were reviewed before they went in"
"How many patches were directly responsible for a bug"
"How many patches actually fixed something anyone cares about"
"How many patches are responsible for just 'churn'"

Dave

--
http://www.codemonkey.org.uk

2008-07-28 21:23:36

by Randy Dunlap

[permalink] [raw]
Subject: Re: 463 kernel developers missing!

On Mon, 28 Jul 2008 16:46:24 -0400 Dave Jones wrote:

> On Mon, Jul 28, 2008 at 04:22:36PM -0400, Theodore Tso wrote:
> > On Mon, Jul 28, 2008 at 03:00:13PM -0400, Jon Smirl wrote:
> > > Other people aren't perfect, I've found over 1,000 typos in the those
> > > names and emails. We need a validation mechanism.
> > >
> >
> > You keep using the word "need"; I do not think it means what you think
> > it does. :-)
> >
> > Seriously, why is it so important? It's a nice to have, and I
> > recognize that you've spent a bunch of time on it. But if the goal is
> > to get better statistics, and in exchange we forcibly map all Mark
> > Browns to one e-mail address, and/or force them to all adopt middle
> > initials (what if there are two Dan Smith's that don't have middle
> > initials) just for the convenience of your statistics gathering, I
> > would gently suggest to you that you've forgotten which is the tail,
> > and which is the dog.
>
> I'm beginning to question just how useful the continued measuring
> of things like Signed-off-by's is. Last week at OLS, I overheard
> a conversation where someone was talking about the "top 10" lists
> that Greg has been talking about at various conferences.
> The conversation went along the lines of "my manager really wants
> to see us on that list, at any cost".
> Whilst the niave may think 'more patches == more better', this isn't
> necessarily the case given we have nowhere near enough review bandwidth
> *now*, and flooding with a zillion trivial patches really isn't going
> to make that job any easier.
>
> Getting patches into the tree is easy, we've proven that.
> As things stand now, it's also fairly easy to 'game' the system
> by committing something in 10 changesets when it could be done
> just as easily in 2-3.
>
> How about we start measuring things that actually matter, like..
>
> "How many patches were reviewed before they went in"
> "How many patches were directly responsible for a bug"
> "How many patches actually fixed something anyone cares about"
> "How many patches are responsible for just 'churn'"

It would be Good if we could give more value to Reviewed-by: tag lines also...

IOW, we "need" to do this. :)


---
~Randy
Linux Plumbers Conference, 17-19 September 2008, Portland, Oregon USA
http://linuxplumbersconf.org/

2008-07-28 22:03:37

by James Morris

[permalink] [raw]
Subject: Re: 463 kernel developers missing!

On Mon, 28 Jul 2008, Randy Dunlap wrote:

> It would be Good if we could give more value to Reviewed-by: tag lines also...
>
> IOW, we "need" to do this. :)

Also, Tested-by:, to encourage and recognize people who may not be
confident in reviewing code to at least test it, which is immensely
useful if done thoughtfully.

"Measuring programming progress by lines of code is like measuring
aircraft building progress by weight."

If you know who said this, award yourself a cookie :-)


- James
--
James Morris
<[email protected]>

2008-07-28 22:08:43

by [email protected]

[permalink] [raw]
Subject: Re: 463 kernel developers missing!

On 7/28/08, Dave Jones <[email protected]> wrote:
> On Mon, Jul 28, 2008 at 04:22:36PM -0400, Theodore Tso wrote:
> > On Mon, Jul 28, 2008 at 03:00:13PM -0400, Jon Smirl wrote:
> > > Other people aren't perfect, I've found over 1,000 typos in the those
> > > names and emails. We need a validation mechanism.
> > >
> >
> > You keep using the word "need"; I do not think it means what you think
> > it does. :-)
> >
> > Seriously, why is it so important? It's a nice to have, and I
> > recognize that you've spent a bunch of time on it. But if the goal is
> > to get better statistics, and in exchange we forcibly map all Mark
> > Browns to one e-mail address, and/or force them to all adopt middle
> > initials (what if there are two Dan Smith's that don't have middle
> > initials) just for the convenience of your statistics gathering, I
> > would gently suggest to you that you've forgotten which is the tail,
> > and which is the dog.
>
>
> I'm beginning to question just how useful the continued measuring
> of things like Signed-off-by's is. Last week at OLS, I overheard
> a conversation where someone was talking about the "top 10" lists
> that Greg has been talking about at various conferences.
> The conversation went along the lines of "my manager really wants
> to see us on that list, at any cost".

I didn't do this to measure statistics, I did it because I was writing
a script and the script was getting garbage for input. It just had the
side effect of cleaning up the statistics.

> Whilst the naive may think 'more patches == more better', this isn't
> necessarily the case given we have nowhere near enough review bandwidth
> *now*, and flooding with a zillion trivial patches really isn't going
> to make that job any easier.
>
> Getting patches into the tree is easy, we've proven that.
> As things stand now, it's also fairly easy to 'game' the system
> by committing something in 10 changesets when it could be done
> just as easily in 2-3.
>
> How about we start measuring things that actually matter, like..
>
> "How many patches were reviewed before they went in"
> "How many patches were directly responsible for a bug"
> "How many patches actually fixed something anyone cares about"
> "How many patches are responsible for just 'churn'"
>

These are good topics for the Plumbers conference. But to ask these
questions we need to get the data into a format where a computer can
process it. Syntax checking, validation, etc are needed on the log
messages. I'm not going to hunt through 100,000 commits trying to
answer these by hand.

Another fun experiment would be to load an archive of LKML, kernel
bugzilla and the kernel source history into git and then try to link
everything together. The cleaner the data is, the easier it will be to
link things. How about a GUI where each patch is annotated with a link
to the email thread discussing it?

--
Jon Smirl
[email protected]

2008-07-28 22:33:17

by Theodore Ts'o

[permalink] [raw]
Subject: Re: 463 kernel developers missing!

On Mon, Jul 28, 2008 at 06:08:33PM -0400, Jon Smirl wrote:
> I didn't do this to measure statistics, I did it because I was writing
> a script and the script was getting garbage for input. It just had the
> side effect of cleaning up the statistics.

Out of curiosity, what is your script trying to do?

- Ted

2008-07-28 22:39:41

by Randy Dunlap

[permalink] [raw]
Subject: Re: 463 kernel developers missing!

On Mon, 28 Jul 2008 18:32:41 -0400 Theodore Tso wrote:

> On Mon, Jul 28, 2008 at 06:08:33PM -0400, Jon Smirl wrote:
> > I didn't do this to measure statistics, I did it because I was writing
> > a script and the script was getting garbage for input. It just had the
> > side effect of cleaning up the statistics.
>
> Out of curiosity, what is your script trying to do?


Speaking of missing developers, I'd be more interested in whatever
happened to Michal Piotrowski, Satyam Sharma, et al...


---
~Randy
Linux Plumbers Conference, 17-19 September 2008, Portland, Oregon USA
http://linuxplumbersconf.org/

2008-07-28 22:39:54

by Stefan Richter

[permalink] [raw]
Subject: Re: 463 kernel developers missing!

Jon Smirl wrote:
> Another fun experiment would be to load an archive of LKML, kernel
> bugzilla and the kernel source history into git and then try to link
> everything together.

Another fun experiment: Fetch 10 open bugs in bugzilla which may affect
your hardware, try to reproduce one of them, fix it.
--
Stefan Richter
-=====-==--- -=== ===-=
http://arcgraph.de/sr/

2008-07-28 22:52:36

by [email protected]

[permalink] [raw]
Subject: Re: 463 kernel developers missing!

On 7/28/08, Theodore Tso <[email protected]> wrote:
> On Mon, Jul 28, 2008 at 06:08:33PM -0400, Jon Smirl wrote:
> > I didn't do this to measure statistics, I did it because I was writing
> > a script and the script was getting garbage for input. It just had the
> > side effect of cleaning up the statistics.
>
>
> Out of curiosity, what is your script trying to do?

I was trying to locate my patches in other private trees that were
ready for deletion. I wanted to make sure there wasn't something good
that I had forgotten about. I processed the output from 'git log' and
got tripped up matching the author field because is was full of junk.
My database background kicked in and I found myself on a tangent
cleaning up the data.

I have since learned about the existence of 'git shortlog' which
solved my problem. But I had already cleaned up the data before
finding it.

>
> - Ted
>


--
Jon Smirl
[email protected]

2008-07-28 23:42:42

by Paul Mundt

[permalink] [raw]
Subject: Re: 463 kernel developers missing!

On Tue, Jul 29, 2008 at 08:01:09AM +1000, James Morris wrote:
> On Mon, 28 Jul 2008, Randy Dunlap wrote:
>
> > It would be Good if we could give more value to Reviewed-by: tag lines also...
> >
> > IOW, we "need" to do this. :)
>
> Also, Tested-by:, to encourage and recognize people who may not be
> confident in reviewing code to at least test it, which is immensely
> useful if done thoughtfully.
>
> "Measuring programming progress by lines of code is like measuring
> aircraft building progress by weight."
>
> If you know who said this, award yourself a cookie :-)
>
Or just filter on "-by:", which seems to get anything relevant, including
people that shamelessly make up their own tags. In order for something to
be converted from a Cc: to a *-by: requires manual effort at least, which
ought to be sufficient for recognition.

If someone was really bored they could probably make a table of tags with
various points to try and balance things slightly more objectively.
Though it seems we now at least have totally different metrics on LWN,
for the kernel summit selection process, and Jon's new script. ;-)

Trying to map all of the names seems pretty pointless though, most
regular contributors contribute in a fairly consistent and sane manner,
with the odd mismatch or typo here or there. It might make sense for
anyone where there's a significant difference, but those are going to be
corner cases.

2008-07-29 00:14:32

by [email protected]

[permalink] [raw]
Subject: Re: 463 kernel developers missing!

On 7/28/08, Paul Mundt <[email protected]> wrote:
> On Tue, Jul 29, 2008 at 08:01:09AM +1000, James Morris wrote:
> > On Mon, 28 Jul 2008, Randy Dunlap wrote:
> >
> > > It would be Good if we could give more value to Reviewed-by: tag lines also...
> > >
> > > IOW, we "need" to do this. :)
> >
> > Also, Tested-by:, to encourage and recognize people who may not be
> > confident in reviewing code to at least test it, which is immensely
> > useful if done thoughtfully.
> >
> > "Measuring programming progress by lines of code is like measuring
> > aircraft building progress by weight."
> >
> > If you know who said this, award yourself a cookie :-)
> >
>
> Or just filter on "-by:", which seems to get anything relevant, including
> people that shamelessly make up their own tags. In order for something to
> be converted from a Cc: to a *-by: requires manual effort at least, which
> ought to be sufficient for recognition.
>
> If someone was really bored they could probably make a table of tags with
> various points to try and balance things slightly more objectively.
> Though it seems we now at least have totally different metrics on LWN,
> for the kernel summit selection process, and Jon's new script. ;-)
>
> Trying to map all of the names seems pretty pointless though, most
> regular contributors contribute in a fairly consistent and sane manner,
> with the odd mismatch or typo here or there. It might make sense for
> anyone where there's a significant difference, but those are going to be
> corner cases.

12% of the name/email pairs are messed up. It's not all simple typos.
There is significant mangling of non ASCII charsets by people's tools
in the maintainer's chain of processing. Half of the time I don't
believe what the author is submitting is what is ending up in the log
due to mangling. It's a larger source of noise than typos.

All of these variations on email names are in the log. Humans can
identify these problems, it is much harder for a machine.

For example, where are these backslashes coming from?
Auke-Jan H Kok <[email protected]>
Auke-Jan H Kok <auke\[email protected]>
Auke-Jan H Kok <auke\\[email protected]>
Auke-Jan H Kok <auke\\\[email protected]>
Auke-Jan H Kok <[email protected]>

Are the tools case sensitive or insensitive on email addresses? Some
are are some aren't, so I need these cases...
Al Viro <[email protected]>
Al Viro <[email protected]>
Al Viro <[email protected]>

Another problem is internal machine names...
David S. Miller <[email protected]>
David S. Miller <[email protected]>
David S. Miller <[email protected]>
David S. Miller <[email protected]>
David S. Miller <[email protected]>
David S. Miller <[email protected]>
David S. Miller <[email protected]>
David S. Miller <[email protected]>

Or varying the email name...
Alexey Starikovskiy <[email protected]>
Alexey Starikovskiy <[email protected]>
Alexey Starikovskiy <[email protected]>

Why do these all end in (none)?
Craig Hughes <[email protected].(none)>
Dave Neuer <[email protected].(none)>
David Brownell <[email protected].(none)>
David Woodhouse <[email protected].(none)>
Deepak Saxena <[email protected].(none)>
Enrico Scholz <[email protected].(none)>

--
Jon Smirl
[email protected]

2008-07-29 00:27:10

by Rene Herman

[permalink] [raw]
Subject: Re: 463 kernel developers missing!

On 29-07-08 02:14, Jon Smirl wrote:

> Why do these all end in (none)?
> Craig Hughes <[email protected].(none)>
> Dave Neuer <[email protected].(none)>
> David Brownell <[email protected].(none)>
> David Woodhouse <[email protected].(none)>
> Deepak Saxena <[email protected].(none)>
> Enrico Scholz <[email protected].(none)>

Because rmk rewrites addresses to comply with privacy laws. Another good
example of why this nonsense of yours is exactly that.

I checked and am personally in there three times, once even without any
valid email address listed. And any time there's anything other than my
gmail address in some submission it at least recently means that someone
_else_ took my from: address and stuck it on there and while I don't
terribly mind that generally, I find it really annoying to see even
those mistakes harvested into your hugely google-accessible resource.

This is just yet another example of the senseless robotic crap people
people just insist is "needed" and "valueable", but which is neither.

Nonsense it is.

Rene.

2008-07-29 00:35:00

by Paul Mundt

[permalink] [raw]
Subject: Re: 463 kernel developers missing!

On Tue, Jul 29, 2008 at 02:29:01AM +0200, Rene Herman wrote:
> On 29-07-08 02:14, Jon Smirl wrote:
>
> >Why do these all end in (none)?
> >Craig Hughes <[email protected].(none)>
> >Dave Neuer <[email protected].(none)>
> >David Brownell <[email protected].(none)>
> >David Woodhouse <[email protected].(none)>
> >Deepak Saxena <[email protected].(none)>
> >Enrico Scholz <[email protected].(none)>
>
> This is just yet another example of the senseless robotic crap people
> people just insist is "needed" and "valueable", but which is neither.
>
Speaking of which, lk-changelog did the same sort of thing back in the BK
days, which was at least useful for generating a pretty short log.
Perhaps it makes more sense to start from that if someone really wants to
waste their time on this. I'm still not sure what the point is though.

2008-07-29 00:50:52

by [email protected]

[permalink] [raw]
Subject: Re: 463 kernel developers missing!

On 7/28/08, Rene Herman <[email protected]> wrote:
> On 29-07-08 02:14, Jon Smirl wrote:
>
>
> > Why do these all end in (none)?
> > Craig Hughes <[email protected].(none)>
> > Dave Neuer <[email protected].(none)>
> > David Brownell <[email protected].(none)>
> > David Woodhouse <[email protected].(none)>
> > Deepak Saxena <[email protected].(none)>
> > Enrico Scholz <[email protected].(none)>
> >
>
> Because rmk rewrites addresses to comply with privacy laws. Another good
> example of why this nonsense of yours is exactly that.
>
> I checked and am personally in there three times, once even without any
> valid email address listed. And any time there's anything other than my
> gmail address in some submission it at least recently means that someone
> _else_ took my from: address and stuck it on there and while I don't
> terribly mind that generally, I find it really annoying to see even those
> mistakes harvested into your hugely google-accessible resource.

The emails in the list are extracted from the commit log. I did not
touch the emails. If your email is in there wrong it is in a log
message wrong. That doesn't necessarily mean you are the person who
put it into the log wrong, patches can get mangled when being passed
along the maintainer chain. The point of this file is to turn the
mistake back into something useful. Think of these are reverse
mappings, they convert errors back to usable names.

As for privacy, if you don't want your email address in a file like
this don't put it into a GPL'd public project. Generate a random name
and email for each patch you submit. Of course I'm having trouble with
a Signed-off-by: that can't be turned back into a person.
Signed-off-by is there to track the responsibility chain for a patch
and if the chain has been obfuscated what good is it?

> This is just yet another example of the senseless robotic crap people
> people just insist is "needed" and "valueable", but which is neither.
>
> Nonsense it is.
>
> Rene.
>


--
Jon Smirl
[email protected]

2008-07-29 01:15:37

by Al Viro

[permalink] [raw]
Subject: Re: 463 kernel developers missing!

On Mon, Jul 28, 2008 at 03:00:13PM -0400, Jon Smirl wrote:

> Other people aren't perfect, I've found over 1,000 typos in the those
> names and emails. We need a validation mechanism.

Who's "we", luser, and why would I possibly give a damn for your needs?

2008-07-29 01:25:54

by [email protected]

[permalink] [raw]
Subject: Re: 463 kernel developers missing!

On 7/28/08, Al Viro <[email protected]> wrote:
> On Mon, Jul 28, 2008 at 03:00:13PM -0400, Jon Smirl wrote:
>
>
> > Other people aren't perfect, I've found over 1,000 typos in the those
> > names and emails. We need a validation mechanism.
>
> Who's "we", luser, and why would I possibly give a damn for your needs?

Let's drop the whole Sign-off-by mechanism. If we can't be bothered to
clean up the junk in Signed-off-by why should we bother recording
them? Sign every patch Mickey Mouse, it has the same effect.

--
Jon Smirl
[email protected]

2008-07-29 01:37:27

by Al Viro

[permalink] [raw]
Subject: Re: 463 kernel developers missing!

On Mon, Jul 28, 2008 at 09:25:39PM -0400, Jon Smirl wrote:
> On 7/28/08, Al Viro <[email protected]> wrote:
> > On Mon, Jul 28, 2008 at 03:00:13PM -0400, Jon Smirl wrote:
> >
> >
> > > Other people aren't perfect, I've found over 1,000 typos in the those
> > > names and emails. We need a validation mechanism.
> >
> > Who's "we", luser, and why would I possibly give a damn for your needs?
>
> Let's drop the whole Sign-off-by mechanism. If we can't be bothered to
> clean up the junk in Signed-off-by why should we bother recording
> them? Sign every patch Mickey Mouse, it has the same effect.

That still doesn't answer either of my questions. As for your question, the
point is to have them good enough to make an individual changeset feasible
to track.

2008-07-29 02:01:20

by [email protected]

[permalink] [raw]
Subject: Re: 463 kernel developers missing!

On 7/28/08, Al Viro <[email protected]> wrote:
> On Mon, Jul 28, 2008 at 09:25:39PM -0400, Jon Smirl wrote:
> > On 7/28/08, Al Viro <[email protected]> wrote:
> > > On Mon, Jul 28, 2008 at 03:00:13PM -0400, Jon Smirl wrote:
> > >
> > >
> > > > Other people aren't perfect, I've found over 1,000 typos in the those
> > > > names and emails. We need a validation mechanism.
> > >
> > > Who's "we", luser, and why would I possibly give a damn for your needs?
> >
> > Let's drop the whole Sign-off-by mechanism. If we can't be bothered to
> > clean up the junk in Signed-off-by why should we bother recording
> > them? Sign every patch Mickey Mouse, it has the same effect.
>
> That still doesn't answer either of my questions. As for your question, the
> point is to have them good enough to make an individual changeset feasible
> to track.

The file lets you convert the mess that exists in the log file xx-by:
fields back into something reasonable. The messed up email addresses
are verbatim extracted from the log. There is one entry in the file
for each email address that appears in the log. The real names have
been fixed by script and hand to correspond a real name with the
extracted emails.

Now we will differ on the definition of feasible and whether we should
work to prevent more messed up emails/names from getting into the log.
That's the central question here, how much are you allowed to
obfuscate (on purpose or accidentally) your identity in an xx-by?

I should also point out that external information (Google) was needed
to identify several hundred names, there was insufficient information
in the log or kernel source. If we have to reconstruct this mapping
ten years from now for some random lawsuit, the external information
may not be there.

--
Jon Smirl
[email protected]

2008-07-29 02:50:32

by Theodore Ts'o

[permalink] [raw]
Subject: Re: 463 kernel developers missing!

On Mon, Jul 28, 2008 at 10:01:06PM -0400, Jon Smirl wrote:
> I should also point out that external information (Google) was needed
> to identify several hundred names, there was insufficient information
> in the log or kernel source. If we have to reconstruct this mapping
> ten years from now for some random lawsuit, the external information
> may not be there.

Jon,

The reality is ten years from now, many e-mail addresses won't
be accurate anyway. We will have to track people down by hand, if it
ever comes down to that. The signed-off-by needs to be enough so we
can track down someone (very likely only a few set of people); via a
manual method is quite acceptable. I don't think it is really
necessary to try force fit the signed-off-by just so we can collect
better mode.

It should also be noted that the Developer's Certification of
Origin 1.1 has laguage that was designed to make it legal to collect
the DCO lines even in the European Union. So what rmk is doing is
strictly speaking not necessary.

- Ted

2008-07-29 03:23:42

by [email protected]

[permalink] [raw]
Subject: Re: 463 kernel developers missing!

On 7/28/08, Theodore Tso <[email protected]> wrote:
> On Mon, Jul 28, 2008 at 10:01:06PM -0400, Jon Smirl wrote:
> > I should also point out that external information (Google) was needed
> > to identify several hundred names, there was insufficient information
> > in the log or kernel source. If we have to reconstruct this mapping
> > ten years from now for some random lawsuit, the external information
> > may not be there.
>
>
> Jon,
>
> The reality is ten years from now, many e-mail addresses won't
> be accurate anyway. We will have to track people down by hand, if it
> ever comes down to that. The signed-off-by needs to be enough so we
> can track down someone (very likely only a few set of people); via a
> manual method is quite acceptable. I don't think it is really
> necessary to try force fit the signed-off-by just so we can collect
> better mode.

The kernel already has a mailmap file, but it is not complete. So I
should just take this work that makes the mailmap file a lot better
and throw it away? The policy is that the log file should be messed up
enough so that a computer can't process it and that a human can
recover it only with several day's effort? That's a really hard line
to define and we'll probably lose the identity of a bunch of
contributors. I'll follow up with a patch that deletes the current
.mailmap

--
Jon Smirl
[email protected]

2008-07-29 04:13:52

by Theodore Ts'o

[permalink] [raw]
Subject: Re: 463 kernel developers missing!

On Mon, Jul 28, 2008 at 11:23:31PM -0400, Jon Smirl wrote:
> The kernel already has a mailmap file, but it is not complete. So I
> should just take this work that makes the mailmap file a lot better
> and throw it away? The policy is that the log file should be messed up
> enough so that a computer can't process it and that a human can
> recover it only with several day's effort? That's a really hard line
> to define and we'll probably lose the identity of a bunch of
> contributors. I'll follow up with a patch that deletes the current
> .mailmap

Personally, I have no objection to the mailmap file as it's on the
whole an improvement; if it's been automatically generated and it
falsely maps multiple people to a single person, that would be highly
unfortunate, but maybe it fixes more problems than it creates.

I think the part most people are seriously objecting to is that the
supposition that Linus and some of his top lieutenants should be
enforcing some arbitrary rule that rejects commits if they come from
addresses outside of your .mailmap file (unless they first send a
patch to add their e-mail address to the .mailmap file), in some kind
of misguided attempt to enforce validation, which apparently the main
justification for which is so that you and others can runs some
statistical analysis, of which there seems to be some dispute whether
or not encouraging people to compete to get into the top 20
signed-off-by by splitting up commits into 100 different micro-patches
should be considered a desirable side effect of said statistical
analysis.

As I said earlier, the moment you started advocating enforcing
validation, you may have started to confuse which is the tail and
which is the dog. People should be supplying patches to improve the
kernel; not to provide accurate fodder for statistical analysis.

- Ted

2008-07-29 04:15:51

by Theodore Ts'o

[permalink] [raw]
Subject: Re: 463 kernel developers missing!

On Tue, Jul 29, 2008 at 12:13:37AM -0400, Theodore Tso wrote:
> Personally, I have no objection to the mailmap file as it's on the
> whole an improvement; if it's been automatically generated and it
> falsely maps multiple people to a single person, that would be highly
> unfortunate, but maybe it fixes more problems than it creates.

Typo correction. The first part of that sentence should read:

"Personally, I have no objection to the mailmap file IF on the
whole it's an improvement...."

> I think the part most people are seriously objecting to is that the
> supposition that Linus and some of his top lieutenants should be
> enforcing some arbitrary rule that rejects commits if they come from
> addresses outside of your .mailmap file (unless they first send a
> patch to add their e-mail address to the .mailmap file), in some kind
> of misguided attempt to enforce validation, which apparently the main
> justification for which is so that you and others can runs some
> statistical analysis, of which there seems to be some dispute whether
> or not encouraging people to compete to get into the top 20
> signed-off-by by splitting up commits into 100 different micro-patches
> should be considered a desirable side effect of said statistical
> analysis.
>
> As I said earlier, the moment you started advocating enforcing
> validation, you may have started to confuse which is the tail and
> which is the dog. People should be supplying patches to improve the
> kernel; not to provide accurate fodder for statistical analysis.
>
> - Ted

2008-07-29 05:05:43

by [email protected]

[permalink] [raw]
Subject: Re: 463 kernel developers missing!

On 7/29/08, Theodore Tso <[email protected]> wrote:
> On Mon, Jul 28, 2008 at 11:23:31PM -0400, Jon Smirl wrote:
> > The kernel already has a mailmap file, but it is not complete. So I
> > should just take this work that makes the mailmap file a lot better
> > and throw it away? The policy is that the log file should be messed up
> > enough so that a computer can't process it and that a human can
> > recover it only with several day's effort? That's a really hard line
> > to define and we'll probably lose the identity of a bunch of
> > contributors. I'll follow up with a patch that deletes the current
> > .mailmap
>
>
> Personally, I have no objection to the mailmap file as it's on the
> whole an improvement; if it's been automatically generated and it
> falsely maps multiple people to a single person, that would be highly
> unfortunate, but maybe it fixes more problems than it creates.

The mapping multiple people to a single person problem was always
there, the new mailmap file doesn't alter it. There simply isn't
enough information in the kernel source to tell if there are two or
one Mark Browns. The file would need to be extended to encode more
information.

Mark Brown <[email protected]>
Mark Brown <[email protected]>

If the Marks want to separate themselves they will need to alter the
mailmap. With the new mailmap this is easily done. With the old one
you would have need to identify all of the aliases first.

It's the higher level tools that are combining these into a single person.

> I think the part most people are seriously objecting to is that the
> supposition that Linus and some of his top lieutenants should be
> enforcing some arbitrary rule that rejects commits if they come from
> addresses outside of your .mailmap file (unless they first send a
> patch to add their e-mail address to the .mailmap file), in some kind
> of misguided attempt to enforce validation, which apparently the main
> justification for which is so that you and others can runs some
> statistical analysis, of which there seems to be some dispute whether
> or not encouraging people to compete to get into the top 20
> signed-off-by by splitting up commits into 100 different micro-patches
> should be considered a desirable side effect of said statistical
> analysis.

That whole thread was pointless, the scripts for doing validation
don't exist. The stat tools are helpful in finding errors in the
mailmap file. I never cared about the stat results, I already know who
the top developers are. Let's drop the whole validation concept too
since it is obviously upsetting people.

There are two types of entries in the file. Ones that alter the names
associated with an email and ones that don't. You could argue that the
ones that don't alter the names aren't needed. They're in there to
make maintenance on the file easier.

Putting all emails in the file lets you do maintenance by extracting
the complete list of emails from the log and then removing the ones
already in the file. Now you only have to manually check these new
emails. If the unchanged entries were removed from the file they'd get
mixed in with the new emails. Each time you updated mailmap you'd have
a couple thousand emails to check.

Putting the unchanged entries in the file also makes it very easy for
people who want to alter their name entry. Just edit the mailmap file.
Everything is there and sorted by name. Change the name for all of
your aliases to whatever you want. Just make sure the names are all
identical on the aliases.

> As I said earlier, the moment you started advocating enforcing
> validation, you may have started to confuse which is the tail and
> which is the dog. People should be supplying patches to improve the
> kernel; not to provide accurate fodder for statistical analysis.

These addresses have more purposes than statistical analysis. They
also record the responsibility chain of who submitted the patch. It
seems prudent to me that we should make some effort to attempt to keep
that chain in a reasonably clean state.

I believe that people can get their name/email right in a patch 99% of
the time. The bulk of the 12% error rate appears to be coming from
maintainer tools mangling the patches and exposed internal mail server
names. The real message is that there are some tools that need to be
fixed.

--
Jon Smirl
[email protected]

2008-07-29 05:29:20

by Willy Tarreau

[permalink] [raw]
Subject: Re: 463 kernel developers missing!

On Mon, Jul 28, 2008 at 11:45:29AM -0400, Jon Smirl wrote:
> On 7/28/08, Adrian Bunk <[email protected]> wrote:
> > You count merges as patches.
>
> I just used the output from git shortlog, is there a better way?

git shortlog --no-merges

Willy

2008-07-29 09:59:24

by Nick Piggin

[permalink] [raw]
Subject: Re: 463 kernel developers missing!

On Tuesday 29 July 2008 06:46, Dave Jones wrote:
> On Mon, Jul 28, 2008 at 04:22:36PM -0400, Theodore Tso wrote:
> > On Mon, Jul 28, 2008 at 03:00:13PM -0400, Jon Smirl wrote:
> > > Other people aren't perfect, I've found over 1,000 typos in the those
> > > names and emails. We need a validation mechanism.
> >
> > You keep using the word "need"; I do not think it means what you think
> > it does. :-)
> >
> > Seriously, why is it so important? It's a nice to have, and I
> > recognize that you've spent a bunch of time on it. But if the goal is
> > to get better statistics, and in exchange we forcibly map all Mark
> > Browns to one e-mail address, and/or force them to all adopt middle
> > initials (what if there are two Dan Smith's that don't have middle
> > initials) just for the convenience of your statistics gathering, I
> > would gently suggest to you that you've forgotten which is the tail,
> > and which is the dog.
>
> I'm beginning to question just how useful the continued measuring
> of things like Signed-off-by's is. Last week at OLS, I overheard
> a conversation where someone was talking about the "top 10" lists
> that Greg has been talking about at various conferences.
> The conversation went along the lines of "my manager really wants
> to see us on that list, at any cost".
> Whilst the niave may think 'more patches == more better', this isn't
> necessarily the case given we have nowhere near enough review bandwidth
> *now*

This is one way of looking at "the problem". The other way to look at
it is that things are merged too quickly / without enough review, etc.

That is the problem kernel maintainers can actually do something about.
Or, they can just whine about "not enough review bandwidth".

There has been this complaining from lots of people about not enough
review bandwidth for quite a few years now. So I doubt it is going to
magically get better by making more noise.

Consider that there is probably virtually limitless amount of crap that
people want to try to merge, so there is always going to be a lack of
review bandwidth if the aim is to merge as much as we possibly can as
fast as we can.

The answer is to not make the problem worse by merging stuff faster
than can be reviewed. When that happens, developers and companies
should eventually assign a higher value to patch review.

2008-07-29 10:38:45

by Rene Herman

[permalink] [raw]
Subject: Re: 463 kernel developers missing!

On 29-07-08 02:50, Jon Smirl wrote:

>>> Why do these all end in (none)?
>>> Craig Hughes <[email protected].(none)>
>>> Dave Neuer <[email protected].(none)>
>>> David Brownell <[email protected].(none)>
>>> David Woodhouse <[email protected].(none)>
>>> Deepak Saxena <[email protected].(none)>
>>> Enrico Scholz <[email protected].(none)>
>>>
>> Because rmk rewrites addresses to comply with privacy laws. Another good
>> example of why this nonsense of yours is exactly that.
>>
>> I checked and am personally in there three times, once even without any
>> valid email address listed. And any time there's anything other than my
>> gmail address in some submission it at least recently means that someone
>> _else_ took my from: address and stuck it on there and while I don't
>> terribly mind that generally, I find it really annoying to see even those
>> mistakes harvested into your hugely google-accessible resource.

[ .. ]

> As for privacy, if you don't want your email address in a file like
> this don't put it into a GPL'd public project.

Like I told you, I don't. Others do. And while that's not a huge issue
in itself, you harvesting it into your nicely formatted google and
spam-base MAKES it an issue. Just stop this crap. Be away.

Rene.

2008-07-29 13:44:42

by [email protected]

[permalink] [raw]
Subject: Re: 463 kernel developers missing!

On 7/29/08, Rene Herman <[email protected]> wrote:
> On 29-07-08 02:50, Jon Smirl wrote:
>
>
> >
> > >
> > > > Why do these all end in (none)?
> > > > Craig Hughes <[email protected].(none)>
> > > > Dave Neuer <[email protected].(none)>
> > > > David Brownell <[email protected].(none)>
> > > > David Woodhouse <[email protected].(none)>
> > > > Deepak Saxena <[email protected].(none)>
> > > > Enrico Scholz <[email protected].(none)>
> > > >
> > > >
> > > Because rmk rewrites addresses to comply with privacy laws. Another
> good
> > > example of why this nonsense of yours is exactly that.
> > >
> > > I checked and am personally in there three times, once even without any
> > > valid email address listed. And any time there's anything other than my
> > > gmail address in some submission it at least recently means that someone
> > > _else_ took my from: address and stuck it on there and while I don't
> > > terribly mind that generally, I find it really annoying to see even
> those
> > > mistakes harvested into your hugely google-accessible resource.
> > >
> >
>
> [ .. ]
>
>
> > As for privacy, if you don't want your email address in a file like
> > this don't put it into a GPL'd public project.
> >
>
> Like I told you, I don't. Others do. And while that's not a huge issue in
> itself, you harvesting it into your nicely formatted google and spam-base
> MAKES it an issue. Just stop this crap. Be away.

Google got the list the second it was mailed on LKML. Why haven't you
told Google to remove the 1,054 pages that contain your email?

http://www.google.com/support/webmasters/bin/answer.py?answer=508&topic=13511

If you really want to spam kernel developers there is a much easier
way, just send the message to LKML.


--
Jon Smirl
[email protected]

2008-07-29 14:23:00

by Pekka Enberg

[permalink] [raw]
Subject: Re: 463 kernel developers missing!

Hi Jon,

On Tue, Jul 29, 2008 at 4:44 PM, Jon Smirl <[email protected]> wrote:
>> Like I told you, I don't. Others do. And while that's not a huge issue in
>> itself, you harvesting it into your nicely formatted google and spam-base
>> MAKES it an issue. Just stop this crap. Be away.
>
> Google got the list the second it was mailed on LKML. Why haven't you
> told Google to remove the 1,054 pages that contain your email?
>
> http://www.google.com/support/webmasters/bin/answer.py?answer=508&topic=13511
>
> If you really want to spam kernel developers there is a much easier
> way, just send the message to LKML.

Why does any of this matter? Rene asked you to drop his email from
your list and refusing to do so is somewhat rude, isn't it?

Pekka

2008-07-29 14:27:23

by [email protected]

[permalink] [raw]
Subject: Re: 463 kernel developers missing!

On 7/29/08, Pekka Enberg <[email protected]> wrote:
> Hi Jon,
>
>
> On Tue, Jul 29, 2008 at 4:44 PM, Jon Smirl <[email protected]> wrote:
> >> Like I told you, I don't. Others do. And while that's not a huge issue in
> >> itself, you harvesting it into your nicely formatted google and spam-base
> >> MAKES it an issue. Just stop this crap. Be away.
> >
> > Google got the list the second it was mailed on LKML. Why haven't you
> > told Google to remove the 1,054 pages that contain your email?
> >
> > http://www.google.com/support/webmasters/bin/answer.py?answer=508&topic=13511
> >
> > If you really want to spam kernel developers there is a much easier
> > way, just send the message to LKML.
>
>
> Why does any of this matter? Rene asked you to drop his email from
> your list and refusing to do so is somewhat rude, isn't it?

Rene used his email in the immutable log of a public GPL'd project.
It has become part of the public domain and can't be removed. So new
users of the log are supposed to start editing history to remove
actions from the past?

If you want your email kept private don't use it to submit patches to
a GPL'd project.

--
Jon Smirl
[email protected]

2008-07-29 14:32:47

by Rene Herman

[permalink] [raw]
Subject: Re: 463 kernel developers missing!

On 29-07-08 15:44, Jon Smirl wrote:

> Google got the list the second it was mailed on LKML. Why haven't you
> told Google to remove the 1,054 pages that contain your email?
>
> http://www.google.com/support/webmasters/bin/answer.py?answer=508&topic=13511
>
> If you really want to spam kernel developers there is a much easier
> way, just send the message to LKML.

Right, so you say that google got it the first time you fucked it up.
How exactly do you consider that to be a reason for continuing to fuck
it up and putting it in few hundred nicely fully indexed linux kernel
trees out there on the web making the fuck up rank at number 1 in the
results?

Now fortunately, from the discussion it seems that most sensible people
will be ignoring you anyway so I guess I can and should stop bothering
with this but please...

That which is not white is not black and my keyaccess.nl address being
public already anyway is NOT the same as it being veryveryvery public.

Rene.

2008-07-29 14:35:08

by Pekka Enberg

[permalink] [raw]
Subject: Re: 463 kernel developers missing!

Hi Jon,

On Tue, Jul 29, 2008 at 5:27 PM, Jon Smirl <[email protected]> wrote:
> Rene used his email in the immutable log of a public GPL'd project.
> It has become part of the public domain and can't be removed. So new
> users of the log are supposed to start editing history to remove
> actions from the past?
>
> If you want your email kept private don't use it to submit patches to
> a GPL'd project.

OK, I'm not interested in arguing about this. I just don't understand
what you're trying to accomplish with pissing of kernel contributors,
that's all. (Not that I'm happy about being on your list either, I
just don't care enough to argue it.)

Pekka

2008-07-29 14:38:25

by Rene Herman

[permalink] [raw]
Subject: Re: 463 kernel developers missing!

On 29-07-08 16:27, Jon Smirl wrote:

> On 7/29/08, Pekka Enberg <[email protected]> wrote:

>> On Tue, Jul 29, 2008 at 4:44 PM, Jon Smirl <[email protected]> wrote:
>> >> Like I told you, I don't. Others do. And while that's not a huge issue in
>> >> itself, you harvesting it into your nicely formatted google and spam-base
>> >> MAKES it an issue. Just stop this crap. Be away.
>> >
>> > Google got the list the second it was mailed on LKML. Why haven't you
>> > told Google to remove the 1,054 pages that contain your email?
>> >
>> > http://www.google.com/support/webmasters/bin/answer.py?answer=508&topic=13511
>> >
>> > If you really want to spam kernel developers there is a much easier
>> > way, just send the message to LKML.
>>
>>
>> Why does any of this matter? Rene asked you to drop his email from
>> your list and refusing to do so is somewhat rude, isn't it?
>
> Rene used his email in the immutable log of a public GPL'd project.
> It has become part of the public domain and can't be removed. So new
> users of the log are supposed to start editing history to remove
> actions from the past?
>
> If you want your email kept private don't use it to submit patches to
> a GPL'd project.

Jon, fuck of. I told you three times now -- I DO NOT, OTHERS DO. And it
is only your bureaucrat attitude which is turning it into a problem. Go
apply for a job at IBM if you love IT bureaucracy.

Rene.

2008-07-30 07:25:47

by Adrian Bunk

[permalink] [raw]
Subject: Re: 463 kernel developers missing!

On Tue, Jul 29, 2008 at 12:39:51PM +0200, Rene Herman wrote:
> On 29-07-08 02:50, Jon Smirl wrote:
>
>>>> Why do these all end in (none)?
>>>> Craig Hughes <[email protected].(none)>
>>>> Dave Neuer <[email protected].(none)>
>>>> David Brownell <[email protected].(none)>
>>>> David Woodhouse <[email protected].(none)>
>>>> Deepak Saxena <[email protected].(none)>
>>>> Enrico Scholz <[email protected].(none)>
>>>>
>>> Because rmk rewrites addresses to comply with privacy laws. Another good
>>> example of why this nonsense of yours is exactly that.
>>>
>>> I checked and am personally in there three times, once even without any
>>> valid email address listed. And any time there's anything other than my
>>> gmail address in some submission it at least recently means that someone
>>> _else_ took my from: address and stuck it on there and while I don't
>>> terribly mind that generally, I find it really annoying to see even those
>>> mistakes harvested into your hugely google-accessible resource.
>
> [ .. ]
>
>> As for privacy, if you don't want your email address in a file like
>> this don't put it into a GPL'd public project.
>
> Like I told you, I don't. Others do. And while that's not a huge issue
> in itself, you harvesting it into your nicely formatted google and
> spam-base MAKES it an issue. Just stop this crap. Be away.

Whether Jon's patch is a good idea one might discuss, but as soon as
someone puts an email address into a kernel commit Google will anyway
find it:

The ChangeLog-* files at http://ftp.kernel.org/pub/linux/kernel/v2.6/
also contain all addresses in Jon's list, and Google harvests them.
The same goes for mailing list archives of git-commits-head.

> Rene.

cu
Adrian

--

"Is there not promise of rain?" Ling Tan asked suddenly out
of the darkness. There had been need of rain for many days.
"Only a promise," Lao Er said.
Pearl S. Buck - Dragon Seed

2008-07-30 07:43:32

by Adrian Bunk

[permalink] [raw]
Subject: Re: 463 kernel developers missing!

On Mon, Jul 28, 2008 at 04:46:24PM -0400, Dave Jones wrote:
> On Mon, Jul 28, 2008 at 04:22:36PM -0400, Theodore Tso wrote:
> > On Mon, Jul 28, 2008 at 03:00:13PM -0400, Jon Smirl wrote:
> > > Other people aren't perfect, I've found over 1,000 typos in the those
> > > names and emails. We need a validation mechanism.
> > >
> >
> > You keep using the word "need"; I do not think it means what you think
> > it does. :-)
> >
> > Seriously, why is it so important? It's a nice to have, and I
> > recognize that you've spent a bunch of time on it. But if the goal is
> > to get better statistics, and in exchange we forcibly map all Mark
> > Browns to one e-mail address, and/or force them to all adopt middle
> > initials (what if there are two Dan Smith's that don't have middle
> > initials) just for the convenience of your statistics gathering, I
> > would gently suggest to you that you've forgotten which is the tail,
> > and which is the dog.
>
> I'm beginning to question just how useful the continued measuring
> of things like Signed-off-by's is. Last week at OLS, I overheard
> a conversation where someone was talking about the "top 10" lists
> that Greg has been talking about at various conferences.
> The conversation went along the lines of "my manager really wants
> to see us on that list, at any cost".
> Whilst the niave may think 'more patches == more better', this isn't
> necessarily the case given we have nowhere near enough review bandwidth
> *now*, and flooding with a zillion trivial patches really isn't going
> to make that job any easier.
>
> Getting patches into the tree is easy, we've proven that.
> As things stand now, it's also fairly easy to 'game' the system
> by committing something in 10 changesets when it could be done
> just as easily in 2-3.
>
> How about we start measuring things that actually matter, like..
>
> "How many patches were reviewed before they went in"
> "How many patches were directly responsible for a bug"
> "How many patches actually fixed something anyone cares about"
> "How many patches are responsible for just 'churn'"

How do you want to measure such stuff?

And with measuring I'm not talking about estimates but about exact data.

Authorship information was already available in the commits, which is
why people were able to develop scripts to harvest them.

For getting any meaningful statistics you have to either enforce the
usage of additional tags in the commits or someone has to work full-time
on generating statistics.

> Dave

cu
Adrian

--

"Is there not promise of rain?" Ling Tan asked suddenly out
of the darkness. There had been need of rain for many days.
"Only a promise," Lao Er said.
Pearl S. Buck - Dragon Seed

2008-07-30 08:38:37

by Stefan Richter

[permalink] [raw]
Subject: Re: 463 kernel developers missing!

Adrian Bunk wrote:
> Whether Jon's patch is a good idea one might discuss,

There isn't a lot to discuss. From a purely technical standpoint,
duplicating SCM metadata into a source file and aiming to be
comprehensive and up to date is naive at best.

> but as soon as someone puts an email address into a kernel commit
> Google will anyway find it:

This doesn't justify what Jon did though.

Jon created a new database out of formerly disparate datasets, even
though we didn't provide him these datasets for this purpose. The fact
that the means to create this database are rather trivial and cheap do
not mean that we implicitly agreed to what he did or that it wouldn't
matter whether we agree to it or not.

Jon even suggested that his database is then used to combine with
further databases (bugzilla accounts, mailinglist archives). Again, the
fact that something like this is possible without great difficulties
doesn't make it right.
--
Stefan Richter
-=====-==--- -=== ====-
http://arcgraph.de/sr/

2008-07-30 11:37:23

by Bodo Eggert

[permalink] [raw]
Subject: Re: 463 kernel developers missing!

Jon Smirl <[email protected]> wrote:

> Are the tools case sensitive or insensitive on email addresses? Some
> are are some aren't, so I need these cases...
> Al Viro <[email protected]>
> Al Viro <[email protected]>
> Al Viro <[email protected]>

Domain names are case-insensitive, local parts may be case sensitive.
You can lowercase all domains and the localparts of some known domains.

> Another problem is internal machine names...
> David S. Miller <[email protected]>
> David S. Miller <[email protected]>

If it's a common problem, you should have a function doing
$domain =~ s/.*\.davemloft\.net$/davemloft.net/; (not that simple, off cause)

2008-07-30 12:41:47

by Rene Herman

[permalink] [raw]
Subject: Re: 463 kernel developers missing!

On 30-07-08 09:24, Adrian Bunk wrote:

> Whether Jon's patch is a good idea one might discuss

<discussion>it is not</discussion>

> but as soon as someone puts an email address into a kernel commit
> Google will anyway find it:

It will and note this is not a privacy issue "as such" at least for me
(for rmk rewriting addresses is a privacy issue, directly or via law,
and whether or not needed in this specific example or not)

Google find lots of things, most of which do not end up at the top of
the search results. This address I'm now posting with is definitely
public (or I wouldn't be posting with it) but given that it shouldn't
even exist at the moment I have been careful for some time to put a
relay address into anything which I intend to be long lived.

Since outside its non-existence it's the best address I have available I
do still use it though. This is not a problem, since all mailing list
archives go to great trouble to obscure addresses anyway and my gmail
address will feature as the "most public" from it being in _content_.
Sometimes others use this address in content as well but given that they
can't be expected to know about any of my peculiar mail fetishes I'm not
going to whine about it and it's not a practical problem anyway.

Then Jon comes along, puts _all_ addresses in content inside a hugely
publicized, widely web-indexed tree and fucks it up.

Anyways... yesterday I had to turn the fan on my monitor to keep it from
damage in this bloody furnace while today it's some 5 degrees cooler and
the fan's aimed at me again so I'll stop cursing and shouting now. But
still a damn bad idea.

Rene.

2008-07-30 12:47:50

by Adrian Bunk

[permalink] [raw]
Subject: Re: 463 kernel developers missing!

On Wed, Jul 30, 2008 at 10:37:34AM +0200, Stefan Richter wrote:
> Adrian Bunk wrote:
>...
> > but as soon as someone puts an email address into a kernel commit
> > Google will anyway find it:
>
> This doesn't justify what Jon did though.
>
> Jon created a new database out of formerly disparate datasets, even
> though we didn't provide him these datasets for this purpose. The fact
> that the means to create this database are rather trivial and cheap do
> not mean that we implicitly agreed to what he did or that it wouldn't
> matter whether we agree to it or not.
>...

You certified:

I understand and agree that this project and the contribution
are public and that a record of the contribution (including all
personal information I submit with it, including my sign-off) is
maintained indefinitely and may be redistributed consistent with
this project or the open source license(s) involved.


And if you think this doesn't cover Jon's patch you should also
complain to LWN and the Linux Foundation who published data
generated from the same datasets as Jon's patch.


> Stefan Richter

cu
Adrian

--

"Is there not promise of rain?" Ling Tan asked suddenly out
of the darkness. There had been need of rain for many days.
"Only a promise," Lao Er said.
Pearl S. Buck - Dragon Seed

2008-07-30 12:52:38

by Rene Herman

[permalink] [raw]
Subject: Re: 463 kernel developers missing!

On 30-07-08 14:46, Adrian Bunk wrote:

> On Wed, Jul 30, 2008 at 10:37:34AM +0200, Stefan Richter wrote:
>> Adrian Bunk wrote:
>> ...
>>> but as soon as someone puts an email address into a kernel commit
>>> Google will anyway find it:
>> This doesn't justify what Jon did though.
>>
>> Jon created a new database out of formerly disparate datasets, even
>> though we didn't provide him these datasets for this purpose. The fact
>> that the means to create this database are rather trivial and cheap do
>> not mean that we implicitly agreed to what he did or that it wouldn't
>> matter whether we agree to it or not.
>> ...
>
> You certified:

You only certify anything when _you_ put your address in. Given that
it's a very common occurence that not you but _others_ do, this does not
mean a _single_ thing. Tested-by, Bisected-by, what have you...

But let us leave this discussion be. It's not going anywhere anyway.

Rene.

2008-07-30 13:03:29

by Rene Herman

[permalink] [raw]
Subject: Re: 463 kernel developers missing!

On 30-07-08 14:58, Paul Rolland wrote:

> And you dare [ .. ]

*plonk*

Rene

2008-07-30 13:08:00

by Paul Rolland

[permalink] [raw]
Subject: Re: 463 kernel developers missing!

Hello,

First, please note that my name and addresses are _in_ the list published
by Jon.

On Wed, 30 Jul 2008 14:43:12 +0200
Rene Herman <[email protected]> wrote:

> On 30-07-08 09:24, Adrian Bunk wrote:
>
> > Whether Jon's patch is a good idea one might discuss
>
> <discussion>it is not</discussion>

And you dare using <discussion> ? Where is this a discussion ?

> > but as soon as someone puts an email address into a kernel commit
> > Google will anyway find it:
>
> It will and note this is not a privacy issue "as such" at least for me
> (for rmk rewriting addresses is a privacy issue, directly or via law,
> and whether or not needed in this specific example or not)
>
[...]
>
> Then Jon comes along, puts _all_ addresses in content inside a hugely
> publicized, widely web-indexed tree and fucks it up.
>
> Anyways... yesterday I had to turn the fan on my monitor to keep it from
> damage in this bloody furnace while today it's some 5 degrees cooler and
> the fan's aimed at me again so I'll stop cursing and shouting now. But
> still a damn bad idea.

Sorry, I don't agree. First, because using Google to collect a list of emails
is damn easy, and wether this list is handy or not is not changing for people
using it for Spam.
Second, because it takes just a few seconds to extract it nearly as complete
as Jon's version from git : git log | grep Author: | sort | uniq -c
gives something very useful : about 5800 emails. So let's not consider Jon is
saving a complicated job for people searching for this list.

Linux is an open project. Everything that's related to it is open, and public.
If you don't want your name/email to be associated with it, that's another
issue.

We could blame Jon for publishing the his list on the list without prior
information, but not for creating it.
And I certainly would like to see the .mailmap appear at my next git pull ;)

Regards,
Paul

--
Paul Rolland E-Mail : rol(at)witbe.net
CTO - Witbe.net SA Tel. +33 (0)1 47 67 77 77
Les Collines de l'Arche Fax. +33 (0)1 47 67 77 99
F-92057 Paris La Defense RIPE : PR12-RIPE

Please no HTML, I'm not a browser - Pas d'HTML, je ne suis pas un navigateur
"Some people dream of success... while others wake up and work hard at it"

"I worry about my child and the Internet all the time, even though she's too
young to have logged on yet. Here's what I worry about. I worry that 10 or 15
years from now, she will come to me and say 'Daddy, where were you when they
took freedom of the press away from the Internet?'"
--Mike Godwin, Electronic Frontier Foundation

2008-07-30 15:03:56

by Stefan Richter

[permalink] [raw]
Subject: Re: 463 kernel developers missing!

Adrian Bunk wrote:
> On Wed, Jul 30, 2008 at 10:37:34AM +0200, Stefan Richter wrote:
[That I signed off -> on a contribution <- does...]
>> not mean that we implicitly agreed to what he did or that it wouldn't
>> matter whether we agree to it or not.
>>...
>
> You certified:
>
> I understand and agree that this project and the contribution
> are public and that a record of the contribution (including all
> personal information I submit with it, including my sign-off) is
> maintained indefinitely and may be redistributed consistent with
> this project or the open source license(s) involved.

Yes.

Copyright doesn't have a lot to do with personality rights though.
And then there is also ethics besides laws.
--
Stefan Richter
-=====-==--- -=== ====-
http://arcgraph.de/sr/

2008-07-30 15:06:19

by David Schwartz

[permalink] [raw]
Subject: RE: 463 kernel developers missing!


Stefan Richter wrote:

> This doesn't justify what Jon did though.

No, but the GPL does.

> Jon created a new database out of formerly disparate datasets, even
> though we didn't provide him these datasets for this purpose. The fact
> that the means to create this database are rather trivial and cheap do
> not mean that we implicitly agreed to what he did or that it wouldn't
> matter whether we agree to it or not.

Yes, it does. If you contribute to a GPL project, you *explicitly* agree to exactly this. Anything you submit may be pieced together, changed, made public, processed, and used for purposes other than you intended.

> Jon even suggested that his database is then used to combine with
> further databases (bugzilla accounts, mailinglist archives). Again, the
> fact that something like this is possible without great difficulties
> doesn't make it right.

No, but that all the submissions were made under the GPL, whose explicit purpose is to allow information to be changed, processed, and reused for other purposes does.

If you don't want your submissions to be in the public record for all eternity to be used for any lawful purpose, don't make them to a GPL project.

You have no right whatsoever to look at how one person chooses to use them and say "I didn't agree to that". Yes, you did. You gave up the right to approve or reject each use when you made the submission. If you don't like it, submit under some other license.

DS

2008-07-30 15:08:48

by [email protected]

[permalink] [raw]
Subject: Re: 463 kernel developers missing!

On 7/30/08, Stefan Richter <[email protected]> wrote:
> Adrian Bunk wrote:
> > Whether Jon's patch is a good idea one might discuss,
>
>
> There isn't a lot to discuss. From a purely technical standpoint,
> duplicating SCM metadata into a source file and aiming to be
> comprehensive and up to date is naive at best.

I noticed that the log was full of errors and thought that it might be
nice to have a mechanism to correct them. Since the log is immutable,
error correction needs to be external. It is a different discussion as
to whether we should try and fix the errors in the log.

Assuming that we wanted the data clean I came up with this solution.
Maybe there is a better way.

Kernel log is immutable.
Kernel log contains about 1,000 errors of various classes.
.mailmap file format was preexisting, it maps email addresses to
people's names. If can be used to map the other direction, but none of
the kernel tools use it that way.

I observed that the unique key in the log is the email address, but
many of those email keys have errors in them, The data item we are
actually interested in is the developer's name.

I then generated a .mailmap file containing all of the unique email
addresses in the log and a guess from the log as to which developer
was associated with the email.

I then used various tools and hand editing to correct the ~1,000
errors and assign the correct developer name to the email in the log.
Correcting all these errors was a lot of work.It exposed the fact that
tools in the maintainer's change may be the largest source of errors.
Of course the file can be patched as more errors are found.

This new mailmap file now has two types of entries, ones fixing errors
and ones that are just copies of the data from the log.

I chose to leave both types of records in the file to make maintenance
easier. The complete set of email keys from the log is in the mailmap
file. To do maintenance, regenerate the email keys from the log and
diff them against mailmap. Now you only have to inspect the diff for
errors. After the diff is clean, add the new entires to the mailmap.

If you remove entries from the mailmap file they will get flagged in
every maintenance sweep and need to be removed again. Of course this
will lead you to build a list of people who don't want to be in the
list.

The mailmap file is sorted by name instead of email even though it is
used to convert email to name. This makes it easy for humans to edit
when their name changes (like getting married). Find all of your
aliases and change them to reflect your new name. Output from all of
the tools using mailmap will be updated.

I see now that editing the name provides a mechanism for removing
people from the file, their names can be edited to 'anonymous' . The
email address can't be removed since they are keys and have to match
the immutable set in the log. People may not be happy when tools
report that the developer of the patch that is causing them problem is
'anonymous'.

A simplistic validation check would be for checkpatch to look up each
email address in a new patch and print a warning if the address was
not in mailmap. That would be enough to stop many of the common typo
errors.

Assuming we want the log data clean, what's a better solution?


>
> > but as soon as someone puts an email address into a kernel commit
> > Google will anyway find it:
>
>
> This doesn't justify what Jon did though.
>
> Jon created a new database out of formerly disparate datasets, even
> though we didn't provide him these datasets for this purpose. The fact
> that the means to create this database are rather trivial and cheap do
> not mean that we implicitly agreed to what he did or that it wouldn't
> matter whether we agree to it or not.
>
> Jon even suggested that his database is then used to combine with
> further databases (bugzilla accounts, mailinglist archives). Again, the
> fact that something like this is possible without great difficulties
> doesn't make it right.
>
> --
> Stefan Richter
> -=====-==--- -=== ====-
> http://arcgraph.de/sr/
>


--
Jon Smirl
[email protected]

2008-07-30 15:10:28

by Stefan Richter

[permalink] [raw]
Subject: Re: 463 kernel developers missing!

David Schwartz wrote:
> Stefan Richter wrote:
>> This doesn't justify what Jon did though.
>
> No, but the GPL does.

GPL is merely about copyright.
--
Stefan Richter
-=====-==--- -=== ====-
http://arcgraph.de/sr/

2008-07-30 15:11:55

by Alan

[permalink] [raw]
Subject: Re: 463 kernel developers missing!

> No, but that all the submissions were made under the GPL, whose explicit purpose is to allow information to be changed, processed, and reused for other purposes does.

So why hasn't Jon included a copy of the GPL and the sources with his new
data set ?

> If you don't want your submissions to be in the public record for all eternity to be used for any lawful purpose, don't make them to a GPL project.

The GPL doesn't trump data protection law. It can't.

> You have no right whatsoever to look at how one person chooses to use them and say "I didn't agree to that". Yes, you did. You gave up the right to approve or reject each use when you made the submission. If you don't like it, submit under some other license.

Disagree - firstly national law trumps licences, secondly there is the
(regrettably increasingly) small matter of manners.

Alan

2008-07-30 15:23:51

by [email protected]

[permalink] [raw]
Subject: Re: 463 kernel developers missing!

On 7/30/08, Alan Cox <[email protected]> wrote:
> > No, but that all the submissions were made under the GPL, whose explicit purpose is to allow information to be changed, processed, and reused for other purposes does.
>
>
> So why hasn't Jon included a copy of the GPL and the sources with his new
> data set ?

Bug, obviously the file needs it, it is derived from GPL'd files.
Please edit your local copy, no need to send another couple hundred GB
of email.

> > If you don't want your submissions to be in the public record for all eternity to be used for any lawful purpose, don't make them to a GPL project.
>
>
> The GPL doesn't trump data protection law. It can't.

By making a submission to a GPL'd project didn't you grant a license
for your data to be used? That was Ted's point when he posted the
developer's certification.

>
> > You have no right whatsoever to look at how one person chooses to use them and say "I didn't agree to that". Yes, you did. You gave up the right to approve or reject each use when you made the submission. If you don't like it, submit under some other license.
>
>
> Disagree - firstly national law trumps licences, secondly there is the
> (regrettably increasingly) small matter of manners.
>
>
> Alan
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>


--
Jon Smirl
[email protected]

2008-07-30 15:31:54

by Alan

[permalink] [raw]
Subject: Re: 463 kernel developers missing!

> > The GPL doesn't trump data protection law. It can't.
>
> By making a submission to a GPL'd project didn't you grant a license
> for your data to be used? That was Ted's point when he posted the
> developer's certification.

Data protection law trumps the GPL. The fact my address is public does
not give you the rights globally to process it.

2008-07-30 15:33:48

by Adrian Bunk

[permalink] [raw]
Subject: Re: 463 kernel developers missing!

On Wed, Jul 30, 2008 at 02:54:05PM +0200, Rene Herman wrote:
> On 30-07-08 14:46, Adrian Bunk wrote:
>
>> On Wed, Jul 30, 2008 at 10:37:34AM +0200, Stefan Richter wrote:
>>> Adrian Bunk wrote:
>>> ...
>>>> but as soon as someone puts an email address into a kernel commit
>>>> Google will anyway find it:
>>> This doesn't justify what Jon did though.
>>>
>>> Jon created a new database out of formerly disparate datasets, even
>>> though we didn't provide him these datasets for this purpose. The fact
>>> that the means to create this database are rather trivial and cheap do
>>> not mean that we implicitly agreed to what he did or that it wouldn't
>>> matter whether we agree to it or not.
>>> ...
>>
>> You certified:
>
> You only certify anything when _you_ put your address in. Given that
> it's a very common occurence that not you but _others_ do, this does not
> mean a _single_ thing. Tested-by, Bisected-by, what have you...
>
> But let us leave this discussion be. It's not going anywhere anyway.

There's one thing where it might actually go further:

You actually have a good point here, and I'm not disagreeing with it.

I've added Linus to the recipients since stuff like e.g. Tested-by or
Bisected-by tags actually undermine what the DCE 1.1 update should
have accomplished. So if DCE 1.1 (d) is considered to be legally
required for public indefinite storage of name and email address
we have a problem here.

> Rene.

cu
Adrian

--

"Is there not promise of rain?" Ling Tan asked suddenly out
of the darkness. There had been need of rain for many days.
"Only a promise," Lao Er said.
Pearl S. Buck - Dragon Seed

2008-07-30 15:51:21

by [email protected]

[permalink] [raw]
Subject: Re: 463 kernel developers missing!

On 7/30/08, Alan Cox <[email protected]> wrote:
> > > The GPL doesn't trump data protection law. It can't.
> >
> > By making a submission to a GPL'd project didn't you grant a license
> > for your data to be used? That was Ted's point when he posted the
> > developer's certification.
>
>
> Data protection law trumps the GPL. The fact my address is public does
> not give you the rights globally to process it.
>

There are a lot of companies (including Google's code database)
indexing the kernel source and processing it into new form. What is
their standing?

--
Jon Smirl
[email protected]

2008-07-30 16:01:46

by Alan Cox

[permalink] [raw]
Subject: Re: 463 kernel developers missing!

> There are a lot of companies (including Google's code database)
> indexing the kernel source and processing it into new form. What is
> their standing?

That would depend on their location and activities.

Alan

2008-07-30 16:31:32

by Rene Herman

[permalink] [raw]
Subject: Re: 463 kernel developers missing!

On 30-07-08 17:32, Adrian Bunk wrote:

> On Wed, Jul 30, 2008 at 02:54:05PM +0200, Rene Herman wrote:

>> But let us leave this discussion be. It's not going anywhere anyway.
>
> There's one thing where it might actually go further:
>
> You actually have a good point here, and I'm not disagreeing with it.
>
> I've added Linus to the recipients since stuff like e.g. Tested-by or
> Bisected-by tags actually undermine what the DCE 1.1 update should
> have accomplished. So if DCE 1.1 (d) is considered to be legally
> required for public indefinite storage of name and email address
> we have a problem here.

I was afraid you'd conclude that... Me, I conclude that we should just
not do the "harvest all these addresses into a big file" thing that
might make it more than a theoretical problem...

But that's just me.

Rene.

2008-07-30 16:42:23

by Theodore Ts'o

[permalink] [raw]
Subject: Re: 463 kernel developers missing!

On Wed, Jul 30, 2008 at 04:14:08PM +0100, Alan Cox wrote:
> > > The GPL doesn't trump data protection law. It can't.
> >
> > By making a submission to a GPL'd project didn't you grant a license
> > for your data to be used? That was Ted's point when he posted the
> > developer's certification.
>
> Data protection law trumps the GPL. The fact my address is public does
> not give you the rights globally to process it.

Yes, but when you submit patches using the required Signed-off-by:,
you are agreeing to the following from the Developer's Certification
of Origin:

(d) I understand and agree that this project and the contribution
are public and that a record of the contribution (including all
personal information I submit with it, including my sign-off) is
maintained indefinitely and may be redistributed consistent with
this project or the open source license(s) involved.

How this interacts with Europe's Data Protection Law, and whether
correcting spelling errors in e-mail addresses to make it easier to
canonicalize the list is consistent with what is allowed by the GPL,
Europe's Data Protection Law, and the permission given when a
developer's signs the DCO's, is probably not worth debating on LKML.
Let someone file a complaint with the EU who can try to arrest Jon
Smirl the next time he enters Europe, or get him extadicted to the
Hague for violations against international law if they really think
they can justify it, argue about whether they can do so in other
forums; if you put a beer in my hand, maybe I'd even be willing to
debate it in a bar at some future conference. But does it really make
sense to argue about it here?

- Ted

2008-07-30 17:00:52

by Linus Torvalds

[permalink] [raw]
Subject: Re: 463 kernel developers missing!



On Wed, 30 Jul 2008, Adrian Bunk wrote:
>
> I've added Linus to the recipients since stuff like e.g. Tested-by or
> Bisected-by tags actually undermine what the DCE 1.1 update should
> have accomplished. So if DCE 1.1 (d) is considered to be legally
> required for public indefinite storage of name and email address
> we have a problem here.

Quite frankly, since the patches are public anyway, and the code is open
source, I personally think that worry is just silly fear-mongering by
people who take lawyers not just too seriously, but then think that judges
and lawyers are too stupid to think for themselves.

We added the lines to the DCO-1.1 because we wanted to make it _obvious_
that the legal requirements for the sign-off would never clash with any
possible insane reading of things, but it was a "dot the i's" kind of
thing.

The fact is, people who are involved in Linux know it's public. People
make public bug-reports, and they _expect_ to get attributed. I think any
worries about indefinite storage should be the other way around: we should
strive to make sure that the attributions are consistent and correct.

If somebody really doesn't want their name and email known, they can say
that. We won't accept patches from them, but it's certainly no problem to
suppress "tested-by" etc things on request. Not that I have ever seen such
a request that I can remember, nor do I really expect to ever see one
(unless it's as a perverse reaction to this email where people just want
to be silly).

Anyway, normal people talking about obscure and insane readings of some
random law is stupid. You should worry about "doing the right thing", not
about trying to read law as if it was some mindless machine that acted
like the computers you're used to.

Let's face it, _everybody_ breaks laws if you think about them as some
inflexible and absolute rules. Probably every day.

You roll through a STOP-sign (in California, it was almost as if that's
what the sign _meant_). Maybe you take a shortcut when crossing the street
and you don't walk _exactly_ on the zebra-crossing (or against a red light
just because there were obviously no cars within _miles_ of you). Maybe
you drive 58mph in a 55 zone. Maybe you walk around and spit out the
cherry-pits on the street rather than in a garbage can.

Only insane people with OCD cannot understand that things aren't ever that
black-and-white. Use your good _judgement_ for chrissake!

Linus

2008-07-30 17:08:03

by Alan Cox

[permalink] [raw]
Subject: Re: 463 kernel developers missing!

> debate it in a bar at some future conference. But does it really make
> sense to argue about it here?

No but perhaps Jon could simply show some manners when people request him
politely not to do that. He doesn't seem to want to debate manners, just
law.

Alan

2008-07-30 17:09:32

by Stefan Richter

[permalink] [raw]
Subject: Re: 463 kernel developers missing!

Adrian Bunk wrote:
> On Wed, Jul 30, 2008 at 02:54:05PM +0200, Rene Herman wrote:
>> You only certify anything when _you_ put your address in. Given that
>> it's a very common occurence that not you but _others_ do, this does not
>> mean a _single_ thing. Tested-by, Bisected-by, what have you...
[...]
> stuff like e.g. Tested-by or Bisected-by tags actually undermine what
> the DCE 1.1 update should have accomplished.
[...]

Last time when I read a discussion about these tags, people at least at
this side of the pond seemed to come to the conclusion that we ask
testers for their consent if in doubt, before adding such a tag if they
didn't do so themselves. (That's different from how we handle Acked-by:
from fellow developers which we often imply from an informally given OK.)
--
Stefan Richter
-=====-==--- -=== ====-
http://arcgraph.de/sr/

2008-07-30 17:55:47

by Simon Arlott

[permalink] [raw]
Subject: Re: 463 kernel developers missing!

On 30/07/08 13:58, Paul Rolland wrote:
> We could blame Jon for publishing the his list on the list without prior
> information, but not for creating it.
> And I certainly would like to see the .mailmap appear at my next git pull ;)

The fact still remains that most of the entries in Jon's .mailmap file are
redundant. I may have three email addresses referred to in git, but my name
is the same in all cases. Until "git shortlog" is changed to link an author's
commits without using their name there is no need to include me and several
other people in there. Especially not anyone with only one email address
in the commit log.

(It would also be easy to sha1 hash all the email addresses in the shortlog
mailmap, using a hash lookup to find the name, although it would make it
a bit difficult to find what any one address is without grepping the log.)

The ChangeLogs do include all email addresses contributing to that release,
but in general I think a big list of them which people can use with no
effort to spam every developer is a bad thing. On the other hand maybe
those people don't know what .files are ;)

--
Simon Arlott

2008-07-30 18:03:22

by David Schwartz

[permalink] [raw]
Subject: RE: 463 kernel developers missing!


> > No, but that all the submissions were made under the GPL, whose
> > explicit purpose is to allow information to be changed,
> > processed, and reused for other purposes does.

> So why hasn't Jon included a copy of the GPL and the sources with his new
> data set ?

I would assume that was either an error or because he believes the
information contains insufficient creative content to be covered by
copyright. It seems to me more like a functional, factual report. But I can
see both sides of that argument.

> > If you don't want your submissions to be in the public record
> > for all eternity to be used for any lawful purpose, don't make
> > them to a GPL project.

> The GPL doesn't trump data protection law. It can't.

No. But the GPL can be used to show the intent to consent to the use of the
information by others. I don't know the data protection law in your country,
of course.

> > You have no right whatsoever to look at how one person chooses
> > to use them and say "I didn't agree to that". Yes, you did. You
> > gave up the right to approve or reject each use when you made the
> > submission. If you don't like it, submit under some other license.

> Disagree - firstly national law trumps licences, secondly there is the
> (regrettably increasingly) small matter of manners.

I think it's terribly bad manners to submit something to a GPL project and
then complain when someone else uses it the way they want to. If you want
the benefit of using and modifying GPL software, you have to let others do
what they want with your contributions. If you don't find that deal fair,
don't make it. But then don't make the deal and then claim others are being
rude when they take what the deal gave them.

As for GPL only being about copyright, I don't think that's true. The GPL is
a copyright license. It grants you rights that the author would otherwise
hold exclusively under copyright. But it doesn't follow that the rights you
give up are only rights under copyright. See, for example, section 7.

If you want the rights GPL grants you under copyright, you have to give up
certain things, and not just copyright. One of the things you have to give
up is *any* legal mechanism that would permit you to restrict other people's
GPL rights.

GPL section 6 clearly prohibits you from using any data protection laws in
your jurisdiction to prevent someone else from modifying and redistributing
information you submitted under the GPL.

The "GPL only affects copyright" argument would mean that I could
redistribute modified GPL'd work with an EULA. Obviously, I can't do that.
Enforcing data protection laws to restrict rights granted under the GPL is
no different from enforcing an EULA to do ths same thing.

DS

2008-07-30 18:58:50

by [email protected]

[permalink] [raw]
Subject: Re: 463 kernel developers missing!

On 7/30/08, Alan Cox <[email protected]> wrote:
> > debate it in a bar at some future conference. But does it really make
> > sense to argue about it here?
>
>
> No but perhaps Jon could simply show some manners when people request him
> politely not to do that. He doesn't seem to want to debate manners, just
> law.

I didn't handle the removing the people from the list issue very well.
I got caught in the fact the log immutably records history and I
objected to editing history. I viewed this along the lines of George
Washington asking to be removed from text books. He was the first
president; we can't change that and we have to include him in the list
of presidents.

I still don't have a good solution for how to track the people who
don't want their names to appear without creating yet another list.

--
Jon Smirl
[email protected]

2008-07-30 19:39:27

by Rene Herman

[permalink] [raw]
Subject: Re: 463 kernel developers missing!

On 30-07-08 18:56, Linus Torvalds wrote:

> The fact is, people who are involved in Linux know it's public.
> People make public bug-reports, and they _expect_ to get attributed.

The problem here is just the _scale_ of publicness. Yes, Adrian's worry
can be shrugged of I'd say but this thread is about Jon Smirl collecting
addresses into a hugely public (because in tree) and hugely accessible
format and while your statement above might be true for 95% of cases
(99, I don't care) the use of people's personalia is just something you
cannot decide on yourself ever. It's theirs.

I'm in this thread because the from address on this message is in Jon's
file and while I've used it myself in the past, any time it's been part
of some Fooed-by tag recently it's because someone else put it there.
While it's the best address I have for these uses (and so I still use
it) it shouldn't work anymore even today, so I've been careful to put a
future proof relay address in when I advertise a contact myself.

As said before, I'm also not going to whine about it when others do put
it in because they shouldn't need to concern themselves with my odd
needs and wants and it's not a real problem anyway as long as the future
proof one is much _more_ public. I am, therefore, just not glad that
it's now being put into a file in the root of your highly publicized
tree of files.

Just a silly example, I know, but it doesn't really matter -- even if
someone tells me he fears cosmic channeling will get the better of him
if his personalia are in some resource I maintain, I jump to attention,
salute, shout "SIR YES SIR!" and remove it. It's his.

So now for example I'm debugging a problem with an ALSA driver with a
few users at least one of which has used different email addresses
during it and if I'm going to attribute any of their testing and effort,
I'm going to have to ask for permission and which address was meant to
be the public one. And sure, sure, I'd probably do that even today
anyway but right now it's mostly a principled thing while with the
addresses in the tree I'd sort of insist that anyone would, what with
them being top google hits for ever more.

So, if you were doing more than responding to Adrian's DCO worry here
(which I do not share) the above is what I have against harvesting the
addresses into a _way_ too public place/format. It's a matter of scale;
as opposed to the SCM metadata, your tree itself is way too public to
put anything in without very definite and explicit approval. I feel.

Rene.

2008-07-30 19:48:21

by Ray Lee

[permalink] [raw]
Subject: Re: 463 kernel developers missing!

On Wed, Jul 30, 2008 at 12:41 PM, Rene Herman <[email protected]> wrote:
> So, if you were doing more than responding to Adrian's DCO worry here (which
> I do not share) the above is what I have against harvesting the addresses
> into a _way_ too public place/format.

Er, what? Are you saying that a mailcap file inside a .gz or .bz2 or a
git repository is *more* public than a mailing list? or the already
existing gitweb history of the main tree?

I've noticed correlated (lagged) spikes in my spam volume to the email
address I use for this list whenever I post from it, so please
consider that you are perhaps being penny-wise and pound-foolish here.

2008-07-30 19:50:18

by Stefan Richter

[permalink] [raw]
Subject: Re: 463 kernel developers missing!

David Schwartz wrote:
> I think it's terribly bad manners to submit something to a GPL project and
> then complain when someone else uses it the way they want to.
...
> Enforcing data protection laws to restrict rights granted under the GPL is
> no different from enforcing an EULA to do ths same thing.

It's not the same thing, by far. EULA = "end user license agreement";
while "data protection law" is... law. Obviously, licenses (contracts)
must not be unlawful.

PS: You may use a GPL'd program any way you want --- although not for
unlawful purposes. But that's not a matter between you and the
copyright holder, it's between you and the law.

PPS: SCM metadata are not part of the program. The DCoO states that
the personal data submitted along with the contribution may be
redistributed "consistent... with the open source license(s) involved",
but it isn't discussed whether other terms of the licenses, notably
those on modification and derivatives, apply to the data supplied for
the certificate of origin.
--
Stefan Richter
-=====-==--- -=== ====-
http://arcgraph.de/sr/

2008-07-30 19:59:18

by Rene Herman

[permalink] [raw]
Subject: Re: 463 kernel developers missing!

On 30-07-08 21:47, Ray Lee wrote:

> On Wed, Jul 30, 2008 at 12:41 PM, Rene Herman <[email protected]> wrote:

>> So, if you were doing more than responding to Adrian's DCO worry here (which
>> I do not share) the above is what I have against harvesting the addresses
>> into a _way_ too public place/format.
>
> Er, what? Are you saying that a mailcap file inside a .gz or .bz2 or a
> git repository is *more* public than a mailing list?

Inside a .gz of .bz2? But yes, definitely. Have you ever noticed exactly
how many fully indexed linux source trees there are out there on the
web? And how not any mailinglist archive does _not_ take the trouble to
obscure addresses?

> or the already existing gitweb history of the main tree?
>
> I've noticed correlated (lagged) spikes in my spam volume to the
> email address I use for this list whenever I post from it, so please
> consider that you are perhaps being penny-wise and pound-foolish
> here.

I'm not talking about spam. Spammers will get anything that's not
private. As said, I'm talking about scale of publicness.

Rene.

2008-07-30 20:22:41

by Rene Herman

[permalink] [raw]
Subject: Re: 463 kernel developers missing!

On 30-07-08 21:41, Rene Herman wrote:

> So, if you were doing more than responding to Adrian's DCO worry here
> (which I do not share) the above is what I have against harvesting the
> addresses into a _way_ too public place/format. It's a matter of scale;
> as opposed to the SCM metadata, your tree itself is way too public to
> put anything in without very definite and explicit approval. I feel.

This, Jon, by the way also suggests something which I would consider
much better; _keep_ it as SCM metadata in some less-accesible format
under .git/

I doubt anyone's going to come up with an objection then. It's already
in that exact same spot after all.

Rene.

2008-07-30 20:27:57

by [email protected]

[permalink] [raw]
Subject: Re: 463 kernel developers missing!

On 7/30/08, Rene Herman <[email protected]> wrote:
> On 30-07-08 21:41, Rene Herman wrote:
>
>
> > So, if you were doing more than responding to Adrian's DCO worry here
> (which I do not share) the above is what I have against harvesting the
> addresses into a _way_ too public place/format. It's a matter of scale; as
> opposed to the SCM metadata, your tree itself is way too public to put
> anything in without very definite and explicit approval. I feel.
> >
>
> This, Jon, by the way also suggests something which I would consider much
> better; _keep_ it as SCM metadata in some less-accesible format under .git/

The decision to keep it in .mailmap format was made before I was
involved. A smaller .mailmap has been in the tree since 2007. Existing
tools will need to be changed if the location is moved.

>
> I doubt anyone's going to come up with an objection then. It's already in
> that exact same spot after all.
>
> Rene.
>


--
Jon Smirl
[email protected]

2008-07-30 20:41:15

by Stefan Richter

[permalink] [raw]
Subject: Re: 463 kernel developers missing!

Jon Smirl wrote:
> The decision to keep it in .mailmap format was made before I was
> involved. A smaller .mailmap has been in the tree since 2007. Existing
> tools will need to be changed if the location is moved.

You were talking about new uses all the time, hence new tools.
--
Stefan Richter
-=====-==--- -=== ====-
http://arcgraph.de/sr/

2008-07-30 21:05:32

by Simon Arlott

[permalink] [raw]
Subject: Re: 463 kernel developers missing!

On 30/07/08 19:58, Jon Smirl wrote:
> On 7/30/08, Alan Cox <[email protected]> wrote:
>> > debate it in a bar at some future conference. But does it really make
>> > sense to argue about it here?
>>
>>
>> No but perhaps Jon could simply show some manners when people request him
>> politely not to do that. He doesn't seem to want to debate manners, just
>> law.
>
> I didn't handle the removing the people from the list issue very well.
> I got caught in the fact the log immutably records history and I
> objected to editing history. I viewed this along the lines of George
> Washington asking to be removed from text books. He was the first
> president; we can't change that and we have to include him in the list
> of presidents.
>
> I still don't have a good solution for how to track the people who
> don't want their names to appear without creating yet another list.
>

It's really simple - if there is actually some benefit to them being
in .mailmap (because their name is spelled differently in some commits),
then contact them. Otherwise, you can assume they don't want to be on it.

That should cut down the number of people considerably... I can only see
one example in the A section (Auke Kok).

--
Simon Arlott

2008-07-30 22:26:14

by David Schwartz

[permalink] [raw]
Subject: RE: 463 kernel developers missing!


I'll try to make this my last response on the issue, if possible.

Stefan Richter wrote:

> David Schwartz wrote:

> > I think it's terribly bad manners to submit something to a GPL
> > project and
> > then complain when someone else uses it the way they want to.
> > ...
> > Enforcing data protection laws to restrict rights granted under
> > the GPL is
> > no different from enforcing an EULA to do ths same thing.

> It's not the same thing, by far. EULA = "end user license agreement";
> while "data protection law" is... law. Obviously, licenses (contracts)
> must not be unlawful.

An EULA itself is not law, but neither would someone's request to be removed
from such a list be itself a law. EULA's operate under law, and so would a
request for data confidentiality. This difference is no difference. Both are
attempts to invoke a law other than copyright to restrict rights guaranteed
by the GPL. You may not use any law or provision to restrict another
person's GPL rights. That's what the GPL says, and it means it.

If a law, any law, permits you to impose restrictions on something the GPL
allows, then you give up the right to use that law in exchange for the
license the GPL grants. This obviously applies to your copyright in
derivative works. But it would also apply to any attempt to use any law to
encumber GPL rights.

As the GPL states, the license grants you permission to copy, distribute,
and/or modify the covered work. This is against any rights the authors might
have to prevent you from doing so.

> PS: You may use a GPL'd program any way you want --- although not for
> unlawful purposes. But that's not a matter between you and the
> copyright holder, it's between you and the law.

Precisely. And others who wish to exercise rights under the GPL forfeit any
legal mechanism (whether copyright, DMCA, contract, data privacy laws, or
whatever theory) to impose "further restrictions" on those who wish to
similarly use GPL works.

Copyright is the carrot the GPL uses to get you agree to the stick. The
stick is no "further restrictions" of any kind, imposed by any law.
Obviously, you aren't responsible for an operation of law over which you
have no control. But you cannot invoke copyright -- or any other law -- to
restrict someone else's exercise of rights granter by the GPL. You get
copyright, but you give up it all. No "further restrictions", period.

> PPS: SCM metadata are not part of the program. The DCoO states that
> the personal data submitted along with the contribution may be
> redistributed "consistent... with the open source license(s) involved",
> but it isn't discussed whether other terms of the licenses, notably
> those on modification and derivatives, apply to the data supplied for
> the certificate of origin.

When you submit a unit to a GPL project, you place that unit under the GPL.
That is what the DCoO is trying to say. There cannot be some things that
some parts of the GPL apply to and some don't. There is no "sort of GPL,
sort of not" that applies to some parts of some submissions. If something is
part of or all of a submission made under the GPL, then all of the GPL
applies to it.

DS

2008-07-30 22:49:57

by Alan

[permalink] [raw]
Subject: Re: 463 kernel developers missing!

> Precisely. And others who wish to exercise rights under the GPL forfeit any
> legal mechanism (whether copyright, DMCA, contract, data privacy laws, or
> whatever theory) to impose "further restrictions" on those who wish to
> similarly use GPL works.

I don't know where you get that paticular idea from. Try sending GPL code
from the USA to Cuba. Seems the US government is using GPL code but
imposing further restrictions...

> have no control. But you cannot invoke copyright -- or any other law -- to
> restrict someone else's exercise of rights granter by the GPL. You get
> copyright, but you give up it all. No "further restrictions", period.

Some rights in laws are absolute. I cannot "give up" my right to be
identified as the author of a work I create in many countries. Its an
absolute.

> When you submit a unit to a GPL project, you place that unit under the GPL.
> That is what the DCoO is trying to say. There cannot be some things that
> some parts of the GPL apply to and some don't. There is no "sort of GPL,
> sort of not" that applies to some parts of some submissions. If something is
> part of or all of a submission made under the GPL, then all of the GPL
> applies to it.

The metadata licensing isn't clear in my view.

I think what you are more likely to get sensible results with is arguing
estoppel ? That was always the intent of that DCO wording. To ensure that
rights or otherwise you couldn't turn around and say "hey you published
my name and I didn't expect that implied by my actions".

However publishing a name and performing data processing on personal data
databases for other purposes is not the same thing at least in some
jurisdictions. In the EU you collect data "for a purpose".

Alan

2008-07-30 23:34:40

by David Schwartz

[permalink] [raw]
Subject: RE: 463 kernel developers missing!


> > Precisely. And others who wish to exercise rights under the GPL
> > forfeit any
> > legal mechanism (whether copyright, DMCA, contract, data
> > privacy laws, or
> > whatever theory) to impose "further restrictions" on those who wish to
> > similarly use GPL works.

> I don't know where you get that paticular idea from. Try sending GPL code
> from the USA to Cuba. Seems the US government is using GPL code but
> imposing further restrictions...

I'm not sure how you think this is relevent. I could go to the effort of
explaining in detail why it's irrelevent, but I can't imagine you intended
this comment as a genuine response in good faith.

For one thing, even if this was a violation of the GPL, there would be no
recourse. The only conceivable recourse would be a suit by an author for
copyright infringement. The government has sovereign immunity against such a
claim. The government is immune because it has sovereign immunity. Jon is
immune because the GPL grants him the right he is exercising. (Of course, it
can't make him immune from any laws he violates, but my argument is that
because he has consent he isn't violating any laws.)

> > have no control. But you cannot invoke copyright -- or any
> > other law -- to
> > restrict someone else's exercise of rights granter by the GPL. You get
> > copyright, but you give up it all. No "further restrictions", period.

> Some rights in laws are absolute. I cannot "give up" my right to be
> identified as the author of a work I create in many countries. Its an
> absolute.

Yes, but you can give up your right to pursue that right. And certainly some
terms of the GPL might be unenforceable in some jurisdictions. But the GPL
says Jon can do what he's doing, and it means what he says. As I said, I
don't know the data privacy laws in your jurisdiction, but I do know the GPL
made you give up your right to use them to impose restrictions on Jon's
imposition of his GPL rights.

You may or may not be able to stop some operation of law from happening. You
are not responsible for things outside your control. And some jurisdictions
may find some GPL terms unconscionable when used in this way.

> > When you submit a unit to a GPL project, you place that unit
> > under the GPL.
> > That is what the DCoO is trying to say. There cannot be some things that
> > some parts of the GPL apply to and some don't. There is no "sort of GPL,
> > sort of not" that applies to some parts of some submissions. If
> > something is
> > part of or all of a submission made under the GPL, then all of the GPL
> > applies to it.

> The metadata licensing isn't clear in my view.

Perhaps you can invent some other meaning it might have and then claim it's
unclear because it can mean that. But I don't think it matters. The GPL is
really what matters here, at least in my opinion. The GPL is clearly all of
apiece -- it either applies to something or it doesn't. And if you want to
argue that people must parse GPL submissions to figure out what's really
covered by the GPL and what's not, you can certainly argue that. I find that
argument fairly unconvincing.

> I think what you are more likely to get sensible results with is arguing
> estoppel ? That was always the intent of that DCO wording. To ensure that
> rights or otherwise you couldn't turn around and say "hey you published
> my name and I didn't expect that implied by my actions".

> However publishing a name and performing data processing on personal data
> databases for other purposes is not the same thing at least in some
> jurisdictions. In the EU you collect data "for a purpose".

GPL submissions are for the purposes specified in the GPL -- so that other
people may freely redistribute, copy, and modify them. You forfeit the right
to claim you made GPL submissions "for a purpose" as the GPL specifically
requires you to consent to their use for any purpose (save those the GPL
itself prohibits, of course).

DS

2008-07-31 04:20:42

by Kyle Moffett

[permalink] [raw]
Subject: Re: 463 kernel developers missing!

STOP!!! This is seriously just getting silly...

If people *really* care about the privacy of information they placed
in publicly accessible databases via agreement with the DCO, then
there is a workaround:

Instead of a "mailmap" file, use a "mailhash" file like this:

[...lines...]
4db83f457ca750b3ed0bb7db2375cfd41846fb43 Kyle Moffett <[email protected]>
[...more lines...]


That SHA1 checksum is of the name-and-email you are mapping *from* and
the value on the right is the string to replace it with. For all the
people who don't like their emails being displayed when somebody looks
at logs, you can just get your entries in that file changed to
"anonymous". Then the people who want useful statistics will ignore
your commits and people who want to look at logs will just use the
newly-added --no-mailhash option to see the
"<[email protected]>" that you happened to put in the
Signed-off-by.

Alternatively people could realize it's not worth it and just go write
real code or something.

PS. Just to show how easy it was, I converted the mailmap file that
was sent out into the above mailhash file with a perl one-liner
(WARNING: probably linewrapped):

perl -MDigest::SHA1=sha1_hex -n -e 'chomp; s/\s+/ /g; s/^ //; s/ $//;
print sha1_hex($_)." $_\n";' <mailmap >mailhash

Once you have the mailhash file, to convert from a "Name <email>" to
the actual desired representation you can run:

value="Name <email>"
sha1="$(echo -n "${value}" | sha1sum - | awk '{print $1}')"
line="$(sed -ne "/^${sha1} /{ s/^${sha1} //; p }" mailhash | head -n 1)"

Cheers,
Kyle Moffett

2008-07-31 07:02:14

by Stefan Richter

[permalink] [raw]
Subject: Re: 463 kernel developers missing!

David Schwartz wrote:
> because he has consent

Not. See below. Also remember that there are sometimes tags added to
the changelogs without having ensured that the respective person agrees
to this addition.

> if you want to
> argue that people must parse GPL submissions to figure out what's really
> covered by the GPL and what's not, you can certainly argue that.

The metadata (authorship, committership, changelog including sign-off
tags...) are not part of the submitted program source code.

The fact that I agreed to have aspects of my participation in the open
source project documented in the SCM does not imply an agreement that
these data may be copied into databases which serve other purposes.
--
Stefan Richter
-=====-==--- -=== =====
http://arcgraph.de/sr/

2008-07-31 20:59:38

by Willy Tarreau

[permalink] [raw]
Subject: Re: 463 kernel developers missing!

On Wed, Jul 30, 2008 at 11:32:29PM +0100, Alan Cox wrote:
> However publishing a name and performing data processing on personal data
> databases for other purposes is not the same thing at least in some
> jurisdictions. In the EU you collect data "for a purpose".

And in some countries (at least France) you need to declare the existence
of a list you constitute from personal data (names, addresses, etc...)
and the persons referenced in your list are always granted a right to
be removed upon a simple request. This right is scrupulously respected,
and I can say that I've successfully used it many times to be removed
from advertisers' lists.

Anyway, I don't care much about Jon's list right now.

Willy