2001-10-05 07:38:36

by Juha Siltala

[permalink] [raw]
Subject: Fw: Re: Past CREDITS files


Hi,

I replied to Alan but forgot to cc here. (I'm not on the list so please cc
if you want me to see something.)

Begin forwarded message:

Date: Fri, 5 Oct 2001 10:31:34 +0300
From: Juha Siltala <[email protected]>
To: Alan Cox <[email protected]>
Subject: Re: Past CREDITS files


On Thu, 04 Oct 2001 23:04:32 +0100 (BST)
Alan Cox <[email protected]> wrote:

> > I would like to examine the CREDITS files of all/most kernels released
> over
> > time. How could I get my hands on these? I want to study the
> accumulation
> > of contributors over the years. This is part of my masters thesis
> project.
>
> Download all the kernels. Be aware they are
> - Wildly inaccurate
> - Started becoming accurate later on
> - Were subject to significant external effects (the RH IPO
> caused people to massively update/send in new CREDIT entries)
>
>
> They still represent a tiny subset of contributors. Especially the
> thousands
> who send in the odd small patch
>
> > BTW, when was the current twofold stable/devel numbering scheme
> started?
>
> See my historical mail archive.
> http://www.linux.org.uk/Old-LK/Old-linux-kernel
>
> Its in there somewhere 8)

I roamed the kernel archives and found no CREDITS files in very old (0.x)
kernels. I eased up my work by selecting just major versions from the
stable tree. Grepping those CREDITS gave some general data to back up a
simple statement that linux is a collaborative project, which has grown
bigger in time. Here's the data:

ver. date tar.bz2 size contributors

1.0 12.03.1994 993 k 80
1.2 06.03.1995 1.8 M 128
2.0 08.06.1996 4.5 M 190
2.2 25.01.1999 10.1 M 269
2.4 04.01.2001 18.9 M 391

Now this is not too much but a couple of developments are emerging:
checking out the geographical distribution of kernel hackers and some other
analysis based on the info that the files yield. I'm not the one doing this
but Dr. Silvonen ([email protected]). I'm looking for a good way
of extracting names from the kernel sources instead of CREDITS, since Dr
Silvonen seems to be really getting into this and is data hungry now :)

I've been getting a lot of warnings (from Brian Gerst, Horst von Brand, and
Mark Hahn and others) about the data above. For my own purposes, that is,
to just show that linux is not "witten by Linus Torvalds in 1991" like we
hear from the media, the data would do. But If we (Dr. Silvonen and perhaps
I too) are going to elaborate on this, we obviously need something more
reliable. Everyone puts their name in their files and patches right?

I'd think that studying _all_ the kernels would be necessary, only more
elaborate name extraction method for the source files (I haven't figured
out how to do it yet though).

Thanks for taking the time to point out these weaknesses in my method!
--
| Juha Siltala | Mail:[email protected] |
| Maahisentie 2K A8 | Tel : +358 8 554 3591 |
| 90550 Oulu, Finland | GSM : +358 40 718 4743 |



2001-10-05 08:48:15

by Alexander Viro

[permalink] [raw]
Subject: Re: Fw: Re: Past CREDITS files



On Fri, 5 Oct 2001, Juha Siltala wrote:

> reliable. Everyone puts their name in their files and patches right?

Wrong.

2001-10-05 14:37:03

by Horst H. von Brand

[permalink] [raw]
Subject: Re: Fw: Re: Past CREDITS files


[...]

> Now this is not too much but a couple of developments are emerging:
> checking out the geographical distribution of kernel hackers and some other
> analysis based on the info that the files yield. I'm not the one doing this
> but Dr. Silvonen ([email protected]). I'm looking for a good way
> of extracting names from the kernel sources instead of CREDITS, since Dr
> Silvonen seems to be really getting into this and is data hungry now :)

Check the list of people on lkml, I think that is a much more accurate list
of "current developers" than CREDITS. Or look at who posted on the list.
You might classify by includes/doesn't include a patch, perhaps (people who
comment on a patch are helping development too... only flamers don't ;)

> I've been getting a lot of warnings (from Brian Gerst, Horst von Brand, and
> Mark Hahn and others) about the data above. For my own purposes, that is,
> to just show that linux is not "witten by Linus Torvalds in 1991" like we
> hear from the media, the data would do. But If we (Dr. Silvonen and perhaps
> I too) are going to elaborate on this, we obviously need something more
> reliable. Everyone puts their name in their files and patches right?

No... I have posted several patches, most of those that did get included
went in without my name (to be fair, they were mostly very small/simple).
Others I sent directly to the maintainers, some went in with my name on a
changelog or in the modified file(s), others without any mention at all.
Policy on this clearly varies from maintainer to maintainer (and perhaps
phase of the moon), so your idea will give data skewed at least by
subsystem.
--
Dr. Horst H. von Brand Usuario #22616 counter.li.org
Departamento de Informatica Fono: +56 32 654431
Universidad Tecnica Federico Santa Maria +56 32 654239
Casilla 110-V, Valparaiso, Chile Fax: +56 32 797513

2001-10-05 14:56:34

by Dave Jones

[permalink] [raw]
Subject: Re: Fw: Re: Past CREDITS files

On Fri, 5 Oct 2001, Juha Siltala wrote:

> Now this is not too much but a couple of developments are emerging:
> checking out the geographical distribution of kernel hackers

Two points to be aware of.
- Kernel hackers move sometimes :)
Be sure to associate the two (or more) addresses of any hacker
with one person. Automating this may be quite difficult in some
cases.. My data from a year or so ago was completely different
to current. I think every field has changed since then.
On the other hand, it may be interesting to see the data tracking
hackers movements over the past few years :)

- Tracking by snail mail address (where present) is more accurate
than by email address TLD.
(Mine states .de, but I'm actually in London, UK for example)

> Thanks for taking the time to point out these weaknesses in my method!

Happy to help.

regards,

Dave.

--
| Dave Jones. http://www.suse.de/~davej
| SuSE Labs

2001-10-05 16:27:45

by Juha Siltala

[permalink] [raw]
Subject: Re: Fw: Re: Past CREDITS files

On Fri, 05 Oct 2001 10:35:53 -0400
Horst von Brand <[email protected]> wrote:

(among other things):

> Policy on this clearly varies from maintainer to maintainer (and perhaps
> phase of the moon), so your idea will give data skewed at least by
> subsystem.

This is something I'd be interested in as well. When did Linus start to
give out the maintenance of certain parts of the kernel to others? Does
this coincide with the famous "Linus doesn't scale" debate that Glyn Moody
describes as the moment Linux almost forked? And was this very difficult?
--
| Juha Siltala | Mail:[email protected] |
| Maahisentie 2K A8 | Tel : +358 8 554 3591 |
| 90550 Oulu, Finland | GSM : +358 40 718 4743 |


2001-10-06 09:58:22

by David Woodhouse

[permalink] [raw]
Subject: Re: Fw: Re: Past CREDITS files


[email protected] said:
> This is something I'd be interested in as well. When did Linus start
> to give out the maintenance of certain parts of the kernel to others?
> Does this coincide with the famous "Linus doesn't scale" debate that
> Glyn Moody describes as the moment Linux almost forked? And was this
> very difficult?

Individual filesystems/drivers/subsystems/architectures/protocols have had
their own maintainers for years - but those maintainers still have to submit
patches, repeatedly if needs be, to Linus for final approval.

--
dwmw2