2007-06-14 06:16:58

by Oleg Verych

[permalink] [raw]
Subject: regression tracking (Re: Linux 2.6.21)

* Newsgroups: gmane.linux.kernel
* Date: Sun, 29 Apr 2007 10:50:22 -0700 (PDT)

* From: Linus Torvalds
>
> On Sun, 29 Apr 2007, Andi Kleen wrote:
>>
>> Besides the primary point of bug tracking is not to be friendly
>> to someone, but to (a) fix the bugs and (b) know how many bugs
>> there for a given release. Any replacement would need to solve
>> this problem too.
>>
>> Email does not solve it as far as I can see.
>
> Email fixes a _lot_ more bugs than bugzilla does.
>
> End of story. I don't think anybody who cannot accept that UNDENIABLE FACT
> should even participate in this discussion. Wake up and look at all the
> bugs we fix - most of them have never been in bugzilla.
>
> That's a FACT.
>
> Don't go around ignoring reality.

I'm seeing this long (198) thread and just have no idea how it has
ended (wiki? hand-mailing?).

Just two general questions to Adrian.

1) You was maintainer of the woody backports, isn't it[0]? Why you didn't
proposed (used) Debian's BTS as alternative to bugzilla, and how you did
your regression tracking?

What exactly doesn't fit? Full control by e-mail, comprehensive
management, ML handling/redirection, tagging, sorting, searching? Finally,
reportbug tool and web-inteface?

2) Your decision to stop activity, was that with debian because Sarge was
release with known security hole in the kernel[1]?

I'm just wonder.

[0] google: "woody backports Adrian Bunk"

[1] Message-ID: <[email protected]>
Xref: news.gmane.org gmane.linux.debian.devel.kernel:27730
[Just take your news readers and have fun with Gmane!]
[For those, who don't know what it is -- web :]
Archived-At: <http://permalink.gmane.org/gmane.linux.debian.devel.kernel/27730>
--*--
Unfortunately this message is from a man, who was punished in very
unfair manner by "fellow developers". I'm not trying to rise this
issue (sorry, if i'm trolling), just want to say, that life can be
very unfair, when some wrong people are in power...

Message-ID: <[email protected]>
Xref: news.gmane.org gmane.linux.debian.devel.project:12330
Archived-At: <http://permalink.gmane.org/gmane.linux.debian.devel.project/12330>
____


2007-06-14 15:33:52

by Stefan Richter

[permalink] [raw]
Subject: Re: regression tracking (Re: Linux 2.6.21)

Oleg Verych wrote:
> I'm seeing this long (198) thread and just have no idea how it has
> ended (wiki? hand-mailing?).

Direct or indirect results:

- See Michal Piotrowski's periodic posts and
http://kernelnewbies.org/known_regressions .

- Meanwhile, the people who maintain bugzilla.kernel.org seem to work
on improvements. I noticed that (a) each page now has a backlink to
the bugzilla.kernel.org start page, (b) the show_bug.cgi=... page
layout is now an unreadable mess, (c) e-mail integration is still
the same (it's impossible at least for me to send e-mails to bugs).

[...]
> Why you didn't proposed (used) Debian's BTS as alternative to bugzilla,
[...]

BTS has been mentioned in that thread in a few posts; mostly positively
as I recall.
--
Stefan Richter
-=====-=-=== -==- -===-
http://arcgraph.de/sr/

2007-06-14 16:27:18

by Oleg Verych

[permalink] [raw]
Subject: Re: regression tracking (Re: Linux 2.6.21)

On Thu, Jun 14, 2007 at 05:33:40PM +0200, Stefan Richter wrote:
[]
> [...]
> > Why you didn't proposed (used) Debian's BTS as alternative to bugzilla,
> [...]
>
> BTS has been mentioned in that thread in a few posts; mostly positively
> as I recall.

I know, that most developers here are not working/familiar with what
Debian has as its bug shooting weapon ``The system is mainly controlled
by e-mail, but the bug reports can be viewed using the WWW.''[0].

I thought somebody, who familiar with that, might propose to setup/tune
it, but not doing yet another NIH thing, especially from e-mail
integration POV. I doubt mozilla guys can think about it without
javascript and/or java servlets :)

[0] <http://www.debian.org/Bugs/>
____

2007-06-14 16:37:44

by Stefan Richter

[permalink] [raw]
Subject: Re: regression tracking (Re: Linux 2.6.21)

Oleg Verych wrote:
> I thought somebody, who familiar with that, might propose to setup/tune
> it, but not doing yet another NIH thing,

I may have missed something, but I recall that Adrian's bugtracking,
while it lasted, and now Michal's continuing it mostly came into being
because Adrian just started doing it and others soon found it very useful.
--
Stefan Richter
-=====-=-=== -==- -===-
http://arcgraph.de/sr/

2007-06-14 17:30:45

by Adrian Bunk

[permalink] [raw]
Subject: Re: regression tracking (Re: Linux 2.6.21)

On Thu, Jun 14, 2007 at 06:39:23PM +0200, Oleg Verych wrote:
> On Thu, Jun 14, 2007 at 05:33:40PM +0200, Stefan Richter wrote:
> []
> > [...]
> > > Why you didn't proposed (used) Debian's BTS as alternative to bugzilla,
> > [...]
> >
> > BTS has been mentioned in that thread in a few posts; mostly positively
> > as I recall.
>
> I know, that most developers here are not working/familiar with what
> Debian has as its bug shooting weapon ``The system is mainly controlled
> by e-mail, but the bug reports can be viewed using the WWW.''[0].
>
> I thought somebody, who familiar with that, might propose to setup/tune
> it, but not doing yet another NIH thing, especially from e-mail
> integration POV. I doubt mozilla guys can think about it without
> javascript and/or java servlets :)
>...

The problem isn't Bugzilla, and the Debian BTS wouldn't solve any
problem.

What is missing?

We need people who know one or more subsystems and who are willing to
regularly handle bug reports in their area.

And we need a release process that makes debugging, and if possible
fixing, all regressions prior to the release mandatory. You might never
come down to zero regressions and might not be able to handle all
last-minute reported regressions, but the 2.6.21 situation with 3 week
old known regressions not ever being debugged by a kernel developer
before the release has much room for improvements.

Changing the BTS would make sense if some core developers would state
that they would start using the BTS after this change. But otherwise it
doesn't matter which BTS to use.

cu
Adrian

--

"Is there not promise of rain?" Ling Tan asked suddenly out
of the darkness. There had been need of rain for many days.
"Only a promise," Lao Er said.
Pearl S. Buck - Dragon Seed

2007-06-14 20:21:24

by Oleg Verych

[permalink] [raw]
Subject: Re: regression tracking (Re: Linux 2.6.21)

On Thu, Jun 14, 2007 at 07:30:49PM +0200, Adrian Bunk wrote:
> On Thu, Jun 14, 2007 at 06:39:23PM +0200, Oleg Verych wrote:
[]
> > I know, that most developers here are not working/familiar with what
> > Debian has as its bug shooting weapon ``The system is mainly controlled
> > by e-mail, but the bug reports can be viewed using the WWW.''[0].
> >
> > I thought somebody, who familiar with that, might propose to setup/tune
> > it, but not doing yet another NIH thing, especially from e-mail
> > integration POV. I doubt mozilla guys can think about it without
> > javascript and/or java servlets :)
> >...
>
> The problem isn't Bugzilla, and the Debian BTS wouldn't solve any
> problem.
>
> What is missing?
>
> We need people who know one or more subsystems and who are willing to
> regularly handle bug reports in their area.

I think if somebody, by example will show how it can be handled in more
convenient way, that will eventually become mainstream. As we know,
nothing gets from vacuum just like that, without taking energy and time.
And my question was not about this social problem of acceptance, support
etc.

Linus had spent some time in this thread trying to explain what problems
are: as from that (social, think scheduler :) POV, as also from
"zarro bogs found" one.

Also, after i saw Linus' message about doing mostly tools last couple of
years, i wonder why you, Adrian, didn't think about your tools first,
before you've started regression tracking? You are not running in front
of a train, unlike you know who does, plus bugzilla issues are known for
years. Luckily Fedora kernel guys also upstream developers, thus lkml and
other MLs under their view.

After having read all that, i've asked you, my question, as the person
who supposedly used BTS as a maintainer.

Yes, in current form it might not be in suitable configuration, i.e.
kernel sub-systems instead of packages etc, anyway main thing is the way
BTS is handled. While i was looking and replying for bug reports in the
Debian kernel, that i saw in lkml, i've noticed, just how guys work with
it there. Now they even came up with tracking upstream bugzilla, it
seems [0]. I left that activity due to RL some months ago, but now trying
to catch up things again.

Thus it's just my curiosity about all this. And BTS is like, you know,
why not, if it fits by mostly all parameters?

[0] Message-ID: <[email protected]>
Xref: news.gmane.org gmane.linux.debian.devel.kernel:29426
Archived-At: <http://permalink.gmane.org/gmane.linux.debian.devel.kernel/29426>


> And we need a release process that makes debugging, and if possible
> fixing, all regressions prior to the release mandatory. You might never
> come down to zero regressions and might not be able to handle all
> last-minute reported regressions, but the 2.6.21 situation with 3 week
> old known regressions not ever being debugged by a kernel developer
> before the release has much room for improvements.
>
> Changing the BTS would make sense if some core developers would state
> that they would start using the BTS after this change. But otherwise it
> doesn't matter which BTS to use.

So, as i've wrote before: one must give them pretty-shiny tool, kindly
barking in their inboxes, instead of for example

"Guilty: **** ***** <????@****.com>",

as it was on the very beginning.
____

2007-06-14 20:46:13

by Adrian Bunk

[permalink] [raw]
Subject: Re: regression tracking (Re: Linux 2.6.21)

On Thu, Jun 14, 2007 at 10:33:38PM +0200, Oleg Verych wrote:
>...
> Also, after i saw Linus' message about doing mostly tools last couple of
> years, i wonder why you, Adrian, didn't think about your tools first,
> before you've started regression tracking? You are not running in front
> of a train, unlike you know who does, plus bugzilla issues are known for
> years. Luckily Fedora kernel guys also upstream developers, thus lkml and
> other MLs under their view.

My tool was a textfile with a text editor.
For a smaller amount of regressions that works fine.

And it's not that Linus started developing the Linux kernel with writing
git, the first 10 years of Linux development were without any SCM.

> After having read all that, i've asked you, my question, as the person
> who supposedly used BTS as a maintainer.
>
> Yes, in current form it might not be in suitable configuration, i.e.
> kernel sub-systems instead of packages etc, anyway main thing is the way
> BTS is handled. While i was looking and replying for bug reports in the
> Debian kernel, that i saw in lkml, i've noticed, just how guys work with
> it there. Now they even came up with tracking upstream bugzilla, it
> seems [0]. I left that activity due to RL some months ago, but now trying
> to catch up things again.
>...

Both the Debian BTS and Bugzilla are usable programs with their own
advantages and disadvantages.

I don't believe switching to the Debian BTS would solve any problem.

> > And we need a release process that makes debugging, and if possible
> > fixing, all regressions prior to the release mandatory. You might never
> > come down to zero regressions and might not be able to handle all
> > last-minute reported regressions, but the 2.6.21 situation with 3 week
> > old known regressions not ever being debugged by a kernel developer
> > before the release has much room for improvements.
> >
> > Changing the BTS would make sense if some core developers would state
> > that they would start using the BTS after this change. But otherwise it
> > doesn't matter which BTS to use.
>
> So, as i've wrote before: one must give them pretty-shiny tool, kindly
> barking in their inboxes, instead of for example
>
> "Guilty: **** ***** <????@****.com>",
>
> as it was on the very beginning.

A pretty-shiny tools wouldn't change anything.

What you need are humans debugging the regresssions and humans remining
other humans that they should debug the regressions.

cu
Adrian

--

"Is there not promise of rain?" Ling Tan asked suddenly out
of the darkness. There had been need of rain for many days.
"Only a promise," Lao Er said.
Pearl S. Buck - Dragon Seed

2007-06-15 23:21:43

by Linus Torvalds

[permalink] [raw]
Subject: Re: regression tracking (Re: Linux 2.6.21)



On Thu, 14 Jun 2007, Oleg Verych wrote:
>
> I'm seeing this long (198) thread and just have no idea how it has
> ended (wiki? hand-mailing?).

I'm hoping it's not "ended".

IOW, I really don't think we _resolved_ anything, although the work that
Adrian started is continuing through the wiki and other people trying to
track regressions, and that was obviously something good.

But I don't think we really know where we want to take this thing in the
long run. I think everybody wants a better bug-tracking system, but
whether something that makes people satisfied can even be built is open.
It sure doesn't seem to exist right now ;)

Linus

2007-06-15 23:41:57

by Adrian Bunk

[permalink] [raw]
Subject: Re: regression tracking (Re: Linux 2.6.21)

On Fri, Jun 15, 2007 at 04:20:57PM -0700, Linus Torvalds wrote:
>
>
> On Thu, 14 Jun 2007, Oleg Verych wrote:
> >
> > I'm seeing this long (198) thread and just have no idea how it has
> > ended (wiki? hand-mailing?).
>
> I'm hoping it's not "ended".
>
> IOW, I really don't think we _resolved_ anything, although the work that
> Adrian started is continuing through the wiki and other people trying to
> track regressions, and that was obviously something good.
>
> But I don't think we really know where we want to take this thing in the
> long run. I think everybody wants a better bug-tracking system, but
> whether something that makes people satisfied can even be built is open.
> It sure doesn't seem to exist right now ;)

The problem is not the bug tracking system, be it manual tracking in a
text file or a Wiki or be it in Bugzilla or any other bug tracking
system.

One problem is the lack of experienced developers willing to debug bug
reports.

But what really annoyed me was the missing integration of regression
tracking into the release process, IOW how _you_ handled the regression
lists.

If we want to offer something less of a disaster than 2.6.21, and if we
want to encourage people to start and continue testing -rc kernels, we
must try to fix as many reported regressions as reasonably possible.

This means going through every single point in the regression list
asking "Have we tried everything possible to solve this regression?".
There are very mysterious regressions and there are regressions that
might simply be reported too late. But if at the time of the final
release 3 week old regressions hadn't been debugged at all there's
definitely room for improvement. And mere mortals like me reminding
people is often not enough, sometimes an email by Linus Torvalds himself
asking a patch author or maintainer to look after a regression might be
required.

And a low hanging fruit to improve the release would be if you could
release one last -rc, wait for 48 hours, and then release either this
-rc unchanged as -final or another -rc (and wait another 48 hours).
There were at least two different regressions people ran into in 2.6.21
who successfully tested -rc7.

> Linus

cu
Adrian

--

"Is there not promise of rain?" Ling Tan asked suddenly out
of the darkness. There had been need of rain for many days.
"Only a promise," Lao Er said.
Pearl S. Buck - Dragon Seed

2007-06-16 01:20:27

by Oleg Verych

[permalink] [raw]
Subject: Re: regression tracking (Re: Linux 2.6.21)

On Sat, Jun 16, 2007 at 01:42:02AM +0200, Adrian Bunk wrote:
> On Fri, Jun 15, 2007 at 04:20:57PM -0700, Linus Torvalds wrote:
> >
> >
> > On Thu, 14 Jun 2007, Oleg Verych wrote:
> > >
> > > I'm seeing this long (198) thread and just have no idea how it has
> > > ended (wiki? hand-mailing?).
> >
> > I'm hoping it's not "ended".
> >
> > IOW, I really don't think we _resolved_ anything, although the work that
> > Adrian started is continuing through the wiki and other people trying to
> > track regressions, and that was obviously something good.
> >
> > But I don't think we really know where we want to take this thing in the
> > long run. I think everybody wants a better bug-tracking system, but
> > whether something that makes people satisfied can even be built is open.
> > It sure doesn't seem to exist right now ;)
>
> The problem is not the bug tracking system, be it manual tracking in a
> text file or a Wiki or be it in Bugzilla or any other bug tracking
> system.
>
> One problem is the lack of experienced developers willing to debug bug
> reports.

*debug*

I hope you saw what subject i've chosen to bring this discussion back.
Yes, "tracking", as the first brick for big wall.

Your arguments about developers and users, you've said already, but i've
asked different questions, have i?

Lets look on regular automatic report, like this one:

Message-ID: <[email protected]>
Xref: news.gmane.org gmane.linux.debian.devel.general:116248
Archived-At: <http://permalink.gmane.org/gmane.linux.debian.devel.general/116248

And what we see? Basic packages, like ``dpkg'', ``grub'', ``mc'' are
in the list, requesting help. And as you can see for quite some time.
And it's *OK*, because distribution is working, development is going on.
Sometimes slowly, sometimes with delays.

> But what really annoyed me was the missing integration of regression
> tracking into the release process, IOW how _you_ handled the regression
> lists.

So, _tracking_ or _debugging_?

> If we want to offer something less of a disaster than 2.6.21, and if we
> want to encourage people to start and continue testing -rc kernels, we
> must try to fix as many reported regressions as reasonably possible.

*tracking*

Despite of tools, Debian have such thing as long release cycles, so
called ``Debian sickness''. And reason, i see, is what you've just
pointed out: less disaster, zer0 RC bugs. Plus everybody is volunteer,
big chunk of bureaucracy-based decisions. And all this for about
15000 packages.

* + reporting*

One side Linux is facing is hardware, and that kind of thing is very-very
diverse. LKML traffic is huge, yet there's no suitable tracking and
reporting *tool*.

> This means going through every single point in the regression list
> asking "Have we tried everything possible to solve this regression?".
> There are very mysterious regressions and there are regressions that
> might simply be reported too late. But if at the time of the final
> release 3 week old regressions hadn't been debugged at all there's
> definitely room for improvement. And mere mortals like me reminding
> people is often not enough, sometimes an email by Linus Torvalds himself
> asking a patch author or maintainer to look after a regression might be
> required.

*social* (first approximation)

That's a social problem, just like Debian loosing good kernel team members.

For example you feel, that you've wasted time. But after all, if you've
came up with some kind of tool, everybody else could take it. No
problems, useful ideas must and will evolve. But _ideally_ this must not be
from ground zero every time. _Ideally_ from technical, not personal
point of view ;).

That's why people in Debian have started *team* maintenance with alioth.

Unfortunately problems with individuals in big machine with bad people,
got randomly elected, can't be solved (IMHO). Even LKML's rule "patches
are welcome", that is very technical, thus good, doesn't work there.

Finally, read the end of 2.6.21 release message ;)

> And a low hanging fruit to improve the release would be if you could
> release one last -rc, wait for 48 hours, and then release either this
> -rc unchanged as -final or another -rc (and wait another 48 hours).
> There were at least two different regressions people ran into in 2.6.21
> who successfully tested -rc7.

What about Linus' tree is a development tree, Andrew's one is a "crazy
development one" (quoting Linus)?

What about open (web page still exists) position on bug manager in
Google?

What about *volunteers* working from both developer's and user's
sides? What about "release is a reward" for everybody?

Balanced eco-system will oscillate. Be it .19(---++), .20(-++++),
.21(----+) *relese*. That's natural, unless pushed to extremes.

I think, i can trust Linus in it, and you are welcome too.

*tools*

That's why i'm talking about tools, and started to discuss them.

My last try: reportbug (there's "-ng" one also), Debian BTS.

Adrian, what can/must be done to adopt them? If not, your experience may
provide information about "why?" (re-consider my first mail about
background, please).
____

2007-06-16 02:55:15

by Adrian Bunk

[permalink] [raw]
Subject: Re: regression tracking (Re: Linux 2.6.21)

On Sat, Jun 16, 2007 at 03:32:36AM +0200, Oleg Verych wrote:
> On Sat, Jun 16, 2007 at 01:42:02AM +0200, Adrian Bunk wrote:
> > On Fri, Jun 15, 2007 at 04:20:57PM -0700, Linus Torvalds wrote:
> > >
> > >
> > > On Thu, 14 Jun 2007, Oleg Verych wrote:
> > > >
> > > > I'm seeing this long (198) thread and just have no idea how it has
> > > > ended (wiki? hand-mailing?).
> > >
> > > I'm hoping it's not "ended".
> > >
> > > IOW, I really don't think we _resolved_ anything, although the work that
> > > Adrian started is continuing through the wiki and other people trying to
> > > track regressions, and that was obviously something good.
> > >
> > > But I don't think we really know where we want to take this thing in the
> > > long run. I think everybody wants a better bug-tracking system, but
> > > whether something that makes people satisfied can even be built is open.
> > > It sure doesn't seem to exist right now ;)
> >
> > The problem is not the bug tracking system, be it manual tracking in a
> > text file or a Wiki or be it in Bugzilla or any other bug tracking
> > system.
> >
> > One problem is the lack of experienced developers willing to debug bug
> > reports.
>
> *debug*
>
> I hope you saw what subject i've chosen to bring this discussion back.
> Yes, "tracking", as the first brick for big wall.

Tracking regressions is not a real problem.
Especially since it doesn't require much technical knowledge.

> Your arguments about developers and users, you've said already, but i've
> asked different questions, have i?
>
> Lets look on regular automatic report, like this one:
>
> Message-ID: <[email protected]>
> Xref: news.gmane.org gmane.linux.debian.devel.general:116248
> Archived-At: <http://permalink.gmane.org/gmane.linux.debian.devel.general/116248
>
> And what we see? Basic packages, like ``dpkg'', ``grub'', ``mc'' are
> in the list, requesting help. And as you can see for quite some time.
> And it's *OK*, because distribution is working, development is going on.
> Sometimes slowly, sometimes with delays.

I sent weekly regression reports.
Not automatically generated but manually - but that doesn't matter.

The problem is that sending reports itself does not fix anything.

> > But what really annoyed me was the missing integration of regression
> > tracking into the release process, IOW how _you_ handled the regression
> > lists.
>
> So, _tracking_ or _debugging_?

_debugging_ (can be indirectly by poking other people to do debugging)

> > If we want to offer something less of a disaster than 2.6.21, and if we
> > want to encourage people to start and continue testing -rc kernels, we
> > must try to fix as many reported regressions as reasonably possible.
>
> *tracking*

no, *debugging*

I tracked regressions for the 2.6.21 disaster, and the not debugged
regressions that I had tracked are exactly where we should be better.

>...
> > This means going through every single point in the regression list
> > asking "Have we tried everything possible to solve this regression?".
> > There are very mysterious regressions and there are regressions that
> > might simply be reported too late. But if at the time of the final
> > release 3 week old regressions hadn't been debugged at all there's
> > definitely room for improvement. And mere mortals like me reminding
> > people is often not enough, sometimes an email by Linus Torvalds himself
> > asking a patch author or maintainer to look after a regression might be
> > required.
>
> *social* (first approximation)

Yes.

> That's a social problem, just like Debian loosing good kernel team members.

Different social problem.

> For example you feel, that you've wasted time. But after all, if you've
> came up with some kind of tool, everybody else could take it. No
> problems, useful ideas must and will evolve. But _ideally_ this must not be
> from ground zero every time. _Ideally_ from technical, not personal
> point of view ;).

As I expected, someone else has picked up regression tracking.
And other from what you claim, this did not require any kind of tool.

> That's why people in Debian have started *team* maintenance with alioth.

The problem for the Linux kernel is that for a better bug handling you'd
need people willing to learn other people's code and to do the hard work
of debugging bug reports. E.g. writing a new filesystem is simply much
more fun than learning and debugging other people's code in some old
filesystem.

Talking about "team maintenance" sounds nice, but the problem in the
kernel starts with code that has zero maintainers. And if there's
already a maintainer, it's unlikely that he'll not accept patches from
some new person debugging bug reports. But how to find people who will
become good maintainers?

> Unfortunately problems with individuals in big machine with bad people,
> got randomly elected, can't be solved (IMHO). Even LKML's rule "patches
> are welcome", that is very technical, thus good, doesn't work there.

Debian has it's own problems, Linux kernel development at least has a
working structure for getting decisions and regular releases.

> Finally, read the end of 2.6.21 release message ;)
>
> > And a low hanging fruit to improve the release would be if you could
> > release one last -rc, wait for 48 hours, and then release either this
> > -rc unchanged as -final or another -rc (and wait another 48 hours).
> > There were at least two different regressions people ran into in 2.6.21
> > who successfully tested -rc7.
>
> What about Linus' tree is a development tree, Andrew's one is a "crazy
> development one" (quoting Linus)?
>
> What about open (web page still exists) position on bug manager in
> Google?
>
> What about *volunteers* working from both developer's and user's
> sides? What about "release is a reward" for everybody?

People aren't that dumb that some empty words like "release is a reward"
would change anything.

> Balanced eco-system will oscillate. Be it .19(---++), .20(-++++),
> .21(----+) *relese*. That's natural, unless pushed to extremes.
>
> I think, i can trust Linus in it, and you are welcome too.
>
> *tools*
>
> That's why i'm talking about tools, and started to discuss them.
>
> My last try: reportbug (there's "-ng" one also), Debian BTS.
>
> Adrian, what can/must be done to adopt them? If not, your experience may
> provide information about "why?" (re-consider my first mail about
> background, please).

Bug tracking for the kernel is more or less working.
The main problem is getting people to debug bug reports.

We need the main problem fixed, not a different tool in an area that is
not the main problem.

cu
Adrian

--

"Is there not promise of rain?" Ling Tan asked suddenly out
of the darkness. There had been need of rain for many days.
"Only a promise," Lao Er said.
Pearl S. Buck - Dragon Seed

2007-06-16 04:51:29

by Oleg Verych

[permalink] [raw]
Subject: Re: regression tracking (Re: Linux 2.6.21)

[I've added Herbert as former kernel team member in the debian(AFAIK),
sorry, if i'm wrong and you have no opinion on that, Herbert.]

On Sat, Jun 16, 2007 at 04:55:16AM +0200, Adrian Bunk wrote:
> On Sat, Jun 16, 2007 at 03:32:36AM +0200, Oleg Verych wrote:
> > On Sat, Jun 16, 2007 at 01:42:02AM +0200, Adrian Bunk wrote:
> > > On Fri, Jun 15, 2007 at 04:20:57PM -0700, Linus Torvalds wrote:
> > > >
> > > >
> > > > On Thu, 14 Jun 2007, Oleg Verych wrote:
> > > > >
> > > > > I'm seeing this long (198) thread and just have no idea how it has
> > > > > ended (wiki? hand-mailing?).
> > > >
> > > > I'm hoping it's not "ended".
> > > >
> > > > IOW, I really don't think we _resolved_ anything, although the work that
> > > > Adrian started is continuing through the wiki and other people trying to
> > > > track regressions, and that was obviously something good.
> > > >
> > > > But I don't think we really know where we want to take this thing in the
> > > > long run. I think everybody wants a better bug-tracking system, but
> > > > whether something that makes people satisfied can even be built is open.
> > > > It sure doesn't seem to exist right now ;)
> > >
> > > The problem is not the bug tracking system, be it manual tracking in a
> > > text file or a Wiki or be it in Bugzilla or any other bug tracking
> > > system.
> > >
> > > One problem is the lack of experienced developers willing to debug bug
> > > reports.
> >
> > *debug*
> >
> > I hope you saw what subject i've chosen to bring this discussion back.
> > Yes, "tracking", as the first brick for big wall.
>
> Tracking regressions is not a real problem.
> Especially since it doesn't require much technical knowledge.

I've tried to express different point of view. We have different ones {0}.
Thus, no comments.

> > Your arguments about developers and users, you've said already, but i've
> > asked different questions, have i?
> >
> > Lets look on regular automatic report, like this one:
> >
> > Message-ID: <[email protected]>
> > Xref: news.gmane.org gmane.linux.debian.devel.general:116248
> > Archived-At: <http://permalink.gmane.org/gmane.linux.debian.devel.general/116248
> >
> > And what we see? Basic packages, like ``dpkg'', ``grub'', ``mc'' are
> > in the list, requesting help. And as you can see for quite some time.
> > And it's *OK*, because distribution is working, development is going on.
> > Sometimes slowly, sometimes with delays.
>
> I sent weekly regression reports.
> Not automatically generated but manually - but that doesn't matter.
>
> The problem is that sending reports itself does not fix anything.

...{1}

> > > But what really annoyed me was the missing integration of regression
> > > tracking into the release process, IOW how _you_ handled the regression
> > > lists.
> >
> > So, _tracking_ or _debugging_?
>
> _debugging_ (can be indirectly by poking other people to do debugging)
>
> > > If we want to offer something less of a disaster than 2.6.21, and if we
> > > want to encourage people to start and continue testing -rc kernels, we
> > > must try to fix as many reported regressions as reasonably possible.
> >
> > *tracking*
>
> no, *debugging*
>
> I tracked regressions for the 2.6.21 disaster, and the not debugged
> regressions that I had tracked are exactly where we should be better.

...{2}

> >...
> > > This means going through every single point in the regression list
> > > asking "Have we tried everything possible to solve this regression?".
> > > There are very mysterious regressions and there are regressions that
> > > might simply be reported too late. But if at the time of the final
> > > release 3 week old regressions hadn't been debugged at all there's
> > > definitely room for improvement. And mere mortals like me reminding
> > > people is often not enough, sometimes an email by Linus Torvalds himself
> > > asking a patch author or maintainer to look after a regression might be
> > > required.
> >
> > *social* (first approximation)
>
> Yes.
>
> > That's a social problem, just like Debian loosing good kernel team members.
>
> Different social problem.

The term ``like'' here means people are not able/willing to do work, they
might/can do. And cause of it is *social*, not technical. {1},{2} are
results of that problem/behavior. But according to {0}, you think,
differently.

> > For example you feel, that you've wasted time. But after all, if you've
> > came up with some kind of tool, everybody else could take it. No
> > problems, useful ideas must and will evolve. But _ideally_ this must not be
> > from ground zero every time. _Ideally_ from technical, not personal
> > point of view ;).
>
> As I expected, someone else has picked up regression tracking.
> And other from what you claim, this did not require any kind of tool.

So you expected good, doing bad. ``Bad'' means bringing pointless flame
about what everybody should do, without constructive approach like: "OK,
i can't do it due to my POV{0}, useless manual work. Everybody willing to
bring another way of dealing with it is welcome."

Your first reply:

"And it's not that Linus started developing the Linux kernel with writing
git, the first 10 years of Linux development were without any SCM." {3}

to my note about, that you are not hurry anywhere, that after all that
years of Open Source and Free Software development you are not trying
to deal with such important thing like regression/bug tracking in
***organized way***, is rather pointless.

> > That's why people in Debian have started *team* maintenance with alioth.
>
> The problem for the Linux kernel is that for a better bug handling you'd
> need people willing to learn other people's code and to do the hard work
> of debugging bug reports.

... {0}++

Do you know, for example, why i'm not making my "hacker's career"
doing that?

1. because i ended up with lynx, slrn, mutt, emacs-nox. Including
"zarro bogs found" kind of thing and other "userspace suck" problems.
2. i have no way to know if something *really* broken, unless it right
on my hardware

This all unlike Debian BTS using reportbug, where you have basic
information, mbox format messages for easy "mutt -f", and other funny
things, real maintainers aware of (i'm trying to know, learn about).

Thus organized, non brain-damaged way of reporting and tracking is the
key issue.

> E.g. writing a new filesystem is simply much more fun than learning and
> debugging other people's code in some old filesystem.

It's like in {3} -- i don't like it (personally), so i'm going along.

That's wrong and counter productive, as i've tried to explain.

> Talking about "team maintenance" sounds nice, but the problem in the
> kernel starts with code that has zero maintainers.

Floppy went pretty fine, before it was started to be maintained, and
you know that. But you also told that unmaintained and not-working are
different things.

Thus, if that just happen to break, well reports are welcome, and if
long time run will show, that user count is small, so let be it.
Nothing is long forever.

Positive side i think obvious, because Linus have found ext2 bug back
under the New Year tree (AFAIK), Thomas did his best on timers, etc.

For more productivity score system must be implemented and synchronized[0]
with distributions. Only after that *noise* filter, you may claim
importance of problems. Otherwise, you must know how noise affects human
beings.

(In the prev. e-mail i've mentioned such effort from one of the DDs:
[0] Message-ID: <[email protected]>
Xref: news.gmane.org gmane.linux.debian.devel.kernel:29426
Archived-At: <http://permalink.gmane.org/gmane.linux.debian.devel.kernel/29426>
)

> And if there's already a maintainer, it's unlikely that he'll not
> accept patches from some new person debugging bug reports. But how to
> find people who will become good maintainers?

... {0}++

> > Unfortunately problems with individuals in big machine with bad people,
> > got randomly elected, can't be solved (IMHO). Even LKML's rule "patches
> > are welcome", that is very technical, thus good, doesn't work there.
>
> Debian has it's own problems, Linux kernel development at least has a
> working structure for getting decisions and regular releases.
>
> > Finally, read the end of 2.6.21 release message ;)
> >
> > > And a low hanging fruit to improve the release would be if you could
> > > release one last -rc, wait for 48 hours, and then release either this
> > > -rc unchanged as -final or another -rc (and wait another 48 hours).
> > > There were at least two different regressions people ran into in 2.6.21
> > > who successfully tested -rc7.
> >
> > What about Linus' tree is a development tree, Andrew's one is a "crazy
> > development one" (quoting Linus)?
> >
> > What about open (web page still exists) position on bug manager in
> > Google?
> >
> > What about *volunteers* working from both developer's and user's
> > sides? What about "release is a reward" for everybody?
>
> People aren't that dumb that some empty words like "release is a reward"
> would change anything.

So, distro kernel teams, making .21 available to their user are ones?
That's simply pointless.

> > Balanced eco-system will oscillate. Be it .19(---++), .20(-++++),
> > .21(----+) *relese*. That's natural, unless pushed to extremes.
> >
> > I think, i can trust Linus in it, and you are welcome too.
> >
> > *tools*
> >
> > That's why i'm talking about tools, and started to discuss them.
> >
> > My last try: reportbug (there's "-ng" one also), Debian BTS.
> >
> > Adrian, what can/must be done to adopt them? If not, your experience may
> > provide information about "why?" (re-consider my first mail about
> > background, please).
>
> Bug tracking for the kernel is more or less working.
> The main problem is getting people to debug bug reports.
>
> We need the main problem fixed, not a different tool in an area that is
> not the main problem.

I see (this repetition). Maybe i've managed to express my POV, that it
can be seeing more cleanly.
____

2007-06-16 12:24:47

by Stefan Richter

[permalink] [raw]
Subject: Re: regression tracking (Re: Linux 2.6.21)

Oleg Verych wrote:
> On Sat, Jun 16, 2007 at 01:42:02AM +0200, Adrian Bunk wrote:
[...]
>> This means going through every single point in the regression list
>> asking "Have we tried everything possible to solve this regression?".
[...]
>> And a low hanging fruit to improve the release would be if you could
>> release one last -rc, wait for 48 hours, and then release either this
>> -rc unchanged as -final or another -rc (and wait another 48 hours).
>> There were at least two different regressions people ran into in 2.6.21
>> who successfully tested -rc7.
>
> What about Linus' tree is a development tree, Andrew's one is a "crazy
> development one" (quoting Linus)?
[...]

Linus also said that Andrew's tree is abused too often for broken stuff.

My goal for the little driver subsystem I'm maintaining is
- everything that Andrew pulls from me builds and runs and doesn't
introduce regressions to my and the submitters' knowledge. I am
slowly expanding my test procedures to catch things that fail that
goal.
- Everything that Linus pulls from me fulfills the above criteria
and, in addition, had reasonable time and publication for test and
review, depending on the kind of patch.

I had a few regressions in Linus' releases. None of them were known
before release. All of them were debugged and fixed rather soon after
report, AFAIR.

So what _I_ need is neither better regression tracking nor more manpower
for debugging of regression reports. What I need is more own spare time
and equipment for tests, more own knowledge and experience, and more
people who run-time test -rc kernels or at least my subsystem updates on
top of older kernels.

(Note, I'm talking only about regressions here, not old bugs.
There my requirements are different; the by far most important
one is more manpower for debugging and fixing.)

Well, if _other_ subsystems would get regressions in Linus' tree fixed
quicker, there might perhaps be more people who would consider to run
-rc kernels and would catch and report "my" regressions.

[Oleg, sorry that I too digressed from the subject of your thread, but
your remark about "[crazy] development tree"s caught my eye. IMO people
should care for quality already in Andrew's tree --- more so than at the
moment.]

[Adrian, I'm not saying "too few users run -rc kernels", I'm saying "too
few FireWire driver users run -rc kernels".]
--
Stefan Richter
-=====-=-=== -==- =----
http://arcgraph.de/sr/

2007-06-16 12:54:29

by Michal Piotrowski

[permalink] [raw]
Subject: Re: regression tracking (Re: Linux 2.6.21)

Hi Stefan,

On 16/06/07, Stefan Richter <[email protected]> wrote:
[..]
> Well, if _other_ subsystems would get regressions in Linus' tree fixed
> quicker, there might perhaps be more people who would consider to run
> -rc kernels and would catch and report "my" regressions.
[..]
>
> [Adrian, I'm not saying "too few users run -rc kernels", I'm saying "too
> few FireWire driver users run -rc kernels".]

Rafael is working on translation of "Linux Kernel Tester's Guide"
(it's almost finished). I hope you will get more -rc testers.

Regards,
Michal

--
LOG
http://www.stardust.webpages.pl/log/

2007-06-16 13:25:50

by Adrian Bunk

[permalink] [raw]
Subject: Re: regression tracking (Re: Linux 2.6.21)

On Sat, Jun 16, 2007 at 07:03:44AM +0200, Oleg Verych wrote:
>...
> On Sat, Jun 16, 2007 at 04:55:16AM +0200, Adrian Bunk wrote:
> > On Sat, Jun 16, 2007 at 03:32:36AM +0200, Oleg Verych wrote:
> > > On Sat, Jun 16, 2007 at 01:42:02AM +0200, Adrian Bunk wrote:
>...
> > > For example you feel, that you've wasted time. But after all, if you've
> > > came up with some kind of tool, everybody else could take it. No
> > > problems, useful ideas must and will evolve. But _ideally_ this must not be
> > > from ground zero every time. _Ideally_ from technical, not personal
> > > point of view ;).
> >
> > As I expected, someone else has picked up regression tracking.
> > And other from what you claim, this did not require any kind of tool.
>
> So you expected good, doing bad. ``Bad'' means bringing pointless flame
> about what everybody should do, without constructive approach like: "OK,
> i can't do it due to my POV{0}, useless manual work. Everybody willing to
> bring another way of dealing with it is welcome."
>
> Your first reply:
>
> "And it's not that Linus started developing the Linux kernel with writing
> git, the first 10 years of Linux development were without any SCM." {3}
>
> to my note about, that you are not hurry anywhere, that after all that
> years of Open Source and Free Software development you are not trying
> to deal with such important thing like regression/bug tracking in
> ***organized way***, is rather pointless.

I am dealing with bug tracking in the kernel Bugzilla.
I did regression tracking for the kernel.
Michal is currently tracking regressions.
Andrew is doing an enormous amount of work in these areas.

You might not see it because you are not active in this area, but it is
working in an organized way.

What is not working is getting all tracked bugs properly debugged.

> > > That's why people in Debian have started *team* maintenance with alioth.
> >
> > The problem for the Linux kernel is that for a better bug handling you'd
> > need people willing to learn other people's code and to do the hard work
> > of debugging bug reports.
>
> ... {0}++
>
> Do you know, for example, why i'm not making my "hacker's career"
> doing that?
>
> 1. because i ended up with lynx, slrn, mutt, emacs-nox. Including
> "zarro bogs found" kind of thing and other "userspace suck" problems.
> 2. i have no way to know if something *really* broken, unless it right
> on my hardware
>
> This all unlike Debian BTS using reportbug, where you have basic
> information, mbox format messages for easy "mutt -f", and other funny
> things, real maintainers aware of (i'm trying to know, learn about).
>
> Thus organized, non brain-damaged way of reporting and tracking is the
> key issue.

Bugzilla is a usable tool for bug and regression reporting and tracking
tracking.

I am using the Debian BTS since 8 years and I've used many Bugzillas.

Both are usable, and the real problem in kernel development is not
really related to the question which tool to use for bug tracking.

>...
> > Talking about "team maintenance" sounds nice, but the problem in the
> > kernel starts with code that has zero maintainers.
>
> Floppy went pretty fine, before it was started to be maintained, and
> you know that. But you also told that unmaintained and not-working are
> different things.

unmaintained != unused
user != developer
worked != went pretty fine

Stuff can easily get broken and noone looks after bugs if there's no
maintainer both knowing the code and willing to debug bugs.

The floppy driver is actually an example of code that has been broken
quite often by patches simply because noone who completely understands
this driver reviewed patches.

It somehow works and it might work for some years, but there is a
problem.

>...
> I see (this repetition). Maybe i've managed to express my POV, that it
> can be seeing more cleanly.

IMHO your point of view is simply not related to the real current
quality problems in kernel development.

cu
Adrian

--

"Is there not promise of rain?" Ling Tan asked suddenly out
of the darkness. There had been need of rain for many days.
"Only a promise," Lao Er said.
Pearl S. Buck - Dragon Seed

2007-06-17 00:44:31

by Adrian Bunk

[permalink] [raw]
Subject: Re: regression tracking (Re: Linux 2.6.21)

On Sat, Jun 16, 2007 at 02:23:25PM +0200, Stefan Richter wrote:
>...
> [Adrian, I'm not saying "too few users run -rc kernels", I'm saying "too
> few FireWire driver users run -rc kernels".]

Getting more people testing -rc kernels might be possible, and I don't
think it would be too hard. And not only FireWire would benefit from
this, remember e.g. that at least 2 out of the last 5 kernels Linus
released contained filesystem corruption regressions.

The problem is that we aren't able to handle the many regression reports
we get today, so asking for more testing and regression reports today
would attack it at the wrong part of the chain.

Additionally, every reported and unhandled regression will frustrate the
reporter - never forget that we have _many_ unhandled bug reports
(including but not limited to regression reports) where the submitter
spent much time and energy in writing a good bug report.

If we somehow gain the missing manpower for debugging regressions we can
actively ask for more testing. Missing manpower (of people knowing some
part of the kernel well) for debugging bug reports is IMHO the one big
source of quality problems in the Linux kernel. If we get this solved,
things like getting more testers for -rc kernels will become low hanging
fruits.

> Stefan Richter

cu
Adrian

--

"Is there not promise of rain?" Ling Tan asked suddenly out
of the darkness. There had been need of rain for many days.
"Only a promise," Lao Er said.
Pearl S. Buck - Dragon Seed

2007-06-17 09:42:05

by Michal Piotrowski

[permalink] [raw]
Subject: [PATCH] (Re: regression tracking (Re: Linux 2.6.21))

Hi all,

Adrian Bunk pisze:
> On Sat, Jun 16, 2007 at 02:23:25PM +0200, Stefan Richter wrote:
>> ...
>> [Adrian, I'm not saying "too few users run -rc kernels", I'm saying "too
>> few FireWire driver users run -rc kernels".]
>
> Getting more people testing -rc kernels might be possible, and I don't
> think it would be too hard. And not only FireWire would benefit from
> this, remember e.g. that at least 2 out of the last 5 kernels Linus
> released contained filesystem corruption regressions.
>
> The problem is that we aren't able to handle the many regression reports
> we get today, so asking for more testing and regression reports today
> would attack it at the wrong part of the chain.
>
> Additionally, every reported and unhandled regression will frustrate the
> reporter - never forget that we have _many_ unhandled bug reports
> (including but not limited to regression reports) where the submitter
> spent much time and energy in writing a good bug report.
>
> If we somehow gain the missing manpower for debugging regressions we can
> actively ask for more testing. Missing manpower (of people knowing some
> part of the kernel well) for debugging bug reports is IMHO the one big
> source of quality problems in the Linux kernel. If we get this solved,
> things like getting more testers for -rc kernels will become low hanging
> fruits.

Adrian, I agree with _all_ your points.

I bet that developers will hate me for this.

Please consider for 2.6.23

Regards,
Michal

--
LOG
http://www.stardust.webpages.pl/log/

Signed-off-by: Michal Piotrowski <[email protected]>

--- linux-work-clean/Documentation/SubmitChecklist 2007-06-17 11:18:37.000000000 +0200
+++ linux-work/Documentation/SubmitChecklist 2007-06-17 11:29:26.000000000 +0200
@@ -90,3 +90,8 @@ kernel patches.
patch style checker prior to submission (scripts/checkpatch.pl).
You should be able to justify all violations that remain in
your patch.
+
+
+
+If the patch introduces a new regression and this regression was not fixed
+in seven days, then the patch will be reverted.

2007-06-17 10:04:55

by Andrew Morton

[permalink] [raw]
Subject: Re: [PATCH] (Re: regression tracking (Re: Linux 2.6.21))

On Sun, 17 Jun 2007 11:41:36 +0200 Michal Piotrowski <[email protected]> wrote:

> +If the patch introduces a new regression and this regression was not fixed
> +in seven days, then the patch will be reverted.

Those regressions where we know which patch caused them are the easy ones.
Often we don't know which patch (or even which subsystem merge) is at
fault.

I think. How many of the present 2.6.22-rc regressions which you're
presently tracking have such a well-identified cause?

2007-06-17 10:22:37

by Michal Piotrowski

[permalink] [raw]
Subject: Re: [PATCH] (Re: regression tracking (Re: Linux 2.6.21))

On 17/06/07, Andrew Morton <[email protected]> wrote:
> On Sun, 17 Jun 2007 11:41:36 +0200 Michal Piotrowski <[email protected]> wrote:
>
> > +If the patch introduces a new regression and this regression was not fixed
> > +in seven days, then the patch will be reverted.
>
> Those regressions where we know which patch caused them are the easy ones.
> Often we don't know which patch (or even which subsystem merge) is at
> fault.
>
> I think. How many of the present 2.6.22-rc regressions which you're
> presently tracking have such a well-identified cause?
>

Here lays the problem.

git-bisect is a killer app, people should start using it.

Regards,
Michal

--
LOG
http://www.stardust.webpages.pl/log/

2007-06-17 11:34:55

by Oleg Verych

[permalink] [raw]
Subject: Re: [PATCH] (Re: regression tracking (Re: Linux 2.6.21))

On Sun, Jun 17, 2007 at 12:22:26PM +0200, Michal Piotrowski wrote:
> On 17/06/07, Andrew Morton <[email protected]> wrote:
> >On Sun, 17 Jun 2007 11:41:36 +0200 Michal Piotrowski
> ><[email protected]> wrote:
> >
> >> +If the patch introduces a new regression and this regression was not
> >fixed
> >> +in seven days, then the patch will be reverted.
> >
> >Those regressions where we know which patch caused them are the easy ones.
> >Often we don't know which patch (or even which subsystem merge) is at
> >fault.
> >
> >I think. How many of the present 2.6.22-rc regressions which you're
> >presently tracking have such a well-identified cause?
> >
>
> Here lays the problem.
>
> git-bisect is a killer app, people should start using it.

It's OK _only_ in case of unknown, hard to find *hardware* bugs.

If you think it's "a good thing" for bad, untested by developer
code, then something is completely wrong.

And if there's no debugger in the mainline kernel, which is developer's
tool, then why do you think testers must stick with git-bisect, as their
debugger-like tool (bandwidth in most and time consuming in some cases)?

That's wrong if developers are tending to reply only one thing --
git-bisect.

If things are going to be that bad, then better to start dealing with the
cause, not consequences. In this situation requesting test-cases is a
better way, as it's going to influence developer as cause of potential
problems. If tests will show *hardware* side of problem, then, well some
parts may be not obvious, thus bisecting is a way to continue.

Sorry if i'm from the abnormally different side yet one more time.
____

2007-06-17 11:55:28

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: [PATCH] (Re: regression tracking (Re: Linux 2.6.21))

On Sunday, 17 June 2007 12:22, Michal Piotrowski wrote:
> On 17/06/07, Andrew Morton <[email protected]> wrote:
> > On Sun, 17 Jun 2007 11:41:36 +0200 Michal Piotrowski <[email protected]> wrote:
> >
> > > +If the patch introduces a new regression and this regression was not fixed
> > > +in seven days, then the patch will be reverted.
> >
> > Those regressions where we know which patch caused them are the easy ones.

Except when the bisection points us to a patch exposing a bug that is present
regardless (see http://lkml.org/lkml/2007/6/13/273 for example).

Besides, if a patch is merged before -rc1 as a bugfix, there are several
patches depending on it and only after -rc5 has been released we find out
that it breaks someone's system, then reverting it is not a solution, IMO.

> > Often we don't know which patch (or even which subsystem merge) is at
> > fault.
> >
> > I think. How many of the present 2.6.22-rc regressions which you're
> > presently tracking have such a well-identified cause?
> >
>
> Here lays the problem.
>
> git-bisect is a killer app, people should start using it.

People should test _all_ of the -rc kernels and report problems. Otherwise, we
may assume that there are no problems and go on.

Greetings,
Rafael


--
"Premature optimization is the root of all evil." - Donald Knuth

2007-06-17 12:07:11

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: [PATCH] (Re: regression tracking (Re: Linux 2.6.21))

On Sunday, 17 June 2007 13:47, Oleg Verych wrote:
> On Sun, Jun 17, 2007 at 12:22:26PM +0200, Michal Piotrowski wrote:
> > On 17/06/07, Andrew Morton <[email protected]> wrote:
> > >On Sun, 17 Jun 2007 11:41:36 +0200 Michal Piotrowski
> > ><[email protected]> wrote:
> > >
> > >> +If the patch introduces a new regression and this regression was not
> > >fixed
> > >> +in seven days, then the patch will be reverted.
> > >
> > >Those regressions where we know which patch caused them are the easy ones.
> > >Often we don't know which patch (or even which subsystem merge) is at
> > >fault.
> > >
> > >I think. How many of the present 2.6.22-rc regressions which you're
> > >presently tracking have such a well-identified cause?
> > >
> >
> > Here lays the problem.
> >
> > git-bisect is a killer app, people should start using it.
>
> It's OK _only_ in case of unknown, hard to find *hardware* bugs.
>
> If you think it's "a good thing" for bad, untested by developer
> code, then something is completely wrong.

Oh, I've just fixed two purely software bugs pointed out by binary searching
in the code that I'm sure has been tested, not only by its developers, but the
bugs only showed up in my configuration (on one out of four test boxes).

There are so many different kernel configurations possible that there's no way
a developer can test them all.

Greetings,
Rafael


--
"Premature optimization is the root of all evil." - Donald Knuth

2007-06-17 12:45:07

by Adrian Bunk

[permalink] [raw]
Subject: Re: [PATCH] (Re: regression tracking (Re: Linux 2.6.21))

On Sun, Jun 17, 2007 at 11:41:36AM +0200, Michal Piotrowski wrote:
> Hi all,
>
> Adrian Bunk pisze:
>> On Sat, Jun 16, 2007 at 02:23:25PM +0200, Stefan Richter wrote:
>>> ...
>>> [Adrian, I'm not saying "too few users run -rc kernels", I'm saying "too
>>> few FireWire driver users run -rc kernels".]
>> Getting more people testing -rc kernels might be possible, and I don't
>> think it would be too hard. And not only FireWire would benefit from this,
>> remember e.g. that at least 2 out of the last 5 kernels Linus released
>> contained filesystem corruption regressions.
>> The problem is that we aren't able to handle the many regression reports
>> we get today, so asking for more testing and regression reports today
>> would attack it at the wrong part of the chain.
>> Additionally, every reported and unhandled regression will frustrate the
>> reporter - never forget that we have _many_ unhandled bug reports
>> (including but not limited to regression reports) where the submitter
>> spent much time and energy in writing a good bug report.
>> If we somehow gain the missing manpower for debugging regressions we can
>> actively ask for more testing. Missing manpower (of people knowing some
>> part of the kernel well) for debugging bug reports is IMHO the one big
>> source of quality problems in the Linux kernel. If we get this solved,
>> things like getting more testers for -rc kernels will become low hanging
>> fruits.
>
> Adrian, I agree with _all_ your points.
>
> I bet that developers will hate me for this.
>
> Please consider for 2.6.23

Fine with me, but:

There are not so simple cases like big infrastructure patches with
20 other patches in the tree depending on it causing a regression, or
even worse, a big infrastructure patch exposing a latent old bug in some
completely different area of the kernel.

And we should be aware that reverting is only a workaround for the real
problem which lies in our bug handling.

> Regards,
> Michal
>...

cu
Adrian

--

"Is there not promise of rain?" Ling Tan asked suddenly out
of the darkness. There had been need of rain for many days.
"Only a promise," Lao Er said.
Pearl S. Buck - Dragon Seed

2007-06-17 13:18:14

by Michal Piotrowski

[permalink] [raw]
Subject: Re: [PATCH] (Re: regression tracking (Re: Linux 2.6.21))

On 17/06/07, Adrian Bunk <[email protected]> wrote:
> On Sun, Jun 17, 2007 at 11:41:36AM +0200, Michal Piotrowski wrote:
> > Hi all,
> >
> > Adrian Bunk pisze:
> >> On Sat, Jun 16, 2007 at 02:23:25PM +0200, Stefan Richter wrote:
> >>> ...
> >>> [Adrian, I'm not saying "too few users run -rc kernels", I'm saying "too
> >>> few FireWire driver users run -rc kernels".]
> >> Getting more people testing -rc kernels might be possible, and I don't
> >> think it would be too hard. And not only FireWire would benefit from this,
> >> remember e.g. that at least 2 out of the last 5 kernels Linus released
> >> contained filesystem corruption regressions.
> >> The problem is that we aren't able to handle the many regression reports
> >> we get today, so asking for more testing and regression reports today
> >> would attack it at the wrong part of the chain.
> >> Additionally, every reported and unhandled regression will frustrate the
> >> reporter - never forget that we have _many_ unhandled bug reports
> >> (including but not limited to regression reports) where the submitter
> >> spent much time and energy in writing a good bug report.
> >> If we somehow gain the missing manpower for debugging regressions we can
> >> actively ask for more testing. Missing manpower (of people knowing some
> >> part of the kernel well) for debugging bug reports is IMHO the one big
> >> source of quality problems in the Linux kernel. If we get this solved,
> >> things like getting more testers for -rc kernels will become low hanging
> >> fruits.
> >
> > Adrian, I agree with _all_ your points.
> >
> > I bet that developers will hate me for this.
> >
> > Please consider for 2.6.23
>
> Fine with me, but:
>
> There are not so simple cases like big infrastructure patches with
> 20 other patches in the tree depending on it causing a regression, or
> even worse, a big infrastructure patch exposing a latent old bug in some
> completely different area of the kernel.

It is different case.

"If the patch introduces a new regression"

introduces != exposes an old bug

Removal of 20 patches will be painful, but sometimes you need to
"choose minor evil to prevent a greater one" [1].

> And we should be aware that reverting is only a workaround for the real
> problem which lies in our bug handling.
>
> > Regards,
> > Michal
> >...
>
> cu
> Adrian
>
> --
>
> "Is there not promise of rain?" Ling Tan asked suddenly out
> of the darkness. There had been need of rain for many days.
> "Only a promise," Lao Er said.
> Pearl S. Buck - Dragon Seed
>
>

Regards,
Michal

[1] the quote from "The Last Wish/Minor Evil" by Andrzej Sapkowski :)

--
LOG
http://www.stardust.webpages.pl/log/

2007-06-17 14:04:10

by Stefan Richter

[permalink] [raw]
Subject: Re: [PATCH] (Re: regression tracking (Re: Linux 2.6.21))

Michal Piotrowski wrote:
> "choose minor evil to prevent a greater one"

The measurement of "evil" is subjective. That's why there are releases
with known regressions.
--
Stefan Richter
-=====-=-=== -==- =---=
http://arcgraph.de/sr/

2007-06-17 14:12:16

by Oleg Verych

[permalink] [raw]
Subject: Re: [PATCH] (Re: regression tracking (Re: Linux 2.6.21))

On Sun, Jun 17, 2007 at 02:13:39PM +0200, Rafael J. Wysocki wrote:
> On Sunday, 17 June 2007 13:47, Oleg Verych wrote:
[]
> > It's OK _only_ in case of unknown, hard to find *hardware* bugs.
> >
> > If you think it's "a good thing" for bad, untested by developer
> > code, then something is completely wrong.
>
> Oh, I've just fixed two purely software bugs pointed out by binary searching
> in the code that I'm sure has been tested, not only by its developers, but the
> bugs only showed up in my configuration (on one out of four test boxes).
>
> There are so many different kernel configurations possible that there's no way
> a developer can test them all.

With current state of affairs it's not only hard for developers, but
and for users: <[email protected]>,
<[email protected]>

I'm trying to re-do some kbuild stuff, but i'm getting rather offensive
answers :( <1182020654.8176.398.camel@chaos>

(Even if i'm academic with free Internet, i doubt i even tried to
think to improve something, if i didn't have one, because i wouldn't knew
huge lkml traffic, problems, etc.)

Maybe i'm wrong. But reducing amount of traffic/files and ease of
(re-)configuration are not last things to be done for better testing.
All for speed of getting and compiling kernel. Latter for avoiding
bugs and noise due to inconsistent build configuration.

Finally again, bug-reporting and tracking tools, i've tried to discuss
are major problems out there I think it's plain easy and deal with. One
more example:

<[email protected]>
Xref: news.gmane.org gmane.linux.debian.devel.kernel:28095
<http://permalink.gmane.org/gmane.linux.debian.devel.kernel/28095>
____

2007-06-17 14:29:44

by Adrian Bunk

[permalink] [raw]
Subject: How to improve the quality of the kernel?

On Sun, Jun 17, 2007 at 03:17:58PM +0200, Michal Piotrowski wrote:
> On 17/06/07, Adrian Bunk <[email protected]> wrote:
>...
>> Fine with me, but:
>>
>> There are not so simple cases like big infrastructure patches with
>> 20 other patches in the tree depending on it causing a regression, or
>> even worse, a big infrastructure patch exposing a latent old bug in some
>> completely different area of the kernel.
>
> It is different case.
>
> "If the patch introduces a new regression"
>
> introduces != exposes an old bug

My remark was meant as a note "this sentence can't handle all
regressions" (and for a user it doesn't matter whether a new
regression is introduced or an old regression exposed).

It could be we simply agree on this one. ;-)

> Removal of 20 patches will be painful, but sometimes you need to
> "choose minor evil to prevent a greater one" [1].
>
>> And we should be aware that reverting is only a workaround for the real
>> problem which lies in our bug handling.
>...

And this is something I want to emphasize again.

How can we make any progress with the real problem and not only the
symptoms?

There's now much money in the Linux market, and the kernel quality
problems might result in real costs in the support of companies like
IBM, SGI, Redhat or Novell (plus it harms the Linux image which might
result in lower revenues).

If [1] this is true, it might even pay pay off for them to each assign
X man hours per month of experienced kernel developers to upstream
kernel bug handling?

This is just a wild thought and it might be nonsense - better
suggestions for solving our quality problems would be highly welcome...

cu
Adrian

[1] note that this is an "if"

--

"Is there not promise of rain?" Ling Tan asked suddenly out
of the darkness. There had been need of rain for many days.
"Only a promise," Lao Er said.
Pearl S. Buck - Dragon Seed

2007-06-17 14:48:48

by Adrian Bunk

[permalink] [raw]
Subject: Re: [PATCH] (Re: regression tracking (Re: Linux 2.6.21))

On Sun, Jun 17, 2007 at 04:24:30PM +0200, Oleg Verych wrote:
> On Sun, Jun 17, 2007 at 02:13:39PM +0200, Rafael J. Wysocki wrote:
> > On Sunday, 17 June 2007 13:47, Oleg Verych wrote:
> []
> > > It's OK _only_ in case of unknown, hard to find *hardware* bugs.
> > >
> > > If you think it's "a good thing" for bad, untested by developer
> > > code, then something is completely wrong.
> >
> > Oh, I've just fixed two purely software bugs pointed out by binary searching
> > in the code that I'm sure has been tested, not only by its developers, but the
> > bugs only showed up in my configuration (on one out of four test boxes).
> >
> > There are so many different kernel configurations possible that there's no way
> > a developer can test them all.
>
> With current state of affairs it's not only hard for developers, but
> and for users: <[email protected]>,
> <[email protected]>

Uwe has an attitude that made many people (including Linus himself)
set their mail filters to deliver his emails directly to /dev/null.

Parts of the contents of his emails were usable including usable
regression reports - but the way he treats people simply disqualified
him.

> I'm trying to re-do some kbuild stuff, but i'm getting rather offensive
> answers :( <1182020654.8176.398.camel@chaos>
>...

I'm not seeing anything in Thomas' email that could be considered
offensive. He told you in a technical way why he disagrees with you.

If you call this email "rather offensive", you should _really_
unsubscribe from lkml (or even any Debian mailing lists). And this is
not meant against you, it's simply that for the standards of lkml there
is nothing offensive in this email.

cu
Adrian

--

"Is there not promise of rain?" Ling Tan asked suddenly out
of the darkness. There had been need of rain for many days.
"Only a promise," Lao Er said.
Pearl S. Buck - Dragon Seed

2007-06-17 16:15:29

by Michal Piotrowski

[permalink] [raw]
Subject: Re: How to improve the quality of the kernel?

On 17/06/07, Adrian Bunk <[email protected]> wrote:
> On Sun, Jun 17, 2007 at 03:17:58PM +0200, Michal Piotrowski wrote:
> > On 17/06/07, Adrian Bunk <[email protected]> wrote:
> >...
> >> Fine with me, but:
> >>
> >> There are not so simple cases like big infrastructure patches with
> >> 20 other patches in the tree depending on it causing a regression, or
> >> even worse, a big infrastructure patch exposing a latent old bug in some
> >> completely different area of the kernel.
> >
> > It is different case.
> >
> > "If the patch introduces a new regression"
> >
> > introduces != exposes an old bug
>
> My remark was meant as a note "this sentence can't handle all
> regressions" (and for a user it doesn't matter whether a new
> regression is introduced or an old regression exposed).
>
> It could be we simply agree on this one. ;-)
>
> > Removal of 20 patches will be painful, but sometimes you need to
> > "choose minor evil to prevent a greater one" [1].
> >
> >> And we should be aware that reverting is only a workaround for the real
> >> problem which lies in our bug handling.
> >...
>
> And this is something I want to emphasize again.
>
> How can we make any progress with the real problem and not only the
> symptoms?
>
> There's now much money in the Linux market, and the kernel quality
> problems might result in real costs in the support of companies like
> IBM, SGI, Redhat or Novell (plus it harms the Linux image which might
> result in lower revenues).
>
> If [1] this is true, it might even pay pay off for them to each assign
> X man hours per month of experienced kernel developers to upstream
> kernel bug handling?
>
> This is just a wild thought and it might be nonsense - better
> suggestions for solving our quality problems would be highly welcome...

Just one comment.

We don't try to recruit new skilled testers - it's a big problem.
Skilled tester can narrow down the problem, try to fix it etc. There
are too many "something between 2.6.10 and 2.6.21 broke my laptop"
reports...

Regards,
Michal

--
LOG
http://www.stardust.webpages.pl/log/

2007-06-17 16:27:56

by Stefan Richter

[permalink] [raw]
Subject: Re: How to improve the quality of the kernel?

Adrian Bunk wrote:
>>> And we should be aware that reverting is only a workaround for the real
>>> problem which lies in our bug handling.
>> ...
>
> And this is something I want to emphasize again.
>
> How can we make any progress with the real problem and not only the
> symptoms?
...

Perhaps make lists of

- bug reports which never lead to any debug activity
(no responsible person/team was found, or a seemingly person/team
did not start to debug the report)

- known regressions on release,

- regressions that became known after release,

- subsystems with notable backlogs of old bugs,

- other categories?

Select typical cases from each categories, analyze what went wrong in
these cases, and try to identify practicable countermeasures.

Another approach: Figure out areas where quality is exemplary and try
to draw conclusions for areas where quality is lacking.
--
Stefan Richter
-=====-=-=== -==- =---=
http://arcgraph.de/sr/

2007-06-17 16:47:19

by Michal Piotrowski

[permalink] [raw]
Subject: Re: How to improve the quality of the kernel?

On 17/06/07, Stefan Richter <[email protected]> wrote:
> Adrian Bunk wrote:
> >>> And we should be aware that reverting is only a workaround for the real
> >>> problem which lies in our bug handling.
> >> ...
> >
> > And this is something I want to emphasize again.
> >
> > How can we make any progress with the real problem and not only the
> > symptoms?
> ...
>
> Perhaps make lists of
>
> - bug reports which never lead to any debug activity
> (no responsible person/team was found, or a seemingly person/team
> did not start to debug the report)
>
> - known regressions on release,
>
> - regressions that became known after release,
>
> - subsystems with notable backlogs of old bugs,
>
> - other categories?

It is unworkable in wiki.

There is a new regression field in bugzilla, but it is only the first
step to implement regression tracking feature.

>
> Select typical cases from each categories, analyze what went wrong in
> these cases, and try to identify practicable countermeasures.
>
> Another approach: Figure out areas where quality is exemplary and try
> to draw conclusions for areas where quality is lacking.
> --
> Stefan Richter
> -=====-=-=== -==- =---=
> http://arcgraph.de/sr/
>

Regards,
Michal

--
LOG
http://www.stardust.webpages.pl/log/

2007-06-17 17:24:36

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: How to improve the quality of the kernel?

On Sunday, 17 June 2007 16:29, Adrian Bunk wrote:
> On Sun, Jun 17, 2007 at 03:17:58PM +0200, Michal Piotrowski wrote:
> > On 17/06/07, Adrian Bunk <[email protected]> wrote:
> >...
> >> Fine with me, but:
> >>
> >> There are not so simple cases like big infrastructure patches with
> >> 20 other patches in the tree depending on it causing a regression, or
> >> even worse, a big infrastructure patch exposing a latent old bug in some
> >> completely different area of the kernel.
> >
> > It is different case.
> >
> > "If the patch introduces a new regression"
> >
> > introduces != exposes an old bug
>
> My remark was meant as a note "this sentence can't handle all
> regressions" (and for a user it doesn't matter whether a new
> regression is introduced or an old regression exposed).
>
> It could be we simply agree on this one. ;-)
>
> > Removal of 20 patches will be painful, but sometimes you need to
> > "choose minor evil to prevent a greater one" [1].
> >
> >> And we should be aware that reverting is only a workaround for the real
> >> problem which lies in our bug handling.
> >...
>
> And this is something I want to emphasize again.
>
> How can we make any progress with the real problem and not only the
> symptoms?

I think that we can handle bug reports like we handle modifications of code.

Namely, for each subsystem there can be a person (or a team) responsible
for handling bugs, by which I don't mean fixing them, but directing bug reports
at the right developers or subsystem maintainers, following the history of each
bug report etc. [Of course, these people can choose to use the bugzilla or any
other bug tracking system they want, as long as it works for them.]

The email addresses of these people should be known (and even documented),
so that everyone can notify them if need be and so that it's clear who should
handle given bug reports.

Just an idea. :-)

Greetings,
Rafael


--
"Premature optimization is the root of all evil." - Donald Knuth

2007-06-17 17:42:49

by Natalie Protasevich

[permalink] [raw]
Subject: Re: How to improve the quality of the kernel?

On 6/17/07, Rafael J. Wysocki <[email protected]> wrote:
> On Sunday, 17 June 2007 16:29, Adrian Bunk wrote:
> > On Sun, Jun 17, 2007 at 03:17:58PM +0200, Michal Piotrowski wrote:
> > > On 17/06/07, Adrian Bunk <[email protected]> wrote:
> > >...
> > >> Fine with me, but:
> > >>
> > >> There are not so simple cases like big infrastructure patches with
> > >> 20 other patches in the tree depending on it causing a regression, or
> > >> even worse, a big infrastructure patch exposing a latent old bug in some
> > >> completely different area of the kernel.
> > >
> > > It is different case.
> > >
> > > "If the patch introduces a new regression"
> > >
> > > introduces != exposes an old bug
> >
> > My remark was meant as a note "this sentence can't handle all
> > regressions" (and for a user it doesn't matter whether a new
> > regression is introduced or an old regression exposed).
> >
> > It could be we simply agree on this one. ;-)
> >
> > > Removal of 20 patches will be painful, but sometimes you need to
> > > "choose minor evil to prevent a greater one" [1].
> > >
> > >> And we should be aware that reverting is only a workaround for the real
> > >> problem which lies in our bug handling.
> > >...
> >
> > And this is something I want to emphasize again.
> >
> > How can we make any progress with the real problem and not only the
> > symptoms?
>
> I think that we can handle bug reports like we handle modifications of code.
>
> Namely, for each subsystem there can be a person (or a team) responsible
> for handling bugs, by which I don't mean fixing them, but directing bug reports
> at the right developers or subsystem maintainers, following the history of each
> bug report etc. [Of course, these people can choose to use the bugzilla or any
> other bug tracking system they want, as long as it works for them.]
>
> The email addresses of these people should be known (and even documented),
> so that everyone can notify them if need be and so that it's clear who should
> handle given bug reports.
>
> Just an idea. :-)
>

Those are very good ideas indeed. The whole development process came
to the point when all realize that something needs to be done for the
team to balance out new development and old and recent unresolved
issues that are piling up...

I've looked through a number of bugzillas recently and here is my
scoop on shortcomings and some ideas. I am not sure how realistic they
are, probably might fall into "wishful thinking" category.

The way bugs get tracked and resolved is definitely a "no guarantee",
and main reasons are:
not enough time for a maintainer to attend to them all
nobody else (except at best very few busy people) knows about
majority of the problems. Andrew and Adrian and Michal post the most
pressing ones. But there are many many smaller ones that are not
assessed and not being taken care of.
many problems are not easily reproducible and not easy to verify
because there is no identical system, motherboard, application, etc.
in case if reporter doesn't stick around until the end of the bug's
life.

Maybe along with bugzilla there should be another tracking tool - for
resources and systems that are available to individual developers.
Someone might have same or similar system to verify fixes in case if
the reporter disappears or "the system is gone now". Requests for
specific hardware can be automatically generated by the bugzilla say.
Those can be posted once in a while for everyone to see and chip in
and acknowledge if they happen to have such hardware and able to run a
quick test to at least verify the patch. Statistically, such need
doesn't happen often for each type of hardware, so it shouldn't be a
big burden for owners.

Besides, the database and resources can be useful for developers who
want to test their new patches on variety of hardware. This might
prevent future regressions which often caused by lack of testing as we
all know.

There are problems that require more research and thinking such as
implementing new features or redesigning old ones. Those should be
posted as a wish list I think as invitation for constructive
discussion and as possible project for takers. They also can be
extracted from bugzilla, I ran into several ones in intermission state
like that.

And finally, the most wishful would probably be collecting test tools
that are written by and can be reused by and available to developers.
It's normally possible to find something on the Internet or write a
quick test program - and probably lots of people end up writing little
programs to allocate shared memory and exercise it in certain way or
some affinity tool etc. But sometimes people come up with pretty
elaborate ones - why won't we attempt to sort out those test programs,
have them contributed (and maybe not code reviewed! - just as is, take
it or leave it :) and have them handy for better and faster bug
fixing/testing. And again - there are times we wish for such tool or
emulator and don't have spare cycles, so those type of requests for
custom test scripts and programs can also be posted.

I also had on mind what to do about maintainers and project teams and
alternative contacts who can handle issues on a particular module or
subsystem. Probably list or database of volunteers can be arranged,
this is something that is really needed. I can relate after trying to
get hold of alternative people myself...

Regards,
--Natalie

2007-06-17 17:45:57

by David Lang

[permalink] [raw]
Subject: Re: [PATCH] (Re: regression tracking (Re: Linux 2.6.21))

On Sun, 17 Jun 2007, Oleg Verych wrote:

> On Sun, Jun 17, 2007 at 12:22:26PM +0200, Michal Piotrowski wrote:
>> On 17/06/07, Andrew Morton <[email protected]> wrote:
>>> On Sun, 17 Jun 2007 11:41:36 +0200 Michal Piotrowski
>>> <[email protected]> wrote:
>>>
>>>> +If the patch introduces a new regression and this regression was not
>>> fixed
>>>> +in seven days, then the patch will be reverted.
>>>
>>> Those regressions where we know which patch caused them are the easy ones.
>>> Often we don't know which patch (or even which subsystem merge) is at
>>> fault.
>>>
>>> I think. How many of the present 2.6.22-rc regressions which you're
>>> presently tracking have such a well-identified cause?
>>>
>>
>> Here lays the problem.
>>
>> git-bisect is a killer app, people should start using it.
>
> It's OK _only_ in case of unknown, hard to find *hardware* bugs.
>
> If you think it's "a good thing" for bad, untested by developer
> code, then something is completely wrong.
>
> And if there's no debugger in the mainline kernel, which is developer's
> tool, then why do you think testers must stick with git-bisect, as their
> debugger-like tool (bandwidth in most and time consuming in some cases)?
>
> That's wrong if developers are tending to reply only one thing --
> git-bisect.
>
> If things are going to be that bad, then better to start dealing with the
> cause, not consequences. In this situation requesting test-cases is a
> better way, as it's going to influence developer as cause of potential
> problems. If tests will show *hardware* side of problem, then, well some
> parts may be not obvious, thus bisecting is a way to continue.

most people who report bugs don't know enough about what's actually going
wrong to be able to write a test case (those that do can probably just
write a patch to fix it). Along similar lines a debugger wouldn't be of
much use either.

the fact that git-bisect doesn't require any knowledge other then
knowledge the reporter has demonstrated that they already have (the
ability to compile and install their own kernel) puts it within the reach
of testers.

unfortunantly, as good as it is it can take a lot of effort, especially if
the bug takes time to show up. it's not perfect, but it's a huge help.

and developers aren't always responding with 'do a bisect', sometimes they
respond with 'yes, we know about that' or 'that sounds like X', so it's
still worthwhile for people to report the problem first before going to
the ffort of doing a bisect.

David Lang

2007-06-17 18:11:16

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: How to improve the quality of the kernel?

On Sunday, 17 June 2007 19:42, Natalie Protasevich wrote:
> On 6/17/07, Rafael J. Wysocki <[email protected]> wrote:
> > On Sunday, 17 June 2007 16:29, Adrian Bunk wrote:
> > > On Sun, Jun 17, 2007 at 03:17:58PM +0200, Michal Piotrowski wrote:
> > > > On 17/06/07, Adrian Bunk <[email protected]> wrote:
> > > >...
> > > >> Fine with me, but:
> > > >>
> > > >> There are not so simple cases like big infrastructure patches with
> > > >> 20 other patches in the tree depending on it causing a regression, or
> > > >> even worse, a big infrastructure patch exposing a latent old bug in some
> > > >> completely different area of the kernel.
> > > >
> > > > It is different case.
> > > >
> > > > "If the patch introduces a new regression"
> > > >
> > > > introduces != exposes an old bug
> > >
> > > My remark was meant as a note "this sentence can't handle all
> > > regressions" (and for a user it doesn't matter whether a new
> > > regression is introduced or an old regression exposed).
> > >
> > > It could be we simply agree on this one. ;-)
> > >
> > > > Removal of 20 patches will be painful, but sometimes you need to
> > > > "choose minor evil to prevent a greater one" [1].
> > > >
> > > >> And we should be aware that reverting is only a workaround for the real
> > > >> problem which lies in our bug handling.
> > > >...
> > >
> > > And this is something I want to emphasize again.
> > >
> > > How can we make any progress with the real problem and not only the
> > > symptoms?
> >
> > I think that we can handle bug reports like we handle modifications of code.
> >
> > Namely, for each subsystem there can be a person (or a team) responsible
> > for handling bugs, by which I don't mean fixing them, but directing bug reports
> > at the right developers or subsystem maintainers, following the history of each
> > bug report etc. [Of course, these people can choose to use the bugzilla or any
> > other bug tracking system they want, as long as it works for them.]
> >
> > The email addresses of these people should be known (and even documented),
> > so that everyone can notify them if need be and so that it's clear who should
> > handle given bug reports.
> >
> > Just an idea. :-)
> >
>
> Those are very good ideas indeed. The whole development process came
> to the point when all realize that something needs to be done for the
> team to balance out new development and old and recent unresolved
> issues that are piling up...
>
> I've looked through a number of bugzillas recently and here is my
> scoop on shortcomings and some ideas. I am not sure how realistic they
> are, probably might fall into "wishful thinking" category.
>
> The way bugs get tracked and resolved is definitely a "no guarantee",
> and main reasons are:
> not enough time for a maintainer to attend to them all
> nobody else (except at best very few busy people) knows about
> majority of the problems. Andrew and Adrian and Michal post the most
> pressing ones. But there are many many smaller ones that are not
> assessed and not being taken care of.
> many problems are not easily reproducible and not easy to verify
> because there is no identical system, motherboard, application, etc.
> in case if reporter doesn't stick around until the end of the bug's
> life.

I agree. In addition, there is only a limited time window in which it makes
sense to debug given problem before the kernel changes too much (that of
course depends on the subsystem in question).

> Maybe along with bugzilla there should be another tracking tool - for
> resources and systems that are available to individual developers.

Yes, that would be very nice to have.

> Someone might have same or similar system to verify fixes in case if
> the reporter disappears or "the system is gone now". Requests for
> specific hardware can be automatically generated by the bugzilla say.
> Those can be posted once in a while for everyone to see and chip in
> and acknowledge if they happen to have such hardware and able to run a
> quick test to at least verify the patch. Statistically, such need
> doesn't happen often for each type of hardware, so it shouldn't be a
> big burden for owners.
>
> Besides, the database and resources can be useful for developers who
> want to test their new patches on variety of hardware. This might
> prevent future regressions which often caused by lack of testing as we
> all know.

For that, I think, some "professional testers" would be needed ...

Greetings,
Rafael


--
"Premature optimization is the root of all evil." - Donald Knuth

2007-06-17 18:24:52

by Adrian Bunk

[permalink] [raw]
Subject: Re: How to improve the quality of the kernel?

On Sun, Jun 17, 2007 at 06:26:55PM +0200, Stefan Richter wrote:
> Adrian Bunk wrote:
> >>> And we should be aware that reverting is only a workaround for the real
> >>> problem which lies in our bug handling.
> >> ...
> >
> > And this is something I want to emphasize again.
> >
> > How can we make any progress with the real problem and not only the
> > symptoms?
> ...
>
> Perhaps make lists of
>
> - bug reports which never lead to any debug activity
> (no responsible person/team was found, or a seemingly person/team
> did not start to debug the report)
>
> - known regressions on release,
>
> - regressions that became known after release,
>
> - subsystems with notable backlogs of old bugs,
>
> - other categories?
>
> Select typical cases from each categories, analyze what went wrong in
> these cases, and try to identify practicable countermeasures.

No maintainer or no maintainer who is debugging bug reports is the
major problem in all parts of your list.

> Another approach: Figure out areas where quality is exemplary and try
> to draw conclusions for areas where quality is lacking.

ieee1394 has a maintainer who is looking after all bug reports he gets.

Conclusion: We need such maintainers for all parts of the kernel.

> Stefan Richter

cu
Adrian

--

"Is there not promise of rain?" Ling Tan asked suddenly out
of the darkness. There had been need of rain for many days.
"Only a promise," Lao Er said.
Pearl S. Buck - Dragon Seed

Subject: Re: How to improve the quality of the kernel?


Hi,

On Sunday 17 June 2007, Adrian Bunk wrote:
> On Sun, Jun 17, 2007 at 03:17:58PM +0200, Michal Piotrowski wrote:
> > On 17/06/07, Adrian Bunk <[email protected]> wrote:
> >...
> >> Fine with me, but:
> >>
> >> There are not so simple cases like big infrastructure patches with
> >> 20 other patches in the tree depending on it causing a regression, or
> >> even worse, a big infrastructure patch exposing a latent old bug in some
> >> completely different area of the kernel.
> >
> > It is different case.
> >
> > "If the patch introduces a new regression"
> >
> > introduces != exposes an old bug
>
> My remark was meant as a note "this sentence can't handle all
> regressions" (and for a user it doesn't matter whether a new
> regression is introduced or an old regression exposed).
>
> It could be we simply agree on this one. ;-)
>
> > Removal of 20 patches will be painful, but sometimes you need to
> > "choose minor evil to prevent a greater one" [1].
> >
> >> And we should be aware that reverting is only a workaround for the real
> >> problem which lies in our bug handling.
> >...
>
> And this is something I want to emphasize again.
>
> How can we make any progress with the real problem and not only the
> symptoms?
>
> There's now much money in the Linux market, and the kernel quality
> problems might result in real costs in the support of companies like
> IBM, SGI, Redhat or Novell (plus it harms the Linux image which might
> result in lower revenues).
>
> If [1] this is true, it might even pay pay off for them to each assign
> X man hours per month of experienced kernel developers to upstream
> kernel bug handling?
>
> This is just a wild thought and it might be nonsense - better
> suggestions for solving our quality problems would be highly welcome...

IMO we should concentrate more on preventing regressions than on fixing them.
In the long-term preventing bugs is cheaper than fixing them afterwards.

First let me tell you all a little story...

Over two years ago I've reviewed some _cleanup_ patch and noticed three bugs
in it (in other words I potentially prevented three regressions). I also
asked for more thorough verification of the patch as I suspected that it may
have more problems. The author fixed the issues and replied that he hasn't
done the full verification yet but he doesn't suspect any problems...

Fast forward...

Year later I discover that the final version of the patch hit the mainline.
I don't remember ever seeing the final version in my mailbox (there are no
cc: lines in the patch description) and I saw that I'm not credited in the
patch description. However the worse part is that it seems that the full
verification has never been done. The result? Regression in the release
kernel (exactly the issue that I was worried about) which required three
patches and over a month to be fixed completely. It seems that a year
was not enough to get this ~70k _cleanup_ patch fully verified and tested
(it hit -mm soon before being merged)...

>From reviewer's POV: I have invested my time into review, discovered real
issues and as a reward I got no credit et all and extra frustration from the
fact that part of my review was forgotten/ignored (the part which resulted in
real regression in the release kernel)... Oh and in the past the said
developer has already been asked (politely in private message) to pay more
attention to his changes (after I silently fixed some other regression caused
by his other patch).

But wait there is more, I happend to be the maintainer of the subsystem which
got directly hit by the issue and I was getting bugreports from the users about
the problem... :-)

It wasn't my first/last bad experience as a reviewer... finally I just gave up
on reviewing other people patches unless they are stricly for IDE subsystem.

The moral of the story is that currently it just doesn't pay off to do
code reviews. From personal POV it pays much more to wait until buggy patch
hits the mainline and then fix the issues yourself (at least you will get
some credit). To change this we should put more ephasize on the importance
of code reviews by "rewarding" people investing their time into reviews
and "rewarding" developers/maintainers taking reviews seriously.

We should credit reviewers more, sometimes it takes more time/knowledge to
review the patch than to make it so getting near to zero credit for review
doesn't sound too attractive. Hmm, wait it can be worse - your review
may be ignored... ;-)

>From my side I think I'll start adding less formal "Reviewed-by" to IDE
patches even if the review resulted in no issues being found (in additon to
explicit "Acked-by" tags and crediting people for finding real issues - which
I currently always do as a way for showing my appreciation for their work).

I also encourage other maintainers/developers to pay more attention to
adding "Acked-by"/"Reviewed-by" tags and crediting reviewers. I hope
that maintainers will promote changes that have been reviewed by others
by giving them priority over other ones (if the changes are on more-or-less
the same importance level of course, you get the idea).

Now what to do with people who ignore reviews and/or have rather high
regressions/patches ratio?

I think that we should have info about regressions integrated into SCM,
i.e. in git we should have optional "fixes-commit" tag and we should be
able to do some reverse data colletion. This feature combined with
"Author:" info after some time should give us some very interesting
statistics (Top Ten "Regressors"). It wouldn't be ideal (ie. we need some
patches threshold to filter out people with 1 patch and >= 1 regression(s),
we need to remember that some code areas are more difficult than the others
and that patches are not equal per se etc.) however I believe than making it
into Top Ten "Regressors" should give the winners some motivation to improve
their work ethic. Well, in the worst case we would just get some extra
trivial/documentation patches. ;-)

Sorry for a bit chaotic mail but I hope that message is clear.

Thanks,
Bart

2007-06-17 18:46:12

by Stefan Richter

[permalink] [raw]
Subject: Re: How to improve the quality of the kernel?

Adrian Bunk wrote:
> On Sun, Jun 17, 2007 at 06:26:55PM +0200, Stefan Richter wrote:
>> Another approach: Figure out areas where quality is exemplary and try
>> to draw conclusions for areas where quality is lacking.
>
> ieee1394 has a maintainer who is looking after all bug reports he gets.

...but doesn't fix them all, and is usually slow with fixes. He should
spend less time conversing on LKML. :-)
--
Stefan Richter
-=====-=-=== -==- =---=
http://arcgraph.de/sr/

2007-06-17 18:50:33

by Natalie Protasevich

[permalink] [raw]
Subject: Re: How to improve the quality of the kernel?

On 6/17/07, Adrian Bunk <[email protected]> wrote:
> On Sun, Jun 17, 2007 at 06:26:55PM +0200, Stefan Richter wrote:
> > Adrian Bunk wrote:
> > >>> And we should be aware that reverting is only a workaround for the real
> > >>> problem which lies in our bug handling.
> > >> ...
> > >
> > > And this is something I want to emphasize again.
> > >
> > > How can we make any progress with the real problem and not only the
> > > symptoms?
> > ...
> >
> > Perhaps make lists of
> >
> > - bug reports which never lead to any debug activity
> > (no responsible person/team was found, or a seemingly person/team
> > did not start to debug the report)
> >
> > - known regressions on release,
> >
> > - regressions that became known after release,
> >
> > - subsystems with notable backlogs of old bugs,
> >
> > - other categories?
> >
> > Select typical cases from each categories, analyze what went wrong in
> > these cases, and try to identify practicable countermeasures.
>
> No maintainer or no maintainer who is debugging bug reports is the
> major problem in all parts of your list.
>
> > Another approach: Figure out areas where quality is exemplary and try
> > to draw conclusions for areas where quality is lacking.
>
> ieee1394 has a maintainer who is looking after all bug reports he gets.
>
> Conclusion: We need such maintainers for all parts of the kernel.
>

I noticed some areas are well maintained because there is an awesome
maintainer, or good and well coordinated team - and this is mostly in
the "fun" areas ;) But there are "boring" areas that are about to be
deprecated or no new development expected etc. It will be hard to get
a dedicated person to take care of such. How about having people on
rotation, or jury duty so to speak - for a period of time (completely
voluntary!) Nice stats on the report about contributions in non-native
areas for a developer would be great accomplishment and also good
chance to look into other things! Besides, this way "old parts" will
get attention to be be revised and re-implemented sooner. And we can
post "Temp maintainer needed" list...

--Natalie

> > Stefan Richter
>
> cu
> Adrian

2007-06-17 18:54:12

by Andrew Morton

[permalink] [raw]
Subject: Re: How to improve the quality of the kernel?

On Sun, 17 Jun 2007 20:53:41 +0200 Bartlomiej Zolnierkiewicz <[email protected]> wrote:

>
>
> IMO we should concentrate more on preventing regressions than on fixing them.
> In the long-term preventing bugs is cheaper than fixing them afterwards.
>
> First let me tell you all a little story...
>
> Over two years ago I've reviewed some _cleanup_ patch and noticed three bugs
> in it (in other words I potentially prevented three regressions). I also
> asked for more thorough verification of the patch as I suspected that it may
> have more problems. The author fixed the issues and replied that he hasn't
> done the full verification yet but he doesn't suspect any problems...
>
> Fast forward...
>
> Year later I discover that the final version of the patch hit the mainline.
> I don't remember ever seeing the final version in my mailbox (there are no
> cc: lines in the patch description) and I saw that I'm not credited in the
> patch description. However the worse part is that it seems that the full
> verification has never been done. The result? Regression in the release
> kernel (exactly the issue that I was worried about) which required three
> patches and over a month to be fixed completely. It seems that a year
> was not enough to get this ~70k _cleanup_ patch fully verified and tested
> (it hit -mm soon before being merged)...

crap. Commit ID, please ;)

> >From reviewer's POV: I have invested my time into review, discovered real
> issues and as a reward I got no credit et all and extra frustration from the
> fact that part of my review was forgotten/ignored (the part which resulted in
> real regression in the release kernel)... Oh and in the past the said
> developer has already been asked (politely in private message) to pay more
> attention to his changes (after I silently fixed some other regression caused
> by his other patch).
>
> But wait there is more, I happend to be the maintainer of the subsystem which
> got directly hit by the issue and I was getting bugreports from the users about
> the problem... :-)
>
> It wasn't my first/last bad experience as a reviewer... finally I just gave up
> on reviewing other people patches unless they are stricly for IDE subsystem.
>
> The moral of the story is that currently it just doesn't pay off to do
> code reviews.

I dunno. I suspect (hope) that this was an exceptional case, hence one
should not draw general conclusions from it. It certainly sounds very bad.

> From personal POV it pays much more to wait until buggy patch
> hits the mainline and then fix the issues yourself (at least you will get
> some credit). To change this we should put more ephasize on the importance
> of code reviews by "rewarding" people investing their time into reviews
> and "rewarding" developers/maintainers taking reviews seriously.
>
> We should credit reviewers more, sometimes it takes more time/knowledge to
> review the patch than to make it so getting near to zero credit for review
> doesn't sound too attractive. Hmm, wait it can be worse - your review
> may be ignored... ;-)
>
> >From my side I think I'll start adding less formal "Reviewed-by" to IDE
> patches even if the review resulted in no issues being found (in additon to
> explicit "Acked-by" tags and crediting people for finding real issues - which
> I currently always do as a way for showing my appreciation for their work).

yup, Reviewed-by: is good and I do think we should start adopting it,
although I haven't thought through exactly how.

On my darker days I consider treating a Reviewed-by: as a prerequisite for
merging. I suspect that would really get the feathers flying.

> I also encourage other maintainers/developers to pay more attention to
> adding "Acked-by"/"Reviewed-by" tags and crediting reviewers. I hope
> that maintainers will promote changes that have been reviewed by others
> by giving them priority over other ones (if the changes are on more-or-less
> the same importance level of course, you get the idea).
>
> Now what to do with people who ignore reviews and/or have rather high
> regressions/patches ratio?

Ignoring a review would be a wildly wrong thing to do. It's so unusual
that I'd be suspecting a lost email or an i-sent-the-wrong-patch.

As for high regressions/patches ratio: that'll be hard to calculate and
tends to be dependent upon the code which is being altered rather than who
is doing the altering: some stuff is just fragile, for various reasons.

One ratio which we might want to have a think about is the patches-sent
versus reviews-done ratio ;)

> I think that we should have info about regressions integrated into SCM,
> i.e. in git we should have optional "fixes-commit" tag and we should be
> able to do some reverse data colletion. This feature combined with
> "Author:" info after some time should give us some very interesting
> statistics (Top Ten "Regressors"). It wouldn't be ideal (ie. we need some
> patches threshold to filter out people with 1 patch and >= 1 regression(s),
> we need to remember that some code areas are more difficult than the others
> and that patches are not equal per se etc.) however I believe than making it
> into Top Ten "Regressors" should give the winners some motivation to improve
> their work ethic. Well, in the worst case we would just get some extra
> trivial/documentation patches. ;-)

We of course do want to minimise the amount of overhead for each developer.
I'm a strong believer in specialisation: rather than requiring that *every*
developer/maintainer integrate new steps in their processes it would be
better to allow them to proceed in a close-to-usual fashion and to provide
for a specialist person (or team) to do the sorts of things which you're
thinking about.

2007-06-17 18:54:45

by Michal Piotrowski

[permalink] [raw]
Subject: Re: How to improve the quality of the kernel?

On 17/06/07, Bartlomiej Zolnierkiewicz <[email protected]> wrote:
>
> Hi,
>
> On Sunday 17 June 2007, Adrian Bunk wrote:
> > On Sun, Jun 17, 2007 at 03:17:58PM +0200, Michal Piotrowski wrote:
> > > On 17/06/07, Adrian Bunk <[email protected]> wrote:
> > >...
> > >> Fine with me, but:
> > >>
> > >> There are not so simple cases like big infrastructure patches with
> > >> 20 other patches in the tree depending on it causing a regression, or
> > >> even worse, a big infrastructure patch exposing a latent old bug in some
> > >> completely different area of the kernel.
> > >
> > > It is different case.
> > >
> > > "If the patch introduces a new regression"
> > >
> > > introduces != exposes an old bug
> >
> > My remark was meant as a note "this sentence can't handle all
> > regressions" (and for a user it doesn't matter whether a new
> > regression is introduced or an old regression exposed).
> >
> > It could be we simply agree on this one. ;-)
> >
> > > Removal of 20 patches will be painful, but sometimes you need to
> > > "choose minor evil to prevent a greater one" [1].
> > >
> > >> And we should be aware that reverting is only a workaround for the real
> > >> problem which lies in our bug handling.
> > >...
> >
> > And this is something I want to emphasize again.
> >
> > How can we make any progress with the real problem and not only the
> > symptoms?
> >
> > There's now much money in the Linux market, and the kernel quality
> > problems might result in real costs in the support of companies like
> > IBM, SGI, Redhat or Novell (plus it harms the Linux image which might
> > result in lower revenues).
> >
> > If [1] this is true, it might even pay pay off for them to each assign
> > X man hours per month of experienced kernel developers to upstream
> > kernel bug handling?
> >
> > This is just a wild thought and it might be nonsense - better
> > suggestions for solving our quality problems would be highly welcome...
>
> IMO we should concentrate more on preventing regressions than on fixing them.
> In the long-term preventing bugs is cheaper than fixing them afterwards.
>
> First let me tell you all a little story...
>
> Over two years ago I've reviewed some _cleanup_ patch and noticed three bugs
> in it (in other words I potentially prevented three regressions). I also
> asked for more thorough verification of the patch as I suspected that it may
> have more problems. The author fixed the issues and replied that he hasn't
> done the full verification yet but he doesn't suspect any problems...
>
> Fast forward...
>
> Year later I discover that the final version of the patch hit the mainline.
> I don't remember ever seeing the final version in my mailbox (there are no
> cc: lines in the patch description) and I saw that I'm not credited in the
> patch description. However the worse part is that it seems that the full
> verification has never been done. The result? Regression in the release
> kernel (exactly the issue that I was worried about) which required three
> patches and over a month to be fixed completely. It seems that a year
> was not enough to get this ~70k _cleanup_ patch fully verified and tested
> (it hit -mm soon before being merged)...
>
> From reviewer's POV: I have invested my time into review, discovered real
> issues and as a reward I got no credit et all and extra frustration from the
> fact that part of my review was forgotten/ignored (the part which resulted in
> real regression in the release kernel)... Oh and in the past the said
> developer has already been asked (politely in private message) to pay more
> attention to his changes (after I silently fixed some other regression caused
> by his other patch).
>
> But wait there is more, I happend to be the maintainer of the subsystem which
> got directly hit by the issue and I was getting bugreports from the users about
> the problem... :-)
>
> It wasn't my first/last bad experience as a reviewer... finally I just gave up
> on reviewing other people patches unless they are stricly for IDE subsystem.
>
> The moral of the story is that currently it just doesn't pay off to do
> code reviews. From personal POV it pays much more to wait until buggy patch
> hits the mainline and then fix the issues yourself (at least you will get
> some credit). To change this we should put more ephasize on the importance
> of code reviews by "rewarding" people investing their time into reviews
> and "rewarding" developers/maintainers taking reviews seriously.
>
> We should credit reviewers more, sometimes it takes more time/knowledge to
> review the patch than to make it so getting near to zero credit for review
> doesn't sound too attractive. Hmm, wait it can be worse - your review
> may be ignored... ;-)
>
> From my side I think I'll start adding less formal "Reviewed-by" to IDE
> patches even if the review resulted in no issues being found (in additon to
> explicit "Acked-by" tags and crediting people for finding real issues - which
> I currently always do as a way for showing my appreciation for their work).
>
> I also encourage other maintainers/developers to pay more attention to
> adding "Acked-by"/"Reviewed-by" tags and crediting reviewers. I hope
> that maintainers will promote changes that have been reviewed by others
> by giving them priority over other ones (if the changes are on more-or-less
> the same importance level of course, you get the idea).

I think that this is a very good idea - especially for large, intrusive patches.
Long {Acked,Reviewed,Signed-off,Tested}-by list will be welcome.

>
> Now what to do with people who ignore reviews and/or have rather high
> regressions/patches ratio?
>
> I think that we should have info about regressions integrated into SCM,
> i.e. in git we should have optional "fixes-commit" tag and we should be
> able to do some reverse data colletion. This feature combined with
> "Author:" info after some time should give us some very interesting
> statistics (Top Ten "Regressors"). It wouldn't be ideal (ie. we need some
> patches threshold to filter out people with 1 patch and >= 1 regression(s),
> we need to remember that some code areas are more difficult than the others
> and that patches are not equal per se etc.) however I believe than making it
> into Top Ten "Regressors" should give the winners some motivation to improve
> their work ethic. Well, in the worst case we would just get some extra
> trivial/documentation patches. ;-)
>
> Sorry for a bit chaotic mail but I hope that message is clear.
>
> Thanks,
> Bart
>

Regards,
Michal

--
LOG
http://www.stardust.webpages.pl/log/

2007-06-17 19:16:45

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: How to improve the quality of the kernel?

On Sunday, 17 June 2007 20:52, Andrew Morton wrote:
> On Sun, 17 Jun 2007 20:53:41 +0200 Bartlomiej Zolnierkiewicz <[email protected]> wrote:
>
> >
> >
> > IMO we should concentrate more on preventing regressions than on fixing them.
> > In the long-term preventing bugs is cheaper than fixing them afterwards.
> >
> > First let me tell you all a little story...
> >
> > Over two years ago I've reviewed some _cleanup_ patch and noticed three bugs
> > in it (in other words I potentially prevented three regressions). I also
> > asked for more thorough verification of the patch as I suspected that it may
> > have more problems. The author fixed the issues and replied that he hasn't
> > done the full verification yet but he doesn't suspect any problems...
> >
> > Fast forward...
> >
> > Year later I discover that the final version of the patch hit the mainline.
> > I don't remember ever seeing the final version in my mailbox (there are no
> > cc: lines in the patch description) and I saw that I'm not credited in the
> > patch description. However the worse part is that it seems that the full
> > verification has never been done. The result? Regression in the release
> > kernel (exactly the issue that I was worried about) which required three
> > patches and over a month to be fixed completely. It seems that a year
> > was not enough to get this ~70k _cleanup_ patch fully verified and tested
> > (it hit -mm soon before being merged)...
>
> crap. Commit ID, please ;)
>
> > >From reviewer's POV: I have invested my time into review, discovered real
> > issues and as a reward I got no credit et all and extra frustration from the
> > fact that part of my review was forgotten/ignored (the part which resulted in
> > real regression in the release kernel)... Oh and in the past the said
> > developer has already been asked (politely in private message) to pay more
> > attention to his changes (after I silently fixed some other regression caused
> > by his other patch).
> >
> > But wait there is more, I happend to be the maintainer of the subsystem which
> > got directly hit by the issue and I was getting bugreports from the users about
> > the problem... :-)
> >
> > It wasn't my first/last bad experience as a reviewer... finally I just gave up
> > on reviewing other people patches unless they are stricly for IDE subsystem.
> >
> > The moral of the story is that currently it just doesn't pay off to do
> > code reviews.
>
> I dunno. I suspect (hope) that this was an exceptional case, hence one
> should not draw general conclusions from it. It certainly sounds very bad.
>
> > From personal POV it pays much more to wait until buggy patch
> > hits the mainline and then fix the issues yourself (at least you will get
> > some credit). To change this we should put more ephasize on the importance
> > of code reviews by "rewarding" people investing their time into reviews
> > and "rewarding" developers/maintainers taking reviews seriously.
> >
> > We should credit reviewers more, sometimes it takes more time/knowledge to
> > review the patch than to make it so getting near to zero credit for review
> > doesn't sound too attractive. Hmm, wait it can be worse - your review
> > may be ignored... ;-)
> >
> > >From my side I think I'll start adding less formal "Reviewed-by" to IDE
> > patches even if the review resulted in no issues being found (in additon to
> > explicit "Acked-by" tags and crediting people for finding real issues - which
> > I currently always do as a way for showing my appreciation for their work).
>
> yup, Reviewed-by: is good and I do think we should start adopting it,
> although I haven't thought through exactly how.
>
> On my darker days I consider treating a Reviewed-by: as a prerequisite for
> merging. I suspect that would really get the feathers flying.

How about the following "algorithm":

* Step 1: Send a patch as an RFC to the relevant lists/people and only if there
are no negative comments within at least n days, you are allowed to proceed
to the next step. If anyone has reviewed/acked the patch, add their names
and email addresses as "Reviewed-by"/"Acked-by" to the patch in the next
step.
* Step 2: Send the patch as an RC to the relevant lists/people _and_ LKML and
if there are no negative comments within at least n days, you can proceed to
the next step. If anyone has reviewed/acked the patch, add their names
and email addresses as "Reviewed-by"/"Acked-by" to the patch in the next
step.
* Step 3: Submit the patch for merging to the right maintainer (keeping the
previous CC list).

where n is a number that needs to be determined (I think that n could be 3).
Well, "negative comments" should also be defined more precisely. ;-)

> > I also encourage other maintainers/developers to pay more attention to
> > adding "Acked-by"/"Reviewed-by" tags and crediting reviewers. I hope
> > that maintainers will promote changes that have been reviewed by others
> > by giving them priority over other ones (if the changes are on more-or-less
> > the same importance level of course, you get the idea).
> >
> > Now what to do with people who ignore reviews and/or have rather high
> > regressions/patches ratio?
>
> Ignoring a review would be a wildly wrong thing to do. It's so unusual
> that I'd be suspecting a lost email or an i-sent-the-wrong-patch.
>
> As for high regressions/patches ratio: that'll be hard to calculate and
> tends to be dependent upon the code which is being altered rather than who
> is doing the altering: some stuff is just fragile, for various reasons.
>
> One ratio which we might want to have a think about is the patches-sent
> versus reviews-done ratio ;)
>
> > I think that we should have info about regressions integrated into SCM,
> > i.e. in git we should have optional "fixes-commit" tag and we should be
> > able to do some reverse data colletion. This feature combined with
> > "Author:" info after some time should give us some very interesting
> > statistics (Top Ten "Regressors"). It wouldn't be ideal (ie. we need some
> > patches threshold to filter out people with 1 patch and >= 1 regression(s),
> > we need to remember that some code areas are more difficult than the others
> > and that patches are not equal per se etc.) however I believe than making it
> > into Top Ten "Regressors" should give the winners some motivation to improve
> > their work ethic. Well, in the worst case we would just get some extra
> > trivial/documentation patches. ;-)
>
> We of course do want to minimise the amount of overhead for each developer.
> I'm a strong believer in specialisation: rather than requiring that *every*
> developer/maintainer integrate new steps in their processes it would be
> better to allow them to proceed in a close-to-usual fashion and to provide
> for a specialist person (or team) to do the sorts of things which you're
> thinking about.

Still, even very experienced developers make trivial mistakes, so there should
be a way to catch such things before they hit -rc or even -mm kernels

Greetings,
Rafael


--
"Premature optimization is the root of all evil." - Donald Knuth

2007-06-17 19:31:07

by Adrian Bunk

[permalink] [raw]
Subject: Re: How to improve the quality of the kernel?

On Sun, Jun 17, 2007 at 07:31:01PM +0200, Rafael J. Wysocki wrote:
> On Sunday, 17 June 2007 16:29, Adrian Bunk wrote:
> > On Sun, Jun 17, 2007 at 03:17:58PM +0200, Michal Piotrowski wrote:
> > > On 17/06/07, Adrian Bunk <[email protected]> wrote:
> > >...
> > >> Fine with me, but:
> > >>
> > >> There are not so simple cases like big infrastructure patches with
> > >> 20 other patches in the tree depending on it causing a regression, or
> > >> even worse, a big infrastructure patch exposing a latent old bug in some
> > >> completely different area of the kernel.
> > >
> > > It is different case.
> > >
> > > "If the patch introduces a new regression"
> > >
> > > introduces != exposes an old bug
> >
> > My remark was meant as a note "this sentence can't handle all
> > regressions" (and for a user it doesn't matter whether a new
> > regression is introduced or an old regression exposed).
> >
> > It could be we simply agree on this one. ;-)
> >
> > > Removal of 20 patches will be painful, but sometimes you need to
> > > "choose minor evil to prevent a greater one" [1].
> > >
> > >> And we should be aware that reverting is only a workaround for the real
> > >> problem which lies in our bug handling.
> > >...
> >
> > And this is something I want to emphasize again.
> >
> > How can we make any progress with the real problem and not only the
> > symptoms?
>
> I think that we can handle bug reports like we handle modifications of code.
>
> Namely, for each subsystem there can be a person (or a team) responsible
> for handling bugs, by which I don't mean fixing them, but directing bug reports
> at the right developers or subsystem maintainers, following the history of each
> bug report etc. [Of course, these people can choose to use the bugzilla or any
> other bug tracking system they want, as long as it works for them.]
>
> The email addresses of these people should be known (and even documented),
> so that everyone can notify them if need be and so that it's clear who should
> handle given bug reports.

Currently, these people are "Andrew Morton" and the addresses are
[email protected] and http://bugzilla.kernel.org/ - and this
part is working.

Although there is room for improvement in this area, the problem in the
pipeline is really to find developers who know the code in question and
who are willing to debug bug reports.

There are unmaintained parts of the kernel.

And there are parts of the kernel where the maintainers are developing
code, reviewing code and handling patches but are not willing or simply
not capable of looking at bug reports. That's not against these people
and they might do great work, but then there's simply an additional
person missing who would be willing to learn the subsystem in question
and handle bug reports.

All bug handling becomes moot and every request for more information
from the submitter a waste of time if there's noone available for
looking deeper into a bug.

> Just an idea. :-)
>
> Greetings,
> Rafael

cu
Adrian

--

"Is there not promise of rain?" Ling Tan asked suddenly out
of the darkness. There had been need of rain for many days.
"Only a promise," Lao Er said.
Pearl S. Buck - Dragon Seed

2007-06-17 19:34:46

by Carlo Wood

[permalink] [raw]
Subject: Re: How to improve the quality of the kernel?

On Sun, Jun 17, 2007 at 09:18:17PM +0200, Rafael J. Wysocki wrote:
> where n is a number that needs to be determined (I think that n could be 3).
> Well, "negative comments" should also be defined more precisely. ;-)

I think that n should be a function of the number of accepted patches
that this person sent in before, and the number of regressions he
caused in the past.

Ie, new developers have to wait a considerable amount of time - while
experienced developers who never caused a regression should be able
to write patches that are immediately applied. Also, if anyone causes
a regression - that would lead to them having to wait longer the
next time before they can apply the patch - a good reason for a
developer to put extra time into making sure there are no regressions.

--
Carlo Wood <[email protected]>

2007-06-17 20:01:22

by Stefan Richter

[permalink] [raw]
Subject: Re: How to improve the quality of the kernel?

Carlo Wood wrote:
> On Sun, Jun 17, 2007 at 09:18:17PM +0200, Rafael J. Wysocki wrote:
>> where n is a number that needs to be determined (I think that n could be 3).
>> Well, "negative comments" should also be defined more precisely. ;-)
>
> I think that n should be a function of the number of accepted patches
> that this person sent in before, and the number of regressions he
> caused in the past.

The character of the patch (potential impacts, size...) and availability
of reviewers and testers influence the required review time so much that
other factors, like reputation of the submitter, hardly matter.
--
Stefan Richter
-=====-=-=== -==- =---=
http://arcgraph.de/sr/

2007-06-17 20:11:00

by Michal Piotrowski

[permalink] [raw]
Subject: Re: How to improve the quality of the kernel?

On 17/06/07, Stefan Richter <[email protected]> wrote:
> Carlo Wood wrote:
> > On Sun, Jun 17, 2007 at 09:18:17PM +0200, Rafael J. Wysocki wrote:
> >> where n is a number that needs to be determined (I think that n could be 3).
> >> Well, "negative comments" should also be defined more precisely. ;-)
> >
> > I think that n should be a function of the number of accepted patches
> > that this person sent in before, and the number of regressions he
> > caused in the past.
>
> The character of the patch (potential impacts, size...) and availability
> of reviewers and testers influence the required review time so much that
> other factors, like reputation of the submitter, hardly matter.

So we need a bug/regression/patch tracking system based on MMORPG game ;)

Regards,
Michal

--
LOG
http://www.stardust.webpages.pl/log/

2007-06-17 21:11:35

by Oleg Verych

[permalink] [raw]
Subject: Re: [PATCH] (Re: regression tracking (Re: Linux 2.6.21))

On Sun, Jun 17, 2007 at 10:44:39AM -0700, [email protected] wrote:
> On Sun, 17 Jun 2007, Oleg Verych wrote:
[]
> >That's wrong if developers are tending to reply only one thing --
> >git-bisect.
> >
> >If things are going to be that bad, then better to start dealing with the
> >cause, not consequences. In this situation requesting test-cases is a
> >better way, as it's going to influence developer as cause of potential
> >problems. If tests will show *hardware* side of problem, then, well some
> >parts may be not obvious, thus bisecting is a way to continue.
>
> most people who report bugs don't know enough about what's actually going
> wrong to be able to write a test case (those that do can probably just
> write a patch to fix it). Along similar lines a debugger wouldn't be of
> much use either.

Sorry for my English. Requesting test cases from the developers of
course. Or at least results of some kind of testing, so people may run
and check them as well, if something is suspected. And this from my POV
leads again to organized way of filtering noise and collecting structured
information easily (yea, i think it's BTS with reportbug in Debian).{0}

> the fact that git-bisect doesn't require any knowledge other then
> knowledge the reporter has demonstrated that they already have (the
> ability to compile and install their own kernel) puts it within the reach
> of testers.
>
> unfortunantly, as good as it is it can take a lot of effort, especially if
> the bug takes time to show up. it's not perfect, but it's a huge help.

I think, positive feedback from {0} to the LTP may improve that.
Especially, if things are in parts and easy to choose for testing.

> and developers aren't always responding with 'do a bisect', sometimes they
> respond with 'yes, we know about that'

> or 'that sounds like X',

That two are _exactly_ what reportbug tool is doing. That's why i'm
talking about it. And i'm *no* wonder why developers are boring -- at
some points they might not handle the *noise*.

> so it's still worthwhile for people to report the problem first before
> going to the ffort of doing a bisect.
>
> David Lang
__

Bits for Adrian.

*ML*

I *use* Gmane. I'm not subscribed (receiving e-mail to my mbox) to any ML,
except <[email protected]>.

Nearly every my e-mail here is with Gmane links. You seem ignored all of
them. As for me it's result of *your personal*, rather than technical
activity.

*offense*

I'm not talking about personal offense, you are seem thinking about, but
technical one. I.e. when possible benefit might be even more, than NOHZ
on x86 and a like[0], with much less effort. I still think, unless i will
develop or fail, that reducing traffic on one or two order of magnitude
is possible as well as improving kbuild/kconfig to reduce of the noise of
mis-configurations/tester's .config length. Discouraging that effort is
my source of offense.

(FYI Until Linus checked in my _RFT_ kbuild patches, i realized how
*many* people are willing to understand and try to test kbuild stuff.)

[0] I bet VGA, DRAM, HDD are far more power hungry room-heaters. Unless
you can substantially lower frequency, you might have no benefit at
all. Whana know how it's done in perfectly designed embedded
MCU/CPU? Please, see for instance MSP430 from TI (i know, it uses
SRAM, but i've asked to look on processor core design).
____

Subject: Re: How to improve the quality of the kernel?

On Sunday 17 June 2007, Andrew Morton wrote:
> On Sun, 17 Jun 2007 20:53:41 +0200 Bartlomiej Zolnierkiewicz <[email protected]> wrote:
>
> >
> >
> > IMO we should concentrate more on preventing regressions than on fixing them.
> > In the long-term preventing bugs is cheaper than fixing them afterwards.
> >
> > First let me tell you all a little story...
> >
> > Over two years ago I've reviewed some _cleanup_ patch and noticed three bugs
> > in it (in other words I potentially prevented three regressions). I also
> > asked for more thorough verification of the patch as I suspected that it may
> > have more problems. The author fixed the issues and replied that he hasn't
> > done the full verification yet but he doesn't suspect any problems...
> >
> > Fast forward...
> >
> > Year later I discover that the final version of the patch hit the mainline.
> > I don't remember ever seeing the final version in my mailbox (there are no
> > cc: lines in the patch description) and I saw that I'm not credited in the
> > patch description. However the worse part is that it seems that the full
> > verification has never been done. The result? Regression in the release
> > kernel (exactly the issue that I was worried about) which required three
> > patches and over a month to be fixed completely. It seems that a year
> > was not enough to get this ~70k _cleanup_ patch fully verified and tested
> > (it hit -mm soon before being merged)...
>
> crap. Commit ID, please ;)

Will send in pm.

I don't want to reveal the "guilty" person identify in public.

> > >From reviewer's POV: I have invested my time into review, discovered real
> > issues and as a reward I got no credit et all and extra frustration from the
> > fact that part of my review was forgotten/ignored (the part which resulted in
> > real regression in the release kernel)... Oh and in the past the said
> > developer has already been asked (politely in private message) to pay more
> > attention to his changes (after I silently fixed some other regression caused
> > by his other patch).
> >
> > But wait there is more, I happend to be the maintainer of the subsystem which
> > got directly hit by the issue and I was getting bugreports from the users about
> > the problem... :-)
> >
> > It wasn't my first/last bad experience as a reviewer... finally I just gave up
> > on reviewing other people patches unless they are stricly for IDE subsystem.
> >
> > The moral of the story is that currently it just doesn't pay off to do
> > code reviews.
>
> I dunno. I suspect (hope) that this was an exceptional case, hence one
> should not draw general conclusions from it. It certainly sounds very bad.

I've been too long around to not learn a few things...

rule #3 of successful kernel developer

Ignore reviewers - fix the bugs but don't credit reviewers (crediting them
makes your patch and you look less perfect), if they are asking question
requiring you to do the work (verification of taken assumptions etc.) do not
check anything - answer in a misleading way and present the assumptions you've
taken as a truth written in the stone - eventually they will do verification
themselves.

I really shouldn't be giving these rules out (at least for free 8) so this
time only #3 but there are much more rules and they are as dead serious as
Linus' advices on Linux kernel management style...

> > From personal POV it pays much more to wait until buggy patch
> > hits the mainline and then fix the issues yourself (at least you will get
> > some credit). To change this we should put more ephasize on the importance
> > of code reviews by "rewarding" people investing their time into reviews
> > and "rewarding" developers/maintainers taking reviews seriously.
> >
> > We should credit reviewers more, sometimes it takes more time/knowledge to
> > review the patch than to make it so getting near to zero credit for review
> > doesn't sound too attractive. Hmm, wait it can be worse - your review
> > may be ignored... ;-)
> >
> > >From my side I think I'll start adding less formal "Reviewed-by" to IDE
> > patches even if the review resulted in no issues being found (in additon to
> > explicit "Acked-by" tags and crediting people for finding real issues - which
> > I currently always do as a way for showing my appreciation for their work).
>
> yup, Reviewed-by: is good and I do think we should start adopting it,
> although I haven't thought through exactly how.

Adding Reviewed-by for reviews which highlighted real issues is obvious
(with more detailed credits for noticed problems in the patch description).

Also when somebody reviewed your patch but the discussions it turned out
that the patch is valid - the review itself was still valuable so it would
be appropriate to credit the reviewer by adding Reviewed-by:.

> On my darker days I consider treating a Reviewed-by: as a prerequisite for
> merging. I suspect that would really get the feathers flying.

Easy to workaround by a friendly mine "Reviewed-by:" for yours "Reviewed-by:"
deals (without any _proper_ review being done in reality)... ;)

> > I also encourage other maintainers/developers to pay more attention to
> > adding "Acked-by"/"Reviewed-by" tags and crediting reviewers. I hope
> > that maintainers will promote changes that have been reviewed by others
> > by giving them priority over other ones (if the changes are on more-or-less
> > the same importance level of course, you get the idea).
> >
> > Now what to do with people who ignore reviews and/or have rather high
> > regressions/patches ratio?
>
> Ignoring a review would be a wildly wrong thing to do. It's so unusual
> that I'd be suspecting a lost email or an i-sent-the-wrong-patch.

It is not unusual et all. I mean patches which affect code in such way
that it is difficult to prove it's (in)correctness without doing time
consuming audit.

ie. lets imagine doing a small patch affecting many drivers - you've tested
it quickly on your driver/hardware, then you skip the part of verifying
correctness of new code in other drivers and just push the patch

As a patch author you can either assume "works for me" and push the patch
or do the audit (requires good understanding of the changed code and could
be time consuming). It is usually quite easy to find out which approach
the author has choosen - the very sparse patch description combined with
the changes in code behavior not mentioned in the patch description should
raise the red flag. :)

As a reviewer having enough knowledge in the area of code affected by patch
you can see the potential problems but you can't prove them without doing
the time consuming part. You may try to NACK the patch if you have enough
power but you will end up being bypassed by not proving incorrectness of
the patch (not to mention that developer will feel bad about you NACKing
his patch). Now the funny thing is that despite the fact that audit takes
more time/knowledge then making the patch you will end up with zero credit
if patch turns out to be (luckily) correct. Even if you find out issues
and report them you are still on mercy of author for being credited so
from personal POV you are much better to wait and fix issues after they
hit mainline kernel. You have to choose between being a good citizen and
preventing kernel regressions or being bastard and getting the credit. ;)

If you happen to be maintainer of the affected code the choice is similar
with more pros for letting the patch in especially if you can't afford the
time to do audit (and by being maintainer you are guaranteed to be heavily
time constrained).

I hope this makes people see the importance of proper review and proper
recognition of reviewers in preventing kernel regressions.

> As for high regressions/patches ratio: that'll be hard to calculate and
> tends to be dependent upon the code which is being altered rather than who
> is doing the altering: some stuff is just fragile, for various reasons.
>
> One ratio which we might want to have a think about is the patches-sent
> versus reviews-done ratio ;)

Sounds like a good idea.

> > I think that we should have info about regressions integrated into SCM,
> > i.e. in git we should have optional "fixes-commit" tag and we should be
> > able to do some reverse data colletion. This feature combined with
> > "Author:" info after some time should give us some very interesting
> > statistics (Top Ten "Regressors"). It wouldn't be ideal (ie. we need some
> > patches threshold to filter out people with 1 patch and >= 1 regression(s),
> > we need to remember that some code areas are more difficult than the others
> > and that patches are not equal per se etc.) however I believe than making it
> > into Top Ten "Regressors" should give the winners some motivation to improve
> > their work ethic. Well, in the worst case we would just get some extra
> > trivial/documentation patches. ;-)
>
> We of course do want to minimise the amount of overhead for each developer.
> I'm a strong believer in specialisation: rather than requiring that *every*
> developer/maintainer integrate new steps in their processes it would be
> better to allow them to proceed in a close-to-usual fashion and to provide
> for a specialist person (or team) to do the sorts of things which you're
> thinking about.

Makes sense... however we need to educate each and every developer about
importance of the code review and proper recognition of reviewers.

Thanks,
Bart

2007-06-17 22:41:19

by Al Boldi

[permalink] [raw]
Subject: Re: How to improve the quality of the kernel?

Bartlomiej Zolnierkiewicz wrote:
> On Sunday 17 June 2007, Andrew Morton wrote:
> > We of course do want to minimise the amount of overhead for each
> > developer. I'm a strong believer in specialisation: rather than
> > requiring that *every* developer/maintainer integrate new steps in their
> > processes it would be better to allow them to proceed in a
> > close-to-usual fashion and to provide for a specialist person (or team)
> > to do the sorts of things which you're thinking about.
>
> Makes sense... however we need to educate each and every developer about
> importance of the code review and proper recognition of reviewers.

That's as easy to manage as is currently done with rc-regressions.

Maybe Adrian can introduce a "Patch Review Tacking" system akin to the his
"rc-Regression Tracking" system.


Thanks!

--
Al

2007-06-17 22:55:39

by Michal Piotrowski

[permalink] [raw]
Subject: Re: How to improve the quality of the kernel?

On 18/06/07, Al Boldi <[email protected]> wrote:
> Bartlomiej Zolnierkiewicz wrote:
> > On Sunday 17 June 2007, Andrew Morton wrote:
> > > We of course do want to minimise the amount of overhead for each
> > > developer. I'm a strong believer in specialisation: rather than
> > > requiring that *every* developer/maintainer integrate new steps in their
> > > processes it would be better to allow them to proceed in a
> > > close-to-usual fashion and to provide for a specialist person (or team)
> > > to do the sorts of things which you're thinking about.
> >
> > Makes sense... however we need to educate each and every developer about
> > importance of the code review and proper recognition of reviewers.
>
> That's as easy to manage as is currently done with rc-regressions.

Are you a volunteer?

It's not an easy task, there are more patches than regressions.

Regards,
Michal

--
LOG
http://www.stardust.webpages.pl/log/

2007-06-17 23:09:42

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: How to improve the quality of the kernel?

On Sunday, 17 June 2007 23:49, Bartlomiej Zolnierkiewicz wrote:
> On Sunday 17 June 2007, Andrew Morton wrote:
> > On Sun, 17 Jun 2007 20:53:41 +0200 Bartlomiej Zolnierkiewicz <[email protected]> wrote:
[--snip--]
> >
> > yup, Reviewed-by: is good and I do think we should start adopting it,
> > although I haven't thought through exactly how.
>
> Adding Reviewed-by for reviews which highlighted real issues is obvious
> (with more detailed credits for noticed problems in the patch description).

Suppose you have modified the patch as a result of a review and you post the
modified version. Is that still right to put "Reviewed-by" into it?
Personally, I don't think so, because that suggests that this particular
version of the patch has been reviewed and not the previous one.

> Also when somebody reviewed your patch but the discussions it turned out
> that the patch is valid - the review itself was still valuable so it would
> be appropriate to credit the reviewer by adding Reviewed-by:.

Yes, IMO in such a case it would be appropriate to do that.

Also, the review need not lead to any negative comments from the reviewer,
but in that case it's also appropriate to add a "Reviewed-by" to the patch.

Generally, if someone comments my patches, I add his/her address to the next
version's CC list, which sort of documents that the reviewer was involved.
Then, if the reviewer ACKs the patch, that will be recorded.

I think that for "Reviewed-by" to work correctly, we ought to have a two-stage
process of accepting patches, where in the first stage the patch is reviewed
and if there are no objections, the "Reviewed-by" (or "Acked-by") records are
added to it in the next stage (the patch itself remains unmodified).

> > On my darker days I consider treating a Reviewed-by: as a prerequisite for
> > merging. I suspect that would really get the feathers flying.
>
> Easy to workaround by a friendly mine "Reviewed-by:" for yours "Reviewed-by:"
> deals (without any _proper_ review being done in reality)... ;)
>
> > > I also encourage other maintainers/developers to pay more attention to
> > > adding "Acked-by"/"Reviewed-by" tags and crediting reviewers. I hope
> > > that maintainers will promote changes that have been reviewed by others
> > > by giving them priority over other ones (if the changes are on more-or-less
> > > the same importance level of course, you get the idea).
> > >
> > > Now what to do with people who ignore reviews and/or have rather high
> > > regressions/patches ratio?
> >
> > Ignoring a review would be a wildly wrong thing to do. It's so unusual
> > that I'd be suspecting a lost email or an i-sent-the-wrong-patch.
>
> It is not unusual et all. I mean patches which affect code in such way
> that it is difficult to prove it's (in)correctness without doing time
> consuming audit.
>
> ie. lets imagine doing a small patch affecting many drivers - you've tested
> it quickly on your driver/hardware, then you skip the part of verifying
> correctness of new code in other drivers and just push the patch
>
> As a patch author you can either assume "works for me" and push the patch
> or do the audit (requires good understanding of the changed code and could
> be time consuming). It is usually quite easy to find out which approach
> the author has choosen - the very sparse patch description combined with
> the changes in code behavior not mentioned in the patch description should
> raise the red flag. :)

First of all, the author should have a good understanding of what he's doing
and why. If there are any doubts with respect to that, the patch is likely to
introduce bugs.

This also depends on who will be handling the bug reports related to the patch.
If that will be the patch author, then so be it. ;-)

> As a reviewer having enough knowledge in the area of code affected by patch
> you can see the potential problems but you can't prove them without doing
> the time consuming part. You may try to NACK the patch if you have enough
> power but you will end up being bypassed by not proving incorrectness of
> the patch (not to mention that developer will feel bad about you NACKing
> his patch).

Well, IMHO, the author of the patch should convince _you_ that the patch is
correct, not the other way around. If you have doubts and make him think
twice of the code and he still can't prove his point, this means that he
doesn't understand what he's doing well enough.

> Now the funny thing is that despite the fact that audit takes
> more time/knowledge then making the patch you will end up with zero credit
> if patch turns out to be (luckily) correct. Even if you find out issues
> and report them you are still on mercy of author for being credited so
> from personal POV you are much better to wait and fix issues after they
> hit mainline kernel. You have to choose between being a good citizen and
> preventing kernel regressions or being bastard and getting the credit. ;)

Unless you are the poor soul having to handle bug reports related to the
problem.

> If you happen to be maintainer of the affected code the choice is similar
> with more pros for letting the patch in especially if you can't afford the
> time to do audit (and by being maintainer you are guaranteed to be heavily
> time constrained).
>
> I hope this makes people see the importance of proper review and proper
> recognition of reviewers in preventing kernel regressions.
>
> > As for high regressions/patches ratio: that'll be hard to calculate and
> > tends to be dependent upon the code which is being altered rather than who
> > is doing the altering: some stuff is just fragile, for various reasons.
> >
> > One ratio which we might want to have a think about is the patches-sent
> > versus reviews-done ratio ;)
>
> Sounds like a good idea.
>
> > > I think that we should have info about regressions integrated into SCM,
> > > i.e. in git we should have optional "fixes-commit" tag and we should be
> > > able to do some reverse data colletion. This feature combined with
> > > "Author:" info after some time should give us some very interesting
> > > statistics (Top Ten "Regressors"). It wouldn't be ideal (ie. we need some
> > > patches threshold to filter out people with 1 patch and >= 1 regression(s),
> > > we need to remember that some code areas are more difficult than the others
> > > and that patches are not equal per se etc.) however I believe than making it
> > > into Top Ten "Regressors" should give the winners some motivation to improve
> > > their work ethic. Well, in the worst case we would just get some extra
> > > trivial/documentation patches. ;-)
> >
> > We of course do want to minimise the amount of overhead for each developer.
> > I'm a strong believer in specialisation: rather than requiring that *every*
> > developer/maintainer integrate new steps in their processes it would be
> > better to allow them to proceed in a close-to-usual fashion and to provide
> > for a specialist person (or team) to do the sorts of things which you're
> > thinking about.
>
> Makes sense... however we need to educate each and every developer about
> importance of the code review and proper recognition of reviewers.

I don't think that the education alone will be enough. IMO we need to have a
system that promotes the reviewing of code.

Greetings,
Rafael


--
"Premature optimization is the root of all evil." - Donald Knuth

2007-06-17 23:16:18

by Stefan Richter

[permalink] [raw]
Subject: Re: How to improve the quality of the kernel?

Bartlomiej Zolnierkiewicz wrote:
> despite the fact that audit takes
> more time/knowledge then making the patch you will end up with zero credit
> if patch turns out to be (luckily) correct. Even if you find out issues
> and report them you are still on mercy of author for being credited

If we introduce a "Reviewed-by" with reasonably clear semantics
(different from Signed-off-by; e.g. the reviewer is not a middle-man in
patch forwarding; the reviewer might have had remaining reservations...
very similar to but not entirely the same as "Acked-by" as currently
defined in -mm) --- and also make the already somewhat established
"Tested-by" more official, --- then the maintainers could start to make
it a habit to add Reviewed-by and Tested-by.

Plus, reviewers and testers could formally reply with Reviewed-by and
Tested-by lines to patch postings and even could explicitly ask the
maintainer to add these lines.

> so from personal POV you are much better to wait and fix issues after they
> hit mainline kernel. You have to choose between being a good citizen and
> preventing kernel regressions or being bastard and getting the credit. ;)
>
> If you happen to be maintainer of the affected code the choice is similar
> with more pros for letting the patch in especially if you can't afford the
> time to do audit (and by being maintainer you are guaranteed to be heavily
> time constrained).

I don't think that a maintainer (who signs off on patches after all) can
easily afford to take the "bastard approach". I may be naive.
--
Stefan Richter
-=====-=-=== -==- =--=-
http://arcgraph.de/sr/

Subject: Re: How to improve the quality of the kernel?

On Monday 18 June 2007, Stefan Richter wrote:
> Bartlomiej Zolnierkiewicz wrote:
> > despite the fact that audit takes
> > more time/knowledge then making the patch you will end up with zero credit
> > if patch turns out to be (luckily) correct. Even if you find out issues
> > and report them you are still on mercy of author for being credited
>
> If we introduce a "Reviewed-by" with reasonably clear semantics
> (different from Signed-off-by; e.g. the reviewer is not a middle-man in
> patch forwarding; the reviewer might have had remaining reservations...
> very similar to but not entirely the same as "Acked-by" as currently
> defined in -mm) --- and also make the already somewhat established
> "Tested-by" more official, --- then the maintainers could start to make
> it a habit to add Reviewed-by and Tested-by.
>
> Plus, reviewers and testers could formally reply with Reviewed-by and
> Tested-by lines to patch postings and even could explicitly ask the
> maintainer to add these lines.

Sounds great.

> > so from personal POV you are much better to wait and fix issues after they
> > hit mainline kernel. You have to choose between being a good citizen and
> > preventing kernel regressions or being bastard and getting the credit. ;)
> >
> > If you happen to be maintainer of the affected code the choice is similar
> > with more pros for letting the patch in especially if you can't afford the
> > time to do audit (and by being maintainer you are guaranteed to be heavily
> > time constrained).
>
> I don't think that a maintainer (who signs off on patches after all) can
> easily afford to take the "bastard approach". I may be naive.

Well, I'm not doing it myself but I find it tempting... ;)

In case of being maintainer "bastard approach" is more about not discouraging
developers by holding patches for too long than about getting credit.

Bart

2007-06-18 00:33:19

by Stefan Richter

[permalink] [raw]
Subject: Re: How to improve the quality of the kernel?

Bartlomiej Zolnierkiewicz wrote:
> In case of being maintainer "bastard approach" is more about not discouraging
> developers by holding patches for too long than about getting credit.

The maintainer who is about to suffocate in newly contributed code is
actually a lucky guy: He can ask his eager contributors to also help
with cross-reviewing and bug fixing, otherwise all the fine work will be
stuck in the clogged pipeline. (E.g. post a subsystem todo-list now and
then, as a subtle hint.)
--
Stefan Richter
-=====-=-=== -==- =--=-
http://arcgraph.de/sr/

Subject: Re: How to improve the quality of the kernel?

On Monday 18 June 2007, Rafael J. Wysocki wrote:
> On Sunday, 17 June 2007 23:49, Bartlomiej Zolnierkiewicz wrote:
> > On Sunday 17 June 2007, Andrew Morton wrote:
> > > On Sun, 17 Jun 2007 20:53:41 +0200 Bartlomiej Zolnierkiewicz <[email protected]> wrote:
> [--snip--]
> > >
> > > yup, Reviewed-by: is good and I do think we should start adopting it,
> > > although I haven't thought through exactly how.
> >
> > Adding Reviewed-by for reviews which highlighted real issues is obvious
> > (with more detailed credits for noticed problems in the patch description).
>
> Suppose you have modified the patch as a result of a review and you post the
> modified version. Is that still right to put "Reviewed-by" into it?
> Personally, I don't think so, because that suggests that this particular
> version of the patch has been reviewed and not the previous one.

Well, if you got the "fix issues" part right it in the modified version
it shouldn't really matter. ;)

But yes, we may wait with adding "Reviewed-by" after the modified patch
has been posted and reviewed.

> > Also when somebody reviewed your patch but the discussions it turned out
> > that the patch is valid - the review itself was still valuable so it would
> > be appropriate to credit the reviewer by adding Reviewed-by:.
>
> Yes, IMO in such a case it would be appropriate to do that.
>
> Also, the review need not lead to any negative comments from the reviewer,
> but in that case it's also appropriate to add a "Reviewed-by" to the patch.
>
> Generally, if someone comments my patches, I add his/her address to the next
> version's CC list, which sort of documents that the reviewer was involved.
> Then, if the reviewer ACKs the patch, that will be recorded.

Same approach here.

> I think that for "Reviewed-by" to work correctly, we ought to have a two-stage
> process of accepting patches, where in the first stage the patch is reviewed
> and if there are no objections, the "Reviewed-by" (or "Acked-by") records are
> added to it in the next stage (the patch itself remains unmodified).
>
> > > On my darker days I consider treating a Reviewed-by: as a prerequisite for
> > > merging. I suspect that would really get the feathers flying.
> >
> > Easy to workaround by a friendly mine "Reviewed-by:" for yours "Reviewed-by:"
> > deals (without any _proper_ review being done in reality)... ;)
> >
> > > > I also encourage other maintainers/developers to pay more attention to
> > > > adding "Acked-by"/"Reviewed-by" tags and crediting reviewers. I hope
> > > > that maintainers will promote changes that have been reviewed by others
> > > > by giving them priority over other ones (if the changes are on more-or-less
> > > > the same importance level of course, you get the idea).
> > > >
> > > > Now what to do with people who ignore reviews and/or have rather high
> > > > regressions/patches ratio?
> > >
> > > Ignoring a review would be a wildly wrong thing to do. It's so unusual
> > > that I'd be suspecting a lost email or an i-sent-the-wrong-patch.
> >
> > It is not unusual et all. I mean patches which affect code in such way
> > that it is difficult to prove it's (in)correctness without doing time
> > consuming audit.
> >
> > ie. lets imagine doing a small patch affecting many drivers - you've tested
> > it quickly on your driver/hardware, then you skip the part of verifying
> > correctness of new code in other drivers and just push the patch
> >
> > As a patch author you can either assume "works for me" and push the patch
> > or do the audit (requires good understanding of the changed code and could
> > be time consuming). It is usually quite easy to find out which approach
> > the author has choosen - the very sparse patch description combined with
> > the changes in code behavior not mentioned in the patch description should
> > raise the red flag. :)
>
> First of all, the author should have a good understanding of what he's doing
> and why. If there are any doubts with respect to that, the patch is likely to
> introduce bugs.
>
> This also depends on who will be handling the bug reports related to the patch.
> If that will be the patch author, then so be it. ;-)

The problem is that usually Andrew/Adrian/Michal would also be involved.

> > As a reviewer having enough knowledge in the area of code affected by patch
> > you can see the potential problems but you can't prove them without doing
> > the time consuming part. You may try to NACK the patch if you have enough
> > power but you will end up being bypassed by not proving incorrectness of
> > the patch (not to mention that developer will feel bad about you NACKing
> > his patch).
>
> Well, IMHO, the author of the patch should convince _you_ that the patch is
> correct, not the other way around. If you have doubts and make him think
> twice of the code and he still can't prove his point, this means that he
> doesn't understand what he's doing well enough.

This is a nice theory, practise differs greatly.

Sometimes you are not in position to prevent suspicious patches from being
merged and sometimes you just don't want to do it for various reasons (not
discouring the developer and preventing his personal vendetta against you :).

> > Now the funny thing is that despite the fact that audit takes
> > more time/knowledge then making the patch you will end up with zero credit
> > if patch turns out to be (luckily) correct. Even if you find out issues
> > and report them you are still on mercy of author for being credited so
> > from personal POV you are much better to wait and fix issues after they
> > hit mainline kernel. You have to choose between being a good citizen and
> > preventing kernel regressions or being bastard and getting the credit. ;)
>
> Unless you are the poor soul having to handle bug reports related to the
> problem.
>
> > If you happen to be maintainer of the affected code the choice is similar
> > with more pros for letting the patch in especially if you can't afford the
> > time to do audit (and by being maintainer you are guaranteed to be heavily
> > time constrained).
> >
> > I hope this makes people see the importance of proper review and proper
> > recognition of reviewers in preventing kernel regressions.
> >
> > > As for high regressions/patches ratio: that'll be hard to calculate and
> > > tends to be dependent upon the code which is being altered rather than who
> > > is doing the altering: some stuff is just fragile, for various reasons.
> > >
> > > One ratio which we might want to have a think about is the patches-sent
> > > versus reviews-done ratio ;)
> >
> > Sounds like a good idea.
> >
> > > > I think that we should have info about regressions integrated into SCM,
> > > > i.e. in git we should have optional "fixes-commit" tag and we should be
> > > > able to do some reverse data colletion. This feature combined with
> > > > "Author:" info after some time should give us some very interesting
> > > > statistics (Top Ten "Regressors"). It wouldn't be ideal (ie. we need some
> > > > patches threshold to filter out people with 1 patch and >= 1 regression(s),
> > > > we need to remember that some code areas are more difficult than the others
> > > > and that patches are not equal per se etc.) however I believe than making it
> > > > into Top Ten "Regressors" should give the winners some motivation to improve
> > > > their work ethic. Well, in the worst case we would just get some extra
> > > > trivial/documentation patches. ;-)
> > >
> > > We of course do want to minimise the amount of overhead for each developer.
> > > I'm a strong believer in specialisation: rather than requiring that *every*
> > > developer/maintainer integrate new steps in their processes it would be
> > > better to allow them to proceed in a close-to-usual fashion and to provide
> > > for a specialist person (or team) to do the sorts of things which you're
> > > thinking about.
> >
> > Makes sense... however we need to educate each and every developer about
> > importance of the code review and proper recognition of reviewers.
>
> I don't think that the education alone will be enough. IMO we need to have a
> system that promotes the reviewing of code.

Sure, we need to start somewhere...

Bart

2007-06-18 03:55:33

by Al Boldi

[permalink] [raw]
Subject: Re: How to improve the quality of the kernel?

Michal Piotrowski wrote:
> On 18/06/07, Al Boldi <[email protected]> wrote:
> > Bartlomiej Zolnierkiewicz wrote:
> > > On Sunday 17 June 2007, Andrew Morton wrote:
> > > > We of course do want to minimise the amount of overhead for each
> > > > developer. I'm a strong believer in specialisation: rather than
> > > > requiring that *every* developer/maintainer integrate new steps in
> > > > their processes it would be better to allow them to proceed in a
> > > > close-to-usual fashion and to provide for a specialist person (or
> > > > team) to do the sorts of things which you're thinking about.
> > >
> > > Makes sense... however we need to educate each and every developer
> > > about importance of the code review and proper recognition of
> > > reviewers.
> >
> > That's as easy to manage as is currently done with rc-regressions.
>
> Are you a volunteer?

Probably not, as this task requires a real PRO!

> It's not an easy task, there are more patches than regressions.

I didn't say it was an easy task, and it probably involves a lot of stamina.

But the management aspect looks rather straight forward, yet rewarding.


Thanks!

--
Al

2007-06-18 05:11:03

by Andrew Morton

[permalink] [raw]
Subject: Re: How to improve the quality of the kernel?

On Mon, 18 Jun 2007 01:15:15 +0200 Stefan Richter <[email protected]> wrote:

> Tested-by

Tested-by would be good too. Because over time, we will generate a list of
people who own the relevant hardware and who are prepared to test changes.

So if you make changes to random-driver.c you can do `git-log
random-driver.c|grep Tested-by" to find people who can test your changes
for you.

Not that many people are likely to bother. The consequences of being
slack are negligible, hence there is little incentive to do the extra
work.

2007-06-18 13:23:53

by Vincent Fortier

[permalink] [raw]
Subject: RE: How to improve the quality of the kernel?

> -----Message d'origine-----
> De : [email protected]
> [mailto:[email protected]] De la part de
> Andrew Morton
>
> On Mon, 18 Jun 2007 01:15:15 +0200 Stefan Richter
> <[email protected]> wrote:
>
> > Tested-by
>
> Tested-by would be good too. Because over time, we will
> generate a list of people who own the relevant hardware and
> who are prepared to test changes.

Why not include a user-space tool that, when invoked, if you agree to
send personnal info, sends your hardware vs driver info to a web
database + your email address (maybie even you .config, etc..) ... In
case of help for testing new patches/finding a bug/etc.. your email
could be used by maintainers to ask for help...

> So if you make changes to random-driver.c you can do `git-log
> random-driver.c|grep Tested-by" to find people who can test
> your changes for you.

You would'nt even need to search in GIT. Maybie even when ever a
patchset is being proposed a mail could be sent to appropriate
hardware/or feature pseudo-auto-generated mailing-list?

On lkml I mostly try to follow patches/bugs associated with hardware I
use. Why not try to automate the process and get more testers in?

- vin

2007-06-18 22:31:20

by Natalie Protasevich

[permalink] [raw]
Subject: Re: How to improve the quality of the kernel?

On 6/18/07, Fortier,Vincent [Montreal] <[email protected]> wrote:
> > -----Message d'origine-----
> > De : [email protected]
> > [mailto:[email protected]] De la part de
> > Andrew Morton
> >
> > On Mon, 18 Jun 2007 01:15:15 +0200 Stefan Richter
> > <[email protected]> wrote:
> >
> > > Tested-by
> >
> > Tested-by would be good too. Because over time, we will
> > generate a list of people who own the relevant hardware and
> > who are prepared to test changes.
>
> Why not include a user-space tool that, when invoked, if you agree to
> send personnal info, sends your hardware vs driver info to a web
> database + your email address (maybie even you .config, etc..) ... In
> case of help for testing new patches/finding a bug/etc.. your email
> could be used by maintainers to ask for help...
>
> > So if you make changes to random-driver.c you can do `git-log
> > random-driver.c|grep Tested-by" to find people who can test
> > your changes for you.
>
> You would'nt even need to search in GIT. Maybie even when ever a
> patchset is being proposed a mail could be sent to appropriate
> hardware/or feature pseudo-auto-generated mailing-list?
>
> On lkml I mostly try to follow patches/bugs associated with hardware I
> use. Why not try to automate the process and get more testers in?
>

I think this is an excellent point. One data point could be a field in
bugzilla to input the hardware information. Simple query can select
common hardware and platform. So far it's not working when hardware is
just mentioned in the text part.

--Natalie

> - vin
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>

2007-06-18 22:56:35

by Natalie Protasevich

[permalink] [raw]
Subject: Re: How to improve the quality of the kernel?

On 6/18/07, Martin Bligh <[email protected]> wrote:
> >> > So if you make changes to random-driver.c you can do `git-log
> >> > random-driver.c|grep Tested-by" to find people who can test
> >> > your changes for you.
> >>
> >> You would'nt even need to search in GIT. Maybie even when ever a
> >> patchset is being proposed a mail could be sent to appropriate
> >> hardware/or feature pseudo-auto-generated mailing-list?
> >>
> >> On lkml I mostly try to follow patches/bugs associated with hardware I
> >> use. Why not try to automate the process and get more testers in?
> >>
> >
> > I think this is an excellent point. One data point could be a field in
> > bugzilla to input the hardware information. Simple query can select
> > common hardware and platform. So far it's not working when hardware is
> > just mentioned in the text part.
>
> if it's free text it'll be useless for search ... I suppose we could
> do drop-downs for architecture at least? Not sure much beyond that
> would work ... *possibly* the common drivers, but I don't think
> we'd get enough coverage for it to be of use.
>
How about several buckets for model/BIOS version/chipset etc., at
least optional, and some will be relevant some not for particular
cases. But at least people will make an attempt to collect such data
from their system, boards, etc.

--Natalie

2007-06-18 23:09:18

by Martin Bligh

[permalink] [raw]
Subject: Re: How to improve the quality of the kernel?

>> > So if you make changes to random-driver.c you can do `git-log
>> > random-driver.c|grep Tested-by" to find people who can test
>> > your changes for you.
>>
>> You would'nt even need to search in GIT. Maybie even when ever a
>> patchset is being proposed a mail could be sent to appropriate
>> hardware/or feature pseudo-auto-generated mailing-list?
>>
>> On lkml I mostly try to follow patches/bugs associated with hardware I
>> use. Why not try to automate the process and get more testers in?
>>
>
> I think this is an excellent point. One data point could be a field in
> bugzilla to input the hardware information. Simple query can select
> common hardware and platform. So far it's not working when hardware is
> just mentioned in the text part.

if it's free text it'll be useless for search ... I suppose we could
do drop-downs for architecture at least? Not sure much beyond that
would work ... *possibly* the common drivers, but I don't think
we'd get enough coverage for it to be of use.

M.

2007-06-18 23:59:35

by Martin Bligh

[permalink] [raw]
Subject: Re: How to improve the quality of the kernel?

Natalie Protasevich wrote:
> On 6/18/07, Martin Bligh <[email protected]> wrote:
>> >> > So if you make changes to random-driver.c you can do `git-log
>> >> > random-driver.c|grep Tested-by" to find people who can test
>> >> > your changes for you.
>> >>
>> >> You would'nt even need to search in GIT. Maybie even when ever a
>> >> patchset is being proposed a mail could be sent to appropriate
>> >> hardware/or feature pseudo-auto-generated mailing-list?
>> >>
>> >> On lkml I mostly try to follow patches/bugs associated with hardware I
>> >> use. Why not try to automate the process and get more testers in?
>> >>
>> >
>> > I think this is an excellent point. One data point could be a field in
>> > bugzilla to input the hardware information. Simple query can select
>> > common hardware and platform. So far it's not working when hardware is
>> > just mentioned in the text part.
>>
>> if it's free text it'll be useless for search ... I suppose we could
>> do drop-downs for architecture at least? Not sure much beyond that
>> would work ... *possibly* the common drivers, but I don't think
>> we'd get enough coverage for it to be of use.
>
> How about several buckets for model/BIOS version/chipset etc., at
> least optional, and some will be relevant some not for particular
> cases. But at least people will make an attempt to collect such data
> from their system, boards, etc.

Mmm. the problem is that either they're:

1. free text, in which case they're useless, as everyone types
mis-spelled random crud into them. However, free-text search
through the comment fields might work out.

2. Drop downs, in which case someone has to manage the lists
etc, they're horribly crowded with lots of options. trying to
do that for model/BIOS version/chipset would be a nightmare.

If they're mandatory, they're a pain in the butt, and often
irrelevant ... if they're optional, nobody will fill them in.
Either way, they clutter the interface ;-(

Sorry to be a wet blanket, but I've seen those sort of things
before, and they just don't seem to work, especially in the
environment we're in with such a massive diversity of hardware.

If we can come up with some very clear, tightly constrained
choices, that's a decent possibility. Nothing other than
kernel architecture (i386 / x86_64 / ia64) or whatever springs
to mind, but perhaps I'm being unimaginative.

Nothing complicated ever seems to work ... even the simple
stuff is difficult ;-(

M.

2007-06-19 00:11:40

by Linus Torvalds

[permalink] [raw]
Subject: Re: How to improve the quality of the kernel?



On Mon, 18 Jun 2007, Martin Bligh wrote:
>
> Sorry to be a wet blanket, but I've seen those sort of things
> before, and they just don't seem to work, especially in the
> environment we're in with such a massive diversity of hardware.

I do agree. It _sounds_ like a great idea to try to control the flow of
patches better, but in the end, it needs to also be easy and painfree to
the people involved, and also make sure that any added workflow doesn't
require even *more* load and expertise on the already often overworked
maintainers..

In many cases, I think it tends to *sound* great to try to avoid
regressions in the first place - but it also sounds like one of those "I
wish the world didn't work the way it did" kind of things. A worthy goal,
but not necessarily one that is compatible with reality.

Linus

2007-06-19 00:25:09

by Natalie Protasevich

[permalink] [raw]
Subject: Re: How to improve the quality of the kernel?

On 6/18/07, Linus Torvalds <[email protected]> wrote:
>
>
> On Mon, 18 Jun 2007, Martin Bligh wrote:
> >
> > Sorry to be a wet blanket, but I've seen those sort of things
> > before, and they just don't seem to work, especially in the
> > environment we're in with such a massive diversity of hardware.
>
> I do agree. It _sounds_ like a great idea to try to control the flow of
> patches better, but in the end, it needs to also be easy and painfree to
> the people involved, and also make sure that any added workflow doesn't
> require even *more* load and expertise on the already often overworked
> maintainers..
>
> In many cases, I think it tends to *sound* great to try to avoid
> regressions in the first place - but it also sounds like one of those "I
> wish the world didn't work the way it did" kind of things. A worthy goal,
> but not necessarily one that is compatible with reality.
>
> Linus
>

Sure, simplicity is a key - but most of reporters on bugs are pretty
professional folks (or very well rounded amateurs :) We can try still
why not? the worst that can happen will be empty fields.

Maybe searching free text fields can then be implemented. Then every
message exchange in bugzilla can be used for extracting such info -
questions about HW specifics are asked a lot, almost in every one.
It's a shame we cant' use this information. I was once searching for
"VIA" and got "zero bugs found", but in reality there are hundreds!
Probably something that makes sense to bring up with bugzilla project?

However, I've been working with other bugzillas (have to admit they
were mostly company/corporate), where this was a required field that
didn't seem to cause difficulties. I am planning to do some more
research and get some more ideas from other bugzillas. I suppose we
can have them discussed and revised sometime.

--Natalie

2007-06-19 00:28:25

by Martin Bligh

[permalink] [raw]
Subject: Re: regression tracking (Re: Linux 2.6.21)

Linus Torvalds wrote:
>
> On Thu, 14 Jun 2007, Oleg Verych wrote:
>> I'm seeing this long (198) thread and just have no idea how it has
>> ended (wiki? hand-mailing?).
>
> I'm hoping it's not "ended".
>
> IOW, I really don't think we _resolved_ anything, although the work that
> Adrian started is continuing through the wiki and other people trying to
> track regressions, and that was obviously something good.
>
> But I don't think we really know where we want to take this thing in the
> long run. I think everybody wants a better bug-tracking system, but
> whether something that makes people satisfied can even be built is open.
> It sure doesn't seem to exist right now ;)

I know you hate bugzilla ... but at least I can try to make that bit
of the process work better.

The new version just rolled out does have a simple "regression" checkbox
(and you can search on it), which will hopefully help people keep track
of the ones already in bugzilla more easily.

Thanks to Jon T, Dave J et al. for helping to figure out methods and
implement them.

M.

2007-06-19 00:43:11

by Martin Bligh

[permalink] [raw]
Subject: Re: How to improve the quality of the kernel?

> Sure, simplicity is a key - but most of reporters on bugs are pretty
> professional folks (or very well rounded amateurs :) We can try still
> why not? the worst that can happen will be empty fields.

mmm. added complexity and interface clutter for little or no benefit
is what I'm trying to avoid - they did that in the IBM bugzilla and
it turned into a big ugly unusable monster. You can call me either
"experienced" or "bitter" depending what mood you're in ;-)

Not sure I'd agree that most of the bug submitters are all that
professional, it's a very mixed bag.

> Maybe searching free text fields can then be implemented. Then every
> message exchange in bugzilla can be used for extracting such info -
> questions about HW specifics are asked a lot, almost in every one.
> It's a shame we cant' use this information. I was once searching for
> "VIA" and got "zero bugs found", but in reality there are hundreds!
> Probably something that makes sense to bring up with bugzilla project?

That should work now ... seems to for me.

http://bugzilla.kernel.org/buglist.cgi?query_format=advanced&short_desc_type=allwordssubstr&short_desc=&long_desc_type=substring&long_desc=VIA&kernel_version_type=allwordssubstr&kernel_version=&bug_status=NEW&bug_status=REOPENED&bug_status=ASSIGNED&bug_status=RESOLVED&bug_status=VERIFIED&bug_status=REJECTED&bug_status=DEFERRED&bug_status=NEEDINFO&bug_status=CLOSED&emailassigned_to1=1&emailtype1=substring&email1=&emailassigned_to2=1&emailreporter2=1&emailcc2=1&emailtype2=substring&email2=&bugidtype=include&bug_id=&chfieldfrom=&chfieldto=Now&chfieldvalue=&regression=both&cmdtype=doit&order=Bug+Number&field0-0-0=noop&type0-0-0=noop&value0-0-0=

Produces a metric-buttload of results. Go to the advanced search option
and do "A Comment" contains the string "VIA". By default "Status" is
only set to do open bugs, which you might want to change to all types.

> However, I've been working with other bugzillas (have to admit they
> were mostly company/corporate), where this was a required field that
> didn't seem to cause difficulties. I am planning to do some more
> research and get some more ideas from other bugzillas. I suppose we
> can have them discussed and revised sometime.

Would be good, thanks. I tend to favour keeping things as simple as
possible though, we have very little control over our users, and they're
a very broad base. Making the barrier to entry for use as low as
possible is the design we've been pursuing.

M.

2007-06-19 00:55:46

by Natalie Protasevich

[permalink] [raw]
Subject: Re: How to improve the quality of the kernel?

On 6/18/07, Martin Bligh <[email protected]> wrote:
> > Sure, simplicity is a key - but most of reporters on bugs are pretty
> > professional folks (or very well rounded amateurs :) We can try still
> > why not? the worst that can happen will be empty fields.
>
> mmm. added complexity and interface clutter for little or no benefit
> is what I'm trying to avoid - they did that in the IBM bugzilla and
> it turned into a big ugly unusable monster. You can call me either
> "experienced" or "bitter" depending what mood you're in ;-)
>
> Not sure I'd agree that most of the bug submitters are all that
> professional, it's a very mixed bag.
>
> > Maybe searching free text fields can then be implemented. Then every
> > message exchange in bugzilla can be used for extracting such info -
> > questions about HW specifics are asked a lot, almost in every one.
> > It's a shame we cant' use this information. I was once searching for
> > "VIA" and got "zero bugs found", but in reality there are hundreds!
> > Probably something that makes sense to bring up with bugzilla project?
>
> That should work now ... seems to for me.
>
> http://bugzilla.kernel.org/buglist.cgi?query_format=advanced&short_desc_type=allwordssubstr&short_desc=&long_desc_type=substring&long_desc=VIA&kernel_version_type=allwordssubstr&kernel_version=&bug_status=NEW&bug_status=REOPENED&bug_status=ASSIGNED&bug_status=RESOLVED&bug_status=VERIFIED&bug_status=REJECTED&bug_status=DEFERRED&bug_status=NEEDINFO&bug_status=CLOSED&emailassigned_to1=1&emailtype1=substring&email1=&emailassigned_to2=1&emailreporter2=1&emailcc2=1&emailtype2=substring&email2=&bugidtype=include&bug_id=&chfieldfrom=&chfieldto=Now&chfieldvalue=&regression=both&cmdtype=doit&order=Bug+Number&field0-0-0=noop&type0-0-0=noop&value0-0-0=
>
> Produces a metric-buttload of results. Go to the advanced search option
> and do "A Comment" contains the string "VIA". By default "Status" is
> only set to do open bugs, which you might want to change to all types.

Yes it works great! Thanks... I'd say this should be really a default
search, because first search screen is misleading - it promises find
all for any "words" :)

>
> > However, I've been working with other bugzillas (have to admit they
> > were mostly company/corporate), where this was a required field that
> > didn't seem to cause difficulties. I am planning to do some more
> > research and get some more ideas from other bugzillas. I suppose we
> > can have them discussed and revised sometime.
>
> Would be good, thanks. I tend to favour keeping things as simple as
> possible though, we have very little control over our users, and they're
> a very broad base. Making the barrier to entry for use as low as
> possible is the design we've been pursuing.

Actually, as long as search above is possible - it is going to work.

I must say the new bugzilla interface is very nice in general, and
well designed and easy to use.

--Natalie

2007-06-19 01:10:23

by Martin Bligh

[permalink] [raw]
Subject: Re: How to improve the quality of the kernel?

>> > Maybe searching free text fields can then be implemented. Then every
>> > message exchange in bugzilla can be used for extracting such info -
>> > questions about HW specifics are asked a lot, almost in every one.
>> > It's a shame we cant' use this information. I was once searching for
>> > "VIA" and got "zero bugs found", but in reality there are hundreds!
>> > Probably something that makes sense to bring up with bugzilla project?
>>
>> That should work now ... seems to for me.
>>
>> http://bugzilla.kernel.org/buglist.cgi?query_format=advanced&short_desc_type=allwordssubstr&short_desc=&long_desc_type=substring&long_desc=VIA&kernel_version_type=allwordssubstr&kernel_version=&bug_status=NEW&bug_status=REOPENED&bug_status=ASSIGNED&bug_status=RESOLVED&bug_status=VERIFIED&bug_status=REJECTED&bug_status=DEFERRED&bug_status=NEEDINFO&bug_status=CLOSED&emailassigned_to1=1&emailtype1=substring&email1=&emailassigned_to2=1&emailreporter2=1&emailcc2=1&emailtype2=substring&email2=&bugidtype=include&bug_id=&chfieldfrom=&chfieldto=Now&chfieldvalue=&regression=both&cmdtype=doit&order=Bug+Number&field0-0-0=noop&type0-0-0=noop&value0-0-0=
>>
>>
>> Produces a metric-buttload of results. Go to the advanced search option
>> and do "A Comment" contains the string "VIA". By default "Status" is
>> only set to do open bugs, which you might want to change to all types.
>
> Yes it works great! Thanks... I'd say this should be really a default
> search, because first search screen is misleading - it promises find
> all for any "words" :)

OK, or at the very least we can fix the text at least to indicate it'll
only search summaries (and likely only of open bugs at that ...)

M.

2007-06-19 01:52:29

by Vincent Fortier

[permalink] [raw]
Subject: RE: How to improve the quality of the kernel?

> -----Message d'origine-----
> De : Natalie Protasevich [mailto:[email protected]]
> Envoy? : 18 juin 2007 18:56
>
> On 6/18/07, Martin Bligh <[email protected]> wrote:
> > >> > So if you make changes to random-driver.c you can do `git-log
> > >> > random-driver.c|grep Tested-by" to find people who can test your
> > >> > changes for you.
> > >>
> > >> You would'nt even need to search in GIT. Maybie even when ever a
> > >> patchset is being proposed a mail could be sent to appropriate
> > >> hardware/or feature pseudo-auto-generated mailing-list?
> > >>
> > >> On lkml I mostly try to follow patches/bugs associated with
> > >> hardware I use. Why not try to automate the process and get more testers in?
> > >>
> > >
> > > I think this is an excellent point. One data point could be a field
> > > in bugzilla to input the hardware information. Simple query can
> > > select common hardware and platform. So far it's not working when
> > > hardware is just mentioned in the text part.
> >
> > if it's free text it'll be useless for search ... I suppose we could
> > do drop-downs for architecture at least? Not sure much beyond that
> > would work ... *possibly* the common drivers, but I don't think we'd
> > get enough coverage for it to be of use.
> >
> How about several buckets for model/BIOS version/chipset
> etc., at least optional, and some will be relevant some not
> for particular cases. But at least people will make an
> attempt to collect such data from their system, boards, etc.

How about an easy way to send multiple hardware profiles to your bugzilla user account simultaniously linked to an online pciutils database and/or an hardware list database similar to overclocking web sites and why not even with a link to the git repository when possible?

A some sort of really usefull "send your profile" of RHN that would link the driver with the discovered hardware and add you to appropriate mailing lists to test patches/help reproducing & solving problems/etc.

In the end plenty of statistics and hardware compatibility list could be made. For example, that would make my life easier knowing what level of compatibility Linux can offer for old HP9000 K-boxes that we still have running at the office and presumably get people to contact to get help?

- vin

2007-06-19 02:28:04

by Natalie Protasevich

[permalink] [raw]
Subject: Re: How to improve the quality of the kernel?

On 6/18/07, Fortier,Vincent [Montreal] <[email protected]> wrote:
> > -----Message d'origine-----
> > De : Natalie Protasevich [mailto:[email protected]]
> > Envoy? : 18 juin 2007 18:56
> >
> > On 6/18/07, Martin Bligh <[email protected]> wrote:
> > > >> > So if you make changes to random-driver.c you can do `git-log
> > > >> > random-driver.c|grep Tested-by" to find people who can test your
> > > >> > changes for you.
> > > >>
> > > >> You would'nt even need to search in GIT. Maybie even when ever a
> > > >> patchset is being proposed a mail could be sent to appropriate
> > > >> hardware/or feature pseudo-auto-generated mailing-list?
> > > >>
> > > >> On lkml I mostly try to follow patches/bugs associated with
> > > >> hardware I use. Why not try to automate the process and get more testers in?
> > > >>
> > > >
> > > > I think this is an excellent point. One data point could be a field
> > > > in bugzilla to input the hardware information. Simple query can
> > > > select common hardware and platform. So far it's not working when
> > > > hardware is just mentioned in the text part.
> > >
> > > if it's free text it'll be useless for search ... I suppose we could
> > > do drop-downs for architecture at least? Not sure much beyond that
> > > would work ... *possibly* the common drivers, but I don't think we'd
> > > get enough coverage for it to be of use.
> > >
> > How about several buckets for model/BIOS version/chipset
> > etc., at least optional, and some will be relevant some not
> > for particular cases. But at least people will make an
> > attempt to collect such data from their system, boards, etc.
>
> How about an easy way to send multiple hardware profiles to your bugzilla user account simultaniously linked to an online pciutils database and/or an hardware list database similar to overclocking web sites and why not even with a link to the git repository when possible?
>
> A some sort of really usefull "send your profile" of RHN that would link the driver with the discovered hardware and add you to appropriate mailing lists to test patches/help reproducing & solving problems/etc.
>
> In the end plenty of statistics and hardware compatibility list could be made. For example, that would make my life easier knowing what level of compatibility Linux can offer for old HP9000 K-boxes that we still have running at the office and presumably get people to contact to get help?

This is definitely something that can be done (and should) - well,
especially having ability search by certain criteria - then all sorts
of statistics and databases can be created.
Everything that helps to find a way to work on a patch and to test
easier should be done to make bug fixing easier and even possible.
Often times the most knowledgeable people are not able to make quick
fix just because there is no way to reproduce the case or get access
to HW.

--Natalie

2007-06-19 03:54:27

by Oleg Verych

[permalink] [raw]
Subject: This is [Re:] How to improve the quality of the kernel[?].

[Dear Debbug developers, i wish your ideas will be useful.]

* From: Linus Torvalds
* Newsgroups: gmane.linux.kernel
* Date: Mon, 18 Jun 2007 17:09:37 -0700 (PDT)
>
> On Mon, 18 Jun 2007, Martin Bligh wrote:
>>
>> Sorry to be a wet blanket, but I've seen those sort of things
>> before, and they just don't seem to work, especially in the
>> environment we're in with such a massive diversity of hardware.
>
> I do agree. It _sounds_ like a great idea to try to control the flow of
> patches better,

There were some ideas, i will try to summarize:

* New Patches (or sets) need tracking, review, testing

Zero (tracking) done by sending (To, or bcc) [RFC] patch to something
like [email protected] (like BTS now). Notifications will
be sent to intrested maintainers (if meta-information is ok) or testers
(see below)

First is mostly done by maintainers or interested individuals.
Result is "Acked-by" and "Cc" entries in the next patch sent. Due to
lack of tracking this things are done manually, are generally in
trusted manner. But bad like <[email protected]>
sometimes happens.

When patch in sent to this PTS, your lovely
checkpatch/check-whatever-crap-has-being-sent tools can be set up as
gatekeepers, thus making impossible stupid errors with MIME coding,
line wrapping, whatever style you've came up with now in checking
sent crap.

* Tracking results of review (Acked-by).

This can be mostly e-mail exchange with comments and agreements.
"Acked-by" semantic may be implemented in form of contlol message to
tracking system, and this system will generate e-mail confirmation
to the patch author in form of
"Acked-by: Developer's Name <message-id of e-mail with acke-by>"

Thus, next patch will have this entry. And if testing of this
version ir regression happens, there's info about who is/was
interested/involved.

* Testing.

Mainly same for "Tested-by"
(newly suggested by Stefan <[email protected]>)

|-*- Feature Needed -*-
Addition, needed is hardware user tested have/had/used. Currently
``reportbug'' tool includes packed specific and system specific
additions automaticly gathered and inserted to e-mail sent to BTS.
(e.g. <http://permalink.gmane.org/gmane.linux.debian.devel.kernel/29740>)

Formats of that hardware profile(as system information in reportbug)
. arch
. chipset
. hdd
. vga
...
in meaningful fields, and not just lspci -v[vv]. If additional info
(-vvv) or something required, profile can be exteded.

For kernel's sub-system information(as packed info):
. subsystem/driver/kernel version (or similar)
. maintainers must know what they need to see more here

|-*- back to patches -*-

Last and not least tast cases, that everyone might came up with.

All formats this can be agreed (or implemented and updated latter)
and inserted automaticly.

* Commit.
Id is recorded, patch archived. But any additions are welcome,
regressions will pop up this patch again (reopen in BTS).

> but in the end, it needs to also be easy and painfree to the people
> involved, and also make sure that any added workflow doesn't require
> even *more* load and expertise on the already often overworked
> maintainers..

Experienced BTS users and developers. Please, correct me if i'm wrong.
At least e-mail part of Debian's BTS and whole idea of it is *exactly*
what is needed. Bugzilla fans, you can still use you useless pet,
because Debian developers have done things, to track and e-mail states
of bugs: <http://permalink.gmane.org/gmane.linux.debian.devel.kernel/29736>

> In many cases, I think it tends to *sound* great to try to avoid
> regressions in the first place - but it also sounds like one of those "I
> wish the world didn't work the way it did" kind of things. A worthy goal,
> but not necessarily one that is compatible with reality.

I wish perl hackers out there will join this yet-new effort. I know
there many of them out there, writing kilobytes of checkfile and
checkpatch (i've wrote in few lines of ``sed'').

BTS is written on perl, but any interoperability interface, like
stdin/stdout for python or shell hackers is worth of thinking about.

Please, see more and make useful follows ups: http://bugs.debian.org/

Please, do not (<[email protected]>)
""" I know you hate bugzilla ... but at least I can try to make that bit
of the process work better.
""" [here's you fancy checkbox...]

Thanks.
____

2007-06-19 11:07:04

by Stefan Richter

[permalink] [raw]
Subject: Re: How to improve the quality of the kernel?

Natalie Protasevich wrote:
> On 6/18/07, Fortier,Vincent [Montreal] <[email protected]> wrote:
[...]
>> In the end plenty of statistics and hardware compatibility list
>> could be made. For example, that would make my life easier knowing
>> what level of compatibility Linux can offer for old HP9000 K-boxes
>> that we still have running at the office and presumably get people
>> to contact to get help?
>
> This is definitely something that can be done (and should) - well,
> especially having ability search by certain criteria - then all sorts
> of statistics and databases can be created.

Hardware Compatibility Lists/ Databases already exist, for driver
subsystems, for distributions...

Some issues with those databases are:

- Users typically can only test one specific combination of a
hardware collection and software collection, at one or a few points
in time.

- Users have difficulties or don't have the means to identify chip
revisions, used protocols etc.

- The databases are typically not conceived to serve additional
purposes like bidirectional contact between developer and user.

These issues notwithstanding, these databases are already highly useful
both for endusers and for developers. That's why they exist.

> Everything that helps to find a way to work on a patch and to test
> easier should be done to make bug fixing easier and even possible.
> Often times the most knowledgeable people are not able to make quick
> fix just because there is no way to reproduce the case or get access
> to HW.

As has been mentioned elsewhere in the thread,

- bug---hardware associations are sometimes difficult or impossible
to make. For example, the x86-64 platform maintainers are bothered
with "x86-64 bugs" which turn out to be driver bugs on all
platforms.

(We want details descriptions of the hardware environment in a bug
report, but this means we must be able to handle the flood of
false positives in bug---hardware associations, i.e. successively
narrow down which parts of the hardware/software combo are actually
affected, and what other combinations could be affected too.)

- Patch---hardware associations, especially for preemptive regression
tests, are virtually impossible to make. Murphy says that the
regression will hit hardware which the patch submitter or forwarder
thought could never be affected by the patch.

Of course, /sensible/ patch---hardware associations are (1) to try out
fixes for known issues with a specific hardware, (2) to test that a
cleanup patch or refactoring patch or API changing patch to a driver of
very specific hardware ( = a single type or few types with little
variance) does not introduce regressions for this hardware.
--
Stefan Richter
-=====-=-=== -==- =--==
http://arcgraph.de/sr/

2007-06-19 12:48:48

by Adrian Bunk

[permalink] [raw]
Subject: Re: This is [Re:] How to improve the quality of the kernel[?].

On Tue, Jun 19, 2007 at 06:06:47AM +0200, Oleg Verych wrote:
> [Dear Debbug developers, i wish your ideas will be useful.]
>
> * From: Linus Torvalds
> * Newsgroups: gmane.linux.kernel
> * Date: Mon, 18 Jun 2007 17:09:37 -0700 (PDT)
> >
> > On Mon, 18 Jun 2007, Martin Bligh wrote:
> >>
> >> Sorry to be a wet blanket, but I've seen those sort of things
> >> before, and they just don't seem to work, especially in the
> >> environment we're in with such a massive diversity of hardware.
> >
> > I do agree. It _sounds_ like a great idea to try to control the flow of
> > patches better,
>
> There were some ideas, i will try to summarize:
>
> * New Patches (or sets) need tracking, review, testing
>
> Zero (tracking) done by sending (To, or bcc) [RFC] patch to something
> like [email protected] (like BTS now). Notifications will
> be sent to intrested maintainers (if meta-information is ok) or testers
> (see below)
>
> First is mostly done by maintainers or interested individuals.
> Result is "Acked-by" and "Cc" entries in the next patch sent. Due to
> lack of tracking this things are done manually, are generally in
> trusted manner. But bad like <[email protected]>
> sometimes happens.

The goal is to get all patches for a maintained subsystem submitted to
Linus by the maintainer.

> When patch in sent to this PTS, your lovely
> checkpatch/check-whatever-crap-has-being-sent tools can be set up as
> gatekeepers, thus making impossible stupid errors with MIME coding,
> line wrapping, whatever style you've came up with now in checking
> sent crap.

The -mm kernel already implements what your proposed PTS would do.

Plus it gives testers more or less all patches currently pending
inclusion into Linus' tree in one kernel they can test.

The problem are more social problems like patches Andrew has never heard
of before getting into Linus' tree during the merge window.

>...
> |-*- Feature Needed -*-
> Addition, needed is hardware user tested have/had/used. Currently
> ``reportbug'' tool includes packed specific and system specific
> additions automaticly gathered and inserted to e-mail sent to BTS.
> (e.g. <http://permalink.gmane.org/gmane.linux.debian.devel.kernel/29740>)

The problem is that most problems don't occur on one well-defined
kind of hardware - patches often break in exactly the areas the patch
author expected no problems in.

And in many cases a patch for a device driver was written due to a bug
report - in such cases a tester with the hardware in question is already
available.

>...
> > but in the end, it needs to also be easy and painfree to the people
> > involved, and also make sure that any added workflow doesn't require
> > even *more* load and expertise on the already often overworked
> > maintainers..
>
> Experienced BTS users and developers. Please, correct me if i'm wrong.
> At least e-mail part of Debian's BTS and whole idea of it is *exactly*
> what is needed. Bugzilla fans, you can still use you useless pet,
> because Debian developers have done things, to track and e-mail states
> of bugs: <http://permalink.gmane.org/gmane.linux.debian.devel.kernel/29736>
>...

"useless pet"?
Be serious.
How many open source projects use Bugzilla and how many use the Debian BTS?
And then start thinking about why the "useless pet" has so many more
user...

The Debian BTS requires you to either write emails with control messages
or generating control messages with external tools.

In Bugzilla the same works through a web interface.

I know both the Debian BTS and Bugzilla and although they are quite
different they both are reasonable tools for their purpose.

cu
Adrian

--

"Is there not promise of rain?" Ling Tan asked suddenly out
of the darkness. There had been need of rain for many days.
"Only a promise," Lao Er said.
Pearl S. Buck - Dragon Seed

2007-06-19 13:43:03

by Don Armstrong

[permalink] [raw]
Subject: Re: This is [Re:] How to improve the quality of the kernel[?].

On Tue, 19 Jun 2007, Oleg Verych wrote:
> * From: Linus Torvalds
> * Newsgroups: gmane.linux.kernel
> * Date: Mon, 18 Jun 2007 17:09:37 -0700 (PDT)
>
> > I do agree. It _sounds_ like a great idea to try to control the
> > flow of patches better,
>
> There were some ideas, i will try to summarize:
>
> * New Patches (or sets) need tracking, review, testing
>
> Zero (tracking) done by sending (To, or bcc) [RFC] patch to something
> like [email protected] (like BTS now). Notifications will
> be sent to intrested maintainers (if meta-information is ok) or testers
> (see below)

The BTS, while fairly good at tracking issues for distributions made
up of thousands of packages (like Debian), is rather suboptimal for
dealing with the workflow of a single (relatively) monolithic entity
like the linux kernel.

Since the ultimate goal is presumably to apply a patch to a git tree,
some sort of system which is built directly on top of git (or
intimately intertwined) is probably required. Some of the metrics that
the BTS uses, like the easy ability to use mail to control bugs may be
useful to incorporate, but I'd be rather surprised if it could be made
to work with the kernel developer's workflow as it exists now.

It may be useful for whoever ends up designing the patch system to
take a glimpse at how it's done in debbugs, but since I don't know how
the workflow works now, and how people want to have it work in the
end, I can't tell you what features from debbugs would be useful to
use.

Finally, at the end of the day, my own time and effort (and the
primary direction of debbugs development) is aimed at supporting the
primary user of debbugs, the Debian project. People who understand (or
want to understand) the linux kernel team's workflow are the ones who
are going to need to do the heavy lifting here.


Don Armstrong

--
N: Why should I believe that?"
B: Because it's a fact."
N: Fact?"
B: F, A, C, T... fact"
N: So you're saying that I should believe it because it's true.
That's your argument?
B: It IS true.
-- "Ploy" http://www.mediacampaign.org/multimedia/Ploy.MPG

http://www.donarmstrong.com http://rzlab.ucr.edu

2007-06-19 13:52:49

by Oleg Verych

[permalink] [raw]
Subject: Re: This is [Re:] How to improve the quality of the kernel[?].

[Dropping noise for Debbugs, because interested people may join via Gmane]

On Tue, Jun 19, 2007 at 02:48:55PM +0200, Adrian Bunk wrote:
> On Tue, Jun 19, 2007 at 06:06:47AM +0200, Oleg Verych wrote:
> > [Dear Debbug developers, i wish your ideas will be useful.]
> >
> > * From: Linus Torvalds
> > * Newsgroups: gmane.linux.kernel
> > * Date: Mon, 18 Jun 2007 17:09:37 -0700 (PDT)
> > >
> > > On Mon, 18 Jun 2007, Martin Bligh wrote:
> > >>
> > >> Sorry to be a wet blanket, but I've seen those sort of things
> > >> before, and they just don't seem to work, especially in the
> > >> environment we're in with such a massive diversity of hardware.
> > >
> > > I do agree. It _sounds_ like a great idea to try to control the flow of
> > > patches better,
> >
> > There were some ideas, i will try to summarize:
> >
> > * New Patches (or sets) need tracking, review, testing
> >
> > Zero (tracking) done by sending (To, or bcc) [RFC] patch to something
> > like [email protected] (like BTS now). Notifications will
> > be sent to intrested maintainers (if meta-information is ok) or testers
> > (see below)
> >
> > First is mostly done by maintainers or interested individuals.
> > Result is "Acked-by" and "Cc" entries in the next patch sent. Due to
> > lack of tracking this things are done manually, are generally in
> > trusted manner. But bad like <[email protected]>
> > sometimes happens.
>
> The goal is to get all patches for a maintained subsystem submitted to
> Linus by the maintainer.
>
> > When patch in sent to this PTS, your lovely
> > checkpatch/check-whatever-crap-has-being-sent tools can be set up as
> > gatekeepers, thus making impossible stupid errors with MIME coding,
> > line wrapping, whatever style you've came up with now in checking
> > sent crap.
>
> The -mm kernel already implements what your proposed PTS would do.

Having all-in-one patchset, like -mm is easy thing to provide
interested parties with "you know what you have -- crazy development"

However [P]TS is tracking, recording, organizing tool. {1} Andrew's cron
daemon easily can run script to check status of particular patch (cc,
tested-by, acked-by). If patch have no TS ID, Andrew's watchdog is
barking back to patch originator (with polite asking to send patch as:

* TS as "To:" target
* patch author as "Cc:" target, this is useful to require:
. author can check that copy himself with text-only pager program
(to see any MIME coding crap)
. preventing SPAM
* maybe somebody else in Cc or Bcc.)

> Plus it gives testers more or less all patches currently pending
> inclusion into Linus' tree in one kernel they can test.

Crazy development{0}. Somebody knows, that comprehensively testing
hibernation is their thing. I don't care about it, i care about foo, bar.
Thus i can apply for example lguest patches and implement and test new
asm-offset replacement, *easily*. Somebody, as you know, likes new fancy
file system, and no-way other. Let them be happy testing that thing
*easily*. Because another fancy NO_MHz will hang their testing bench
right after best ever speed results were recorded.

> The problem are more social problems like patches Andrew has never heard
> of before getting into Linus' tree during the merge window.

Linus' watchdog, as well, asking for valid patch id, or just doesn't
care (in similar manner Linus does :).

So far no human is involved in social things. Do you agree?

Human power is worth and needed in particular patch discussion and
testing under the participation (by Cc, acking, test-ok *e-mails*) of
tracking system.

> >...
> > |-*- Feature Needed -*-
> > Addition, needed is hardware user tested have/had/used. Currently
> > ``reportbug'' tool includes packed specific and system specific
> > additions automaticly gathered and inserted to e-mail sent to BTS.
> > (e.g. <http://permalink.gmane.org/gmane.linux.debian.devel.kernel/29740>)
>
> The problem is that most problems don't occur on one well-defined
> kind of hardware - patches often break in exactly the areas the patch
> author expected no problems in.

I tried to test that new fancy FS, and couldn't boot because of
yet-another ACPI crap. See theme{0}?

Overall testing, like Andrew does, is doubtless brave thing, but he have
more time after {1}, isn't it?

> And in many cases a patch for a device driver was written due to a bug
> report - in such cases a tester with the hardware in question is already
> available.

Tracking all possible testers (for next driver update, for example) is
in question.

>
> >...
> > > but in the end, it needs to also be easy and painfree to the people
> > > involved, and also make sure that any added workflow doesn't require
> > > even *more* load and expertise on the already often overworked
> > > maintainers..
> >
> > Experienced BTS users and developers. Please, correct me if i'm wrong.
> > At least e-mail part of Debian's BTS and whole idea of it is *exactly*
> > what is needed. Bugzilla fans, you can still use you useless pet,
> > because Debian developers have done things, to track and e-mail states
> > of bugs: <http://permalink.gmane.org/gmane.linux.debian.devel.kernel/29736>
> >...
>
> "useless pet"?
> Be serious.
> How many open source projects use Bugzilla and how many use the Debian BTS?
> And then start thinking about why the "useless pet" has so many more
> user...

I might be stupid, but i faced it. On my amd64 512M laptop i *cannot* run
mozillka any more, for example! And i don't care, because Linus said his
opinion and i fully agree with him.

> The Debian BTS requires you to either write emails with control messages
> or generating control messages with external tools.

Awesome!!! Are you wrote this reply to me by

> In Bugzilla the same works through a web interface.

web interface? If you did .........</dev/random dd bs=1 count=13.....
Actually you didn't ;)

> I know both the Debian BTS and Bugzilla and although they are quite
> different they both are reasonable tools for their purpose.

As you just might have seen here, i was talking about organizing,
tracking, hopefully saving and redirecting useful main power. And i don't
bother search e-mails i saw about Bugzilla's BD from many other prominent
developers. I just know that, not from my dream or physical vacuum.

Basic concept of Debian BTS is what i've discovered after many useless
hours i spent in Bugzilla. And this is mainly because of one basic
important thing, that nobody acknowledged (for newbies, like me):

* E-Mail with useful MUAs, after it got reliable delivery MTAs with qmail
(or exim) is the main communication toolset.

Can't you see that from Linux's patch sending policy?

I also what to reply to myself about why LKML was established and
USENET (news) was abandoned.

To control and to keep running *your* _main communication toolset_
(read as "your user,developer feedback").

I just couldn't realize that, because i grew up in free web e-mail, after
having set up my own server with MTA and real e-mail and after
discovering Gmane (really mind-blowing evolution of the e-mail system!)

> cu
> Adrian
>
> --
____

2007-06-19 14:27:29

by Stefan Richter

[permalink] [raw]
Subject: Re: This is [Re:] How to improve the quality of the kernel[?].

On 6/19/2007 4:05 PM, Oleg Verych wrote:
> On Tue, Jun 19, 2007 at 02:48:55PM +0200, Adrian Bunk wrote:
>> The Debian BTS requires you to either write emails with control messages
>> or generating control messages with external tools.
...
>> In Bugzilla the same works through a web interface.
...
> Basic concept of Debian BTS is what i've discovered after many useless
> hours i spent in Bugzilla. And this is mainly because of one basic
> important thing, that nobody acknowledged (for newbies, like me):
>
> * E-Mail with useful MUAs, after it got reliable delivery MTAs with qmail
> (or exim) is the main communication toolset.
>
> Can't you see that from Linux's patch sending policy?

That's for developers, not for users.

There are different people involved in
- patch handling,
- bug handling (bugs are reported by end-users),
therefore don't forget that PTS and BTS have different requirements.
--
Stefan Richter
-=====-=-=== -==- =--==
http://arcgraph.de/sr/

2007-06-19 14:43:20

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: regression tracking (Re: Linux 2.6.21)

On Tuesday, 19 June 2007 02:28, Martin Bligh wrote:
> Linus Torvalds wrote:
> >
> > On Thu, 14 Jun 2007, Oleg Verych wrote:
> >> I'm seeing this long (198) thread and just have no idea how it has
> >> ended (wiki? hand-mailing?).
> >
> > I'm hoping it's not "ended".
> >
> > IOW, I really don't think we _resolved_ anything, although the work that
> > Adrian started is continuing through the wiki and other people trying to
> > track regressions, and that was obviously something good.
> >
> > But I don't think we really know where we want to take this thing in the
> > long run. I think everybody wants a better bug-tracking system, but
> > whether something that makes people satisfied can even be built is open.
> > It sure doesn't seem to exist right now ;)
>
> I know you hate bugzilla ... but at least I can try to make that bit
> of the process work better.
>
> The new version just rolled out does have a simple "regression" checkbox
> (and you can search on it), which will hopefully help people keep track
> of the ones already in bugzilla more easily.
>
> Thanks to Jon T, Dave J et al. for helping to figure out methods and
> implement them.

Yes, good work, thanks a lot for it! The new interface is much better and more
useful.

Greetings,
Rafael


PS
BTW, would that be possible to create the "Hibernation/Suspend" subcategory
of "Power Management" that I asked for some time ago, please? :-)

--
"Premature optimization is the root of all evil." - Donald Knuth

2007-06-19 15:04:28

by Adrian Bunk

[permalink] [raw]
Subject: Re: This is [Re:] How to improve the quality of the kernel[?].

On Tue, Jun 19, 2007 at 04:05:12PM +0200, Oleg Verych wrote:
>...
> On Tue, Jun 19, 2007 at 02:48:55PM +0200, Adrian Bunk wrote:
> > On Tue, Jun 19, 2007 at 06:06:47AM +0200, Oleg Verych wrote:
>...
> > > When patch in sent to this PTS, your lovely
> > > checkpatch/check-whatever-crap-has-being-sent tools can be set up as
> > > gatekeepers, thus making impossible stupid errors with MIME coding,
> > > line wrapping, whatever style you've came up with now in checking
> > > sent crap.
> >
> > The -mm kernel already implements what your proposed PTS would do.
>
> Having all-in-one patchset, like -mm is easy thing to provide
> interested parties with "you know what you have -- crazy development"
>
> However [P]TS is tracking, recording, organizing tool. {1} Andrew's cron
> daemon easily can run script to check status of particular patch (cc,
> tested-by, acked-by). If patch have no TS ID, Andrew's watchdog is
> barking back to patch originator (with polite asking to send patch as:
>
> * TS as "To:" target
> * patch author as "Cc:" target, this is useful to require:
> . author can check that copy himself with text-only pager program
> (to see any MIME coding crap)
> . preventing SPAM
> * maybe somebody else in Cc or Bcc.)

Quite a big part of -mm are git trees of maintainers.
Where are they in your tool?

And I still don't think your tool would make sense.
But hey, simply try it - that's the only way for you to prove me wrong.
People said similar things about the 2.6.16 kernel or my regression
tracking, and I simply did it.

> > Plus it gives testers more or less all patches currently pending
> > inclusion into Linus' tree in one kernel they can test.
>
> Crazy development{0}. Somebody knows, that comprehensively testing
> hibernation is their thing. I don't care about it, i care about foo, bar.
> Thus i can apply for example lguest patches and implement and test new
> asm-offset replacement, *easily*. Somebody, as you know, likes new fancy
> file system, and no-way other. Let them be happy testing that thing
> *easily*. Because another fancy NO_MHz will hang their testing bench
> right after best ever speed results were recorded.

Patch dependencies and patch conflicts will be the interesting parts
when you will implement this.

E.g. new fancy filesystem patch in -mm might depend on some VFS change
that requires changes to all other filesystems.

I'm really looking forward to see how you will implement this for
something like -mm with > 1000 patches (many of them git trees that
themselves contain many different patches) without offloading all the
additional work to the kernel developers.

> > The problem are more social problems like patches Andrew has never heard
> > of before getting into Linus' tree during the merge window.
>
> Linus' watchdog, as well, asking for valid patch id, or just doesn't
> care (in similar manner Linus does :).
>
> So far no human is involved in social things. Do you agree?

No.

Forcing people to use some tool (no matter whether it's Bugzilla or
the PTS you want to implement) is 100% a social problem involving humans.

> Human power is worth and needed in particular patch discussion and
> testing under the participation (by Cc, acking, test-ok *e-mails*) of
> tracking system.

For getting people to use your tool, you will have to convince them that
using your tool will bring them real benefits.

> > >...
> > > |-*- Feature Needed -*-
> > > Addition, needed is hardware user tested have/had/used. Currently
> > > ``reportbug'' tool includes packed specific and system specific
> > > additions automaticly gathered and inserted to e-mail sent to BTS.
> > > (e.g. <http://permalink.gmane.org/gmane.linux.debian.devel.kernel/29740>)
> >
> > The problem is that most problems don't occur on one well-defined
> > kind of hardware - patches often break in exactly the areas the patch
> > author expected no problems in.
>
> I tried to test that new fancy FS, and couldn't boot because of
> yet-another ACPI crap. See theme{0}?
>
> Overall testing, like Andrew does, is doubtless brave thing, but he have
> more time after {1}, isn't it?

I doubt the placing of some Acked-By- tags in patches is really what
is killing Andrews time.

How does Andrew check the status of 1500 patches in -mm in your PTS?

And how do you implement the use case that Andrew forwards a batch of
200 patches to Linus? How does the information from your tool come into git?

But hey, write your tool and convince Andrew of it's advantages if you
don't believe me.

> > And in many cases a patch for a device driver was written due to a bug
> > report - in such cases a tester with the hardware in question is already
> > available.
>
> Tracking all possible testers (for next driver update, for example) is
> in question.

Spamming people who have some hardware with information about patches
won't bring you anything. You need people willing to test patches that
won't bring them any benefit - and if you have such people they are
usually as well willing to simply regularly test -rc kernels.

> > >...
> > > > but in the end, it needs to also be easy and painfree to the people
> > > > involved, and also make sure that any added workflow doesn't require
> > > > even *more* load and expertise on the already often overworked
> > > > maintainers..
> > >
> > > Experienced BTS users and developers. Please, correct me if i'm wrong.
> > > At least e-mail part of Debian's BTS and whole idea of it is *exactly*
> > > what is needed. Bugzilla fans, you can still use you useless pet,
> > > because Debian developers have done things, to track and e-mail states
> > > of bugs: <http://permalink.gmane.org/gmane.linux.debian.devel.kernel/29736>
> > >...
> >
> > "useless pet"?
> > Be serious.
> > How many open source projects use Bugzilla and how many use the Debian BTS?
> > And then start thinking about why the "useless pet" has so many more
> > user...
>
> I might be stupid, but i faced it. On my amd64 512M laptop i *cannot* run
> mozillka any more, for example!

Why not?

> And i don't care, because Linus said his
> opinion and i fully agree with him.

Did Linus state he would actually actively use a Debian BTS?
If not, then there's no advantage.

> > The Debian BTS requires you to either write emails with control messages
> > or generating control messages with external tools.
>
> Awesome!!! Are you wrote this reply to me by
>
> > In Bugzilla the same works through a web interface.
>
> web interface? If you did .........</dev/random dd bs=1 count=13.....
> Actually you didn't ;)

There's a difference between a discussion email and a control message in
a fixed format.

> > I know both the Debian BTS and Bugzilla and although they are quite
> > different they both are reasonable tools for their purpose.
>
> As you just might have seen here, i was talking about organizing,
> tracking, hopefully saving and redirecting useful main power. And i don't
> bother search e-mails i saw about Bugzilla's BD from many other prominent
> developers. I just know that, not from my dream or physical vacuum.
>
> Basic concept of Debian BTS is what i've discovered after many useless
> hours i spent in Bugzilla. And this is mainly because of one basic
> important thing, that nobody acknowledged (for newbies, like me):
>
> * E-Mail with useful MUAs, after it got reliable delivery MTAs with qmail
> (or exim) is the main communication toolset.
>...

How do people sell and buy goods at eBay?
eBay has a "do everything through the web interface plus notification
emails" quite similar to Bugzilla.

Or Wikis?
Or Blogs?

cu
Adrian

--

"Is there not promise of rain?" Ling Tan asked suddenly out
of the darkness. There had been need of rain for many days.
"Only a promise," Lao Er said.
Pearl S. Buck - Dragon Seed

2007-06-19 15:04:46

by Linus Torvalds

[permalink] [raw]
Subject: Re: This is [Re:] How to improve the quality of the kernel[?].



On Tue, 19 Jun 2007, Adrian Bunk wrote:
>
> The goal is to get all patches for a maintained subsystem submitted to
> Linus by the maintainer.

Well, to be honest, I've actually over the years tried to have a policy of
*never* really having black-and-white policies.

The fact is, some maintainers are excellent. All the relevant patches
*already* effectively go through them.

But at the same time, other maintainers are less than active, and some
areas aren't clearly maintained at all.

Also, being a maintainer often means that you are busy and spend a lot of
time talking to *people* - it doesn't necessarily mean that you actually
have the hardware and can test things, nor does it necessarily mean that
you know every detail.

So I point out in Documentation/ManagementStyle (which is written very
much tongue-in-cheek, but at the same time it's really *true*) that
maintainership is often about recognizing people who just know *better*
than you!

> The -mm kernel already implements what your proposed PTS would do.
>
> Plus it gives testers more or less all patches currently pending
> inclusion into Linus' tree in one kernel they can test.
>
> The problem are more social problems like patches Andrew has never heard
> of before getting into Linus' tree during the merge window.

Not really. The "problem" boils down to this:

[torvalds@woody linux]$ git-rev-list --all --since=100.days.ago | wc -l
7147
[torvalds@woody linux]$ git-rev-list --no-merges --all --since=100.days.ago | wc -l
6768

ie over the last hundred days, we have averaged over 70 changes per day,
and even ignoring merges and only looking at "pure patches" we have more
than an average of 65 patches per day. Every day. Day in and day out.

That translates to five hundred commits a week, two _thousand_ commits per
month, and 25 thousand commits per year. As a fairly constant stream.

Will mistakes happen? Hell *yes*.

And I'd argue that any flow that tries to "guarantee" that mistakes don't
happen is broken. It's a sure-fire way to just frustrate people, simply
because it assumes a level of perfection in maintainers and developers
that isn't possible.

The accepted industry standard for bug counts is basically one bug per a
thousand lines of code. And that's for released, *debugged* code.

Yes, we should aim higher. Obviously. Let's say that we aim for 0.1 bugs
per KLOC, and that we actually aim for that not just in _released_ code,
but in patches.

What does that mean?

Do the math:

git log -M -p --all --since=100.days.ago | grep '^+' | wc -l

That basically takes the last one hundred days of development, shows it
all as patches, and just counts the "new" lines. It takes about ten
seconds to run, and returns 517252 for me right now.

That's *over*half*a*million* lines added or changed!

And even with the expectation that we do ten times better than what is
often quoted as an industry average, and even with the expectation that
this is already fully debugged code, that's at least 50 bugs in the last
one hundred days.

Yeah, we can be even more stringent, and actually subtract the number of
lines _removed_ (274930), and assume that only *new* code contains bugs,
and that's still just under a quarter million purely *added* lines, and
maybe we'd expect just new 24 bugs in the last 100 days.

[ Argument: some of the old code also contained bugs, so the lines added
to replace it balance out. Counter-argument: new code is less well
tested by *definition* than old code, so.. Counter-counter-argument: the
new code was often added to _fix_ a bug, so the code removed had an even
_higher_ bug rate than normal code..

End result? We don't know. This is all just food for thought. ]

So here's the deal: even by the most *stringent* reasonable rules, we add
a new bug every four days. That's just something that people need to
accept. The people who say "we must never introduce a regression" aren't
living on planet earth, they are living in some wonderful world of
Blarney, where mistakes don't happen, developers are perfect, hardware is
perfect, and maintainers always catch things.

> The problem is that most problems don't occur on one well-defined
> kind of hardware - patches often break in exactly the areas the patch
> author expected no problems in.

Note that the industry-standard 1-bug-per-kloc thing has nothing to do
with hardware. Somebody earlier in this thread (or one of the related
ones) said that "git bisect is only valid for bugs that happen due to
hardware issues", which is just totally *ludicrous*.

Yes, hardware makes it harder to test, but even *without* any hardware-
specific issues, bugs happen. The developer just didn't happen to trigger
the condition, or didn't happen to notice it when he *did* trigger it.

So don't go overboard about "hardware". Yes, hardware-specific issues have
their own set of problems, and yes, drivers have a much higher incidence
of bugs per KLOC, but in the end, even *without* that, you'd still have to
face the music. Even for stuff that isn't drivers.

So this whole *notion* that you can get it right the first time is
*insane*.

We should aim for doing well, yes.

But quite frankly, anybody who aims for "perfect" without taking reality
into account is just not realistic. And if that's part of the goal of some
"new process", then I'm not even interested in listening to people discuss
it.

If this plan cannot take reality into account, please stop Cc'ing me. I'm
simply not interested.

Any process that tries to "guarantee" that regressions don't happen is
crap. Any process that tries to "guarantee" that we release only kernels
without bugs can go screw itself. There's one thing I _can_ guarantee, and
that's as long as we add a quarter million new lines per 100 days (and
change another quarter million lines), we will have new bugs.

No ifs, buts or maybe's about it.

The process should aim for making them *fewer*. But any process that aims
for total eradication of new bugs will result in one thing, and one thign
only: we won't be getting any actual work done.

The only way to guarantee no regressions is to make no progress.

Linus

2007-06-19 15:08:34

by Stefan Richter

[permalink] [raw]
Subject: Re: This is [Re:] How to improve the quality of the kernel[?].

Oleg Verych wrote:
> On Tue, Jun 19, 2007 at 02:48:55PM +0200, Adrian Bunk wrote:
>> The -mm kernel already implements what your proposed PTS would do.
...
>> Plus it gives testers more or less all patches currently pending
>> inclusion into Linus' tree in one kernel they can test.
>
> Crazy development{0}. Somebody knows, that comprehensively testing
> hibernation is their thing. I don't care about it, i care about foo, bar.
> Thus i can apply for example lguest patches and implement and test new
> asm-offset replacement, *easily*.

That's right. But the production of subsystem test patchkits is
volunteer work which will be hard to unify.

I'm not saying it's impossible to reach some degree of organized
production of test patchkits; after all we already have some
standardization regarding patch submission which is volunteer work too.
--
Stefan Richter
-=====-=-=== -==- =--==
http://arcgraph.de/sr/

2007-06-19 15:35:04

by Oleg Verych

[permalink] [raw]
Subject: Re: This is [Re:] How to improve the quality of the kernel[?].

On Tue, Jun 19, 2007 at 04:27:15PM +0200, Stefan Richter wrote:
> On 6/19/2007 4:05 PM, Oleg Verych wrote:
> > On Tue, Jun 19, 2007 at 02:48:55PM +0200, Adrian Bunk wrote:
> >> The Debian BTS requires you to either write emails with control messages
> >> or generating control messages with external tools.
> ...
> >> In Bugzilla the same works through a web interface.
> ...
> > Basic concept of Debian BTS is what i've discovered after many useless
> > hours i spent in Bugzilla. And this is mainly because of one basic
> > important thing, that nobody acknowledged (for newbies, like me):
> >
> > * E-Mail with useful MUAs, after it got reliable delivery MTAs with qmail
> > (or exim) is the main communication toolset.
> >
> > Can't you see that from Linux's patch sending policy?
>
> That's for developers, not for users.
>
> There are different people involved in
> - patch handling,
> - bug handling (bugs are reported by end-users),
> therefore don't forget that PTS and BTS have different requirements.

Sure. But if tracking was done, possible bugs where killed, user's bug
report seems to depend on that patch (bisecting), why not to have a
linkage here? Usefulness for a developer (in sub-system association),
next time to see what went wrong, check test-cases, users might be
interested to have them run too before crying (again) about broken
system. Bug report can become part of (reopened) patch discussion (as
i've wrote). Until that, as bug-candidate without identified patch it
can be associated to some particular sub-system or abstract one
bug-category {1}.

Reversed time. As "do-bisection" shows, problems are not happening
just simply because of something abstract. If problem worth of solving
it, eventually there will be patch trying solve that, in both cases:

* when breaking patch (bisection) actually correct, but hardware
(or similar independent) problem arise.
* something different, like feature request or something.

So, this guys are candidate for patch, and can have ID numerically from
the same domain as patch ID, but with different prefix, like "i'm just
candidate for patch". Bugs {1}, are obviously in this category.

Current identification of problems and patch association
have completely zero level of tracking or automation, while Bugzilla is
believed by somebody to have positive efficiency in bug tracking.

That two (patch/bug tracking) aren't that perpendicular to each other at
all.

Eventually it might be that perfect unification, that bug-tracking can be
obsolete, because of good tracking of patches/features-added and what
they did/do.

In any case, i would like to ask mentors to write at least something
similar to technical task, if that, what i'm saying is accessible for
you. Because your experience is treasure, that must be preserved and
possibly automated/organized.
____

2007-06-19 16:40:36

by Oleg Verych

[permalink] [raw]
Subject: Re: This is [Re:] How to improve the quality of the kernel[?].

Linus,

On Tue, Jun 19, 2007 at 08:01:19AM -0700, Linus Torvalds wrote:
>
>
> On Tue, 19 Jun 2007, Adrian Bunk wrote:
> >
> > The goal is to get all patches for a maintained subsystem submitted to
> > Linus by the maintainer.

Nice quote. I'm trying to make proposition/convince Adrian, who is in
opposition, but whole thread gets just like obeying his extreme POV...

> But quite frankly, anybody who aims for "perfect" without taking reality
> into account is just not realistic. And if that's part of the goal of some
> "new process", then I'm not even interested in listening to people discuss
> it.

I'm proposing kind of smart tracking, summarized before. I'm not an
idealist, doing manual work. Making tools -- is what i've picked up from
one of your mails. Thus hope of having more opinions on that.

> If this plan cannot take reality into account, please stop Cc'ing me. I'm
> simply not interested.

This one is last at least from me. Sorry for taking you time.
____

2007-06-19 17:02:19

by Oleg Verych

[permalink] [raw]
Subject: Re: This is [Re:] How to improve the quality of the kernel[?].

* Date: Tue, 19 Jun 2007 17:08:13 +0200
>
>> Crazy development{0}. Somebody knows, that comprehensively testing
>> hibernation is their thing. I don't care about it, i care about foo, bar.
>> Thus i can apply for example lguest patches and implement and test new
>> asm-offset replacement, *easily*.
>
> That's right. But the production of subsystem test patchkits is
> volunteer work which will be hard to unify.
>
> I'm not saying it's impossible to reach some degree of organized
> production of test patchkits; after all we already have some
> standardization regarding patch submission which is volunteer work too.

But still there's no one opinion about against what tree to base the
patch. For somebody it's Linus's mainline, for somebody it's bleeding
edge -mm. And there will be no one.

Thus, particular patch entry might have as -mm as Linus's re-based
versions or (as Adrian noted) VFS.asof02-07-2007 FANCYFS. For example,
Rusty did that, after somebody asked him to have not only -mm lguest
version. So, for really intrusive feature/patch (and not
in-middle-development, Adrian) author can have a version (with git
branch, patch directory or something).

Counter-example: Scheduler patches are extraordinary with large
threads or replies, but that is (one of) classical release-early and
often. Proposed bureaucracy doesn't apply ;)
____

2007-06-19 17:06:51

by Linus Torvalds

[permalink] [raw]
Subject: Re: This is [Re:] How to improve the quality of the kernel[?].



On Tue, 19 Jun 2007, Oleg Verych wrote:
>
> I'm proposing kind of smart tracking, summarized before. I'm not an
> idealist, doing manual work. Making tools -- is what i've picked up from
> one of your mails. Thus hope of having more opinions on that.

Don't get me wrong, I wasn't actually responing to you personally, I was
actually responding mostly to the tone of this thread.

So I was responding to things like the example from Bartlomiej about
missed opportunity for taking developer review into account (and btw, I
think a little public shaming might not be a bad idea - I believe more in
*social* rules than in *technical* rules), and I'm responding to some of
the commentary by Adrian and others about "no regressions *ever*".

These are things we can *wish* for. But the fact that we migth wish for
them doesn't actually mean that they are really good ideas to aim for in
practice.

Let me put it another way: a few weeks ago there was this big news story
in the New York Times about how "forgetting" is a very essential part
about remembering, and people passed this around as if it was a big
revelation. People think that people with good memories have a "good
thing".

And personally, I was like "Duh".

Good memory is not about remembering everything. Good memory is about
forgetting the irrelevant, so that the important stuff stands out and you
*can* remember it. But the big deal is that yes, you have to forget stuff,
and that means that you *will* miss details - but you'll hopefully miss
the stuff you don't care for. The keyword being "hopefully". It works most
of the time, but we all know we've sometimes been able to forget a detail
that turned out to be crucial after all.

So the *stupid* response to that is "we should remember everything". It
misses the point. Yes, we sometimes forget even important details, but
it's *so* important to forget details, that the fact that our brains
occasionally forget things we later ended up needing is still *much*
preferable to trying to remember everything.

The same tends to be true of bug hunting, and regression tracking.

There's a lot of "noise" there. We'll never get perfect, and I'll argue
that if we don't have a system that tries to actively *remove* noise,
we'll just be overwhelmed. But that _inevitably_ means that sometimes
there was actually a signal in the noise that we ended up removing,
because nobody saw it as anything but noise.

So I think people should concentrate on turning "noise" into "clear
signal", but at the same time remember that that inevitably is a "lossy"
transformation, and just accept the fact that it will mean that we
occasionally make "mistakes".

This is why I've been advocating bugzilla "forget" stuff, for example. I
tend to see bugzilla as a place where noise accumulates, rather than a
place where noise is made into a signal.

Which gets my to the real issue I have: the notion of having a process for
_tracking_ all the information is actually totally counter-productive, if
a big part of the process isn't also about throwing noise away.

We don't want to "save" all the crud. I don't want "smart tracking" to
keep track of everything. I want "smart forgetting", so that we are only
left with the major signal - the stuff that matters.

Linus

2007-06-19 17:27:24

by Martin Bligh

[permalink] [raw]
Subject: Re: regression tracking (Re: Linux 2.6.21)


> Yes, good work, thanks a lot for it! The new interface is much better and more
> useful.
>
> Greetings,
> Rafael
>
>
> PS
> BTW, would that be possible to create the "Hibernation/Suspend" subcategory
> of "Power Management" that I asked for some time ago, please? :-)
>

Oops. Sorry. Done.

M.

2007-06-19 17:37:36

by Natalie Protasevich

[permalink] [raw]
Subject: Re: This is [Re:] How to improve the quality of the kernel[?].

On 6/19/07, Linus Torvalds <[email protected]> wrote:
>
>
> On Tue, 19 Jun 2007, Oleg Verych wrote:
> >
> > I'm proposing kind of smart tracking, summarized before. I'm not an
> > idealist, doing manual work. Making tools -- is what i've picked up from
> > one of your mails. Thus hope of having more opinions on that.
>
> Don't get me wrong, I wasn't actually responing to you personally, I was
> actually responding mostly to the tone of this thread.
>
> So I was responding to things like the example from Bartlomiej about
> missed opportunity for taking developer review into account (and btw, I
> think a little public shaming might not be a bad idea - I believe more in
> *social* rules than in *technical* rules), and I'm responding to some of
> the commentary by Adrian and others about "no regressions *ever*".
>
> These are things we can *wish* for. But the fact that we migth wish for
> them doesn't actually mean that they are really good ideas to aim for in
> practice.
>
> Let me put it another way: a few weeks ago there was this big news story
> in the New York Times about how "forgetting" is a very essential part
> about remembering, and people passed this around as if it was a big
> revelation. People think that people with good memories have a "good
> thing".
>
> And personally, I was like "Duh".
>
> Good memory is not about remembering everything. Good memory is about
> forgetting the irrelevant, so that the important stuff stands out and you
> *can* remember it. But the big deal is that yes, you have to forget stuff,
> and that means that you *will* miss details - but you'll hopefully miss
> the stuff you don't care for. The keyword being "hopefully". It works most
> of the time, but we all know we've sometimes been able to forget a detail
> that turned out to be crucial after all.
>
> So the *stupid* response to that is "we should remember everything". It
> misses the point. Yes, we sometimes forget even important details, but
> it's *so* important to forget details, that the fact that our brains
> occasionally forget things we later ended up needing is still *much*
> preferable to trying to remember everything.
>
> The same tends to be true of bug hunting, and regression tracking.
>
> There's a lot of "noise" there. We'll never get perfect, and I'll argue
> that if we don't have a system that tries to actively *remove* noise,
> we'll just be overwhelmed. But that _inevitably_ means that sometimes
> there was actually a signal in the noise that we ended up removing,
> because nobody saw it as anything but noise.
>
> So I think people should concentrate on turning "noise" into "clear
> signal", but at the same time remember that that inevitably is a "lossy"
> transformation, and just accept the fact that it will mean that we
> occasionally make "mistakes".

This is the most crucial point so far in my opinion.
Well, not only people who report bugs are smart - they are curious,
enthusiastic, and passionate about their system, and job, hobby -
whatever linux means to them. They often do own investigations, give
lots of detail, and often others jump in with "me too" and give even
more detail (and more noise) But real detail that would help in bug
assessment is not there, and needs to be requested in lengthy
exchanges (time wise, since every request takes hours, days,
months...)
I think would help to make some attempt to lead them on to giving out
what's important. Cold and impersonal upfront fields and drop-down
menus are taking a lot of noise and heat off the actual report.
Another observation - things like "me too" should be encouraged to
become separate reports because generally only maintainer and people
who work directly on the module can sort out if this is same problem,
and in fact real problems get lost and not accounted for when getting
in wrong buckets this way.
--Natalie
>
> This is why I've been advocating bugzilla "forget" stuff, for example. I
> tend to see bugzilla as a place where noise accumulates, rather than a
> place where noise is made into a signal.
>
> Which gets my to the real issue I have: the notion of having a process for
> _tracking_ all the information is actually totally counter-productive, if
> a big part of the process isn't also about throwing noise away.
>
> We don't want to "save" all the crud. I don't want "smart tracking" to
> keep track of everything. I want "smart forgetting", so that we are only
> left with the major signal - the stuff that matters.
>
> Linus
>

2007-06-19 17:38:55

by Oleg Verych

[permalink] [raw]
Subject: Re: This is [Re:] How to improve the quality of the kernel[?].

* Date: Tue, 19 Jun 2007 10:04:58 -0700 (PDT)
>
> On Tue, 19 Jun 2007, Oleg Verych wrote:
>>
>> I'm proposing kind of smart tracking, summarized before. I'm not an
>> idealist, doing manual work. Making tools -- is what i've picked up from
>> one of your mails. Thus hope of having more opinions on that.
>
> Don't get me wrong, I wasn't actually responing to you personally, I was
> actually responding mostly to the tone of this thread.

By reading only known persons[1]? Fine, it is OK.

But i hope, i did useful statements. In fact, noise reduction stuff WRT
bug reports was before in my analysis of Adrian's POV here (reportbug
tool). Also it showed again, when i've wrote about traces, where testers
(bug reporters) can find test cases, before they will cry (again) about
some issues. I see this, example is bugzilla @ mozilla -- known history.

[1] Noise filtering -- that's obvious for me, after all :)

By not flaming further, i'm just going to try to implement something.
Hopefully my next patch will be usefully smart tracked.

Thanks!
____

2007-06-19 17:51:28

by Stefan Richter

[permalink] [raw]
Subject: Re: This is [Re:] How to improve the quality of the kernel[?].

Oleg Verych wrote:
> On Tue, Jun 19, 2007 at 04:27:15PM +0200, Stefan Richter wrote:
>> There are different people involved in
>> - patch handling,
>> - bug handling (bugs are reported by end-users),
>> therefore don't forget that PTS and BTS have different requirements.
>
> Sure. But if tracking was done, possible bugs where killed, user's bug
> report seems to depend on that patch (bisecting), why not to have a
> linkage here?

Of course there are certain links between bugs and patches, and thus
there are certain links between bug tracking and patch tracking.

[...]
> Current identification of problems and patch association
> have completely zero level of tracking or automation, while Bugzilla is
> believed by somebody to have positive efficiency in bug tracking.

I, as maintainer of a small subsystem, can personally track bug--patch
relationships with bugzilla just fine, on its near-zero level of
automation and integration.

Nevertheless, would a more integrated bug/patch tracking system help me
improve quality of my output? ---
a) Would it save me more time than it costs me to fit into the system
(time that can be invested in actual debugging)?
This can only be answered after trying it.
b) Would it help me to spot mistakes in patches before I submit?
No.
c) Would I get quicker feedback from testers?
That depends on whether such a system attracts testers and helps
testers to work efficiently. This is also something that can only be
speculated about without trying it.

The potential testers that I deal with are mostly either very
non-technical persons, or persons which are experienced in their
hardware/application area but *not* in kernel internals and kernel
development procedures.
--
Stefan Richter
-=====-=-=== -==- =--==
http://arcgraph.de/sr/

2007-06-19 18:44:27

by Oleg Verych

[permalink] [raw]
Subject: Re: This is [Re:] How to improve the quality of the kernel[?].


* Date: Tue, 19 Jun 2007 19:50:48 +0200
>
> [...]
>> Current identification of problems and patch association
>> have completely zero level of tracking or automation, while Bugzilla is
>> believed by somebody to have positive efficiency in bug tracking.
>
> I, as maintainer of a small subsystem, can personally track bug--patch
> relationships with bugzilla just fine, on its near-zero level of
> automation and integration.
>
> Nevertheless, would a more integrated bug/patch tracking system help me
> improve quality of my output? ---
> a) Would it save me more time than it costs me to fit into the system
> (time that can be invested in actual debugging)?
> This can only be answered after trying it.

I'm not a wizard, if i will answer now: "No." [1:]

[1:] Your User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.8.1.4) Gecko/20070509 SeaMonkey/1.1.2

> b) Would it help me to spot mistakes in patches before I submit?
> No.

If you ever tried to report bug with reportbug tool in Debian, you may
understand what i meant. Nothing can substitute intelligence. Something
can reduce impact of laziness (of searching relevant bugreports).

> c) Would I get quicker feedback from testers?
> That depends on whether such a system attracts testers and helps
> testers to work efficiently. This is also something that can only be
> speculated about without trying it.
>
> The potential testers that I deal with are mostly either very
> non-technical persons, or persons which are experienced in their
> hardware/application area but *not* in kernel internals and kernel
> development procedures.

They also don't bother subscribing to mailing lists and like to write
blogs. I'm not sure about hw databases you talked about, i will talk
about gathering information from testers.

Debian have experimental and unstable branches, people willing to have
new stuff are likely to have this, and not testing or stable. BTS just
collects bugreports <http://bugs.debian.org/>. If kernel team uploads new
kernel (release or even rc recently), interested people will use it after
next upgrade. Bug reports get collected, but main answer will be, try
reproduce on most recent kernel.org's one. Here, what i have proposed,
may play role you expect. Mis-configuration/malfunctioning, programmer's
error (Linus noted) in organized manner may easily join reporting person
to kernel.org's testing. On driver or small sub-system level this may
work. Again it's all about information, not intelligence.
____

2007-06-19 19:23:32

by Stefan Richter

[permalink] [raw]
Subject: Re: This is [Re:] How to improve the quality of the kernel[?].

Oleg Verych wrote:
[I wrote]
>> a) Would it save me more time than it costs me to fit into the system
>> (time that can be invested in actual debugging)?
>> This can only be answered after trying it.
>
> I'm not a wizard, if i will answer now: "No." [1:]
>
> [1:] Your User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.8.1.4) Gecko/20070509 SeaMonkey/1.1.2

Seamonkey isn't interoperable with Debian's BTS?
Lucky me that I frequently use other MUAs too.
--
Stefan Richter
-=====-=-=== -==- =--==
http://arcgraph.de/sr/

2007-06-20 21:34:17

by Adrian Bunk

[permalink] [raw]
Subject: Re: How to improve the quality of the kernel?

On Mon, Jun 18, 2007 at 06:55:47AM +0300, Al Boldi wrote:
> Michal Piotrowski wrote:
> > On 18/06/07, Al Boldi <[email protected]> wrote:
> > > Bartlomiej Zolnierkiewicz wrote:
> > > > On Sunday 17 June 2007, Andrew Morton wrote:
> > > > > We of course do want to minimise the amount of overhead for each
> > > > > developer. I'm a strong believer in specialisation: rather than
> > > > > requiring that *every* developer/maintainer integrate new steps in
> > > > > their processes it would be better to allow them to proceed in a
> > > > > close-to-usual fashion and to provide for a specialist person (or
> > > > > team) to do the sorts of things which you're thinking about.
> > > >
> > > > Makes sense... however we need to educate each and every developer
> > > > about importance of the code review and proper recognition of
> > > > reviewers.
> > >
> > > That's as easy to manage as is currently done with rc-regressions.
> >
> > Are you a volunteer?
>
> Probably not, as this task requires a real PRO!
>...

That's wrong.

We are talking about _tracking_.

I'm not sure whether it makes much sense, and it would cost an enormous
amount of time, but tracking patches should be possible without any
knowledge of the kernel.

> Thanks!
> Al

cu
Adrian

--

"Is there not promise of rain?" Ling Tan asked suddenly out
of the darkness. There had been need of rain for many days.
"Only a promise," Lao Er said.
Pearl S. Buck - Dragon Seed

2007-06-21 03:27:40

by Al Boldi

[permalink] [raw]
Subject: Re: How to improve the quality of the kernel?

Adrian Bunk wrote:
> On Mon, Jun 18, 2007 at 06:55:47AM +0300, Al Boldi wrote:
> > Michal Piotrowski wrote:
> > > On 18/06/07, Al Boldi <[email protected]> wrote:
> > > > Bartlomiej Zolnierkiewicz wrote:
> > > > > On Sunday 17 June 2007, Andrew Morton wrote:
> > > > > > We of course do want to minimise the amount of overhead for each
> > > > > > developer. I'm a strong believer in specialisation: rather than
> > > > > > requiring that *every* developer/maintainer integrate new steps
> > > > > > in their processes it would be better to allow them to proceed
> > > > > > in a close-to-usual fashion and to provide for a specialist
> > > > > > person (or team) to do the sorts of things which you're thinking
> > > > > > about.
> > > > >
> > > > > Makes sense... however we need to educate each and every developer
> > > > > about importance of the code review and proper recognition of
> > > > > reviewers.
> > > >
> > > > That's as easy to manage as is currently done with rc-regressions.
> > >
> > > Are you a volunteer?
> >
> > Probably not, as this task requires a real PRO!
> >...
>
> That's wrong.
>
> We are talking about _tracking_.
>
> I'm not sure whether it makes much sense, and it would cost an enormous
> amount of time, but tracking patches should be possible without any
> knowledge of the kernel.

If that's really true, which I can't imagine, then the proper way forward
would probably involve a fully automated system.


Thanks!

--
Al

2007-06-21 13:07:29

by Adrian Bunk

[permalink] [raw]
Subject: Re: How to improve the quality of the kernel?

On Thu, Jun 21, 2007 at 06:26:20AM +0300, Al Boldi wrote:
> Adrian Bunk wrote:
> > On Mon, Jun 18, 2007 at 06:55:47AM +0300, Al Boldi wrote:
> > > Michal Piotrowski wrote:
> > > > On 18/06/07, Al Boldi <[email protected]> wrote:
> > > > > Bartlomiej Zolnierkiewicz wrote:
> > > > > > On Sunday 17 June 2007, Andrew Morton wrote:
> > > > > > > We of course do want to minimise the amount of overhead for each
> > > > > > > developer. I'm a strong believer in specialisation: rather than
> > > > > > > requiring that *every* developer/maintainer integrate new steps
> > > > > > > in their processes it would be better to allow them to proceed
> > > > > > > in a close-to-usual fashion and to provide for a specialist
> > > > > > > person (or team) to do the sorts of things which you're thinking
> > > > > > > about.
> > > > > >
> > > > > > Makes sense... however we need to educate each and every developer
> > > > > > about importance of the code review and proper recognition of
> > > > > > reviewers.
> > > > >
> > > > > That's as easy to manage as is currently done with rc-regressions.
> > > >
> > > > Are you a volunteer?
> > >
> > > Probably not, as this task requires a real PRO!
> > >...
> >
> > That's wrong.
> >
> > We are talking about _tracking_.
> >
> > I'm not sure whether it makes much sense, and it would cost an enormous
> > amount of time, but tracking patches should be possible without any
> > knowledge of the kernel.
>
> If that's really true, which I can't imagine, then the proper way forward
> would probably involve a fully automated system.

If you consider any kind of patch tracking valuable, you should either
do it yourself or write the tool yourself. In both cases, the
interesting parts would be how to integrate it into the workflow of
kernel development without creating extra work for anyone and how to get
the information into the got commits.

"requires a real PRO" and "would probably involve" sound like cheap
phrases for avoiding doing any work yourself.

Talk is cheap, but unless YOU will do it your emails will only be a
waste of bandwidth.

> Thanks!
> Al

cu
Adrian

--

"Is there not promise of rain?" Ling Tan asked suddenly out
of the darkness. There had been need of rain for many days.
"Only a promise," Lao Er said.
Pearl S. Buck - Dragon Seed

2007-06-21 13:42:18

by Al Boldi

[permalink] [raw]
Subject: Re: How to improve the quality of the kernel?

Adrian Bunk wrote:
> On Thu, Jun 21, 2007 at 06:26:20AM +0300, Al Boldi wrote:
> > Adrian Bunk wrote:
> > > We are talking about _tracking_.
> > >
> > > I'm not sure whether it makes much sense, and it would cost an
> > > enormous amount of time, but tracking patches should be possible
> > > without any knowledge of the kernel.
> >
> > If that's really true, which I can't imagine, then the proper way
> > forward would probably involve a fully automated system.
>
> If you consider any kind of patch tracking valuable, you should either
> do it yourself or write the tool yourself. In both cases, the
> interesting parts would be how to integrate it into the workflow of
> kernel development without creating extra work for anyone and how to get
> the information into the got commits.

Integration is the easy part, really. Just filter all the patches from the
mailing list into a patch-bin, then sort, categorize, and prioritize them,
responding with a validation status to all parties involved.

And after that comes the Tracking part.

> "requires a real PRO" and "would probably involve" sound like cheap
> phrases for avoiding doing any work yourself.

I have learned from this list that premature involvement is
counterproductive.

> Talk is cheap, but unless YOU will do it your emails will only be a
> waste of bandwidth.

Thanks, and good luck with involving people with this kind of response!

--
Al

2007-06-21 13:57:36

by Adrian Bunk

[permalink] [raw]
Subject: Re: How to improve the quality of the kernel?

On Thu, Jun 21, 2007 at 04:41:28PM +0300, Al Boldi wrote:
> Adrian Bunk wrote:
> > On Thu, Jun 21, 2007 at 06:26:20AM +0300, Al Boldi wrote:
> > > Adrian Bunk wrote:
> > > > We are talking about _tracking_.
> > > >
> > > > I'm not sure whether it makes much sense, and it would cost an
> > > > enormous amount of time, but tracking patches should be possible
> > > > without any knowledge of the kernel.
> > >
> > > If that's really true, which I can't imagine, then the proper way
> > > forward would probably involve a fully automated system.
> >
> > If you consider any kind of patch tracking valuable, you should either
> > do it yourself or write the tool yourself. In both cases, the
> > interesting parts would be how to integrate it into the workflow of
> > kernel development without creating extra work for anyone and how to get
> > the information into the got commits.
>
> Integration is the easy part, really. Just filter all the patches from the
> mailing list into a patch-bin, then sort, categorize, and prioritize them,
> responding with a validation status to all parties involved.
>
> And after that comes the Tracking part.

Tracking shouldn't be much more than seeing what different threads are
about the same patch and then do more or less the same as what you
called "the easy part".

> > "requires a real PRO" and "would probably involve" sound like cheap
> > phrases for avoiding doing any work yourself.
>
> I have learned from this list that premature involvement is
> counterproductive.
>
> > Talk is cheap, but unless YOU will do it your emails will only be a
> > waste of bandwidth.
>
> Thanks, and good luck with involving people with this kind of response!

It's simply how kernel development works - not by talking but by doing
the work.

Many people thought long-term maintainance for 2.6.16 wouldn't make sense.
And I didn't start long discussions whether we need regression tracking -
I simply did it.

These are things that simply happened because I thought they were
important - and because I got my ass up to do them myself.

Don't expect anyone to jump up to do it only because of your talk.
YOU must offer something, and it will work if it's then accepted by
people.

If you think what you have in mind is both doable and important just do it.
You will find out where the problems lie yourself.
You might be able to prove me and all other people who think it would
not work wrong.
You might fail, e.g. because people will not adopt whatever you have in
mind because they don't like it for some reason, but that's part of how
development works, and you'll never know unless you try it.

> Al

cu
Adrian

--

"Is there not promise of rain?" Ling Tan asked suddenly out
of the darkness. There had been need of rain for many days.
"Only a promise," Lao Er said.
Pearl S. Buck - Dragon Seed

2007-06-21 15:49:17

by Stefan Richter

[permalink] [raw]
Subject: Re: How to improve the quality of the kernel?

Al Boldi wrote:
> Adrian Bunk wrote:
>> On Mon, Jun 18, 2007 at 06:55:47AM +0300, Al Boldi wrote:
>> > Michal Piotrowski wrote:
>> > > On 18/06/07, Al Boldi <[email protected]> wrote:
>> > > > Bartlomiej Zolnierkiewicz wrote:

[on the tracking of review status of patches]

>> > > > > however we need to educate each and every developer
>> > > > > about importance of the code review and proper recognition of
>> > > > > reviewers.
>> > > >
>> > > > That's as easy to manage as is currently done with rc-regressions.
>> > >
>> > > Are you a volunteer?
>> >
>> > Probably not, as this task requires a real PRO!
>> >...
>>
>> That's wrong.
>>
>> We are talking about _tracking_.
>>
>> I'm not sure whether it makes much sense, and it would cost an enormous
>> amount of time, but tracking patches should be possible without any
>> knowledge of the kernel.

I suspect you are...

> If that's really true, which I can't imagine, then the proper way forward
> would probably involve a fully automated system.

...both wrong --- because patches have varying requirements WRT review
and testing.

What you discuss here under the label "patch tracking" blends into, how
shall I call it, "patch handling" as done by maintainers. Neither a
layman nor an automaton is able to
1. measure required vs. accomplished review and testing of a patch,
2. recruit reviewers and testers.

And IMO *these* two are the points where we typically fail. We
occasionally underestimate the required amount of review and testing,
but more importantly, we are chronically short of reviewers and
partially of testers. (Hmm, I think Adrian and one or another guy
already said as much.)

A "Reviewed-by" tag in a patch is not a simple hard fact. Neither a
layman nor an automaton can draw appropriate conclusions from it. That
doesn't mean I'm against such tags, on the contrary. They may help us
to (a) look harder for review, (b) have a better picture of actual lack
of review, patch by patch, subsystem by subsystem, and (c) get more
volunteer reviewers by emphasizing the merits of code review. Alas,
experience and broad knowledge in kernel development are certainly
prerequisites to become a good reviewer, so don't get high hopes that
reviewers will suddenly come in droves when we appropriately credit
their work.
--
Stefan Richter
-=====-=-=== -==- =-=-=
http://arcgraph.de/sr/

2007-06-21 21:34:08

by Al Boldi

[permalink] [raw]
Subject: Re: How to improve the quality of the kernel?

Adrian Bunk wrote:
> On Thu, Jun 21, 2007 at 04:41:28PM +0300, Al Boldi wrote:
> > Adrian Bunk wrote:
> > > Talk is cheap, but unless YOU will do it your emails will only be a
> > > waste of bandwidth.
> >
> > Thanks, and good luck with involving people with this kind of response!
>
> It's simply how kernel development works - not by talking but by doing
> the work.

This sounds like a brute-force approach, akin to hacking.

I think it's much more productive to analyze the problem and then design a
solution accordingly.

> Many people thought long-term maintainance for 2.6.16 wouldn't make sense.
> And I didn't start long discussions whether we need regression tracking -
> I simply did it.

Maybe because you are a PRO.

> These are things that simply happened because I thought they were
> important - and because I got my ass up to do them myself.
>
> Don't expect anyone to jump up to do it only because of your talk.
> YOU must offer something, and it will work if it's then accepted by
> people.
>
> If you think what you have in mind is both doable and important just do
> it. You will find out where the problems lie yourself.
> You might be able to prove me and all other people who think it would
> not work wrong.
> You might fail, e.g. because people will not adopt whatever you have in
> mind because they don't like it for some reason, but that's part of how
> development works, and you'll never know unless you try it.


Thanks!

--
Al

2007-06-21 23:48:48

by Adrian Bunk

[permalink] [raw]
Subject: Re: This is [Re:] How to improve the quality of the kernel[?].

On Tue, Jun 19, 2007 at 08:01:19AM -0700, Linus Torvalds wrote:
> On Tue, 19 Jun 2007, Adrian Bunk wrote:
>...
> > The -mm kernel already implements what your proposed PTS would do.
> >
> > Plus it gives testers more or less all patches currently pending
> > inclusion into Linus' tree in one kernel they can test.
> >
> > The problem are more social problems like patches Andrew has never heard
> > of before getting into Linus' tree during the merge window.
>
> Not really. The "problem" boils down to this:
>
> [torvalds@woody linux]$ git-rev-list --all --since=100.days.ago | wc -l
> 7147
> [torvalds@woody linux]$ git-rev-list --no-merges --all --since=100.days.ago | wc -l
> 6768
>
> ie over the last hundred days, we have averaged over 70 changes per day,
> and even ignoring merges and only looking at "pure patches" we have more
> than an average of 65 patches per day. Every day. Day in and day out.
>
> That translates to five hundred commits a week, two _thousand_ commits per
> month, and 25 thousand commits per year. As a fairly constant stream.
>
> Will mistakes happen? Hell *yes*.
>
> And I'd argue that any flow that tries to "guarantee" that mistakes don't
> happen is broken. It's a sure-fire way to just frustrate people, simply
> because it assumes a level of perfection in maintainers and developers
> that isn't possible.
>
> The accepted industry standard for bug counts is basically one bug per a
> thousand lines of code. And that's for released, *debugged* code.
>
> Yes, we should aim higher. Obviously. Let's say that we aim for 0.1 bugs
> per KLOC, and that we actually aim for that not just in _released_ code,
> but in patches.
>
> What does that mean?
>
> Do the math:
>
> git log -M -p --all --since=100.days.ago | grep '^+' | wc -l
>
> That basically takes the last one hundred days of development, shows it
> all as patches, and just counts the "new" lines. It takes about ten
> seconds to run, and returns 517252 for me right now.
>
> That's *over*half*a*million* lines added or changed!
>
> And even with the expectation that we do ten times better than what is
> often quoted as an industry average, and even with the expectation that
> this is already fully debugged code, that's at least 50 bugs in the last
> one hundred days.
>
> Yeah, we can be even more stringent, and actually subtract the number of
> lines _removed_ (274930), and assume that only *new* code contains bugs,
> and that's still just under a quarter million purely *added* lines, and
> maybe we'd expect just new 24 bugs in the last 100 days.
>
> [ Argument: some of the old code also contained bugs, so the lines added
> to replace it balance out. Counter-argument: new code is less well
> tested by *definition* than old code, so.. Counter-counter-argument: the
> new code was often added to _fix_ a bug, so the code removed had an even
> _higher_ bug rate than normal code..
>
> End result? We don't know. This is all just food for thought. ]
>
> So here's the deal: even by the most *stringent* reasonable rules, we add
> a new bug every four days. That's just something that people need to
> accept. The people who say "we must never introduce a regression" aren't
> living on planet earth, they are living in some wonderful world of
> Blarney, where mistakes don't happen, developers are perfect, hardware is
> perfect, and maintainers always catch things.
>...

Exactly: We cannot get a regression free or even bug free kernel.
But we could handle the reported regressions (or even the reported bugs)
better than we do.

Lesson #6:
Get the data.

Some real life numbers from 2.6.21 development:
- 80 days between 2.6.20 and 2.6.21
- 98 post-2.6.20 regression have been reported before 2.6.21 was released
- 15 open post-2.6.20 regression reports at the time of the 2.6.21 release
- 8 open post-2.6.20 regression reports at the time of the 2.6.21 release
that were reported at least 3 weeks before the 2.6.21 release

This:
- only includes regressions with reasonably usable reports [1] and
- confirmed to be regressions and
- reported by the relatively small number (compared to the complete
number of Linux users) of -rc testers and
- reported before the release of 2.6.21.

We weren't even able to handle all reported recent regressions in
2.6.21, and for other bugs our numbers won't be better.

When Dave Jones says that for a kernel for a new RHEL release that is
based on a "stable" upstream kernel they spend 3 months only for shaking
out bugs in the kernel that's IMHO a good description of our "stable"
kernels.

I'm not claiming the kernel could become bug-free, but aiming at being
able to handle all incoming bug reports is IMHO a worthwhile and not
completely unrealistic goal with benefits for all Linux users (and the
overall image of Linux).

Currently, we are light years away from this goal.

> Linus

cu
Adrian

[1] submitter has given all information requested

--

"Is there not promise of rain?" Ling Tan asked suddenly out
of the darkness. There had been need of rain for many days.
"Only a promise," Lao Er said.
Pearl S. Buck - Dragon Seed

2007-06-21 23:51:13

by Adrian Bunk

[permalink] [raw]
Subject: Re: This is [Re:] How to improve the quality of the kernel[?].

On Tue, Jun 19, 2007 at 10:04:58AM -0700, Linus Torvalds wrote:
>...
> This is why I've been advocating bugzilla "forget" stuff, for example. I
> tend to see bugzilla as a place where noise accumulates, rather than a
> place where noise is made into a signal.
>
> Which gets my to the real issue I have: the notion of having a process for
> _tracking_ all the information is actually totally counter-productive, if
> a big part of the process isn't also about throwing noise away.
>
> We don't want to "save" all the crud. I don't want "smart tracking" to
> keep track of everything. I want "smart forgetting", so that we are only
> left with the major signal - the stuff that matters.

Even generating the perfect signal is a complete waste of time if
there's no recipient for the signal...

> Linus

cu
Adrian

--

"Is there not promise of rain?" Ling Tan asked suddenly out
of the darkness. There had been need of rain for many days.
"Only a promise," Lao Er said.
Pearl S. Buck - Dragon Seed

2007-06-22 00:01:37

by Linus Torvalds

[permalink] [raw]
Subject: Re: This is [Re:] How to improve the quality of the kernel[?].



On Fri, 22 Jun 2007, Adrian Bunk wrote:
> On Tue, Jun 19, 2007 at 10:04:58AM -0700, Linus Torvalds wrote:
> >...
> > This is why I've been advocating bugzilla "forget" stuff, for example. I
> > tend to see bugzilla as a place where noise accumulates, rather than a
> > place where noise is made into a signal.
> >
> > Which gets my to the real issue I have: the notion of having a process for
> > _tracking_ all the information is actually totally counter-productive, if
> > a big part of the process isn't also about throwing noise away.
> >
> > We don't want to "save" all the crud. I don't want "smart tracking" to
> > keep track of everything. I want "smart forgetting", so that we are only
> > left with the major signal - the stuff that matters.
>
> Even generating the perfect signal is a complete waste of time if
> there's no recipient for the signal...

My argument is that *if* we had "more signal, less noise", we'd probably
get more people looking at it.

In fact, I guarantee that's the case. You may not be 100% happy with the
regression list, but every single maintainer/developer I've talked to has
said they appreciated it and it made it easier (and thus more likely) for
them to actually look at what the outstanding issues were.

Linus

2007-06-22 00:16:25

by Adrian Bunk

[permalink] [raw]
Subject: Re: This is [Re:] How to improve the quality of the kernel[?].

On Thu, Jun 21, 2007 at 04:59:39PM -0700, Linus Torvalds wrote:
>
>
> On Fri, 22 Jun 2007, Adrian Bunk wrote:
> > On Tue, Jun 19, 2007 at 10:04:58AM -0700, Linus Torvalds wrote:
> > >...
> > > This is why I've been advocating bugzilla "forget" stuff, for example. I
> > > tend to see bugzilla as a place where noise accumulates, rather than a
> > > place where noise is made into a signal.
> > >
> > > Which gets my to the real issue I have: the notion of having a process for
> > > _tracking_ all the information is actually totally counter-productive, if
> > > a big part of the process isn't also about throwing noise away.
> > >
> > > We don't want to "save" all the crud. I don't want "smart tracking" to
> > > keep track of everything. I want "smart forgetting", so that we are only
> > > left with the major signal - the stuff that matters.
> >
> > Even generating the perfect signal is a complete waste of time if
> > there's no recipient for the signal...
>
> My argument is that *if* we had "more signal, less noise", we'd probably
> get more people looking at it.
>
> In fact, I guarantee that's the case. You may not be 100% happy with the
> regression list, but every single maintainer/developer I've talked to has
> said they appreciated it and it made it easier (and thus more likely) for
> them to actually look at what the outstanding issues were.


The problems are the parts of the kernel without maintainer or with a
maintainer who is for whatever reason not able to look after bug
reports.

And you often need someone with a good knowledge of a specific area of
the kernel for getting a bug fixed.


Let me make an example:

During 2.6.16-rc, I reported a bug (not a regression) in CIFS where I
had reproducible during big writes to a Samba server after some 100 MBs
(not a fixed amount of data, but 100% reproducible when transferring 1 GB)
a complete freeze of my computer (no SysRq possible). And there is
nothing more I (or any other submitter) could have given as information -
in fact it even took me several days to isolate CIFS as the source of
these freezes.

Steve French and Dave Kleikamp told me to try some mount option.

With this option, I got an Oops instead of a freeze.

After they fixed the Oops, it turned out the patch also fixed the
freeze. The patch went into 2.6.16, and it was therefore fixed
in 2.6.16.

That's one important value of maintainers.

In many other parts of the kernel, my bug report wouldn't have had any
effect.


We need more maintaners who look after bugs - but where to find them,
they don't seem to grow on trees?


> Linus

cu
Adrian

--

"Is there not promise of rain?" Ling Tan asked suddenly out
of the darkness. There had been need of rain for many days.
"Only a promise," Lao Er said.
Pearl S. Buck - Dragon Seed

2007-06-22 11:24:26

by Adrian Bunk

[permalink] [raw]
Subject: Re: How to improve the quality of the kernel?

On Fri, Jun 22, 2007 at 12:33:13AM +0300, Al Boldi wrote:
> Adrian Bunk wrote:
> > On Thu, Jun 21, 2007 at 04:41:28PM +0300, Al Boldi wrote:
> > > Adrian Bunk wrote:
> > > > Talk is cheap, but unless YOU will do it your emails will only be a
> > > > waste of bandwidth.
> > >
> > > Thanks, and good luck with involving people with this kind of response!
> >
> > It's simply how kernel development works - not by talking but by doing
> > the work.
>
> This sounds like a brute-force approach, akin to hacking.
>
> I think it's much more productive to analyze the problem and then design a
> solution accordingly.

Sure, that's part of creating your solution.

But when you've analyzed the problem and designed a solution, YOU have
to implement it.

> > Many people thought long-term maintainance for 2.6.16 wouldn't make sense.
> > And I didn't start long discussions whether we need regression tracking -
> > I simply did it.
>
> Maybe because you are a PRO.
>...

No, simply because I got my ass up.

> Thanks!
> Al

cu
Adrian

--

"Is there not promise of rain?" Ling Tan asked suddenly out
of the darkness. There had been need of rain for many days.
"Only a promise," Lao Er said.
Pearl S. Buck - Dragon Seed

2007-06-22 12:02:05

by Markus Rechberger

[permalink] [raw]
Subject: Re: How to improve the quality of the kernel?

On 6/17/07, Natalie Protasevich <[email protected]> wrote:
> On 6/17/07, Adrian Bunk <[email protected]> wrote:
> > On Sun, Jun 17, 2007 at 06:26:55PM +0200, Stefan Richter wrote:
> > > Adrian Bunk wrote:
> > > >>> And we should be aware that reverting is only a workaround for the
> real
> > > >>> problem which lies in our bug handling.
> > > >> ...
> > > >
> > > > And this is something I want to emphasize again.
> > > >
> > > > How can we make any progress with the real problem and not only the
> > > > symptoms?
> > > ...
> > >
> > > Perhaps make lists of
> > >
> > > - bug reports which never lead to any debug activity
> > > (no responsible person/team was found, or a seemingly person/team
> > > did not start to debug the report)
> > >
> > > - known regressions on release,
> > >
> > > - regressions that became known after release,
> > >
> > > - subsystems with notable backlogs of old bugs,
> > >
> > > - other categories?
> > >
> > > Select typical cases from each categories, analyze what went wrong in
> > > these cases, and try to identify practicable countermeasures.
> >
> > No maintainer or no maintainer who is debugging bug reports is the
> > major problem in all parts of your list.
> >
> > > Another approach: Figure out areas where quality is exemplary and try
> > > to draw conclusions for areas where quality is lacking.
> >
> > ieee1394 has a maintainer who is looking after all bug reports he gets.
> >
> > Conclusion: We need such maintainers for all parts of the kernel.
> >
>
> I noticed some areas are well maintained because there is an awesome
> maintainer, or good and well coordinated team - and this is mostly in
> the "fun" areas ;) But there are "boring" areas that are about to be
> deprecated or no new development expected etc. It will be hard to get
> a dedicated person to take care of such. How about having people on
> rotation, or jury duty so to speak - for a period of time (completely
> voluntary!) Nice stats on the report about contributions in non-native
> areas for a developer would be great accomplishment and also good
> chance to look into other things! Besides, this way "old parts" will
> get attention to be be revised and re-implemented sooner. And we can
> post "Temp maintainer needed" list...
>

I'd vote for that, I've seen alot very bad code already within some
subsystems and critical problems which just have been ignored by some
maintainers.
It mostly helps if some volunteers read through existing code and
state out their considerations about implementations which they don't
like.

I just grep'ed some examples I noticed (note I do not want to jump
onto someone's toe here, just give some examples):

(sn9c102_ov7660.c)
...
err += sn9c102_i2c_write(cam, 0x12, 0x80);
err += sn9c102_i2c_write(cam, 0x11, 0x09);
err += sn9c102_i2c_write(cam, 0x00, 0x0A);
err += sn9c102_i2c_write(cam, 0x01, 0x80);
err += sn9c102_i2c_write(cam, 0x02, 0x80);
err += sn9c102_i2c_write(cam, 0x03, 0x00);
... (around 150 lines directly after each other doing such writes and
adding error values to a variable, I don't understand why someone
should add the errors but continue with sending 150 more updates, how
about one write failed but others succeeded for any reason)

(tvp5150.c)
static int tvp5150_read(struct i2c_client *c, unsigned char addr)
{
unsigned char buffer[1];
int rc;

buffer[0] = addr;
if (1 != (rc = i2c_master_send(c, buffer, 1)))
tvp5150_dbg(0, "i2c i/o error: rc == %d (should be 1)\n", rc);

msleep(10);

if (1 != (rc = i2c_master_recv(c, buffer, 1)))
tvp5150_dbg(0, "i2c i/o error: rc == %d (should be 1)\n", rc);

tvp5150_dbg(2, "tvp5150: read 0x%02x = 0x%02x\n", addr, buffer[0]);

return (buffer[0]);
}

(i2c issues within some driver)
/* This code detects calls by card attach_inform */
if (NULL == t->i2c.dev.driver) {
tuner_dbg ("tuner 0x%02x: called during i2c_client
register by adapter's attach_inform\n", c->addr);

return;
}
... that code doesn't even work anymore since the i2c.dev.driver is
always initialized.

just reading through it and cleaning up some code isn't so difficult
and can be done by many people - if they're allowed/wanted to do so.

Markus

2007-06-22 14:19:43

by Stefan Richter

[permalink] [raw]
Subject: Re: How to improve the quality of the kernel?

Markus Rechberger wrote:
> just reading through it and cleaning up some code isn't so difficult
> and can be done by many people -

Doing cleanups is a good way to get into the matter, to become able to
eventually fix bugs of the difficult type.

> if they're allowed/wanted to do so.

Everybody is allowed to submit. But there is a certain degree of both
persistence and adaptability required to get one's first submissions
upstream. However, these qualities are also required to fix difficult bugs.
--
Stefan Richter
-=====-=-=== -==- =-==-
http://arcgraph.de/sr/

2007-06-22 15:13:08

by Oleg Verych

[permalink] [raw]
Subject: Re: How to improve the quality of the kernel?

On Fri, Jun 22, 2007 at 04:19:34PM +0200, Stefan Richter wrote:
> Markus Rechberger wrote:
> > just reading through it and cleaning up some code isn't so difficult
> > and can be done by many people -
>
> Doing cleanups is a good way to get into the matter, to become able to
> eventually fix bugs of the difficult type.
>
> > if they're allowed/wanted to do so.
>
> Everybody is allowed to submit. But there is a certain degree of both
> persistence and adaptability required to get one's first submissions
> upstream. However, these qualities are also required to fix difficult bugs.

Deja-kernel.

Just two messages:
<http://permalink.gmane.org/gmane.linux.debian.devel.general/116453>
<http://permalink.gmane.org/gmane.linux.debian.devel.general/116463>

Tell me, i'm wrong, if similar thing cannot be implemented here.
Again, key word is _tracking_ system...

Just trying attract attention, that time of ignorance and manual work
must be ended. There must be new time, time of *tracking*, *counting*
opinions and any kernel work anybody want to contribute. I just got bored
after repeatings like, not funny work, code, etc. The manager, who will
do that not funny work is automated tracking system. Based on e-mail,
with additional tools, like

* ``reportbug''-- reporting (imroved REPORTING-BUGS,
EVERY-WORK-IS-APPRECIATED-THANK-YOU)

* ``bts''-- command line interface, etc.

I want to change it, and i will try to work on that. Important thing is
-- to be in the corner *alone*, even with good, open source example system
as Debian BTS is not gonna work.

WRT this, opinions and doings of people in this thread, who spend in
Linux development much more time, than i, just counter productive (fine,
fine but i have a right to have different, wrong opinion on that :).

--
Frenzy
-o--=O`C
#oo'L O
<___=E M