2006-11-08 22:09:12

by Jesper Juhl

[permalink] [raw]
Subject: A proposal; making 2.6.20 a bugfix only version.

Greetings,

I have a suggestion. Why don't we make 2.6.20 a "bug fixes only" kernel version?

I think it would be a good idea to dedicate just one kernel cycle to
pure bug fixes and cleanups. Why? I'll tell you :)

We keep merging new features and destabilizing things all the time,
and while that's more or less just the new 2.6 model working (and
working well) it does have some problems.

There's no shortage of issues that need fixing, but since we keep
merging new stuff, a lot of bugfixing energy gets spend on the new
cool stuff instead of fixing up any other issues we have.
Also, regressions seem to show up with every new kernel version, and
while they usually get fixed it's not always so (Adrian's list of
known regressions seems to help though).

So, what are all these bugs I'm talking about? Well, lets see ...

Coverity has, as of this writing, identified 728 issues in the current
kernel. Sure, some of those have already been identified as false or
ignorable issues, but many are flagged as actual bugs and still more
are as yet uninspected.

The kernel bugzilla has many many entries that are real bugs, some
even have patches.

Many bugreports are made to LKML weekly and while some of the issues
get picked up and fixed, many also get lost.
(many patches also get posted and subsequently ignored - which is a shame).

Building current kernels show up tons of warnings (and sometimes
errors) that should be investigated/fixed - some of them are real
bugs.

The kernel janitors have a long list of issues that need to be
investigated and cleaned up/fixed.

Adrian Bunk has his list of known regressions and, I'll bet, also some
patches in the trivial queue for small issues.

There are many bug fixes in -mm and other trees that we ought to
dedicate some time to merging.

There are many parts of the kernel that are not documented.

I'm sure most distributions have a bunch of bug fixing patches lying
about that they could push.


What I'm trying to say is that, maybe we should resist the temptation
to merge new cool features for just a single kernel cycle and instead
dedicate it to fixing as many of our known issues as possible - we
have plenty...

Let's dedicate a cycle to bug fixing only.
Trivial bug fixes, involved bug fixes, new docs, fixes to existing
docs, obviously correct cleanups - all OK.
What's not OK is stuff that introduce new functionality/features, adds
support for new hardware (unless trivial such as just adding a new PCI
ID), breaks currently working behaviour, etc.


There are a few other reasons, besides the many lists of known bugs,
that inspired me to make this suggestion, a few are listed below.

- I've personally felt a greater and greater need to test kernel.org
kernels recently before putting them into production use, both at home
and at work. In my subjective oppinion, quality of releases seems to
be a lot more uncertain than it used to. Can't put my finger on when
this started to happen, just a subjective feeling over time (as well
as seeing my home box and servers at work have problems with new
kernels more often than they used to).

- A while back, akpm made some statements about being worried that the
2.6 kernel is getting buggier
(http://news.zdnet.com/2100-3513_22-6069363.html).

- The need for the -stable tree and the (relatively large) number of
-stable releases between each new major release clearly shows that we
are leaving lots of regressions in our wake.


In the long term I think it might be a good idea to do something like
this every once in a while (perhaps every .20, .30, .40 etc), we'll
see if that makes sense, but doing it at least once won't do any harm
(except delaying new features a few months).. Let's try it.

Let's make a public statement that 2.6.20 will be a "bug fixes and
stabilization only" release.
Let us invite all distributions to submit their internal bugfixes.
Let us encourage people to work on known issues instead of new stuff
for just this one release (there are enough bugs to choose from that
there should be something worthwhile to do for both newbie and
experienced hacker alike).
Let us comb the mailing list archives and dig up all the lost bug fix patches.
Let us get all pending bug fix patches from the various trees merged,
but just the fixes.
Let us encourage everyone to postpone new stuff to 2.6.21 and re-base
it on top of the 2.6.20 -rc kernels.

What do you say - could it hurt?
I think it would do us a lot of good.

Fixing bugs makes users happy.
Fixing bugs provides a more stable base going forward.
Fixing bugs inspires confidence in the product we provide.


--
Jesper Juhl <[email protected]>
Don't top-post http://www.catb.org/~esr/jargon/html/T/top-post.html
Plain text mails only, please http://www.expita.com/nomime.html


2006-11-08 22:22:17

by Arjan van de Ven

[permalink] [raw]
Subject: Re: A proposal; making 2.6.20 a bugfix only version.


> There's no shortage of issues that need fixing, but since we keep
> merging new stuff, a lot of bugfixing energy gets spend on the new
> cool stuff instead of fixing up any other issues we have.

but if you do this you just end up with a bigger backlog so that the
next one will even be more unstable due to a extreme high change rate.


> Coverity has, as of this writing, identified 728 issues in the current
> kernel. Sure, some of those have already been identified as false or
> ignorable issues, but many are flagged as actual bugs and still more
> are as yet uninspected.

most are mostly false. And the rest is getting looked at. What's the
problem?

> Adrian Bunk has his list of known regressions and, I'll bet, also some
> patches in the trivial queue for small issues.

and all this fixing is happening AS WELL as new features. What makes you
think suddenly even more fixing will happen?

> There are many parts of the kernel that are not documented.

this is where the OSDL Documentation Person will help a lot; a full time
person.



> I'm sure most distributions have a bunch of bug fixing patches lying
> about that they could push.

I doubt it; most have gotten real good at avoiding getting a huge patch
backlog since that is just incredibly expensive ;)

> - A while back, akpm made some statements about being worried that the
> 2.6 kernel is getting buggier
> (http://news.zdnet.com/2100-3513_22-6069363.html).

and at this years Kernel Summit actual data and general consensus showed
this was unfounded fear; the bugrates are more or less stable, but with
many more users.

>
> - The need for the -stable tree and the (relatively large) number of
> -stable releases between each new major release clearly shows that we
> are leaving lots of regressions in our wake.

No it shows that bugs are getting fixed and delivered to you
IMMEDIATELY. Many many of the -stable things fixed are not in new
things. Is there anything in the -stable process that is not working for
you?




2006-11-08 22:40:32

by Jesper Juhl

[permalink] [raw]
Subject: Re: A proposal; making 2.6.20 a bugfix only version.

On 08/11/06, Arjan van de Ven <[email protected]> wrote:
>
> > There's no shortage of issues that need fixing, but since we keep
> > merging new stuff, a lot of bugfixing energy gets spend on the new
> > cool stuff instead of fixing up any other issues we have.
>
> but if you do this you just end up with a bigger backlog so that the
> next one will even be more unstable due to a extreme high change rate.
>
Only if people continue to work on new stuff during the "bug fixing only" cycle.
If we manage to get everyone focused on bug fixing only for the entire
cycle the backlog won't be growing (much).

>
> > Coverity has, as of this writing, identified 728 issues in the current
> > kernel. Sure, some of those have already been identified as false or
> > ignorable issues, but many are flagged as actual bugs and still more
> > are as yet uninspected.
>
> most are mostly false. And the rest is getting looked at. What's the
> problem?
>
Yes, MANY are false, and I know the rest are getting worked at, I work
on some myself when time permits.
I mentioned it simply as an indicator (one amongst many) that we have
a lot of known unfixed issues.


> > Adrian Bunk has his list of known regressions and, I'll bet, also some
> > patches in the trivial queue for small issues.
>
> and all this fixing is happening AS WELL as new features. What makes you
> think suddenly even more fixing will happen?
>
My point was "get people to suspend their work on new features and
focus entirely on bugfixes for a single cycle", so we get more
manpower working on fixing all those known issues we have before we
move on with new features.


> > There are many parts of the kernel that are not documented.
>
> this is where the OSDL Documentation Person will help a lot; a full time
> person.
>
True. I'm looking forward to that.


>
> > I'm sure most distributions have a bunch of bug fixing patches lying
> > about that they could push.
>
> I doubt it; most have gotten real good at avoiding getting a huge patch
> backlog since that is just incredibly expensive ;)
>
Ok, maybe I was wrong there.


> > - A while back, akpm made some statements about being worried that the
> > 2.6 kernel is getting buggier
> > (http://news.zdnet.com/2100-3513_22-6069363.html).
>
> and at this years Kernel Summit actual data and general consensus showed
> this was unfounded fear; the bugrates are more or less stable, but with
> many more users.
>
Ok, I may be on thin ice here, but that contradicts my personal
experience. I see a lot of very nice improvements in recent 2.6
kernels, but I also see a greater need for careful testing before
deploying on the systems I'm responsible for - I feel that I'm running
into more "whoops that kernel wasn't quite fully baked" situations
recently (and yes, I do try to report and/or fix those issues when I
encounter them).


> >
> > - The need for the -stable tree and the (relatively large) number of
> > -stable releases between each new major release clearly shows that we
> > are leaving lots of regressions in our wake.
>
> No it shows that bugs are getting fixed and delivered to you
> IMMEDIATELY. Many many of the -stable things fixed are not in new
> things. Is there anything in the -stable process that is not working for
> you?
>
Let me make one very clear statement first: -stabel is a GREAT think
and it is working VERY well.
That being said, many of the fixes I see going into -stable are
regression fixes. Maybe not the majority, but still, regression fixes
going into -stable tells me that the kernel should have seen more
testing/bugfixing before being declared a stable release.

All I'm trying to say is that we have a number of known bugs, a number
of known regressions, a number of known inefficiencies. Maybe, just
maybe, we should focus a little more on that and a little less on new
features, new hardware support etc. Not permanently - overall I think
the 2.6 model is working great - but just for a single kernel release
cycle every now and then. And why not just try it once as an
experiment? We'll never know if it's a good idea if we don't try it at
least once.


--
Jesper Juhl <[email protected]>
Don't top-post http://www.catb.org/~esr/jargon/html/T/top-post.html
Plain text mails only, please http://www.expita.com/nomime.html

2006-11-08 22:55:21

by Andrew Morton

[permalink] [raw]
Subject: Re: A proposal; making 2.6.20 a bugfix only version.

On Wed, 08 Nov 2006 23:22:11 +0100
Arjan van de Ven <[email protected]> wrote:

> > - A while back, akpm made some statements about being worried that the
> > 2.6 kernel is getting buggier
> > (http://news.zdnet.com/2100-3513_22-6069363.html).
>
> and at this years Kernel Summit actual data

Not true. 70% of surveyed users had hit a new kernel bug. Of those bugs,
30% remained unresolved. I don't know what our quality targets are, but I
suggest they're a little higher than that.

> and general consensus showed
> this was unfounded fear;

There you finger the problem.

2006-11-08 23:05:33

by Andreas Mohr

[permalink] [raw]
Subject: Re: A proposal; making 2.6.20 a bugfix only version.

Hi,

On Wed, Nov 08, 2006 at 11:40:27PM +0100, Jesper Juhl wrote:
> Let me make one very clear statement first: -stabel is a GREAT think
> and it is working VERY well.
> That being said, many of the fixes I see going into -stable are
> regression fixes. Maybe not the majority, but still, regression fixes
> going into -stable tells me that the kernel should have seen more
> testing/bugfixing before being declared a stable release.

Nice theory, but of course I'm pretty sure that it wouldn't work
(as has been said numerous time before by other people).

You cannot do endless testing/bugfixing, it's a psychological issue.
If you do that, then you end up with -preXX (or worse, -preXXX)
version numbers, which would cause too many people to wait and wait
and wait with upgrading until "that next stable" kernel version
finally becomes available.
IOW, your tester base erodes, awfully, and development progress stalls.

You *have* to release a new ""stable"" version rather fast (the .0 one)
so that people will have that "new shiny version, get it while it's hot!"
feeling and will realize rather quickly that that new version
is all crap again after all and report their unhappiness.
That will lead to lots of -stable bug fixes which will then result in
a very stable actual version once you reach x.y.z.15 or so.

Capito? :)

(well, that's at least how I'm seeing it; correct me if I'm wrong)

Andreas Mohr

2006-11-08 23:28:24

by Diego Calleja

[permalink] [raw]
Subject: Re: A proposal; making 2.6.20 a bugfix only version.

El Wed, 08 Nov 2006 23:22:11 +0100,
Arjan van de Ven <[email protected]> escribi?:

> > There are many parts of the kernel that are not documented.
>
> this is where the OSDL Documentation Person will help a lot; a full time
> person.

Maybe it's just me, but wouldn't be this fixed by just asking developers
to document their code? I maintain the LinuxChanges page at kernelnewbies
and very often I see things merged with zero documentation that I can't
understand even trying to understand the code and I need some googling.
For example, in 2.6.19 there're several "UTS namespace" patches that I
just don't really know exactly what they do...

One of the biggest problems I see when looking at Documentation/ (I
tried to update and fix the sysctl documentation; someone probably feed
me some drugs) is that out-of-code documentation that tries to explain
what the code does, like sysctls, just gets outdated (and that's if the
feature is lucky enought to get documented :)

The "in-code" documentation using kernel-doc seems to incite developers
to document their code and update it. I think that it should be possible
to document things like sysctls or sysfs. Sysfs really needs something
like that, there's a lot of things in sysfs that aren't documented at all
and the few ones that are documented in Documentation/ are documented
in separated files that _will_ get outdated just like sysctls did. Not
that a "documentation guy" is a bad idea, but I think that getting the
developers envolved in the documentation process would be a better first
step :)

2006-11-09 00:00:29

by Jan Engelhardt

[permalink] [raw]
Subject: Re: A proposal; making 2.6.20 a bugfix only version.

>
>You *have* to release a new ""stable"" version rather fast (the .0 one)
>so that people will have that "new shiny version, get it while it's hot!"
>feeling and will realize rather quickly that that new version
>is all crap again after all and report their unhappiness.

Or they just silently revert to a kernel that worked.


-`J'
--

2006-11-09 04:55:04

by Al Boldi

[permalink] [raw]
Subject: Re: A proposal; making 2.6.20 a bugfix only version.

Andreas Mohr wrote:
> On Wed, Nov 08, 2006 at 11:40:27PM +0100, Jesper Juhl wrote:
> > Let me make one very clear statement first: -stabel is a GREAT think
> > and it is working VERY well.
> > That being said, many of the fixes I see going into -stable are
> > regression fixes. Maybe not the majority, but still, regression fixes
> > going into -stable tells me that the kernel should have seen more
> > testing/bugfixing before being declared a stable release.
>
> Nice theory, but of course I'm pretty sure that it wouldn't work

Agreed.

> (as has been said numerous time before by other people).
>
> You cannot do endless testing/bugfixing, it's a psychological issue.

Agreed.

> If you do that, then you end up with -preXX (or worse, -preXXX)
> version numbers, which would cause too many people to wait and wait
> and wait with upgrading until "that next stable" kernel version
> finally becomes available.
> IOW, your tester base erodes, awfully, and development progress stalls.

IMHO, the psycho-problem is that you cannot intertwine development and stable
in the same cycle. In that respect, the 2.6 development cycle is a real
flop, as it does not allow for focus.

And focus is needed to achieve stability.

Think catch22...


Thanks!

--
Al

2006-11-09 06:48:26

by Arjan van de Ven

[permalink] [raw]
Subject: Re: A proposal; making 2.6.20 a bugfix only version.

On Thu, 2006-11-09 at 00:28 +0100, Diego Calleja wrote:
> El Wed, 08 Nov 2006 23:22:11 +0100,
> Arjan van de Ven <[email protected]> escribió:
>
> > > There are many parts of the kernel that are not documented.
> >
> > this is where the OSDL Documentation Person will help a lot; a full time
> > person.
>
> Maybe it's just me, but wouldn't be this fixed by just asking developers
> to document their code?

it's a matter of skills. Someone can be awesome at coding a feature but
his english and writing skills may be waaaaay down there.

--
if you want to mail me at work (you don't), use arjan (at) linux.intel.com
Test the interaction between Linux and your BIOS via http://www.linuxfirmwarekit.org

2006-11-09 09:26:45

by Arjan van de Ven

[permalink] [raw]
Subject: Re: A proposal; making 2.6.20 a bugfix only version.

On Wed, 2006-11-08 at 14:51 -0800, Andrew Morton wrote:
> On Wed, 08 Nov 2006 23:22:11 +0100
> Arjan van de Ven <[email protected]> wrote:
>
> > > - A while back, akpm made some statements about being worried that the
> > > 2.6 kernel is getting buggier
> > > (http://news.zdnet.com/2100-3513_22-6069363.html).
> >
> > and at this years Kernel Summit actual data
>
> Not true. 70% of surveyed users had hit a new kernel bug.

70% of surveyed users hit ANY kernel bug. Not "new bugs"
Including "my new wizzbang hardware doesn't work" and "I'll try
something new, oh looky a 4 year old bug" and "this new feature isn't
quite mature yet now that I try it".

One of the things that happened was during early 2.6 udev broke left and
right ABI wise. We've gotten a lot better at that, and that's the kind
of bug that hits a really wide audience.

Statistics can be misleading ... bigtime.
83% of the people also said things were not getting less reliable in
2.6.



--
if you want to mail me at work (you don't), use arjan (at) linux.intel.com
Test the interaction between Linux and your BIOS via http://www.linuxfirmwarekit.org

2006-11-09 09:40:09

by Andrew Morton

[permalink] [raw]
Subject: Re: A proposal; making 2.6.20 a bugfix only version.

On Thu, 09 Nov 2006 10:26:41 +0100
Arjan van de Ven <[email protected]> wrote:

> On Wed, 2006-11-08 at 14:51 -0800, Andrew Morton wrote:
> > On Wed, 08 Nov 2006 23:22:11 +0100
> > Arjan van de Ven <[email protected]> wrote:
> >
> > > > - A while back, akpm made some statements about being worried that the
> > > > 2.6 kernel is getting buggier
> > > > (http://news.zdnet.com/2100-3513_22-6069363.html).
> > >
> > > and at this years Kernel Summit actual data
> >
> > Not true. 70% of surveyed users had hit a new kernel bug.

<funny, I could have sworn I had some additional text in here. Where'd it go?>

> 70% of surveyed users hit ANY kernel bug. Not "new bugs"
> Including "my new wizzbang hardware doesn't work" and "I'll try
> something new, oh looky a 4 year old bug" and "this new feature isn't
> quite mature yet now that I try it".
>
> One of the things that happened was during early 2.6 udev broke left and
> right ABI wise. We've gotten a lot better at that, and that's the kind
> of bug that hits a really wide audience.
>
> Statistics can be misleading ... bigtime.
> 83% of the people also said things were not getting less reliable in
> 2.6.
>

70% hit a bug
1/7th think it's deteriorating
1/4th think lkml response is inadequate
3/5ths think bugzilla response is inadequate
2/5ths think we have features-vs-stability wrong
2/3rds hit a bug. Of those, 1/3rd remain unfixed
1/5th of users are presently impacted by a kernel bug

Happy with that?

2006-11-09 09:52:06

by Arjan van de Ven

[permalink] [raw]
Subject: Re: A proposal; making 2.6.20 a bugfix only version.


>
> 70% hit a bug

this part I consider meaningless personally. If it was "70% hit a
regression" or even x% hit a regression I would be a lot more worried.

> 1/7th think it's deteriorating
> 1/4th think lkml response is inadequate
> 3/5ths think bugzilla response is inadequate
> 2/5ths think we have features-vs-stability wrong
after lots of press.
> 2/3rds hit a bug. Of those, 1/3rd remain unfixed
> 1/5th of users are presently impacted by a kernel bug
>
> Happy with that?

I'm not saying things are perfect. Far from that.
What I care about is if things are getting worse or not. My personal
impression is that while things were flakey on the ABI front during
early 2.6 (before 2.6.12 or so), that got fixed because every single bug
is a major annoyance to a large group of people. (and most bugs in the
survey were from before that).

The counter argument to your "doom" data is that bugrates for acpi for
example have been mostly steady, while the number of users has been
increasing quite a bit.

I don't have the impression things are getting worse personally. I do
hit bugs, in -mm and in -rc kernels, but that is because I'm testing
kernels intended for testing. (another thing that the 70% figure didn't
separate out)

We've gotten better. Adrian started tracking regressions, and that is
helping to make sure that those don't slip through the cracks as much as
they used to (some are unavoidable, especially performance ones or ones
with really obscure hardware that is showing hard to reproduce things).
The -stable series is working out well to fix security and other
annoying bugs quickly post release (because yes things don't get tested
fully until you release), but even -stable is not nearly getting massive
infloods of serious regressions. Sure they are fixing more stuff now,
but that's more a sign that the process is working, and that they are
now picking up less critical stuff as well, than that it is a sign that
things are getting worse.

I'd love if bug responses were better. At the same time, declaring
"bugfix only kernel" isn't going to improve that much; it just creates a
larger flood of stuff for the kernel after that. Do you have the
impression that high quality bug reports on lkml (with this I mean ones
where there is sufficient information, which are not a request for
support and where the reporter actually answers questions that are asked
him) are not getting reasonable attention?


--
if you want to mail me at work (you don't), use arjan (at) linux.intel.com
Test the interaction between Linux and your BIOS via http://www.linuxfirmwarekit.org

2006-11-09 12:46:09

by Rolf Eike Beer

[permalink] [raw]
Subject: Re: A proposal; making 2.6.20 a bugfix only version.

Arjan van de Ven wrote:
> On Thu, 2006-11-09 at 00:28 +0100, Diego Calleja wrote:
> > El Wed, 08 Nov 2006 23:22:11 +0100,
> >
> > Arjan van de Ven <[email protected]> escribió:
> > > > There are many parts of the kernel that are not documented.
> > >
> > > this is where the OSDL Documentation Person will help a lot; a full
> > > time person.
> >
> > Maybe it's just me, but wouldn't be this fixed by just asking developers
> > to document their code?
>
> it's a matter of skills. Someone can be awesome at coding a feature but
> his english and writing skills may be waaaaay down there.

Yes, that's maybe part of the problem. Nevertheless I think we should reject
every patch that adds new functions of global use (everything that might get
called from outside this module) without proper kerneldoc comments on it. At
least everything that comes with EXPORT_SYMBOl_*.

I just remember that digging out all this cdev_* stuff from inside the code
was just pain. If your new feature is _that_ cool that it has to be
immediately merged than there will be surely someone out there to help you
with the documentation if your English is a bit poor. Someone has to review
that code anyway. If you can give him hints even in bad English what is going
on it will surely help him (or her) to understand what you're doing, review
your code and write up some nice comments to make life for the next one to
touch it a _lot_ easier.

Eike


Attachments:
(No filename) (1.41 kB)
(No filename) (189.00 B)
Download all attachments

2006-11-09 17:05:34

by Stephen Hemminger

[permalink] [raw]
Subject: Re: A proposal; making 2.6.20 a bugfix only version.

On Thu, 9 Nov 2006 07:57:48 +0300
Al Boldi <[email protected]> wrote:

> Andreas Mohr wrote:
> > On Wed, Nov 08, 2006 at 11:40:27PM +0100, Jesper Juhl wrote:
> > > Let me make one very clear statement first: -stabel is a GREAT think
> > > and it is working VERY well.
> > > That being said, many of the fixes I see going into -stable are
> > > regression fixes. Maybe not the majority, but still, regression fixes
> > > going into -stable tells me that the kernel should have seen more
> > > testing/bugfixing before being declared a stable release.
> >
> > Nice theory, but of course I'm pretty sure that it wouldn't work
>
> Agreed.
>
> > (as has been said numerous time before by other people).
> >
> > You cannot do endless testing/bugfixing, it's a psychological issue.
>
> Agreed.
>
> > If you do that, then you end up with -preXX (or worse, -preXXX)
> > version numbers, which would cause too many people to wait and wait
> > and wait with upgrading until "that next stable" kernel version
> > finally becomes available.
> > IOW, your tester base erodes, awfully, and development progress stalls.
>
> IMHO, the psycho-problem is that you cannot intertwine development and stable
> in the same cycle. In that respect, the 2.6 development cycle is a real
> flop, as it does not allow for focus.
>
> And focus is needed to achieve stability.
>
> Think catch22...
>
>
> Thanks!
>
> --
> Al

There are bugfixes which are too big for stable or -rc releases, that are
queued for 2.6.20. "Bugfix only" is a relative statement. Do you include,
new hardware support, new security api's, performance fixes. It gets to
be real hard to decide, because these are the changes that often cause
regressions; often one major bug fix causes two minor bugs.

Interestingly, adding a new feature often causes no bugs in the rest
of the code, but it does increase the possible bug surface so most of
the problems related to feature X are bugs in feature X.


--
Stephen Hemminger <[email protected]>

2006-11-09 19:15:38

by Andrew Morton

[permalink] [raw]
Subject: Re: A proposal; making 2.6.20 a bugfix only version.

On Thu, 09 Nov 2006 10:52:00 +0100
Arjan van de Ven <[email protected]> wrote:

> Do you have the
> impression that high quality bug reports on lkml (with this I mean ones
> where there is sufficient information, which are not a request for
> support and where the reporter actually answers questions that are asked
> him) are not getting reasonable attention?

Yes.

And why does the report quality matter? If there's insufficient info you
just ask for more.

But we all know that and nothing's going to happen so there's really not
much point in discussing it. I have 270 saved-up-lkml-bug-reports to
process.

2006-11-09 19:22:08

by Arjan van de Ven

[permalink] [raw]
Subject: Re: A proposal; making 2.6.20 a bugfix only version.

On Thu, 2006-11-09 at 11:12 -0800, Andrew Morton wrote:
> On Thu, 09 Nov 2006 10:52:00 +0100
> Arjan van de Ven <[email protected]> wrote:
>
> > Do you have the
> > impression that high quality bug reports on lkml (with this I mean ones
> > where there is sufficient information, which are not a request for
> > support and where the reporter actually answers questions that are asked
> > him) are not getting reasonable attention?
>
> Yes.
>
> And why does the report quality matter?

because it matters where people spend their time. And if you count
bugreports that are actually distro support questions and then say "but
these aren't looked at" it's not fair either.

> If there's insufficient info you
> just ask for more.

and that does happen. And half the time people just remain silent :(
I know I look at a whole bunch of bugreports in areas that I work on. I
see a lot of other people doing something similar. That doesn't mean
nothing slips through. I'm sure stuff does slip through. I would HOPE
it's really obscure things only; but I fear it's also cases where the
reporter didn't put the right people on the CC as well ;(


> But we all know that and nothing's going to happen so there's really not
> much point in discussing it. I have 270 saved-up-lkml-bug-reports to
> process.

--
if you want to mail me at work (you don't), use arjan (at) linux.intel.com
Test the interaction between Linux and your BIOS via http://www.linuxfirmwarekit.org

2006-11-09 21:11:19

by Adrian Bunk

[permalink] [raw]
Subject: Re: A proposal; making 2.6.20 a bugfix only version.

On Thu, Nov 09, 2006 at 08:21:55PM +0100, Arjan van de Ven wrote:
> On Thu, 2006-11-09 at 11:12 -0800, Andrew Morton wrote:
> > On Thu, 09 Nov 2006 10:52:00 +0100
> > Arjan van de Ven <[email protected]> wrote:
> >
> > > Do you have the
> > > impression that high quality bug reports on lkml (with this I mean ones
> > > where there is sufficient information, which are not a request for
> > > support and where the reporter actually answers questions that are asked
> > > him) are not getting reasonable attention?
> >
> > Yes.
> >
> > And why does the report quality matter?
>
> because it matters where people spend their time. And if you count
> bugreports that are actually distro support questions and then say "but
> these aren't looked at" it's not fair either.
>
> > If there's insufficient info you
> > just ask for more.
>
> and that does happen. And half the time people just remain silent :(
> I know I look at a whole bunch of bugreports in areas that I work on. I
> see a lot of other people doing something similar. That doesn't mean
> nothing slips through. I'm sure stuff does slip through. I would HOPE
> it's really obscure things only; but I fear it's also cases where the
> reporter didn't put the right people on the CC as well ;(
>...

There are bad bug reports, but not all bug reports are that bad.

What if the quality of the bug report is good and the submitter is
responsive, and there's still zero reaction?

Let's make an example:

Since the first list I sent immediately after 2.6.19-rc1 was released,
kernel Bugzilla #7255 is part of my list of 2.6.19-rc regressions but
has gotten exactly zero developer responses.

What exactly were the mistakes of the submitter resulting in noone
caring about Bugzilla #7255?

cu
Adrian

--

"Is there not promise of rain?" Ling Tan asked suddenly out
of the darkness. There had been need of rain for many days.
"Only a promise," Lao Er said.
Pearl S. Buck - Dragon Seed

2006-11-09 21:32:00

by Arjan van de Ven

[permalink] [raw]
Subject: Re: A proposal; making 2.6.20 a bugfix only version.


> What if the quality of the bug report is good and the submitter is
> responsive, and there's still zero reaction?
>
> Let's make an example:

>
> Since the first list I sent immediately after 2.6.19-rc1 was released,
> kernel Bugzilla #7255 is part of my list of 2.6.19-rc regressions but
> has gotten exactly zero developer responses.

where was the lkml mail for this?

>
> What exactly were the mistakes of the submitter resulting in noone
> caring about Bugzilla #7255?

he didn't post to lkml?


--
if you want to mail me at work (you don't), use arjan (at) linux.intel.com
Test the interaction between Linux and your BIOS via http://www.linuxfirmwarekit.org

2006-11-09 23:54:43

by Thomas Gleixner

[permalink] [raw]
Subject: Re: A proposal; making 2.6.20 a bugfix only version.

On Thu, 2006-11-09 at 22:31 +0100, Arjan van de Ven wrote:
> > Since the first list I sent immediately after 2.6.19-rc1 was released,
> > kernel Bugzilla #7255 is part of my list of 2.6.19-rc regressions but
> > has gotten exactly zero developer responses.
>
> where was the lkml mail for this?
>
> >
> > What exactly were the mistakes of the submitter resulting in noone
> > caring about Bugzilla #7255?
>
> he didn't post to lkml?

That's no excuse, as Adrian pointed it out on LKML since weeks.

Also the kernel.org bugzilla has a real flaw:

There is no way to get informed of new entries automatically and
filtered by Category and Component. At least I did not find a way and
[email protected] seems to be a black hole.

The result is that you have to go to bugzilla on a regular base instead
of getting automatic notifications of new entries. I do it once in a
while, but it is really ineffective.

tglx


2006-11-10 00:21:55

by Andrew Morton

[permalink] [raw]
Subject: Re: A proposal; making 2.6.20 a bugfix only version.

On Fri, 10 Nov 2006 00:56:58 +0100
Thomas Gleixner <[email protected]> wrote:

> On Thu, 2006-11-09 at 22:31 +0100, Arjan van de Ven wrote:
> > > Since the first list I sent immediately after 2.6.19-rc1 was released,
> > > kernel Bugzilla #7255 is part of my list of 2.6.19-rc regressions but
> > > has gotten exactly zero developer responses.
> >
> > where was the lkml mail for this?
> >
> > >
> > > What exactly were the mistakes of the submitter resulting in noone
> > > caring about Bugzilla #7255?
> >
> > he didn't post to lkml?
>
> That's no excuse, as Adrian pointed it out on LKML since weeks.
>
> Also the kernel.org bugzilla has a real flaw:
>
> There is no way to get informed of new entries automatically and
> filtered by Category and Component. At least I did not find a way and
> [email protected] seems to be a black hole.
>
> The result is that you have to go to bugzilla on a regular base instead
> of getting automatic notifications of new entries. I do it once in a
> while, but it is really ineffective.
>

I screen all bugzilla reports and I ensure that any of them which look like
they're real and which have a breathing maintainer are brought to that
maintainer's attention.

So no, I think the number of bugs in bugzilla which the relevant maintainer
didn't hear about is vanishingly small.

2006-11-10 15:16:30

by Pavel Machek

[permalink] [raw]
Subject: Re: A proposal; making 2.6.20 a bugfix only version.

Hi!

> >but if you do this you just end up with a bigger
> >backlog so that the
> >next one will even be more unstable due to a extreme
> >high change rate.
> >
> Only if people continue to work on new stuff during the
> "bug fixing only" cycle.
> If we manage to get everyone focused on bug fixing only
> for the entire
> cycle the backlog won't be growing (much).

But neither you, nor andrew, nor linus has power to stop development
like that... (nor would it be good idea)

Pavel
--
Thanks for all the (sleeping) penguins.

2006-11-10 15:49:41

by Al Boldi

[permalink] [raw]
Subject: Re: A proposal; making 2.6.20 a bugfix only version.

Stephen Hemminger wrote:
> Al Boldi <[email protected]> wrote:
> > Andreas Mohr wrote:
> > > On Wed, Nov 08, 2006 at 11:40:27PM +0100, Jesper Juhl wrote:
> > > > Let me make one very clear statement first: -stabel is a GREAT think
> > > > and it is working VERY well.
> > > > That being said, many of the fixes I see going into -stable are
> > > > regression fixes. Maybe not the majority, but still, regression
> > > > fixes going into -stable tells me that the kernel should have seen
> > > > more testing/bugfixing before being declared a stable release.
> > >
> > > Nice theory, but of course I'm pretty sure that it wouldn't work
> >
> > Agreed.
> >
> > > (as has been said numerous time before by other people).
> > >
> > > You cannot do endless testing/bugfixing, it's a psychological issue.
> >
> > Agreed.
> >
> > > If you do that, then you end up with -preXX (or worse, -preXXX)
> > > version numbers, which would cause too many people to wait and wait
> > > and wait with upgrading until "that next stable" kernel version
> > > finally becomes available.
> > > IOW, your tester base erodes, awfully, and development progress
> > > stalls.
> >
> > IMHO, the psycho-problem is that you cannot intertwine development and
> > stable in the same cycle. In that respect, the 2.6 development cycle is
> > a real flop, as it does not allow for focus.
> >
> > And focus is needed to achieve stability.
> >
> > Think catch22...
> >
> >
> > Thanks!
> >
> > --
> > Al
>
> There are bugfixes which are too big for stable or -rc releases, that are
> queued for 2.6.20. "Bugfix only" is a relative statement. Do you include,
> new hardware support, new security api's, performance fixes. It gets to
> be real hard to decide, because these are the changes that often cause
> regressions; often one major bug fix causes two minor bugs.

That's exactly the point I'm trying to get across; the 2.6 dev model tries to
be two cycles in one, dev and stable, which yields an awkward catch22
situation.

The only sane way forward in such a situation is to realize the mistake and
return to the focused dev-only / stable-only model.

This would probably involve pushing the current 2.6 kernel into 2.8 and
starting 2.9 as a dev-cycle only, once 2.8 has structurally stabilized.


Thanks!

--
Al

2006-11-10 16:16:17

by Jesper Juhl

[permalink] [raw]
Subject: Re: A proposal; making 2.6.20 a bugfix only version.

On 10/11/06, Al Boldi <[email protected]> wrote:
> Stephen Hemminger wrote:
[...]
> > There are bugfixes which are too big for stable or -rc releases, that are
> > queued for 2.6.20. "Bugfix only" is a relative statement. Do you include,
> > new hardware support, new security api's, performance fixes. It gets to
> > be real hard to decide, because these are the changes that often cause
> > regressions; often one major bug fix causes two minor bugs.
>
> That's exactly the point I'm trying to get across; the 2.6 dev model tries to
> be two cycles in one, dev and stable, which yields an awkward catch22
> situation.
>
> The only sane way forward in such a situation is to realize the mistake and
> return to the focused dev-only / stable-only model.
>
> This would probably involve pushing the current 2.6 kernel into 2.8 and
> starting 2.9 as a dev-cycle only, once 2.8 has structurally stabilized.
>

That was not what I was arguing for in the initial mail at all.
I think the 2.6 model works very well in general. All I was pushing
for was a single cycle focused mainly on bug fixes once in a while.

--
Jesper Juhl <[email protected]>
Don't top-post http://www.catb.org/~esr/jargon/html/T/top-post.html
Plain text mails only, please http://www.expita.com/nomime.html

2006-11-10 16:43:21

by Stephen Hemminger

[permalink] [raw]
Subject: Re: A proposal; making 2.6.20 a bugfix only version.

Jesper Juhl wrote:
> On 10/11/06, Al Boldi <[email protected]> wrote:
>> Stephen Hemminger wrote:
> [...]
>> > There are bugfixes which are too big for stable or -rc releases,
>> that are
>> > queued for 2.6.20. "Bugfix only" is a relative statement. Do you
>> include,
>> > new hardware support, new security api's, performance fixes. It
>> gets to
>> > be real hard to decide, because these are the changes that often cause
>> > regressions; often one major bug fix causes two minor bugs.
>>
>> That's exactly the point I'm trying to get across; the 2.6 dev model
>> tries to
>> be two cycles in one, dev and stable, which yields an awkward catch22
>> situation.
>>
>> The only sane way forward in such a situation is to realize the
>> mistake and
>> return to the focused dev-only / stable-only model.
>>
>> This would probably involve pushing the current 2.6 kernel into 2.8 and
>> starting 2.9 as a dev-cycle only, once 2.8 has structurally stabilized.
>>
>
> That was not what I was arguing for in the initial mail at all.
> I think the 2.6 model works very well in general. All I was pushing
> for was a single cycle focused mainly on bug fixes once in a while.
>
I like the current model fine. From a developer point of view:
* More branches means having to fix and retest a bug more places.
Workload goes up geometrically with number of versions.
So most developers end up ignoring fixing more than 2 versions;
anything more than -current and -stable are ignored.
* Holding off the tide of changes doesn't work. It just leads to
massive integration headaches.
* Many bugs don't show up until kernel is run on wide range of hardware,
but kernel doesn't get exposed to wide range of hardware and
applications until after it is declared stable. It is a Catch-22.
The current stability range of
-subtree ... -mm ... 2.6.X ... 2.6.X.Y... 2.6.vendor
works well for most people. The people it doesn't work for are trying
to get something for nothing. They want stability and the latest kernel
at the same time.

There are some things that do need working on:
* Old bugs die, the bugzilla database needs a 6mo prune out.

* Bugzilla.kernel.org is underutilized and is only a small sample of the
real problems. Not sure if it is a training, user, behaviour issue or
just that bugzilla is crap.

* Vendor bugs (that could be fixed) aren't forwarded to lkml or bugzilla

* LKML is an overloaded communication channel, do we need:
[email protected] ?

* Developers can't get (or afford to buy) the new hardware that causes
a lot of the pain. Just look at the number of bug reports due to new
flavors of motherboards, chipsets, etc. I spent 3mo on a bug that took
one day to fix once I got the hardware.

2006-11-10 16:53:08

by Randy Dunlap

[permalink] [raw]
Subject: Re: A proposal; making 2.6.20 a bugfix only version.

On Fri, 10 Nov 2006 08:42:58 -0800 Stephen Hemminger wrote:

> Jesper Juhl wrote:
> > On 10/11/06, Al Boldi <[email protected]> wrote:
> >> Stephen Hemminger wrote:
> > [...]
> >> > There are bugfixes which are too big for stable or -rc releases,
> >> that are
> >> > queued for 2.6.20. "Bugfix only" is a relative statement. Do you
> >> include,
> >> > new hardware support, new security api's, performance fixes. It
> >> gets to
> >> > be real hard to decide, because these are the changes that often cause
> >> > regressions; often one major bug fix causes two minor bugs.
> >>
> >> That's exactly the point I'm trying to get across; the 2.6 dev model
> >> tries to
> >> be two cycles in one, dev and stable, which yields an awkward catch22
> >> situation.
> >>
> >> The only sane way forward in such a situation is to realize the
> >> mistake and
> >> return to the focused dev-only / stable-only model.
> >>
> >> This would probably involve pushing the current 2.6 kernel into 2.8 and
> >> starting 2.9 as a dev-cycle only, once 2.8 has structurally stabilized.
> >>
> >
> > That was not what I was arguing for in the initial mail at all.
> > I think the 2.6 model works very well in general. All I was pushing
> > for was a single cycle focused mainly on bug fixes once in a while.
> >
> I like the current model fine. From a developer point of view:

I don't think that it's great, but having even/odd stable/development
is even worse.

But I agree with Jesper and Andrew's comments in general, that
we do have stability problems and we have a lack of people
who are working on bugs.

> * More branches means having to fix and retest a bug more places.
> Workload goes up geometrically with number of versions.
> So most developers end up ignoring fixing more than 2 versions;
> anything more than -current and -stable are ignored.
> * Holding off the tide of changes doesn't work. It just leads to
> massive integration headaches.
> * Many bugs don't show up until kernel is run on wide range of hardware,
> but kernel doesn't get exposed to wide range of hardware and
> applications until after it is declared stable. It is a Catch-22.
> The current stability range of
> -subtree ... -mm ... 2.6.X ... 2.6.X.Y... 2.6.vendor
> works well for most people. The people it doesn't work for are trying
> to get something for nothing. They want stability and the latest kernel
> at the same time.
>
> There are some things that do need working on:
> * Old bugs die, the bugzilla database needs a 6mo prune out.
>
> * Bugzilla.kernel.org is underutilized and is only a small sample of the
> real problems. Not sure if it is a training, user, behaviour issue or
> just that bugzilla is crap.

Behavior, ease of use vs. email.

> * Vendor bugs (that could be fixed) aren't forwarded to lkml or bugzilla

ack

> * LKML is an overloaded communication channel, do we need:
> [email protected] ?

Either that or lkml is/remains for bug reporting and we move development
somewhere else. Or my [repeated] preference:

do development on specific mailing lists (although there would
likely need to be a fallback list when it's not clear which mailing
list should be used)

> * Developers can't get (or afford to buy) the new hardware that causes
> a lot of the pain. Just look at the number of bug reports due to new
> flavors of motherboards, chipsets, etc. I spent 3mo on a bug that took
> one day to fix once I got the hardware.

Yep.

---
~Randy

2006-11-10 17:45:54

by Stefan Richter

[permalink] [raw]
Subject: Re: A proposal; making 2.6.20 a bugfix only version.

Thomas Gleixner wrote:
...
> Also the kernel.org bugzilla has a real flaw:
>
> There is no way to get informed of new entries automatically and
> filtered by Category and Component.
...

There may be ways. I for one configured my bugzilla account to watch the
"user" [email protected]. That way I get notified of
bugs that are filed under category Drivers, component IEEE1394. Here is
a list of pseudo users or real users you could spy on:
http://bugzilla.kernel.org/describeallcomponents.cgi
--
Stefan Richter
-=====-=-==- =-== -=-=-
http://arcgraph.de/sr/

2006-11-10 19:31:30

by Al Boldi

[permalink] [raw]
Subject: Re: A proposal; making 2.6.20 a bugfix only version.

Randy Dunlap wrote:
> On Fri, 10 Nov 2006 08:42:58 -0800 Stephen Hemminger wrote:
> > Jesper Juhl wrote:
> > > On 10/11/06, Al Boldi <[email protected]> wrote:
> > >> Stephen Hemminger wrote:
> > >
> > > [...]
> > >
> > >> > There are bugfixes which are too big for stable or -rc releases,
> > >> > that are queued for 2.6.20. "Bugfix only" is a relative statement.
> > >>
> > >> > Do you include, new hardware support, new security api's,
> > >> > performance fixes. It
> > >> > gets to be real hard to decide, because these are the changes that
> > >> > often cause regressions; often one major bug fix causes two minor
> > >> > bugs.
> > >>
> > >> That's exactly the point I'm trying to get across; the 2.6 dev model
> > >> tries to
> > >> be two cycles in one, dev and stable, which yields an awkward catch22
> > >> situation.
> > >>
> > >> The only sane way forward in such a situation is to realize the
> > >> mistake and
> > >> return to the focused dev-only / stable-only model.
> > >>
> > >> This would probably involve pushing the current 2.6 kernel into 2.8
> > >> and starting 2.9 as a dev-cycle only, once 2.8 has structurally
> > >> stabilized.
> > >
> > > That was not what I was arguing for in the initial mail at all.
> > > I think the 2.6 model works very well in general. All I was pushing
> > > for was a single cycle focused mainly on bug fixes once in a while.

Temporary focusing won't help much, as you are dealing with people, who
cannot be turned on and off like machines.

> > I like the current model fine. From a developer point of view:
>
> I don't think that it's great, but having even/odd stable/development
> is even worse.
>
> But I agree with Jesper and Andrew's comments in general, that
> we do have stability problems and we have a lack of people
> who are working on bugs.

The problem is not just simple bugs that surface, it's deeper than that.
Deep structural problems is what plagues 2.6.

Only a focused model may deal with such problems.

> > * More branches means having to fix and retest a bug more places.
> > Workload goes up geometrically with number of versions.
> > So most developers end up ignoring fixing more than 2 versions;
> > anything more than -current and -stable are ignored.
> > * Holding off the tide of changes doesn't work. It just leads to
> > massive integration headaches.
> > * Many bugs don't show up until kernel is run on wide range of
> > hardware, but kernel doesn't get exposed to wide range of hardware and
> > applications until after it is declared stable. It is a Catch-22.

No Catch-22 here. You just fix those to achieve stability.

It's when you start to flip-flop dev/stable/dev/stable/... that you get
Catch-22, which inhibits stability.

> > The current stability range of
> > -subtree ... -mm ... 2.6.X ... 2.6.X.Y... 2.6.vendor
> > works well for most people. The people it doesn't work for are
> > trying to get something for nothing. They want stability and the latest
> > kernel at the same time.

That's not the impression I got from Andrew's stats.

> > There are some things that do need working on:
> > * Old bugs die, the bugzilla database needs a 6mo prune out.
> >
> > * Bugzilla.kernel.org is underutilized and is only a small sample of
> > the real problems. Not sure if it is a training, user, behaviour issue
> > or just that bugzilla is crap.
>
> Behavior, ease of use vs. email.

Go email. Maybe even automated.

> > * Vendor bugs (that could be fixed) aren't forwarded to lkml or
> > bugzilla
>
> ack
>
> > * LKML is an overloaded communication channel, do we need:
> > [email protected] ?

No.

> Either that or lkml is/remains for bug reporting and we move development
> somewhere else. Or my [repeated] preference:
>
> do development on specific mailing lists (although there would
> likely need to be a fallback list when it's not clear which mailing
> list should be used)

Yes. Needs more thought, though.

> > * Developers can't get (or afford to buy) the new hardware that
> > causes a lot of the pain. Just look at the number of bug reports due to
> > new flavors of motherboards, chipsets, etc. I spent 3mo on a bug that
> > took one day to fix once I got the hardware.
>
> Yep.


Thanks!

--
Al

2006-11-10 19:49:57

by Arjan van de Ven

[permalink] [raw]
Subject: Re: A proposal; making 2.6.20 a bugfix only version.


>
> The problem is not just simple bugs that surface, it's deeper than that.
> Deep structural problems is what plagues 2.6.
>
> Only a focused model may deal with such problems.

can you at least provide a list of such structural problems?
In fact, why don't you collect them and mail them out (bi)weekly... that
may already do wonders.
Look at what Adrian is doing with the regressions; although the response
isn't 100% people DO pay attention to it.... so maybe if you post a
"structural problems list" people will actually start working on
things.. (and of course you can help too ;)

2006-11-10 21:20:51

by Al Boldi

[permalink] [raw]
Subject: Re: A proposal; making 2.6.20 a bugfix only version.

Arjan van de Ven wrote:
> > The problem is not just simple bugs that surface, it's deeper than that.
> > Deep structural problems is what plagues 2.6.
> >
> > Only a focused model may deal with such problems.
>
> can you at least provide a list of such structural problems?
> In fact, why don't you collect them and mail them out (bi)weekly... that
> may already do wonders.
> Look at what Adrian is doing with the regressions; although the response
> isn't 100% people DO pay attention to it.... so maybe if you post a
> "structural problems list" people will actually start working on
> things.. (and of course you can help too ;)

Ok, things like OOM, scheduling, and block-io.

net looks ok, although I would suggest a redesign for 3.0.


Thanks!

--
Al

2006-11-10 21:31:21

by Stephen Hemminger

[permalink] [raw]
Subject: Re: A proposal; making 2.6.20 a bugfix only version.

On Sat, 11 Nov 2006 00:22:52 +0300
Al Boldi <[email protected]> wrote:

> Arjan van de Ven wrote:
> > > The problem is not just simple bugs that surface, it's deeper than that.
> > > Deep structural problems is what plagues 2.6.
> > >
> > > Only a focused model may deal with such problems.
> >
> > can you at least provide a list of such structural problems?
> > In fact, why don't you collect them and mail them out (bi)weekly... that
> > may already do wonders.
> > Look at what Adrian is doing with the regressions; although the response
> > isn't 100% people DO pay attention to it.... so maybe if you post a
> > "structural problems list" people will actually start working on
> > things.. (and of course you can help too ;)
>
> Ok, things like OOM, scheduling, and block-io.

If you want stability don't change these. But if you think you
have better heuristics propose them for discussion.

>
> net looks ok, although I would suggest a redesign for 3.0.

Facts, no vague pronouncements please.


--
Stephen Hemminger <[email protected]>

2006-11-11 03:45:46

by Horst H. von Brand

[permalink] [raw]
Subject: Re: A proposal; making 2.6.20 a bugfix only version.

Jesper Juhl <[email protected]> wrote:
> On 08/11/06, Arjan van de Ven <[email protected]> wrote:
> > > There's no shortage of issues that need fixing, but since we keep
> > > merging new stuff, a lot of bugfixing energy gets spend on the new
> > > cool stuff instead of fixing up any other issues we have.
> >
> > but if you do this you just end up with a bigger backlog so that the
> > next one will even be more unstable due to a extreme high change rate.

> Only if people continue to work on new stuff during the "bug fixing only"
> cycle. If we manage to get everyone focused on bug fixing only for the
> entire cycle the backlog won't be growing (much).

Sorry, won't work. People working on shiny new toys will just put off
sending in their patches for a cycle, and the usual bugfixers will likewise
just go on doing their stuff.

> > > Coverity has, as of this writing, identified 728 issues in the current
> > > kernel. Sure, some of those have already been identified as false or
> > > ignorable issues, but many are flagged as actual bugs and still more
> > > are as yet uninspected.

> > most are mostly false. And the rest is getting looked at. What's the
> > problem?

> Yes, MANY are false, and I know the rest are getting worked at, I work on
> some myself when time permits. I mentioned it simply as an indicator
> (one amongst many) that we have a lot of known unfixed issues.

OK, lead by example: Do put off new work and work just on fixing things for
a while. Collect bug reports and make them useful for would-be-fixers. Etc.
--
Dr. Horst H. von Brand User #22616 counter.li.org
Departamento de Informatica Fono: +56 32 2654431
Universidad Tecnica Federico Santa Maria +56 32 2654239
Casilla 110-V, Valparaiso, Chile Fax: +56 32 2797513

2006-11-11 04:15:08

by Al Boldi

[permalink] [raw]
Subject: Re: A proposal; making 2.6.20 a bugfix only version.

Stephen Hemminger wrote:
> Al Boldi <[email protected]> wrote:
> > Arjan van de Ven wrote:
> > > > The problem is not just simple bugs that surface, it's deeper than
> > > > that. Deep structural problems is what plagues 2.6.
> > > >
> > > > Only a focused model may deal with such problems.
> > >
> > > can you at least provide a list of such structural problems?
> > > In fact, why don't you collect them and mail them out (bi)weekly...
> > > that may already do wonders.
> > > Look at what Adrian is doing with the regressions; although the
> > > response isn't 100% people DO pay attention to it.... so maybe if you
> > > post a "structural problems list" people will actually start working
> > > on things.. (and of course you can help too ;)
> >
> > Ok, things like OOM, scheduling, and block-io.
>
> If you want stability don't change these. But if you think you
> have better heuristics propose them for discussion.

I don't think there is a lack of heuristics, nor is there a lack of
discussion. What is needed, is a realization of the problem.

IOW, respective tree-owners need to come to a realization of the state of
their trees, problem or not. If it has a problem, that problem needs to be
fixed or backed out of stable and moved into dev.

> > net looks ok, although I would suggest a redesign for 3.0.
>
> Facts, no vague pronouncements please.

I meant structural OSI compliance.


Thanks!

--
Al

2006-11-11 05:09:32

by Stephen Hemminger

[permalink] [raw]
Subject: Re: A proposal; making 2.6.20 a bugfix only version.

On Sat, 11 Nov 2006 07:15:49 +0300
Al Boldi <[email protected]> wrote:

> Stephen Hemminger wrote:
> > Al Boldi <[email protected]> wrote:
> > > Arjan van de Ven wrote:
> > > > > The problem is not just simple bugs that surface, it's deeper than
> > > > > that. Deep structural problems is what plagues 2.6.
> > > > >
> > > > > Only a focused model may deal with such problems.
> > > >
> > > > can you at least provide a list of such structural problems?
> > > > In fact, why don't you collect them and mail them out (bi)weekly...
> > > > that may already do wonders.
> > > > Look at what Adrian is doing with the regressions; although the
> > > > response isn't 100% people DO pay attention to it.... so maybe if you
> > > > post a "structural problems list" people will actually start working
> > > > on things.. (and of course you can help too ;)
> > >
> > > Ok, things like OOM, scheduling, and block-io.
> >
> > If you want stability don't change these. But if you think you
> > have better heuristics propose them for discussion.
>
> I don't think there is a lack of heuristics, nor is there a lack of
> discussion. What is needed, is a realization of the problem.
>
> IOW, respective tree-owners need to come to a realization of the state of
> their trees, problem or not. If it has a problem, that problem needs to be
> fixed or backed out of stable and moved into dev.
>
> > > net looks ok, although I would suggest a redesign for 3.0.
> >
> > Facts, no vague pronouncements please.
>
> I meant structural OSI compliance.

Read the book "Network Algorithmics"; it has a clear discussion
of why building your stack like the protocol specification
is a bad idea.
>

2006-11-11 06:31:48

by Valdis Klētnieks

[permalink] [raw]
Subject: Re: A proposal; making 2.6.20 a bugfix only version.

On Sat, 11 Nov 2006 07:15:49 +0300, Al Boldi said:
> I don't think there is a lack of heuristics, nor is there a lack of
> discussion. What is needed, is a realization of the problem.
>
> IOW, respective tree-owners need to come to a realization of the state of
> their trees, problem or not. If it has a problem, that problem needs to be
> fixed or backed out of stable and moved into dev.

I keep trying to parse this, and it keeps coming up as "content-free".

For starters, you don't even have a useful definition of "has a problem".
There's a whole *range* of definitions for that, and even skilled and
respected members of the Linux kernel community can disagree about whether
something is "a problem". For example, see the thread about a week ago
about "Remove hotplug cpu crap from cpufreq".

If, given a *specific* feature with high wart quotient, we can't agree on
whether it needs to be fixed or backed out, we're doomed to fail if we
start handwaving about problems "in general". As a group, we suck at
anything that isn't specific, like "Algorithm A is better than B for
case XYZ".


Attachments:
(No filename) (226.00 B)

2006-11-11 07:15:50

by Willy Tarreau

[permalink] [raw]
Subject: Re: A proposal; making 2.6.20 a bugfix only version.

On Fri, Nov 10, 2006 at 08:53:11AM -0800, Randy Dunlap wrote:
> Either that or lkml is/remains for bug reporting and we move development
> somewhere else. Or my [repeated] preference:
>
> do development on specific mailing lists (although there would
> likely need to be a fallback list when it's not clear which mailing
> list should be used)

I've been thinking about this too for a while now. There is something
like half of the email volume which are (semi-)automated emails
containing patches moving from a GIT tree to another. I think that
moving this to some linux-dev or something like this would :

1) reduce the noise on LKML so that problem reports are better caught
2) reduce the global email volume because instead of sending all these
emails to 10-20000 persons(?), only maybe a thousand will be subscribed.
3) reduce even more the latency between post and publication due to 2.

I don't know if others would be interested, in which case it would be wise
to poll on the subject and include Matti and Davem to the discussion.

Regards,
Willy

2006-11-11 07:23:40

by David Miller

[permalink] [raw]
Subject: Re: A proposal; making 2.6.20 a bugfix only version.

From: Stephen Hemminger <[email protected]>
Date: Fri, 10 Nov 2006 21:09:17 -0800

> On Sat, 11 Nov 2006 07:15:49 +0300
> Al Boldi <[email protected]> wrote:
>
> > Stephen Hemminger wrote:
> > > Al Boldi <[email protected]> wrote:
> > I meant structural OSI compliance.
>
> Read the book "Network Algorithmics"; it has a clear discussion
> of why building your stack like the protocol specification
> is a bad idea.

Even Van Jacobson can be quoted as saying (to the effect) that
layering is how you design protocols, _NOT_ how you implement
them.

2006-11-11 11:01:16

by Martin Bligh

[permalink] [raw]
Subject: Re: A proposal; making 2.6.20 a bugfix only version.


> That's no excuse, as Adrian pointed it out on LKML since weeks.
>
> Also the kernel.org bugzilla has a real flaw:
>
> There is no way to get informed of new entries automatically and
> filtered by Category and Component. At least I did not find a way and
> [email protected] seems to be a black hole.

There is one list, bugme-new, that gets a copy of all bugs. The category
and component are broken out in headers simply so that you can filter it
yourself in whatever way you like.

Other than that, we can make each category owned by a "virtual user",
(many are already), and then multiple people can do an email watch on
that user.

bugme-admin alias should not be a black hole ... I get a copy of all
emails, as do a few other people. I don't recall seeing email from you
recently, but possibly a problem with spam filtering or something. If
you're having trouble, please send email directly to me if stuck.

M.

2006-11-11 11:13:40

by Al Boldi

[permalink] [raw]
Subject: Re: A proposal; making 2.6.20 a bugfix only version.

David Miller wrote:
> From: Stephen Hemminger <[email protected]>
> Date: Fri, 10 Nov 2006 21:09:17 -0800
>
> > On Sat, 11 Nov 2006 07:15:49 +0300
> >
> > Al Boldi <[email protected]> wrote:
> > > Stephen Hemminger wrote:
> > > > Al Boldi <[email protected]> wrote:
> > >
> > > I meant structural OSI compliance.
> >
> > Read the book "Network Algorithmics"; it has a clear discussion
> > of why building your stack like the protocol specification
> > is a bad idea.
>
> Even Van Jacobson can be quoted as saying (to the effect) that
> layering is how you design protocols, _NOT_ how you implement
> them.

The problem is that you let the implementation surface into user-land.


Thanks!

--
Al

2006-11-11 11:25:33

by Al Boldi

[permalink] [raw]
Subject: Re: A proposal; making 2.6.20 a bugfix only version.

[email protected] wrote:
> On Sat, 11 Nov 2006 07:15:49 +0300, Al Boldi said:
> > I don't think there is a lack of heuristics, nor is there a lack of
> > discussion. What is needed, is a realization of the problem.
> >
> > IOW, respective tree-owners need to come to a realization of the state
> > of their trees, problem or not. If it has a problem, that problem needs
> > to be fixed or backed out of stable and moved into dev.
>
> I keep trying to parse this, and it keeps coming up as "content-free".

Think denial.

> For starters, you don't even have a useful definition of "has a problem".
> There's a whole *range* of definitions for that, and even skilled and
> respected members of the Linux kernel community can disagree about whether
> something is "a problem". For example, see the thread about a week ago
> about "Remove hotplug cpu crap from cpufreq".
>
> If, given a *specific* feature with high wart quotient, we can't agree on
> whether it needs to be fixed or backed out, we're doomed to fail if we
> start handwaving about problems "in general". As a group, we suck at
> anything that isn't specific, like "Algorithm A is better than B for
> case XYZ".

We don't need to agree whether A is better than B, the mere fact that we
acknowledge the problem is the first step in finding a solution.

So, either fix it, or back out.

OTOH, if there is no problem, then I guess we have blue skies...


Thanks!

--
Al

2006-11-11 12:03:15

by NeilBrown

[permalink] [raw]
Subject: Re: A proposal; making 2.6.20 a bugfix only version.

On Saturday November 11, [email protected] wrote:
> On Fri, Nov 10, 2006 at 08:53:11AM -0800, Randy Dunlap wrote:
> > Either that or lkml is/remains for bug reporting and we move development
> > somewhere else. Or my [repeated] preference:
> >
> > do development on specific mailing lists (although there would
> > likely need to be a fallback list when it's not clear which mailing
> > list should be used)
>
> I've been thinking about this too for a while now. There is something
> like half of the email volume which are (semi-)automated emails
> containing patches moving from a GIT tree to another. I think that
> moving this to some linux-dev or something like this would :
>
> 1) reduce the noise on LKML so that problem reports are better caught
> 2) reduce the global email volume because instead of sending all these
> emails to 10-20000 persons(?), only maybe a thousand will be subscribed.
> 3) reduce even more the latency between post and publication due to 2.
>
> I don't know if others would be interested, in which case it would be wise
> to poll on the subject and include Matti and Davem to the discussion.

I personally don't think the volume on lkml is a particular problem.
I have filters which pick out the items that might be of particular
interest to me (matching on words like 'nfs' 'raid' 'md' in my case)
and the rest goes in to a bucket that I glance at occasionally. When
I do, I scan the subject lines and read the things that seem
interesting at the time, and delete the rest unread.

I prefer this to splitting lkml into multiple lists because it makes
it much easier for me to change my areas of interest from time to
time. I don't have to go and subscribe to different lists, I just use a
different pattern for matching (either in my brain or in my filter).

I suspect the main reason that I miss problem reports that are
relevant to me is that people choose poor Subject: lines.
e.g. when I scan the subjects of a thread I might see multiple threads
all with
Subject: Re: 2.6.18-rc5-mm2
and I'll have no idea what they are really about, and I don't have
time to read them all. If reporters simple put
(nfs problem)
at the end of the subject (or whatever is appropriate) then it would
be a LOT easier to avoid missing things (some people do do this. I
see their bug reports promptly when relevant).
Fortunately akpm is very good at reading all of these and forwarding
things as appropriate (thanks Andrew) but that shouldn't really be
necessary. And that situation won't be improved by reducing traffic
on any list or splitting up the list (there are already plenty of
other lists around). It really requires people who submit bug reports
to think about what they are doing, and make it easy for people to
find their bug report.

I guess bugzilla encourages people to think about their bug report and
tag it accordingly. In that sense bugzilla is good. I just hate
using it - I have to use a web browser, and it feels like a closed,
private discussion, which really isn't appropriate.

NeilBrown

2006-11-11 19:15:18

by Adrian Bunk

[permalink] [raw]
Subject: Re: A proposal; making 2.6.20 a bugfix only version.

On Fri, Nov 10, 2006 at 08:42:58AM -0800, Stephen Hemminger wrote:
>...
> * Old bugs die, the bugzilla database needs a 6mo prune out.

That's not that much of a problem.

There is me and there are some other people who sometimes go through
older bugs asking submitters whether the issue is still present in a
recent kernel.

And if it is not or the submitter doesn't answer the bug gets closed a
few weeks later.

The problem with this approach is what to do when the bug is still
present - it's quite unfair to ask a submitter whether a bug is still
present, but having no way to help the submitter if he confirms it's
still present.

A positive example is e.g. sparc: davem doesn't use Bugzilla, but when I
forward bugs to him from Bugzilla I know that there will be an answer.

But for other subsystems like e.g. ext3 I don't know about anyone who
will answer every time I forward a bug that is still present in the
latest kernel.

> * Bugzilla.kernel.org is underutilized and is only a small sample of the
> real problems. Not sure if it is a training, user, behaviour issue or
> just that bugzilla is crap.

At least one positive thing about Bugzilla is that it shows how bad our
bug handling is - bug reports noone took care of are visible...

> * Vendor bugs (that could be fixed) aren't forwarded to lkml or bugzilla

We do already get more bug reports than we can handle.

As an example, until recently people were spreading the fairy tale noone
would test -rc kernels. So I started a list of reported regressions by
people who did test the -rc kernels, and this shows that we are even far
away from handling recent regressions within one or two weeks - and the
situation with other bugs looks much worse.

> * LKML is an overloaded communication channel, do we need:
> [email protected] ?

The problem is not how to communicate bugs - the problem is who will
look after the bugs.

As an example, Andrew is already doing a great job in forwarding bugs
from Bugzilla and linux-kernel to maintainers. It's not that maintainers
miss bugs because they don't see them.

> * Developers can't get (or afford to buy) the new hardware that causes
> a lot of the pain. Just look at the number of bug reports due to new
> flavors of motherboards, chipsets, etc. I spent 3mo on a bug that took
> one day to fix once I got the hardware.

If only this was the only problem...

cu
Adrian

--

"Is there not promise of rain?" Ling Tan asked suddenly out
of the darkness. There had been need of rain for many days.
"Only a promise," Lao Er said.
Pearl S. Buck - Dragon Seed

2006-11-11 19:16:56

by Krzysztof Halasa

[permalink] [raw]
Subject: Re: A proposal; making 2.6.20 a bugfix only version.

Randy Dunlap <[email protected]> writes:

>> * LKML is an overloaded communication channel, do we need:
>> [email protected] ?

Not sure if LKML is overloaded - subject filtering etc. And people
expect they will be Ccopied if they are, for example, maintainers.

It's not the list for everybody, though.

> Either that or lkml is/remains for bug reporting and we move development
> somewhere else. Or my [repeated] preference:
>
> do development on specific mailing lists (although there would
> likely need to be a fallback list when it's not clear which mailing
> list should be used)

I don't like this idea as implemented now - it's probably fine for
subsystem maintainers but it's prohibitive for an occasional submitter.


But... specific lists would make sense if - and only if:
- they are always mirrored to something like LKML so one doesn't need
to subscribe to every possible list (as they appear).
- they are open as LKML is (no specific subscription required to post)

This way one could discuss things with/on a specific list (subscribed
by subsystem maintainers and other interested people) and everyone
reading the general list (or more general?) would see it too.

We could have a hierarchy:
- specific lists like lmkl-usb-ohci
- more general lkml-usb
- lkml, seeing all mail

This way every mail sent to lkml-usb-ohci would be sent to lkml-usb,
and lkml-usb would be copied to lkml as well (I imagine some duplicate
elimination would be a plus).

Anyone could discuss ohci writing to lkml-usb-ohci, discuss general
USB writing to lkml-usb etc. Anyone subscribed to any of the above
lists would see the traffic they are interested in, without a need
for constant subscribing.

I assume the lists would be hosted on one domain to make it easy
to post and perhaps to enable it to work at all.

If some form of sender address verification/scoring was in use I think
it would be smart to recognize all people subscribed to more general
lists to be "known" for this purpose (i.e., people subscribed to
lkml would be considered "subscribed" while posting to lkml-usb-ohci).


Having written all of the above I now like the idea so be prepared to
fight me hard if you don't :-)
Perhaps I could even help implementing it in some unspecified so
called "free time".
--
Krzysztof Halasa

2006-11-11 21:09:13

by Pavel Machek

[permalink] [raw]
Subject: bugzilla (was Re: A proposal; making 2.6.20 a bugfix only version.)

Hi!


> I guess bugzilla encourages people to think about their bug report and
> tag it accordingly. In that sense bugzilla is good. I just hate
> using it - I have to use a web browser, and it feels like a closed,
> private discussion, which really isn't appropriate.

Novell bugzilla is crap, but kernel.org bugzilla works with email. Try
replying to one of its mails.

It works like small mailing list for each bug, with archive on the
web. Yes, it is _still_ too much web, but it is way better than
others...
Pavel
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

2006-11-15 21:04:58

by Jesper Juhl

[permalink] [raw]
Subject: Re: A proposal; making 2.6.20 a bugfix only version.

On 10/11/06, Horst H. von Brand <[email protected]> wrote:
> Jesper Juhl <[email protected]> wrote:
> > On 08/11/06, Arjan van de Ven <[email protected]> wrote:
> > > > There's no shortage of issues that need fixing, but since we keep
> > > > merging new stuff, a lot of bugfixing energy gets spend on the new
> > > > cool stuff instead of fixing up any other issues we have.
> > >
> > > but if you do this you just end up with a bigger backlog so that the
> > > next one will even be more unstable due to a extreme high change rate.
>
> > Only if people continue to work on new stuff during the "bug fixing only"
> > cycle. If we manage to get everyone focused on bug fixing only for the
> > entire cycle the backlog won't be growing (much).
>
> Sorry, won't work. People working on shiny new toys will just put off
> sending in their patches for a cycle, and the usual bugfixers will likewise
> just go on doing their stuff.
>
> > > > Coverity has, as of this writing, identified 728 issues in the current
> > > > kernel. Sure, some of those have already been identified as false or
> > > > ignorable issues, but many are flagged as actual bugs and still more
> > > > are as yet uninspected.
>
> > > most are mostly false. And the rest is getting looked at. What's the
> > > problem?
>
> > Yes, MANY are false, and I know the rest are getting worked at, I work on
> > some myself when time permits. I mentioned it simply as an indicator
> > (one amongst many) that we have a lot of known unfixed issues.
>
> OK, lead by example: Do put off new work and work just on fixing things for
> a while. Collect bug reports and make them useful for would-be-fixers. Etc.

I try. Unfortunately I don't have nearly as much time as I would like,
to work on the kernel, but when I do have time I try to fix bugs. If
you look through the mailing list archives you will see that I try to
fix bugs/buglets most of the time and I find these by combing through
the coverity database, bugzilla, the mailing list, logs of test builds
of new kernels etc etc...
I don't maintain lists of unfixed bugs. I would love to do so, but I
lack the time to do it properly.

--
Jesper Juhl <[email protected]>
Don't top-post http://www.catb.org/~esr/jargon/html/T/top-post.html
Plain text mails only, please http://www.expita.com/nomime.html