2010-07-11 07:18:46

by Martin Steigerwald

[permalink] [raw]
Subject: stable? quality assurance?


Hi!

2.6.34 was a desaster for me: bug #15969 - patch was availble before
2.6.34 already, bug #15788, also reported with 2.6.34-rc2 already, as well
as most important two complete lockups - well maybe just X.org and radeon
KMS, I didn't start my second laptop to SSH into the locked up one - on my
ThinkPad T42. I fixed the first one with the patch, but after the lockups I
just downgraded to 2.6.33 again.

I still actually *use* my machines for something else than hunting patches
for kernel bugs and on kernel.org it is written "Latest *Stable* Kernel"
(accentuation from me). I know of the argument that one should use a
distro kernel for machines that are for production use. But frankly, does
that justify to deliver in advance known crap to the distributors? What
impact do partly grave bugs reported on bugzilla have on the release
decision?

And how about people who have their reasons - mine is TuxOnIce - to
compile their own kernels?

Well 2.6.34.1 fixed the two reported bugs and it seemed to have fixed the
freezes as well. So far so good.

Maybe it should read "prerelease of stable" for at least 2.6.34.0 on the
website. And I just again always wait for .2 or .3, as with 2.6.34.1 I
still have some problems like the hang on hibernation reported in

hang on hibernation with kernel 2.6.34.1 and TuxOnIce 3.1.1.1

on this mailing list just a moment ago. But then 2.6.33 did hang with
TuxOnIce which apparently (!) wasn't a TuxOnIce problem either, since
2.6.34 did not hang with it anymore which was a reason for me to try
2.6.34 earlier.

I am quite a bit worried about the quality of the recent kernels. Some
iterations earlier I just compiled them, partly even rc-ones which I do
not expact to be table, and they just worked. But in the recent times .0,
partly even .1 or .2 versions haven't been stable for me quite some times
already and thus they better not be advertised as such on kernel.org I
think. I am willing to risk some testing and do bug reports, but these are
still production machines, I do not have any spare test machines, and
there needs to be some balance, i.e. the kernels should basically work.
Thus I for sure will be more reluctant to upgrade in the future.

Ciao,
--
Martin 'Helios' Steigerwald - http://www.Lichtvoll.de
GPG: 03B0 0D6C 0040 0710 4AFA B82F 991B EAAC A599 84C7


Attachments:
signature.asc (198.00 B)
This is a digitally signed message part.

2010-07-11 08:39:49

by Eric Dumazet

[permalink] [raw]
Subject: Re: stable? quality assurance?

Le dimanche 11 juillet 2010 à 09:18 +0200, Martin Steigerwald a écrit :
> Hi!
>
> 2.6.34 was a desaster for me: bug #15969 - patch was availble before
> 2.6.34 already, bug #15788, also reported with 2.6.34-rc2 already, as well
> as most important two complete lockups - well maybe just X.org and radeon
> KMS, I didn't start my second laptop to SSH into the locked up one - on my
> ThinkPad T42. I fixed the first one with the patch, but after the lockups I
> just downgraded to 2.6.33 again.
>
> I still actually *use* my machines for something else than hunting patches
> for kernel bugs and on kernel.org it is written "Latest *Stable* Kernel"
> (accentuation from me). I know of the argument that one should use a
> distro kernel for machines that are for production use. But frankly, does
> that justify to deliver in advance known crap to the distributors? What
> impact do partly grave bugs reported on bugzilla have on the release
> decision?
>
> And how about people who have their reasons - mine is TuxOnIce - to
> compile their own kernels?
>
> Well 2.6.34.1 fixed the two reported bugs and it seemed to have fixed the
> freezes as well. So far so good.
>
> Maybe it should read "prerelease of stable" for at least 2.6.34.0 on the
> website. And I just again always wait for .2 or .3, as with 2.6.34.1 I
> still have some problems like the hang on hibernation reported in
>
> hang on hibernation with kernel 2.6.34.1 and TuxOnIce 3.1.1.1
>
> on this mailing list just a moment ago. But then 2.6.33 did hang with
> TuxOnIce which apparently (!) wasn't a TuxOnIce problem either, since
> 2.6.34 did not hang with it anymore which was a reason for me to try
> 2.6.34 earlier.
>
> I am quite a bit worried about the quality of the recent kernels. Some
> iterations earlier I just compiled them, partly even rc-ones which I do
> not expact to be table, and they just worked. But in the recent times .0,
> partly even .1 or .2 versions haven't been stable for me quite some times
> already and thus they better not be advertised as such on kernel.org I
> think. I am willing to risk some testing and do bug reports, but these are
> still production machines, I do not have any spare test machines, and
> there needs to be some balance, i.e. the kernels should basically work.
> Thus I for sure will be more reluctant to upgrade in the future.
>
> Ciao,

Anybody running latest kernel on a production machine is living
dangerously. Dont you already know that ?

When 2.6.X is released, everybody knows it contains at least 100 bugs.

It was true for all previous values of X, it will be true for all
futures values.

If you want to be safer, use a one year old kernel, with all stable
patches in.

Something like 2.6.32.16 : Its probably more stable than all 2.6.X
kernels.

If 2.6.33 runs OK on your machine, you are lucky, since 2.6.33.6
contains numerous bug fixes.

2010-07-11 13:56:56

by Lee Mathers

[permalink] [raw]
Subject: Re: stable? quality assurance?

Wow!

First question what is a "desaster"?

Second question, what makes you so important that you feel you can
makes demands and comments as you did.

If indeed these are production systems and you are an administrator of
said production systems. I suggest you need to do a little more home
work to expand your knowledge base.

I would follow Eric's advice. It's sound advice and better yet it was free.

Hope you have better luck in getting your systems running well.

On 7/11/10, Martin Steigerwald <[email protected]> wrote:
>
> Hi!
>
> 2.6.34 was a desaster for me: bug #15969 - patch was availble before
> 2.6.34 already, bug #15788, also reported with 2.6.34-rc2 already, as well
> as most important two complete lockups - well maybe just X.org and radeon
> KMS, I didn't start my second laptop to SSH into the locked up one - on my
> ThinkPad T42. I fixed the first one with the patch, but after the lockups I
> just downgraded to 2.6.33 again.
>
> I still actually *use* my machines for something else than hunting patches
> for kernel bugs and on kernel.org it is written "Latest *Stable* Kernel"
> (accentuation from me). I know of the argument that one should use a
> distro kernel for machines that are for production use. But frankly, does
> that justify to deliver in advance known crap to the distributors? What
> impact do partly grave bugs reported on bugzilla have on the release
> decision?
>
> And how about people who have their reasons - mine is TuxOnIce - to
> compile their own kernels?
>
> Well 2.6.34.1 fixed the two reported bugs and it seemed to have fixed the
> freezes as well. So far so good.
>
> Maybe it should read "prerelease of stable" for at least 2.6.34.0 on the
> website. And I just again always wait for .2 or .3, as with 2.6.34.1 I
> still have some problems like the hang on hibernation reported in
>
> hang on hibernation with kernel 2.6.34.1 and TuxOnIce 3.1.1.1
>
> on this mailing list just a moment ago. But then 2.6.33 did hang with
> TuxOnIce which apparently (!) wasn't a TuxOnIce problem either, since
> 2.6.34 did not hang with it anymore which was a reason for me to try
> 2.6.34 earlier.
>
> I am quite a bit worried about the quality of the recent kernels. Some
> iterations earlier I just compiled them, partly even rc-ones which I do
> not expact to be table, and they just worked. But in the recent times .0,
> partly even .1 or .2 versions haven't been stable for me quite some times
> already and thus they better not be advertised as such on kernel.org I
> think. I am willing to risk some testing and do bug reports, but these are
> still production machines, I do not have any spare test machines, and
> there needs to be some balance, i.e. the kernels should basically work.
> Thus I for sure will be more reluctant to upgrade in the future.
>
> Ciao,
> --
> Martin 'Helios' Steigerwald - http://www.Lichtvoll.de
> GPG: 03B0 0D6C 0040 0710 4AFA B82F 991B EAAC A599 84C7
>

--
Sent from my mobile device

2010-07-11 14:22:57

by Martin Steigerwald

[permalink] [raw]
Subject: Re: stable? quality assurance?

Am Sonntag 11 Juli 2010 schrieb Eric Dumazet:
> Le dimanche 11 juillet 2010 à 09:18 +0200, Martin Steigerwald a écrit :
> > Hi!

Hi Eric,

> > 2.6.34 was a desaster for me: bug #15969 - patch was availble before
> > 2.6.34 already, bug #15788, also reported with 2.6.34-rc2 already, as
> > well as most important two complete lockups - well maybe just X.org
> > and radeon KMS, I didn't start my second laptop to SSH into the
> > locked up one - on my ThinkPad T42. I fixed the first one with the
> > patch, but after the lockups I just downgraded to 2.6.33 again.
> >
> > I still actually *use* my machines for something else than hunting
> > patches for kernel bugs and on kernel.org it is written "Latest
> > *Stable* Kernel" (accentuation from me). I know of the argument that
[...]

> > advertised as such on kernel.org I think. I am willing to risk some
> > testing and do bug reports, but these are still production machines,
> > I do not have any spare test machines, and there needs to be some
> > balance, i.e. the kernels should basically work. Thus I for sure
> > will be more reluctant to upgrade in the future.
> >
> > Ciao,
>
> Anybody running latest kernel on a production machine is living
> dangerously. Dont you already know that ?

Yes, and I indicated it above. But in my - naturally rather subjective I
admit - perception the balance between stable and unstable from about 1 or
2 years ago has been lost. In my personal experience it has gotten much
worse in the last time. To the extent that I skipped some major kernels
versions completely. For example 2.6.30.

And its not servers - these use distro kernels.

> When 2.6.X is released, everybody knows it contains at least 100 bugs.

Then why its still labeled "stable" on kernel.org? It is not. It is at
most beta quality software.

Its not more stable than KDE 4.0 wasn't stable, but at least they
mentioned in the release notes.

> It was true for all previous values of X, it will be true for all
> futures values.
>
> If you want to be safer, use a one year old kernel, with all stable
> patches in.
>
> Something like 2.6.32.16 : Its probably more stable than all 2.6.X
> kernels.
>
> If 2.6.33 runs OK on your machine, you are lucky, since 2.6.33.6
> contains numerous bug fixes.

Actually it was 2.6.33.1 with userspace software suspend and it had pretty
good uptimes above 20 days - only interrupted by installing 2.6.34.

Well then if everybody else considers this for granted I just replace that
"stable" on kernel.org by "beta quality" - from my perception it does not
even have release candidate status in the last iterations - in my mind and
be done with it.

At as soon as the kernel contains a performant hibernation infrastructure
I will probably just use distro kernels and be done with it.

Ciao,
--
Martin 'Helios' Steigerwald - http://www.Lichtvoll.de
GPG: 03B0 0D6C 0040 0710 4AFA B82F 991B EAAC A599 84C7


Attachments:
signature.asc (198.00 B)
This is a digitally signed message part.

2010-07-11 14:51:47

by Martin Steigerwald

[permalink] [raw]
Subject: Re: stable? quality assurance?


Hi Lee,

Am Sonntag 11 Juli 2010 schrieb Lee Mathers:
> Wow!
>
> First question what is a "desaster"?

For me freezing the machine or at least complete desktop randomly for
example. And actually I said "for me" as you can reread on the bottom of
your top posting.

> Second question, what makes you so important that you feel you can
> makes demands and comments as you did.

Since when I do need to be considered to be important by you or anyone
else to make comments? Actually I think I do not - this is still an open
mailinglist, isn't it? And I won't waste my time with proofs that I
contributed to free software here and there - also to kernel testing what
for example Ingo Molnar could testify back in early CFS times where I
roughly compiled a kernel a day and to kernel documentation once.

I also do not get why you are attacking me personally. It seems to be that
you feel personally attacked by me. But I did not. I just questioned the
quality of the kernel and its current quality assurance process. No one is
personally bad then anything of that lacks.

One reason for a demand for me is best expressed by this question: Does
the kernel developer community want to encourage that a group of advanced
Linux users - but mostly non-developers - compile their own vanilla or
valnilla near kernels, provide wider testing and report a bug now and
then?

I can live with either answer. If not, I just will be much more reluctant
to try out new kernels.

But I have experienced working productively with kernel developers like
Ingo and tuxonice developer Nigel who where pretty interested in my usage
of latest kernels.

I admit my wording could have been friendlier, too, but I was just
frustrated out of my recent experiences. What I wanted to achieve is
raising concern whether kernel quality actually has decreased and more
importantly something needs to be done to make it more stable again.

Well Linus has at least been a bit more reluctant to take big changes
after rc1 this cycle, so maybe 2.6.35 will be better again.

> If indeed these are production systems and you are an administrator of
> said production systems. I suggest you need to do a little more home
> work to expand your knowledge base.

Its production system that have some fault tolerance, i.e. not servers,
but laptops and one, not yet all workstations. But for me a certain
balance has to be met. I will just downgrade and drop newer kernels or
even start skipping whole major versions completely on a regular basis if
that turns out to be the only way to have stable enough machines for me.
One approach would be to stick to the stable kernels that Greg and the
stable team maintains for a longer time

> Hope you have better luck in getting your systems running well.

Thanks. I certainly will. If need be by downgrading.

I hope that someone answers who actually can take some critique. From the
current replies I perceive a lack of that ability.

Ciao,
--
Martin 'Helios' Steigerwald - http://www.Lichtvoll.de
GPG: 03B0 0D6C 0040 0710 4AFA B82F 991B EAAC A599 84C7


Attachments:
signature.asc (198.00 B)
This is a digitally signed message part.

2010-07-11 14:52:43

by Martin Steigerwald

[permalink] [raw]
Subject: Re: stable? quality assurance?

Am Sonntag 11 Juli 2010 schrieb Martin Steigerwald:
> worse in the last time. To the extent that I skipped some major
> kernels versions completely. For example 2.6.30.

Okay, not some, but one.

--
Martin 'Helios' Steigerwald - http://www.Lichtvoll.de
GPG: 03B0 0D6C 0040 0710 4AFA B82F 991B EAAC A599 84C7

2010-07-11 16:05:03

by William Pitcock

[permalink] [raw]
Subject: Re: stable? quality assurance?


----- "Eric Dumazet" <[email protected]> wrote:

> Le dimanche 11 juillet 2010 à 09:18 +0200, Martin Steigerwald a écrit
> :
> > Hi!
> >
> > 2.6.34 was a desaster for me: bug #15969 - patch was availble before
>
> > 2.6.34 already, bug #15788, also reported with 2.6.34-rc2 already,
> as well
> > as most important two complete lockups - well maybe just X.org and
> radeon
> > KMS, I didn't start my second laptop to SSH into the locked up one -
> on my
> > ThinkPad T42. I fixed the first one with the patch, but after the
> lockups I
> > just downgraded to 2.6.33 again.
> >
> > I still actually *use* my machines for something else than hunting
> patches
> > for kernel bugs and on kernel.org it is written "Latest *Stable*
> Kernel"
> > (accentuation from me). I know of the argument that one should use a
>
> > distro kernel for machines that are for production use. But frankly,
> does
> > that justify to deliver in advance known crap to the distributors?
> What
> > impact do partly grave bugs reported on bugzilla have on the release
>
> > decision?
> >
> > And how about people who have their reasons - mine is TuxOnIce - to
>
> > compile their own kernels?
> >
> > Well 2.6.34.1 fixed the two reported bugs and it seemed to have
> fixed the
> > freezes as well. So far so good.
> >
> > Maybe it should read "prerelease of stable" for at least 2.6.34.0 on
> the
> > website. And I just again always wait for .2 or .3, as with 2.6.34.1
> I
> > still have some problems like the hang on hibernation reported in
> >
> > hang on hibernation with kernel 2.6.34.1 and TuxOnIce 3.1.1.1
> >
> > on this mailing list just a moment ago. But then 2.6.33 did hang
> with
> > TuxOnIce which apparently (!) wasn't a TuxOnIce problem either,
> since
> > 2.6.34 did not hang with it anymore which was a reason for me to try
>
> > 2.6.34 earlier.
> >
> > I am quite a bit worried about the quality of the recent kernels.
> Some
> > iterations earlier I just compiled them, partly even rc-ones which I
> do
> > not expact to be table, and they just worked. But in the recent
> times .0,
> > partly even .1 or .2 versions haven't been stable for me quite some
> times
> > already and thus they better not be advertised as such on kernel.org
> I
> > think. I am willing to risk some testing and do bug reports, but
> these are
> > still production machines, I do not have any spare test machines,
> and
> > there needs to be some balance, i.e. the kernels should basically
> work.
> > Thus I for sure will be more reluctant to upgrade in the future.
> >
> > Ciao,
>
> Anybody running latest kernel on a production machine is living
> dangerously. Dont you already know that ?
>
> When 2.6.X is released, everybody knows it contains at least 100
> bugs.
>
> It was true for all previous values of X, it will be true for all
> futures values.
>
> If you want to be safer, use a one year old kernel, with all stable
> patches in.
>
> Something like 2.6.32.16 : Its probably more stable than all 2.6.X
> kernels.

2.6.32.16 (possibly 2.6.32.15) has a regression where it is unusable
as a Xen domU. I would say 2.6.32.12 is the best choice since who knows
what other regressions there are in .16.

William

2010-07-11 16:34:11

by Eric Dumazet

[permalink] [raw]
Subject: Re: stable? quality assurance?

Le dimanche 11 juillet 2010 à 19:58 +0400, William Pitcock a écrit :
> ----- "Eric Dumazet" <[email protected]> wrote:
> >
> > Something like 2.6.32.16 : Its probably more stable than all 2.6.X
> > kernels.
>
> 2.6.32.16 (possibly 2.6.32.15) has a regression where it is unusable
> as a Xen domU. I would say 2.6.32.12 is the best choice since who knows
> what other regressions there are in .16.
>

Yea, strictly speaking, you can be sure no kernel will be bug free,
ever.

This is why I said "probably more stable" ;)


2010-07-11 17:04:41

by Heinz Diehl

[permalink] [raw]
Subject: Re: stable? quality assurance?

On 11.07.2010, Eric Dumazet wrote:

> When 2.6.X is released, everybody knows it contains at least 100 bugs.
[....]

http://s5.directupload.net/file/d/2217/ckghonrx_jpg.htm

:-)

2010-07-11 17:22:57

by Willy Tarreau

[permalink] [raw]
Subject: Re: stable? quality assurance?

Hi Martin,

On Sun, Jul 11, 2010 at 04:51:42PM +0200, Martin Steigerwald wrote:
> I hope that someone answers who actually can take some critique. From the
> current replies I perceive a lack of that ability.

well, I'll try to do then :-)

There were some threads in the past about kernel releases quality,
where Linus explained why it could not be completely black or white.

Among the things he explained, I remember that one of primary concern
was the inability to slow down development. I mean, if he waits 2 more
weeks for things to stabilize, then there will be two more weeks of
crap^H^H^H^Hdevelopment merged in next merge window, so in fact this
will just shift dates and not quality.

There are also some regressions that get merged with every pre-release.
Thus, assuming he would wait for one more pre-release to merge the
fixes you spotted, 2 or 3 more would appear, so there's a point where
it must be decided when to release.

Right now it's released when he feels it "good enough". This can be
very subjective, but I'd think that "good enough" basically means
that the kernel will be able to live in its stable branch without
major changes and without reverting features.

Also, you have to consider that there are several types of users.
Some of them are developers who will run a latest -git kernel at
some point. Some of them will be enthousiasts waiting for a feature,
and who will run every -rc kernel once the feature is merged, to
ensure it does not break before the release. There are also janitors
and the curious ones who'll basically run a few of the last -rc as
time permits to see if they can spot a few last-minute issues before
the release. There are the brave ones who systematically download
the dot-0 release once Linus announces it and will proudly run it
to show their friends who it's better than the last one. There are
those who need a bit of stability (eg: professional laptop or home
server) and will prefer to wait for a few stable releases to ensure
they won't waste their time on a big stupid issue that all other ones
above will have immediately spotted for them. And there are the ones
who run production servers who will either use distro kernels of
long term stable kernels, with a more or less long qualification
process between upgrades.

It's just an ecosystem where you have to find your place. From your
description, I think you're before the last ones above, you need
something which works, eventhough it's not critical, so you could
very well wait for 2-3 stable updates before upgrading (that does
not prevent you from testing earlier on other systems if you want
to test performance, new features, regressions, etc...).

It's not really advisable to call dot-0 releases "unstable" because
it will only result in shifting the adoption point between the user
classes above. We need to have enthousiasts who proudly say "hey
look, dot-0 and it's already rock solid". We've all seen some of them
and they're the ones who help reporting issues that get fixed in the
next stable release.

I think that the most reasonable thing to do is to assume your need
for stability and always refrain from running on the latest release.

Speaking for myself, I tend to run rock solid kernels for my data (my
file server was still on 2.4.37.9 till this afternoon, I just upgraded
it to 2.6). The distro's kernel currently is 2.6.33.4 and I'm going to
switch it back to 2.6.32.x or 2.6.27.x because I'd rather have something
fully tested there. My desktop which regularly reaches 50-100 days
uptime runs on whatever looks stable enough for the job when I upgrade.
Usually it's one of Greg's long term stable series. 2.6.27.x or
2.6.32.x, with x >= 10. My work laptop is on similar kernels. My
netbook is generally running experimental code, it does not matter
much. It's where I'd try 2.6.35-rc for instance, or where I test
2.6.32.x-rc when Greg announces them.

You see, there's a kernel for everyone, and for every usage. You just
have to make your choice. And when you don't know or don't want to
guess, stick to the distro's kernel.

Regards,
Willy

2010-07-11 17:48:23

by Theodore Ts'o

[permalink] [raw]
Subject: Re: stable? quality assurance?

On Sun, Jul 11, 2010 at 09:18:41AM +0200, Martin Steigerwald wrote:
>
> I still actually *use* my machines for something else than hunting patches
> for kernel bugs and on kernel.org it is written "Latest *Stable* Kernel"
> (accentuation from me). I know of the argument that one should use a
> distro kernel for machines that are for production use. But frankly, does
> that justify to deliver in advance known crap to the distributors? What
> impact do partly grave bugs reported on bugzilla have on the release
> decision?

So I tend to use -rc3, -rc4, and -rc5 kernels on my laptops, and when
I find bugs, I report them and I help fix them. If more people did
that, then the 2.6.X.0 releases would be more stable. But kernel
development is a volunteer effort, so it's up to the volunteers to
test and fix bugs during the rc4, -rc5 and -rc6 time frame. But if
the work tails off, because the developers are busily working on new
features for the new release, then past a certain point, delaying the
release reaches a point of diminishing returns. This is why we do
time-based releases.

It is possible to do other types of release strategies, but look at
Debian Obsolete^H^H^H^H^H^H^H^H Stable if you want to see what happens
if you insist on waiting until all release blockers are fixed (and
even with Debian, past a certain point the release engineer will still
just reclassify bugs as no longer being release blockers --- after the
stable release has slipped for months or years past the original
projected release date.)

So if you and others like you are willing to help, then the quality of
the Linux kernels can continue to improve. But simply complaining
about it is not likely to solve things, since threating to not be
willing to upgrade kernels is generally not going to motivate many, if
not most, of the volunteers who work on stablizing the kernel.

> I am willing to risk some testing and do bug reports, but these are
> still production machines, I do not have any spare test machines, and
> there needs to be some balance, i.e. the kernels should basically work.

So you want the latest and greatest new features in a brand-new kernel
release, but you're not willing to pay for test machines, and you're
not willing to pay for a distribution support... The fact that you
are willing to do some testing is appreciated, but remember, there's
no such thing as a free lunch. Linux may be a very good bargain (look
at how much Oracle has increased its support contracts for Solaris!),
but it's still not a free lunch. At the end of the day, you get what
you put into it.

Best regards,

- Ted

2010-07-11 18:02:39

by Anca Emanuel

[permalink] [raw]
Subject: Re: stable? quality assurance?

Offtopic.

I'm using Ubuntu 10.04 and kernel 2.6.35-rc1 from kernel.ubuntu.com
Wonking fine (stable, but my webcam still not working).

Using this https://wiki.ubuntu.com/KernelTeam/GitKernelBuild tutorial
to compile the kernel. But no success (it finish the compile but no
deb packages).
I have done it from virtualbox some weeks ago, and grub can not mount.

Is there any tutorial how to build the kernel for Ubuntu 10.04 ?

Please test it yourself in (Ubuntu 10.04):
sudo cfdisk
result: Bad primary partition 1. (any kernel, any enviroment).

2010-07-11 19:49:24

by Stefan Richter

[permalink] [raw]
Subject: Re: stable? quality assurance?

Martin Steigerwald wrote:
> One reason for a demand for me is best expressed by this question: Does
> the kernel developer community want to encourage that a group of advanced
> Linux users - but mostly non-developers - compile their own vanilla or
> valnilla near kernels, provide wider testing and report a bug now and
> then?

Yes, testing is desired --- in order to shake out bugs that are not
manifest on the developer's systems. Remember that the kernel is a
special program in which there are many classes of bugs that can only be
reproduced on special hardware and/or with special workloads.

Alas, there are not only new bugs in new features but also new bugs in
existing features, a.k.a. regressions. But like new bugs, many
regressions can alas not be found by the developers themselves on their
test systems.

You mentioned two particular regressions in your initial posting. Do
you have suggestions how they could have been prevented in the first
place? Or how they could have been handled better than they were?

Do you see subsystems of the kernel in which regressions are not taken
as seriously as in other ones?

> Well Linus has at least been a bit more reluctant to take big changes
> after rc1 this cycle, so maybe 2.6.35 will be better again.

2.6.35 will only be better if this (gradual) change of procedure means
that -rc kernels are going to be tested more and new bugs are going to
be found and fixed quicker in the -rc phase than before. And 2.6.36+
will only be better if the stricter post -rc1 merges do not motivate
developers to put even more hastily assembled under-tested crap into
their pre -rc1 pull requests than they already do.

[PS: 2.6.34 works very well for me, as most 2.6.x releases do.]
[PS2: When on lkml, please use reply-to-all, not reply-to-list-only.]
--
Stefan Richter
-=====-==-=- -=== -=-==
http://arcgraph.de/sr/

2010-07-11 21:40:28

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: stable? quality assurance?

On Sunday, July 11, 2010, Willy Tarreau wrote:
> Hi Martin,
>
> On Sun, Jul 11, 2010 at 04:51:42PM +0200, Martin Steigerwald wrote:
> > I hope that someone answers who actually can take some critique. From the
> > current replies I perceive a lack of that ability.
>
> well, I'll try to do then :-)
>
> There were some threads in the past about kernel releases quality,
> where Linus explained why it could not be completely black or white.
>
> Among the things he explained, I remember that one of primary concern
> was the inability to slow down development. I mean, if he waits 2 more
> weeks for things to stabilize, then there will be two more weeks of
> crap^H^H^H^Hdevelopment merged in next merge window, so in fact this
> will just shift dates and not quality.
...
> It's not really advisable to call dot-0 releases "unstable" because
> it will only result in shifting the adoption point between the user
> classes above.

IMnshO it's not exactly fair to call them "stable" either. I tend to call them
"major releases" which basically reflects what they are - events in the
development process that each start a new merge window. Nothing more, either
way.

Rafael

2010-07-12 04:18:25

by Willy Tarreau

[permalink] [raw]
Subject: Re: stable? quality assurance?

Hi Rafael,

On Sun, Jul 11, 2010 at 11:38:28PM +0200, Rafael J. Wysocki wrote:
> > It's not really advisable to call dot-0 releases "unstable" because
> > it will only result in shifting the adoption point between the user
> > classes above.
>
> IMnshO it's not exactly fair to call them "stable" either. I tend to call them
> "major releases" which basically reflects what they are - events in the
> development process that each start a new merge window. Nothing more, either
> way.

Indeed, just exactly that. Maybe the confusion comes from the title
"Latest Stable Kernel" on kernel.org, which we could rename "Latest
Kernel Release" whatever it reflects ?

Willy

2010-07-12 06:51:40

by David Newall

[permalink] [raw]
Subject: Re: stable? quality assurance?

Ted Ts'o wrote:
> It is possible to do other types of release strategies, but look at
> Debian Obsolete^H^H^H^H^H^H^H^H Stable if you want to see what happens
> if you insist on waiting until all release blockers are fixed

I don't know if Ted intended to be snide, but that is how he sounded.
And yet, his comment was a fair reflection of how core developers seem
to feel about stability, namely that a stable kernel is obsolete and
therefore not particularly desirable. (I use the word "stable" in it's
common English meaning, not the almost inexplicable Tux variation.)

I think the truth is that linux kernels are only ever stable as released
by distributions, and then only the more conservative of them. What
comes direct from kernel.org, I mean those called "latest stable", are
an exercise in dissembling. It's stable because someone calls it
stable, even though it crashes and has regressions? That's not stable,
that's just misleading.

Stable kernels *could* be stable. Debian succeeds. If it takes them a
long time, that is only because the core developers fail to release
reasonable quality kernels. Don't sneer at them because they do the
right thing; do the right thing yourself so that they can produce more
timely updates.

I don't expect fair consideration of these comments; why change when
shooting the messenger is so much more satisfying?

2010-07-12 09:56:29

by Martin Steigerwald

[permalink] [raw]
Subject: Re: stable? quality assurance?

Am Sonntag 11 Juli 2010 schrieb Willy Tarreau:
> Hi Martin,

Hi Willy,

> On Sun, Jul 11, 2010 at 04:51:42PM +0200, Martin Steigerwald wrote:
> > I hope that someone answers who actually can take some critique. From
> > the current replies I perceive a lack of that ability.
>
> well, I'll try to do then :-)
>
> There were some threads in the past about kernel releases quality,
> where Linus explained why it could not be completely black or white.
[...]
> You see, there's a kernel for everyone, and for every usage. You just
> have to make your choice. And when you don't know or don't want to
> guess, stick to the distro's kernel.

Wow! Thanks to you and all the others who provided such constructive
feedback.

I need a bit of time to digest and think through it. I will answer then.

Ciao,
--
Martin 'Helios' Steigerwald - http://www.Lichtvoll.de
GPG: 03B0 0D6C 0040 0710 4AFA B82F 991B EAAC A599 84C7


Attachments:
signature.asc (198.00 B)
This is a digitally signed message part.

2010-07-12 15:44:11

by Martin Steigerwald

[permalink] [raw]
Subject: Re: stable? quality assurance?

Am Sonntag 11 Juli 2010 schrieb Willy Tarreau:
> Hi Martin,

Hi Willy,

> On Sun, Jul 11, 2010 at 04:51:42PM +0200, Martin Steigerwald wrote:
> > I hope that someone answers who actually can take some critique. From
> > the current replies I perceive a lack of that ability.
>
> well, I'll try to do then :-)
>
> There were some threads in the past about kernel releases quality,
> where Linus explained why it could not be completely black or white.
>
> Among the things he explained, I remember that one of primary concern
> was the inability to slow down development. I mean, if he waits 2 more
> weeks for things to stabilize, then there will be two more weeks of
> crap^H^H^H^Hdevelopment merged in next merge window, so in fact this
> will just shift dates and not quality.

Would it make that much of a difference? Linus could still say no to
obvious crap, couldn't he?

> There are also some regressions that get merged with every pre-release.
> Thus, assuming he would wait for one more pre-release to merge the
> fixes you spotted, 2 or 3 more would appear, so there's a point where
> it must be decided when to release.

Some sort of classifying bugs could help here I think. Something that
helps Linus to decide whether it is worth to do another release candidate
round or not.

Actually I think the USB soundcard not working after resume bug I
mentioned (bug #15788) wouldn't warrant a new release candidate round,
especially as it didn't have a patch yet and will likely just affect a
minority of users. Still it would be fine if it was fixed in time. I do
think that the Radeon KMS does not work after resume bug (#15969) does
qualify since it causes loss of data handled by the current X session(s) -
sure I normally save my stuff before hibernating, but... And it actually
had a patch that has been tested! The desktop freeze bug I mentioned would
slip, cause I didn't report it and except from a debian bug report I found
it wasn't confirmed at all. An reported and confirmed desktop freeze would
qualify IMHO.

Actually I read postings from Linus that he actually reads the regression
list kindly provided by Rafael. 15788 was in there, but IMHO wouldn't
qualify (see posting "2.6.34-rc5: Reported regressions from 2.6.33"). But
15969 was not - well it was reported for rc7, so too late for the manual
report by Rafael. So yes, I see how it can have slipped.

Maybe an approach would be to dynamically generate the list from all bug
reports marked for 2.6.34 versions and have it posted to kernel mailing
list after every rc. This way bug #15969 would at least have been in the
list of known regressions.

Bugzilla severity and priority fields or something similar could be used to
set the importance of a bug report and the regression list could be sorted
by importance. One important criterion also would be whether someone could
confirm it, reproduce it. Even when I reported those desktop freezes,
unless someone confirmed them it might just happen for me. Well a "confirm"
or vote button might be good, so that the amount of confirmations could be
counted.

It would need some triaging and classifying and I am willing to help with
that.

> Right now it's released when he feels it "good enough". This can be
> very subjective, but I'd think that "good enough" basically means
> that the kernel will be able to live in its stable branch without
> major changes and without reverting features.

Okay, then thats two different definitions of stable. I mean stable enough
for (adventurous) end users. And here its more of a development point of
view.

> Also, you have to consider that there are several types of users.
> Some of them are developers who will run a latest -git kernel at
> some point. Some of them will be enthousiasts waiting for a feature,
> and who will run every -rc kernel once the feature is merged, to
> ensure it does not break before the release. There are also janitors
> and the curious ones who'll basically run a few of the last -rc as
> time permits to see if they can spot a few last-minute issues before
> the release. There are the brave ones who systematically download
> the dot-0 release once Linus announces it and will proudly run it
> to show their friends who it's better than the last one. There are
> those who need a bit of stability (eg: professional laptop or home
> server) and will prefer to wait for a few stable releases to ensure
> they won't waste their time on a big stupid issue that all other ones
> above will have immediately spotted for them. And there are the ones
> who run production servers who will either use distro kernels of
> long term stable kernels, with a more or less long qualification
> process between upgrades.

Yes, stable enough for whom? I see.

> It's just an ecosystem where you have to find your place. From your
> description, I think you're before the last ones above, you need
> something which works, eventhough it's not critical, so you could
> very well wait for 2-3 stable updates before upgrading (that does
> not prevent you from testing earlier on other systems if you want
> to test performance, new features, regressions, etc...).

ACK.

> It's not really advisable to call dot-0 releases "unstable" because
> it will only result in shifting the adoption point between the user
> classes above. We need to have enthousiasts who proudly say "hey
> look, dot-0 and it's already rock solid". We've all seen some of them
> and they're the ones who help reporting issues that get fixed in the
> next stable release.

I do think the claim should be honest. "stable" IMHO is not, at least from
a user's point of view. "unstable" isn't either, cause a dot-0 kernel is
not guarenteed to be unstable ;). So I agree with the major release kernel
approach from Rafael.

> I think that the most reasonable thing to do is to assume your need
> for stability and always refrain from running on the latest release.
>
> Speaking for myself, I tend to run rock solid kernels for my data (my
[...]
> You see, there's a kernel for everyone, and for every usage. You just
> have to make your choice. And when you don't know or don't want to
> guess, stick to the distro's kernel.

Yes. As told already I will rebalance my decision on which kernel to use.
And I now better understand some of the problems. Thanks.

But beyond that, I do think its worth thinking about ways to improve the
process of ensuring as much stability as sensibly possible. A dot-0 kernel
won't be error-free - but I find just claiming the current process as "the
best we can have" not actually satisfying. And I do think it can be
improved upon. I do not do kernel development, but I am willing to help
with collecting information about the current state of the kernel, help
with bug triaging as good as I can and manage to take time. I do have some
experience with quality management as I coordinated the betatest of some
AmigaOS versions, but then this has been in a closed group. Here its a
different scale and I believe it needs somewhat different approaches.

I reply to other posts in that thread later in the next days.

Ciao,
--
Martin 'Helios' Steigerwald - http://www.Lichtvoll.de
GPG: 03B0 0D6C 0040 0710 4AFA B82F 991B EAAC A599 84C7


Attachments:
signature.asc (198.00 B)
This is a digitally signed message part.

2010-07-12 15:56:46

by David Newall

[permalink] [raw]
Subject: Re: stable? quality assurance?

Marcin,

>> I don't expect fair consideration of these comments; why change when
>> shooting the messenger is so much more satisfying?
>>
Q.E.D.


First, for the sake of brevity, I want it agreed that we're talking
about new kernels, not those which are old, time-tested and patched.

I didn't notice anyone say they want Linux development to slow down;
rather, and not just in this thread but in many threads before, that
kernels released as "stable" fail to meet the common meaning of that
word; and this needs to be improved. Predictably, the common response
sounds a bit like "shut up, go away, you're an idiot, it doesn't happen
to me." These are not useful as they serve not one whit to improve the
situation, but give pause to those who might otherwise want to bring up
a valid issue, once more.

Expectations are key to the problem. When Linus says, "here is a shiny
new, stable kernel", he creates expectations. When that kernel proves
unstable, those expectations are dashed and confidence in Linux
suffers. There's no reason why development methods need to change in
order to reduce the number of flaky "stable" kernels. It would be
sufficient to replace the somewhat deceptive word "stable" with one that
is more accurate; beta or gamma test make sense as they already have
industry acceptance. Clearly "stable" is not appropriate, as implicitly
agreed by others who have advised: "don't use in production"; "wait at
least a year"; and more.

Thus 2.6.34 is the latest gamma-test kernel. It's not stable and I
doubt anybody honestly thinks otherwise.

As to whether other operating systems are stable, well that's a fair
question. I agree that few large bodies of computer code are flawless,
and so stability can be relative. In that spirit I venture to put the
stipulated kernels into order of decreasing reliability: Best is BSD,
Solaris & OS X; then Windows; and then there's Linux. If named
distributions had been included, the list would look better (for us);
they'd go in the first group. Thank goodness for the Debian, Red Hat
and Novell (to name just a few) for giving the world something which
does, at least largely, meet expectations.

2010-07-12 17:36:29

by Willy Tarreau

[permalink] [raw]
Subject: Re: stable? quality assurance?

Hi Martin,

On Mon, Jul 12, 2010 at 05:43:56PM +0200, Martin Steigerwald wrote:
> > Among the things he explained, I remember that one of primary concern
> > was the inability to slow down development. I mean, if he waits 2 more
> > weeks for things to stabilize, then there will be two more weeks of
> > crap^H^H^H^Hdevelopment merged in next merge window, so in fact this
> > will just shift dates and not quality.
>
> Would it make that much of a difference? Linus could still say no to
> obvious crap, couldn't he?

It's not "obvious" crap, it's that the developers will simply have
advanced two more weeks ahead of their schedule, so their merge will
be larger as it will contain some parts that ought to be in next release
should the kernel be release earlier. And it will not be possible to
delay merging because among them there's always the killer feature
everybody wants. This is the reason for the strict merge window.

> > There are also some regressions that get merged with every pre-release.
> > Thus, assuming he would wait for one more pre-release to merge the
> > fixes you spotted, 2 or 3 more would appear, so there's a point where
> > it must be decided when to release.
>
> Some sort of classifying bugs could help here I think. Something that
> helps Linus to decide whether it is worth to do another release candidate
> round or not.

Maybe sometimes that could indeed help, but that must not be done too
often, otherwise releases slip and patches get even bigger.

(...)
> I do
> think that the Radeon KMS does not work after resume bug (#15969) does
> qualify since it causes loss of data handled by the current X session(s) -
> sure I normally save my stuff before hibernating, but... And it actually
> had a patch that has been tested!

Then the problem should be checked on this side : why this patch didn't get
merged in time ? Maybe the maintainer needed more time to recheck it, maybe
he was on holiday, maybe he was ill on the wrong day, maybe he had already
merged tons of fixes and preferred to get this one for next time, ... But
even if there are fixes pending, this should not be a reason to *delay*
releases, otherwise we go back to the problem above, with also the problem
of new regressions reported with tested fixes available...

(...)
> Maybe an approach would be to dynamically generate the list from all bug
> reports marked for 2.6.34 versions and have it posted to kernel mailing
> list after every rc. This way bug #15969 would at least have been in the
> list of known regressions.

In fact, Rafael regularly emits this list, and the respective maintainers
are informed. That means to me that there's little hope that you'll get the
maintainers to merge and send a fix they did not manage to do. What *could*
be improved though would be if Linus publically states the deadline for last
fixes, as Greg does with the stable branch. That can give hopes to some of
them to finish a little merge work in time instead of considering it's too
late.

> Bugzilla severity and priority fields or something similar could be used to
> set the importance of a bug report and the regression list could be sorted
> by importance. One important criterion also would be whether someone could
> confirm it, reproduce it. Even when I reported those desktop freezes,
> unless someone confirmed them it might just happen for me. Well a "confirm"
> or vote button might be good, so that the amount of confirmations could be
> counted.

Maybe that could help, but it will not necessarily be the best solution. Keep
in mind that some issues may be more important but still reported only by one
user. If one reports FS corruption, you certainly don't want to wait for a few
other ones to confirm the bug for instance. Security issues don't need counting
either.

(...)
> > It's not really advisable to call dot-0 releases "unstable" because
> > it will only result in shifting the adoption point between the user
> > classes above. We need to have enthousiasts who proudly say "hey
> > look, dot-0 and it's already rock solid". We've all seen some of them
> > and they're the ones who help reporting issues that get fixed in the
> > next stable release.
>
> I do think the claim should be honest. "stable" IMHO is not, at least from
> a user's point of view. "unstable" isn't either, cause a dot-0 kernel is
> not guarenteed to be unstable ;). So I agree with the major release kernel
> approach from Rafael.

But it's also the starting point of the stable branch. And what about the
-stable branch itself. Sometimes an awful bug will prevent the kernel from
even booting for most users, and a single patch will be present in the
stable branch to fix this early. Same if a major security issue gets
discovered at the time of release, it's possible that the stable branch
only contains one patch. That does not qualify it for more stable than
the main branch either, eventhough it's called "stable". Maybe we should
indicate on http://www.kernel.org that a new release has generally received
little testing but should be good enough for experienced users to test
it, and that stable releases before .3-.4 are not recommended for general
use.

> But beyond that, I do think its worth thinking about ways to improve the
> process of ensuring as much stability as sensibly possible. A dot-0 kernel
> won't be error-free - but I find just claiming the current process as "the
> best we can have" not actually satisfying. And I do think it can be
> improved upon. I do not do kernel development, but I am willing to help
> with collecting information about the current state of the kernel, help
> with bug triaging as good as I can and manage to take time. I do have some
> experience with quality management as I coordinated the betatest of some
> AmigaOS versions, but then this has been in a closed group. Here its a
> different scale and I believe it needs somewhat different approaches.

In fact, I think we're at a point where the development process scales
linearly with every brain and every pair of eyeballs. There are two
orthogonal axes to scale, one on the quality and one on the quantity.
Both are required, but the time spent on one is not spent on the other
one. Customers want quantity (features) and expect implicit quality.
It is possible for some people to bring a lot of added value, a lot
more than they would through their share of brain time on code. This is
the case for Rafael and Greg who noticeably enhance quality, but it's
not limited to them too. Code reviews, bug reviews, -next branch, etc...
are all geared towards quality. But one thing is sure, there are far
less people working on quality than there are working on features, so I
think that if you want to help, there is possibly a way to noticeably
improve quality with one more guy there, though you have to find how
to efficiently spend that time !

Regards,
Willy

2010-07-12 17:48:08

by Marcin Letyns

[permalink] [raw]
Subject: Re: stable? quality assurance?

2010/7/12 David Newall <[email protected]>:
>
> First, for the sake of brevity, I want it agreed that we're talking about
> new kernels, not those which are old, time-tested and patched.
>
> I didn't notice anyone say they want Linux development to slow down; >rather,
> and not just in this thread but in many threads before, that kernels
> released as "stable" fail to meet the common meaning of that word; and > this needs to be improved.

I remember when Greg (correct me if I'm wrong) said something like
there are no more stable releases. Those are distros which should
choose a 'proper' kernel. This seems to be working well: Ubuntu
usually ships with the one release older kernel, the same about
Debian, but they're much more restrictive and some other distros.
Those who wants to live on a bleeding edge they choose Fedora with the
latest kernel etc. Personally, I consider the LTS kernel is a stable
one and IMHO, like someone said in this thread before, the latest
mainline kernel shouldn't be called stable, but differently.

> Predictably, the common response sounds a bit like
> "shut up, go away, you're an idiot, it doesn't happen to me." ?These are not
> useful as they serve not one whit to improve the situation, but give pause
> to those who might otherwise want to bring up a valid issue, once more.

Yes, I apologize for this. After reading your response now, such
complains are much more clear to me.

>?There's no
> reason why development methods need to change in order to reduce the > number
> of flaky "stable" kernels. ?It would be sufficient to replace the somewhat
> deceptive word "stable" with one that is more accurate; beta or gamma >test
> make sense as they already have industry acceptance. ?Clearly "stable" is
> not appropriate, as implicitly agreed by others who have advised: "don't >use
> in production"; "wait at least a year"; and more.
>
> Thus 2.6.34 is the latest gamma-test kernel. ?It's not stable and I doubt
> anybody honestly thinks otherwise.

This is the whole point IMHO. :D Fully agree with you here.

> As to whether other operating systems are stable, well that's a fair
> question. ?I agree that few large bodies of computer code are flawless, and
> so stability can be relative. ?In that spirit I venture to put the
> stipulated kernels into order of decreasing reliability: Best is BSD,
> Solaris & OS X; then Windows; and then there's Linux. ?If named
> distributions had been included, the list would look better (for us); they'd
> go in the first group. ?Thank goodness for the Debian, Red Hat and Novell
> (to name just a few) for giving the world something which does, at least
> largely, meet expectations.
>

In my opinion you shouldn't compare the latest Linux kernel (however,
such comparison would be fair if the latest Linux kernel would be a
'real' stable one) to other operating systems, but rather you should
just compare proper Linux distributions: Debian, RHEL to FreeBSD and
Solaris, OpenSuse, Kubuntu to Windows and OS X etc. Otherwise, it's
like comparing some *BSD development branch to Debian.

The similar situation to described in this thread is when comes to
Fedora. There are people (Linux newbies etc.) who can consider Fedora
is just an another ordinary, Linux distribution, but they're wrong.
Fedora usually ships with the latest, experimental stuff and if some
newbie (or even developer) decides to use Fedora and then he discovers
things simply brake he can consider Linux is a mess. Fedora shipped
with KDE 4.0 development release and even Linus was taken in, because
he probably thought it's a stable KDE release. Imho there should be a
notice what people have to deal with.

2010-07-12 17:56:36

by Stefan Richter

[permalink] [raw]
Subject: Re: stable? quality assurance?

Martin Steigerwald wrote:
> Bugzilla severity and priority fields or something similar could be used to
> set the importance of a bug report and the regression list could be sorted
> by importance. One important criterion also would be whether someone could
> confirm it, reproduce it. Even when I reported those desktop freezes,
> unless someone confirmed them it might just happen for me. Well a "confirm"
> or vote button might be good, so that the amount of confirmations could be
> counted.

"I can reproduce it" comments are often very helpful. "It is important
to me (and it should be to you too)" comments perhaps not so much.

If a bug doesn't make any progress, it may be because the cause of the
bug (i.e. which subsystem is at fault or when the bug was introduced) is
not known well enough. In such a case, more reproducers won't really
help (let alone stating that it is important to somebody); then somebody
needs to delve deeper into it and narrow the cause further down.

A bug which can be reproduced by several people is usually a bug that
can be reproduced quite reliably, and hence is a bug whose cause can
likely be found by bisection. A bug report with a to be blamed git
commit ID attached (at least as far as the reporter could determine),
Cc'd to author and committer of that commit, has more chances to get
fixed quicker than others.

So, votes don't help IMO; good reports do. And the reports need to be
early enough --- i.e. somebody needs to run -rc kernels --- since coming
up with a fix, validating the fix, and merging it may take time.

If there is little progress on a regression for which at least the
faulty subsystem is known, and the release goes by, the merge window
opens, and you see a pull request for that subsystem, then reply to that
pull request with a friendly reminder that there is still an unresolved
regression in that subsystem waiting for attention.

[...]
> As told already I will rebalance my decision on which kernel to use.

If or when you cannot spare resources to test a kernel yourself (be it
Linus' final release, or an -rc, not to mention linux-next), you can
also look out for Raphael's regression lists around the time of a final
release, to get a picture whether it is a worse or better one.
--
Stefan Richter
-=====-==-=- -=== -==--
http://arcgraph.de/sr/

2010-07-12 18:01:12

by Stefan Richter

[permalink] [raw]
Subject: Re: stable? quality assurance?

David Newall wrote:
> Thus 2.6.34 is the latest gamma-test kernel. It's not stable and I
> doubt anybody honestly thinks otherwise.

It works stable for what I use it for.

If it doesn't for you, then I hope you are already in contact with the
respective subsystem developers to get the regressions that you
experience fixed.
--
Stefan Richter
-=====-==-=- -=== -==--
http://arcgraph.de/sr/

2010-07-12 19:56:28

by Martin Steigerwald

[permalink] [raw]
Subject: Re: stable? quality assurance?

Am Montag 12 Juli 2010 schrieb Willy Tarreau:
> Hi Martin,

Hi Willy,

for now I downgraded to 2.6.33.2 and started a compile of 2.6.33.6. I hit
yet another bug, but thats a TuxOnIce one (nevertheless reported at
bugzilla.kernel.org at #15873). And after booting again after the resume
did not work, the machine just locked up again while just playing an avi
file from photo sd card - I *think* that dubious freeze bug I mentioned
before. Since I am holding a Linux training this week I just decide to
downgrade now. Again I didn't try to SSH into the machine, but it was
after eight o clock after a long work day, its really hot here and I just
couldn't stand doing any collecting information about the bug work that
might have easily taken two or more hours. Actually I also do not know
what to do with such a random freeze bug? How to best approach it without
sinking insane amounts of time into it?

The last freeze bug I had was with my ThinkPad T23 when plugging in and
later removing the eSATA PCMCIA card. It worked for quite some kernel
versions, but since a certain version it just started to freeze on
removal. Upto 2.6.33 where I last tried I think. And there I had at least
found on what situation it happens.

What do I do with such bugs? Back then I just decided to not use the eSATA
PCMCIA card in that ThinkPad T23 again, which isn't that unreasonable I
think. I didn't even report, which granted might be the reason that its
not yet fixed.

I am willing to do some testing, but I also like to use Linux. And above a
certain amount its just too much for me. Frankly said for me its all
happening too fast. I experienced it with some KDE 4 versions - later ones
like 4.3 and 4.4! - where I reported so many bug I easily stumpled upon
that at some time I just gave up reporting anything. Sure I wanted Radeon
DRM KMS. Its great. But I really hope things will be more stable again
soon. A new feature is great - when it works. That said, I am not sure
whether the recent freeze bug on my ThinkPad T42 is related to Radeon DRM.

I think I wait for 2.6.34.2 or .3 and then try again. If it then happens
again, hopefully in a moment where I have nerve to deal with such bugs, I
fire up my second notebook and try to SSH into the machine. If that works I
at least could look into dmesg and X.org logs.

Thats what I meant: For me personally the balance is lost. The kernel does
not have to be perfect, but I am experiencing just too many issues
including quite nasty ones at the moment. 2.6.33.2 with userspace software
suspend was stable, or 2.6.32 with TuxOnIce. Thus I am trying 2.6.33.6.

> On Mon, Jul 12, 2010 at 05:43:56PM +0200, Martin Steigerwald wrote:
> > > Among the things he explained, I remember that one of primary
> > > concern was the inability to slow down development. I mean, if he
> > > waits 2 more weeks for things to stabilize, then there will be two
> > > more weeks of crap^H^H^H^Hdevelopment merged in next merge window,
> > > so in fact this will just shift dates and not quality.
> >
> > Would it make that much of a difference? Linus could still say no to
> > obvious crap, couldn't he?
>
> It's not "obvious" crap, it's that the developers will simply have
> advanced two more weeks ahead of their schedule, so their merge will
> be larger as it will contain some parts that ought to be in next
> release should the kernel be release earlier. And it will not be
> possible to delay merging because among them there's always the killer
> feature everybody wants. This is the reason for the strict merge
> window.

Hmmm, it could also be used as two more weeks for testing the new stuff
that should go on, but that might just be wishful thinking...

Is the Linux kernel development really in balance with feature work and
stabilization work? Currently at least from my personal perception it is
not. Development goes that fast - can you all cope with that speed? Maybe
its just time to *slow it down* a bit? Does it really scale? I am
overwhelmed. Several times I just had enough of it. Others had other
experiences. So it might just be me having lots of bad luck. What are
experiences of others?

Actually I think a bit more shift to quality work couldn't harm.

> > > There are also some regressions that get merged with every
> > > pre-release. Thus, assuming he would wait for one more pre-release
> > > to merge the fixes you spotted, 2 or 3 more would appear, so
> > > there's a point where it must be decided when to release.
> >
> > Some sort of classifying bugs could help here I think. Something that
> > helps Linus to decide whether it is worth to do another release
> > candidate round or not.
>
> Maybe sometimes that could indeed help, but that must not be done too
> often, otherwise releases slip and patches get even bigger.
>
> (...)
>
> > I do
> > think that the Radeon KMS does not work after resume bug (#15969)
> > does qualify since it causes loss of data handled by the current X
> > session(s) - sure I normally save my stuff before hibernating,
> > but... And it actually had a patch that has been tested!
>
> Then the problem should be checked on this side : why this patch didn't
> get merged in time ? Maybe the maintainer needed more time to recheck
> it, maybe he was on holiday, maybe he was ill on the wrong day, maybe
> he had already merged tons of fixes and preferred to get this one for
> next time, ... But even if there are fixes pending, this should not be
> a reason to *delay* releases, otherwise we go back to the problem
> above, with also the problem of new regressions reported with tested
> fixes available...
>
> (...)

Well it should only be done for major regressions I think. I still think
some sorting in the regression list regarding importance and tested patch
availability could help. I think that the Radeon DRM fix was quite a low
hanging fruit.

> > Maybe an approach would be to dynamically generate the list from all
> > bug reports marked for 2.6.34 versions and have it posted to kernel
> > mailing list after every rc. This way bug #15969 would at least have
> > been in the list of known regressions.
>
> In fact, Rafael regularly emits this list, and the respective
> maintainers are informed. That means to me that there's little hope
> that you'll get the maintainers to merge and send a fix they did not
> manage to do. What *could* be improved though would be if Linus
> publically states the deadline for last fixes, as Greg does with the
> stable branch. That can give hopes to some of them to finish a little
> merge work in time instead of considering it's too late.

Hmmm, I did not find any regression list after 2.6.34-rc5 but before 2.6.35
on kernel mailing list here. And the bug and fix was with rc7. If the list
would be generated right after every rc? I wouldn't want to demand of
anyone to do it that often, but with some automation and a team of people
triaging and collecting regressions...

> > Bugzilla severity and priority fields or something similar could be
> > used to set the importance of a bug report and the regression list
> > could be sorted by importance. One important criterion also would be
> > whether someone could confirm it, reproduce it. Even when I reported
> > those desktop freezes, unless someone confirmed them it might just
> > happen for me. Well a "confirm" or vote button might be good, so
> > that the amount of confirmations could be counted.
>
> Maybe that could help, but it will not necessarily be the best
> solution. Keep in mind that some issues may be more important but
> still reported only by one user. If one reports FS corruption, you
> certainly don't want to wait for a few other ones to confirm the bug
> for instance. Security issues don't need counting either.

Okay, granted. It would just be a indication.

But a complete or desktop freeze bug could lead to huge data loss, too,
depending on when the user saved his data the last time. Thus is it that
much more unimportant.

> > > It's not really advisable to call dot-0 releases "unstable" because
> > > it will only result in shifting the adoption point between the user
> > > classes above. We need to have enthousiasts who proudly say "hey
> > > look, dot-0 and it's already rock solid". We've all seen some of
> > > them and they're the ones who help reporting issues that get fixed
> > > in the next stable release.
> >
> > I do think the claim should be honest. "stable" IMHO is not, at least
> > from a user's point of view. "unstable" isn't either, cause a dot-0
> > kernel is not guarenteed to be unstable ;). So I agree with the
> > major release kernel approach from Rafael.
>
> But it's also the starting point of the stable branch. And what about
> the -stable branch itself. Sometimes an awful bug will prevent the
> kernel from even booting for most users, and a single patch will be
> present in the stable branch to fix this early. Same if a major
> security issue gets discovered at the time of release, it's possible
> that the stable branch only contains one patch. That does not qualify
> it for more stable than the main branch either, eventhough it's called
> "stable". Maybe we should indicate on http://www.kernel.org that a new
> release has generally received little testing but should be good
> enough for experienced users to test it, and that stable releases
> before .3-.4 are not recommended for general use.

I thought about calling it a "major kernel release" or something like that
from dot-0 and then after stable patches settle - but on what criterion to
decide that? - "stable". Just .3 or .4? Or when there have been some dot
releases with few patches? But then what if Greg just takes a bit longer
to make the next one and it just contains more patches due to that reason?

> > But beyond that, I do think its worth thinking about ways to improve
> > the process of ensuring as much stability as sensibly possible. A
> > dot-0 kernel won't be error-free - but I find just claiming the
> > current process as "the best we can have" not actually satisfying.
> > And I do think it can be improved upon. I do not do kernel
> > development, but I am willing to help with collecting information
> > about the current state of the kernel, help with bug triaging as
> > good as I can and manage to take time. I do have some experience
> > with quality management as I coordinated the betatest of some
> > AmigaOS versions, but then this has been in a closed group. Here
> > its a different scale and I believe it needs somewhat different
> > approaches.
>
> In fact, I think we're at a point where the development process scales
> linearly with every brain and every pair of eyeballs. There are two
> orthogonal axes to scale, one on the quality and one on the quantity.
> Both are required, but the time spent on one is not spent on the other
> one. Customers want quantity (features) and expect implicit quality.

Don't customers also want stability? I certainly want it. And many people
running servers too in my experience.

> It is possible for some people to bring a lot of added value, a lot
> more than they would through their share of brain time on code. This is
> the case for Rafael and Greg who noticeably enhance quality, but it's
> not limited to them too. Code reviews, bug reviews, -next branch,
> etc... are all geared towards quality. But one thing is sure, there
> are far less people working on quality than there are working on
> features, so I think that if you want to help, there is possibly a way
> to noticeably improve quality with one more guy there, though you have
> to find how to efficiently spend that time !

Yes, and I didn't find that yet. I am not in a state where I can just read
kernel code and actually understand what it does. Where I might be able to
start helping with his collecting and categorizing bug and regression
information, bug triaging and stuff. For some bugs at least. I think there
are bugs where I just do not understand enough to do anything helpful.

Last post for today. Enough of computing.

Ciao,
--
Martin 'Helios' Steigerwald - http://www.Lichtvoll.de
GPG: 03B0 0D6C 0040 0710 4AFA B82F 991B EAAC A599 84C7


Attachments:
signature.asc (198.00 B)
This is a digitally signed message part.

2010-07-12 19:58:21

by David Newall

[permalink] [raw]
Subject: Re: stable? quality assurance?

Stefan Richter wrote:
> David Newall wrote:
>
>> Thus 2.6.34 is the latest gamma-test kernel. It's not stable and I
>> doubt anybody honestly thinks otherwise.
>>
>
> It works stable for what I use it for.
>
Mea culpa. I didn't mean that 2.6.34 is unstable, but that the term
"stable" is not appropriate for a newly released kernel; "gamma" should
be used instead.

Merely six months ago 2.6.32 was released; today we're preparing for
2.6.35; a new kernel every two months! Perhaps 2.6.31 is truly the
latest stable kernel; or else 2.6.27 does, which is the other 2.6 on the
front page of kernel.org. I'm pretty sure 2.4 is stable (which might
explain why I see it embedded *much* more frequently than 2.6.)

> If it doesn't for you, then I hope you are already in contact with the
> respective subsystem developers to get the regressions that you
> experience fixed.
>
(Segue to a problem which follows from calling bleeding-edge kernels
"stable".)

When reporting bugs, the first response is often, "we're not interested
in such an old kernel; try it with the latest." That's not hugely
useful when the latest kernels are not suitable for production use. If
kernels weren't marked stable until they had earned the moniker, for
example 2.6.27, then the expectation of developers and of users would be
consistent: developers could expect users to try it again with latest
stable kernel, and users could reasonably expect that trying it wouldn't
break their system.

2010-07-12 20:25:06

by Nix

[permalink] [raw]
Subject: Re: stable? quality assurance?

On 11 Jul 2010, Martin Steigerwald said:

> 2.6.34 was a desaster for me: bug #15969 - patch was availble before
> 2.6.34 already, bug #15788, also reported with 2.6.34-rc2 already, as well
> as most important two complete lockups - well maybe just X.org and radeon
> KMS, I didn't start my second laptop to SSH into the locked up one - on my
> ThinkPad T42. I fixed the first one with the patch, but after the lockups I
> just downgraded to 2.6.33 again.
[...]
> hang on hibernation with kernel 2.6.34.1 and TuxOnIce 3.1.1.1
>
> on this mailing list just a moment ago. But then 2.6.33 did hang with
> TuxOnIce which apparently (!) wasn't a TuxOnIce problem either, since
> 2.6.34 did not hang with it anymore which was a reason for me to try
> 2.6.34 earlier.

To introduce yet more anecdata into this thread, I too had problems with
TuxOnIce-driven suspend/resume from just post-2.6.32 to just pre-2.6.34.
The solution was, surprise surprise, to *raise a bug report*, whereupon
in short order I had a workaround. In 2.6.34, the problem vanished as
mysteriously as it appeared, as did the bug whereby X coredumped and the
screen stayed dark forever upon quitting X. 2.6.34 and 2.6.34.1 have
worked better for me than any kernel I've used since 2.6.30, with no
bugs noticeable on any of my machines (that's a first since 2.6.26).

I speculate that there may be some subtle piece of overwriting inside
the Radeon KMS and/or DRM code, which is obscure enough that it is
relatively easily perturbed by changes elsewhere in the kernel.

But nonetheless, one cannot extrapolate from a single bug in a subsystem
as complex as DRM/KMS to the quality of the entire kernel. This is
doubly true given the degree of difference between different cards
labelled as Radeons: I'd venture to state that most of the Radeon bugs
I've seen flow past over the last year or so only affect a small subset
of cards: but if you add them all up, it's likely that most users have
been bitten by at least one. But the problem here is not the kernel
developers, nor the kernel quality: it's that ATI Radeons are a
horrifically complicated and tangled web of slightly variable hardware.
(In this they are no different from any other modern graphics card.)


Martin, might I suggest considering stable kernels 'experimental' until
at least .1 is out? Before Linus releases a kernel, its only users are
dedicated masochists and developers: after the release, piles of regular
early adopters pour in, and heaps of bug reports head to lkml and fixes
head to -stable. The .1 kernels, with fixes for some of those, are the
first you can really call *stable*, as they've got fixes for bugs
isolated after testing by a much larger userbase of suckers.

-- N., dedicated sucker and masochist

2010-07-12 21:11:42

by Stefan Richter

[permalink] [raw]
Subject: Re: stable? quality assurance?

David Newall wrote:
> Stefan Richter wrote:
>> If it doesn't for you, then I hope you are already in contact with the
>> respective subsystem developers to get the regressions that you
>> experience fixed.
>>
> (Segue to a problem which follows from calling bleeding-edge kernels
> "stable".)
>
> When reporting bugs, the first response is often, "we're not interested
> in such an old kernel; try it with the latest."

Because there are continuously going bug fixes into the new kernels.

> That's not hugely useful when the latest kernels are not suitable for
> production use.

"I have this bug here." - "It might be fixed in 2.6.mn. Try it." - "I
don't want to because I got burned by 2.6.jk." Well, then don't do it
and keep using the old buggy kernel. Or use a forked kernel where
somebody adds bugfix backports and feature backports as you require
them, if that somebody does a really good job.

> If kernels weren't marked stable until they had earned the moniker,
> for example 2.6.27, then the expectation of developers and of users
> would be consistent:

2.6.27.y is what you call stable exactly because none of the boatloads
of bug fixes and improvements of each subsequent 2.6.x release goes into
it anymore.

That's the nature of the beast. You can't have the cake and eat it.
Which is why it is important that we keep the regression count in new
kernels low and try to detect and fix regressions as early as possible.
I admit that I do not really help with this myself outside the subsystem
which I maintain. I usually start to run -rc kernel at later -rc's only
(say, -rc5, only sometimes earlier) and don't test them beyond the one
or to two configurations that I use personally. There were occasionally
regressions in the subsystem that I maintain but they were few and
always fixed quickly, and each one was a lesson how to do better. So,
for that subsystem, the "Latest Stable Kernel" that is advertised on the
front page of kernel.org really and truly /is/ the latest stable release
that is recommended for production use, as far as that subsystem is
concerned.
--
Stefan Richter
-=====-==-=- -=== -==--
http://arcgraph.de/sr/

2010-07-12 21:40:13

by Martin Steigerwald

[permalink] [raw]
Subject: Re: stable? quality assurance?

Am Montag 12 Juli 2010 schrieb David Newall:
> Stefan Richter wrote:
> > David Newall wrote:
> >> Thus 2.6.34 is the latest gamma-test kernel. It's not stable and I
> >> doubt anybody honestly thinks otherwise.
> >
> > It works stable for what I use it for.
>
> Mea culpa. I didn't mean that 2.6.34 is unstable, but that the term
> "stable" is not appropriate for a newly released kernel; "gamma" should
> be used instead.

I indeed think stable should mean "stable for the majority of users". Its
difficult to estimate. But I doubt that every dot-0 release qualified for
that.

> Merely six months ago 2.6.32 was released; today we're preparing for
> 2.6.35; a new kernel every two months! Perhaps 2.6.31 is truly the
> latest stable kernel; or else 2.6.27 does, which is the other 2.6 on
> the front page of kernel.org. I'm pretty sure 2.4 is stable (which
> might explain why I see it embedded *much* more frequently than 2.6.)

I have these metrics:

martin@shambhala:~> uprecords -m 20 | cut -c1-70
# Uptime | System
----------------------------+-----------------------------------------
1 36 days, 09:57:31 | Linux 2.6.32.3-tp42-toi- Tue Jan 12 09:
2 31 days, 01:07:24 | Linux 2.6.26.5-tp42-toi- Tue Sep 30 13:
3 24 days, 13:29:07 | Linux 2.6.33.2-tp42-toi- Mon May 31 22:
4 21 days, 15:08:21 | Linux 2.6.29.2-tp42-toi- Tue Apr 28 22:
5 19 days, 21:22:14 | Linux 2.6.33.2-tp42-toi- Tue May 11 17:
6 19 days, 09:49:05 | Linux 2.6.32.8-tp42-toi- Fri Mar 5 11:
7 18 days, 02:31:41 | Linux 2.6.29.6-tp42-toi- Thu Jul 9 09:
8 17 days, 12:38:36 | Linux 2.6.28.8-tp42-toi- Wed Mar 18 10:
9 16 days, 16:10:28 | Linux 2.6.31-tp42-toi-3. Tue Sep 22 21:
10 15 days, 14:39:26 | Linux 2.6.28.4-tp42-toi- Mon Feb 9 22:
11 15 days, 13:58:12 | Linux 2.6.27.7-tp42-toi- Tue Dec 9 22:
12 13 days, 21:11:06 | Linux 2.6.31-rc7-tp42-to Mon Aug 31 21:
13 13 days, 18:34:00 | Linux 2.6.29.2-tp42-toi- Wed May 27 19:
14 12 days, 21:54:18 | Linux 2.6.26.5-tp42-toi- Fri Oct 31 13:
15 10 days, 22:02:14 | Linux 2.6.28.7-tp42-toi- Thu Feb 26 16:
16 10 days, 16:29:02 | Linux 2.6.33.2-tp42-toi- Fri Jun 25 19:
17 10 days, 08:04:52 | Linux 2.6.26.2-tp42-toi- Thu Sep 18 14:
18 10 days, 03:52:30 | Linux 2.6.31.3-tp42-toi- Thu Oct 15 09:
19 9 days, 22:03:29 | Linux 2.6.31.5-tp42-toi- Tue Nov 3 11:
20 9 days, 00:24:22 | Linux 2.6.29.2-tp42-toi- Thu Jun 25 14:
----------------------------+-----------------------------------------
-> 116 0 days, 00:52:03 | Linux 2.6.33.6-tp42-toi- Mo
----------------------------+-----------------------------------------
1up in 0 days, 00:31:56 | at Mon Jul 12 23:
t10 in 15 days, 13:47:24 | at Wed Jul 28 12:
no1 in 36 days, 09:05:29 | at Wed Aug 18 08:
up 608 days, 02:40:08 | since Thu Sep 18 14:
down 54 days, 06:12:57 | since Thu Sep 18 14:
%up 91.808 | since Thu Sep 18 14:

And 228 entries in there in total since 2.6.26, with

martin@shambhala:~> uprecords -m 300 | cut -c1-70 | grep "0 days" | wc -l
148

entries for shorter than one day.

Sure these are not to be read without the experiences I made and the
reasons for rebooting, since sometimes just I messed up with some kernel
option and compiled another one.

AFAIR 2.6.26 upto 2.6.32 has been fine, except 2.6.30 where TuxOnIce just
didn't work, but I am not yet sure whether this was caused by TuxOnIce or
by some problem with general hibernation infrastructure. I then just
omitted 2.6.30. Since I only tried 2.6.31 with my T42 I got an whooping
uptime of over 100 days for 2.6.29 on my T23! Thats stable. Well any
kernels that reproducably reach more than 15 or 30 days are quite stable
in my own subjective consideration. Most kernels that got that far would
likely have lastest much longer if I didn't just compile the next one, be
it a dot release or a major release.

This all without Radeon KMS!

2.6.33.2 was only stable when I used Radeon KMS without TuxOnIce. Ok, so
might be a TuxOnIce problem, but then at least those quite frequent hangs
on hibernation at the place where the screen goes black for a few seconds
and comes back then which I had with 2.6.33.2 where gone for 2.6.34. Maybe
they are gone with 2.6.33.6 since it carries some more radeon drm fixes.

2.6.34 did not reach an uptime of more than 2 or 3 days yet.

Well maybe Nix is right and its just that Radeon KMS has not been
stabilized enough and rest of kernel is quite stable.

And when the combination of 2.6.33 now .6 and userspace software suspend
works for me - for the first time, often it was TuxOnIce that worked, but
not any in kernel method I tried from time to time - so be it for the time
being, even if userspace software suspend is way slower and doesn't
satisfy the disk on writing the image.

> > If it doesn't for you, then I hope you are already in contact with
> > the respective subsystem developers to get the regressions that you
> > experience fixed.
>
> (Segue to a problem which follows from calling bleeding-edge kernels
> "stable".)
>
> When reporting bugs, the first response is often, "we're not interested
> in such an old kernel; try it with the latest." That's not hugely
> useful when the latest kernels are not suitable for production use. If
> kernels weren't marked stable until they had earned the moniker, for
> example 2.6.27, then the expectation of developers and of users would
> be consistent: developers could expect users to try it again with
> latest stable kernel, and users could reasonably expect that trying it
> wouldn't break their system.

I think thats really a question on how to attract more widespread testing.
For wider spread testing it needs to be stable enough to have enough users
deal with it. But without wider spread testing it might not get there.

I just dropped 2.6.34 for now and I will wait for more dot releases. Maybe
I am really the only one for whom 2.6.34 doesn't work, maybe just other
people did so to frustrated without telling here or in bugzilla.

Maybe providing better ways to report bugs and gather information even on
freeze bugs without setting up too much manually could help. I certainly
think that the enhanced DrKonqi crash reported from KDE 4.3 and up helped
users to provide *good bug reports*. Maybe there could be something like
that for the kernel and an easy option to have the kernel store even
backtraces for hard crashes. Unfortunately there is no reset button on
notebooks, so memory might be the wrong place. Well one could dedicate a
ring buffer space on the swap partition for that or something like that -
that area should be writable even when no filesystem is not working
anymore. On next reboot the bug report application recovers the crash data
from there. Would impose a risk that on severe memory corruption the
kernels write crash data elsewhere, where it shouldn't save it. An USB
stick comes to mind, but what when the USB stack doesn't work anymore?

Well not every bug is a freeze bug and maybe something could be done for
non freeze bugs. Like an application which records selected data while the
user reproduces the bug. Just like enhanced DrKonqi collects crash data
and even helps the user to install necessary debug packages.

But I think when a kernel behaves to unstable for lots of users they just
drop it. Some bugs are okay, but especially freeze bugs and even more so
fs corruptions bugs scare non die-hard kernel debuggers who bisect a
kernel a day away.

Maybe I just had lots of bad luck, so I would love to hear other
experiences, some already said 2.6.34 works pretty stable for them.

I will leave 2.6.34.1 on my T23 which has a Savage which maybe will never
get KMS, who knows, and on the workstation at work, which doesn't use
Radeon KMS due to rock solid stable Debian Lenny userspace. Maybe this at
least sheds a light, whether most of my issues have likely been Radeon KMS
related.

As a side note: Ext4 is absolutely rock stable for me! As is XFS on my T23
and even BTRFS for the T23 /home and some work directory on the
workstation (not yet on my production T42).

Ciao,
--
Martin 'Helios' Steigerwald - http://www.Lichtvoll.de
GPG: 03B0 0D6C 0040 0710 4AFA B82F 991B EAAC A599 84C7


Attachments:
signature.asc (198.00 B)
This is a digitally signed message part.

2010-07-12 22:44:49

by Stefan Richter

[permalink] [raw]
Subject: Re: stable? quality assurance?

Martin Steigerwald wrote:
> And when the combination of 2.6.33 now .6 and userspace software suspend
> works for me - for the first time, often it was TuxOnIce that worked, but
> not any in kernel method I tried from time to time - so be it for the time
> being, even if userspace software suspend is way slower and doesn't
> satisfy the disk on writing the image.

BTW, the need to rely on a quite fundamental kernel component that is
not in the mainline (for whichever reason) in the long term, almost
guarantees you a lot of recurring pain, one way or another.
--
Stefan Richter
-=====-==-=- -=== -==-=
http://arcgraph.de/sr/

2010-07-12 23:04:28

by Stefan Richter

[permalink] [raw]
Subject: Re: stable? quality assurance?

Martin Steigerwald wrote:
> I think I wait for 2.6.34.2 or .3 and then try again. If it then happens
> again, hopefully in a moment where I have nerve to deal with such bugs, I
> fire up my second notebook and try to SSH into the machine. If that works I
> at least could look into dmesg and X.org logs.

netconsole might be required.

...
> Is the Linux kernel development really in balance with feature work and
> stabilization work? Currently at least from my personal perception it is
> not. Development goes that fast - can you all cope with that speed? Maybe
> its just time to *slow it down* a bit?

If those who added the regressions are found out and asked to debug and
fix them, the balance should be corrected and perhaps more precautions
being taken in the future. Alas, finding the point in history at which
the kernel regressed might take a lot more time than to actually fix it
then. In that case, maybe give the author of the bug an estimate of the
volunteered hours that were spent on reporting this bug, to put the
repercussions into it into perspective. OTOH I suspect a lack of
responsibility at the developers is not so much an issue here, more so
that the number of people who take the time for -rc tests (not to
mention linux-next tests) _and_ to file reports is rather low. Plus, a
good bug report often requires experience or good intuition, besides
patience and rigor.

There were discussions in the past on how more enthusiasts who are
willing and able to test prereleases could be attracted. But maybe
(just maybe) there are more ways in which the developers themselves
could perform more extensive/ more systematic tests.
--
Stefan Richter
-=====-==-=- -=== -==-=
http://arcgraph.de/sr/

2010-07-13 10:30:28

by Martin Steigerwald

[permalink] [raw]
Subject: Re: stable? quality assurance?

Am Dienstag 13 Juli 2010 schrieb Stefan Richter:
> ...
>
> > Is the Linux kernel development really in balance with feature work
> > and stabilization work? Currently at least from my personal
> > perception it is not. Development goes that fast - can you all cope
> > with that speed? Maybe its just time to slow it down a bit?
>
> If those who added the regressions are found out and asked to debug and
> fix them, the balance should be corrected and perhaps more precautions
> being taken in the future. Alas, finding the point in history at which
> the kernel regressed might take a lot more time than to actually fix it
> then. In that case, maybe give the author of the bug an estimate of
> the volunteered hours that were spent on reporting this bug, to put
> the repercussions into it into perspective. OTOH I suspect a lack of
> responsibility at the developers is not so much an issue here, more so
> that the number of people who take the time for -rc tests (not to
> mention linux-next tests) and to file reports is rather low. Plus, a
> good bug report often requires experience or good intuition, besides
> patience and rigor.
>
> There were discussions in the past on how more enthusiasts who are
> willing and able to test prereleases could be attracted. But maybe
> (just maybe) there are more ways in which the developers themselves
> could perform more extensive/ more systematic tests.

Well I reported it now, although it contains not nearly as much
information on how to reproduce it or any other debug information either.
I just did not report it before cause I didn't find the information I can
provide very helpful and until yesterday I thought it might just have been
these two freezes and thats it. But maybe report it early is better than
not to report it at all.

Bug 16376 - random - possibly Radeon DRM KMS related - freezes
https://bugzilla.kernel.org/show_bug.cgi?id=16376

I will look in the logs whether I might have luck and find anything this
afternoon when my students learn vi/vim, but I doubt it.

Ciao,
--
Martin 'Helios' Steigerwald - http://www.Lichtvoll.de
GPG: 03B0 0D6C 0040 0710 4AFA B82F 991B EAAC A599 84C7


Attachments:
signature.asc (198.00 B)
This is a digitally signed message part.
Subject: Re: stable? quality assurance?

El Sun, 11 Jul 2010 16:51:42 +0200
Martin Steigerwald <[email protected]> escribió:


>
> One reason for a demand for me is best expressed by this question: Does
> the kernel developer community want to encourage that a group of advanced
> Linux users - but mostly non-developers - compile their own vanilla or
> valnilla near kernels, provide wider testing and report a bug now and
> then?
>
> I can live with either answer. If not, I just will be much more reluctant
> to try out new kernels.

I for one stopped booting into -rc kernels.
The fact that still have to patch my kernels with a *one* liner
since 2.6.29 kernel [1] does not give me confidence on the "test
report/bisect and it will be fixed" promise some have made in this
threath

[1] https://bugzilla.kernel.org/show_bug.cgi?id=13362


Attachments:
signature.asc (836.00 B)

2010-07-13 12:50:36

by Stefan Richter

[permalink] [raw]
Subject: rt2x00: slow wifi with correct basic rate bitmap (was Re: stable? quality assurance?)

Alejandro Riveira Fern?ndez wrote:
> I for one stopped booting into -rc kernels.
> The fact that still have to patch my kernels with a *one* liner
> since 2.6.29 kernel [1] does not give me confidence on the "test
> report/bisect and it will be fixed" promise some have made in this
> threath
>
> [1] https://bugzilla.kernel.org/show_bug.cgi?id=13362

There were promises made in this thread? Then I must have read a
different mailinglist or so.

I do not know why your WLAN regression has not been fixed yet, but at
least it seems rather plausible why commit
7e0986c17f695952ce5d61ed793ce048ba90a661 is not going to be reverted (if
such a revert is the one-liner that you are referring to).

Why is one reporter's rt2500 OK now though but not yours? Are there
different card revisions or firmwares out there that require quirk handling?
--
Stefan Richter
-=====-==-=- -=== -==-=
http://arcgraph.de/sr/

2010-07-13 15:45:15

by John W. Linville

[permalink] [raw]
Subject: Re: rt2x00: slow wifi with correct basic rate bitmap (was Re: stable? quality assurance?)

On Tue, Jul 13, 2010 at 02:50:14PM +0200, Stefan Richter wrote:
> Alejandro Riveira Fern?ndez wrote:
> > I for one stopped booting into -rc kernels.
> > The fact that still have to patch my kernels with a *one* liner
> > since 2.6.29 kernel [1] does not give me confidence on the "test
> > report/bisect and it will be fixed" promise some have made in this
> > threath
> >
> > [1] https://bugzilla.kernel.org/show_bug.cgi?id=13362
>
> There were promises made in this thread? Then I must have read a
> different mailinglist or so.
>
> I do not know why your WLAN regression has not been fixed yet, but at
> least it seems rather plausible why commit
> 7e0986c17f695952ce5d61ed793ce048ba90a661 is not going to be reverted (if
> such a revert is the one-liner that you are referring to).
>
> Why is one reporter's rt2500 OK now though but not yours? Are there
> different card revisions or firmwares out there that require quirk handling?

The patch (7e0986c1) corrects an obvious error. Reverting it might
improve your (i.e. Alejandro) performance, but it seems likely to
cause connectivity problems for others.

The fact that reverting 7e098c1 helps you suggests that rt2500usb
isn't using the basic_rates map properly. But after reviewing the
code and the data I have, I can't see what would be causing that.
It is at least possible that your AP is sending bad rate information.
Have you tried this device with other APs?

John
--
John W. Linville Someday the world will need a hero, and you
[email protected] might be all we have. Be ready.

2010-07-13 16:51:11

by Theodore Ts'o

[permalink] [raw]
Subject: Re: stable? quality assurance?


On Jul 12, 2010, at 11:56 AM, David Newall wrote:

> Thus 2.6.34 is the latest gamma-test kernel. It's not stable and I doubt anybody honestly thinks otherwise.

Stable is relative. Some people are willing to consider
Fedora "stable". Other people will only use a RHEL
kernel, and there are those who are using RHEL 4
or even RHEL 3 because they are extremely risk-adverse.

So arguments about whether or not a specific kernel
version deserves to be called "stable" is going to be
a waste of time and electrons because it's all about
expectations.

But the one huge thing that people are forgetting is that
the fundamental premise behind open source is "scratch
your own itch". That means that people who own a
specific piece of hardware have to collectively be responsible
for making sure that it works. It's not possible for me to
assure that some eSATA PCMCIA card on a T23 laptop
still works, because I don't own the hardware. So the only
way we know whether or not there is a regression is
there is *someone* who owns that hardware which is
willing to try it out, hopefully during -rc3 or -rc4, and let
us known if there is a problem, and hopefully help us
debug the problem.

If you have people saying, "-rc3 isn't stable", I'll wait until
"-rc5" to test things, then it will be that much later before
we discover a potential problem with the T23 laptop, and
before we can fix it. If people say, "2.6.34.0" isn't stable,
I refuse to run a kernel until "2.6.34.4", then if they are the
only person with the T23 eSata device, then we won't hear
about the problem until 2.6.34.4, and it might not get fixed
until 2.6.34.5 or 2.6.34.6!

What this means is yes that stable basically means, "stable
for the core kernel developers". You can say that this isn't
correct, and maybe even dishonest, but if we wait until 2.6.34.N
before we call a release "stable", and this discourages users
from testing 2.6.34.M for M<N, it just delays when bugs will
be found and fixed.

This is why to me, arguing that 2.6.34.0 is not "stable" really
isn't useful. If you really want to frequently update your kernel
and use the latest and greatest, part of the price that you have
to pay is to help us with the testing, bug reporting, and root
cause determination.

If you don't like this, your other choice is to pay $$$ to the
folks who provide support for Solaris and OS X, and accept
the restrictions in hardware implied by Solaris and OS X.
(Hint: neither supports a Thinkpad T23.) But to compare
Linux, especially the non-distribution source code distribution
from kernel.org with operating systems that have very different
business models is to really and fundamentally understand
how things work in the Linux world.

If you want that kind of stability, then you will need to use an
older kernel. Or use a distribution kernel which has a support
and testing and business model compatible with your desires.
Fedora for example uses kernels which are six months out of
date, because during those six months, the people who use the
testing versions of Fedora are doing testing and helping with
the bug fixing. Red Hat uses this free testing pool to improve
the testing and stability of Red Hat Enterprise Linux, so if you
are willing to live with a 2-3 year release cycle, RHEL will be
more stable than Fedora. And if you need to make sure that
bugs are fixed very quickly, and you can call and demand
a developer's attention, you can pay $$$ for a support contract.

I will say once again. There is no such thing as a free lunch.
Linux is a better deal than most, and you have multiple
choices about how frequently you update, whether you let
someone else decide whether or not a particular kernel
release plus patches is "stable", or more accurately,
"stable enough", and you can choose how much you are willing
to pay, either in personal time and effort, or $$$ to some support
organization.

But demanding that kernel.org become "more stable" when it
is supported by purely volunteers is simply not reasonable.

-- Ted

Subject: Re: rt2x00: slow wifi with correct basic rate bitmap (was Re: stable? quality assurance?)

El Tue, 13 Jul 2010 14:50:14 +0200
Stefan Richter <[email protected]> escribió:

> Alejandro Riveira Fernández wrote:
> > I for one stopped booting into -rc kernels.
> > The fact that still have to patch my kernels with a *one* liner
> > since 2.6.29 kernel [1] does not give me confidence on the "test
> > report/bisect and it will be fixed" promise some have made in this
> > threath
> >
> > [1] https://bugzilla.kernel.org/show_bug.cgi?id=13362
>
> There were promises made in this thread? Then I must have read a
> different mailinglist or so.

Ok no promises.
Maybe I read to much in to Mr Tso previous mail. My apologies
[quote]
> So I tend to use -rc3, -rc4, and -rc5 kernels on my laptops, and when
> I find bugs, I report them and I help fix them. If more people did
> that, then the 2.6.X.0 releases would be more stable. But kernel
> development is a volunteer effort, so it's up to the volunteers to
> test and fix bugs during the rc4, -rc5 and -rc6 time frame.

[...]
> [...] Linux may be a very good bargain (look
> at how much Oracle has increased its support contracts for Solaris!),
> but it's still not a free lunch. At the end of the day, you get what
> you put into it.

I tested the kernels i reported the bugs and helped (to the best of my
knowledge; I'm not a programmer)
I got no result.

Subject: Re: rt2x00: slow wifi with correct basic rate bitmap (was Re: stable? quality assurance?)

El Tue, 13 Jul 2010 11:35:31 -0400
"John W. Linville" <[email protected]> escribió:


>
> The patch (7e0986c1) corrects an obvious error. Reverting it might
> improve your (i.e. Alejandro) performance, but it seems likely to
> cause connectivity problems for others.
>
> The fact that reverting 7e098c1 helps you suggests that rt2500usb

my card is pci so it would be rt2500pci

> isn't using the basic_rates map properly. But after reviewing the
> code and the data I have, I can't see what would be causing that.
> It is at least possible that your AP is sending bad rate information.
> Have you tried this device with other APs?

No; this is a desktop pc that connects to my home router/AP. A new wifi
card is cheaper than a new AP ...


>
> John

2010-07-13 18:45:27

by John W. Linville

[permalink] [raw]
Subject: Re: rt2x00: slow wifi with correct basic rate bitmap (was Re: stable? quality assurance?)

On Tue, Jul 13, 2010 at 08:19:27PM +0200, Alejandro Riveira Fern?ndez wrote:
> El Tue, 13 Jul 2010 11:35:31 -0400
> "John W. Linville" <[email protected]> escribi?:
>
>
> >
> > The patch (7e0986c1) corrects an obvious error. Reverting it might
> > improve your (i.e. Alejandro) performance, but it seems likely to
> > cause connectivity problems for others.
> >
> > The fact that reverting 7e098c1 helps you suggests that rt2500usb
>
> my card is pci so it would be rt2500pci

Sorry, typo...

> > isn't using the basic_rates map properly. But after reviewing the
> > code and the data I have, I can't see what would be causing that.
> > It is at least possible that your AP is sending bad rate information.
> > Have you tried this device with other APs?
>
> No; this is a desktop pc that connects to my home router/AP. A new wifi
> card is cheaper than a new AP ...

Perhaps you could capture some beacons from that AP?

--
John W. Linville Someday the world will need a hero, and you
[email protected] might be all we have. Be ready.

Subject: Re: rt2x00: slow wifi with correct basic rate bitmap (was Re: stable? quality assurance?)

El Tue, 13 Jul 2010 14:38:52 -0400
"John W. Linville" <[email protected]> escribió:

>
> > > isn't using the basic_rates map properly. But after reviewing the
> > > code and the data I have, I can't see what would be causing that.
> > > It is at least possible that your AP is sending bad rate information.
> > > Have you tried this device with other APs?

I do no know; i captured some debug data for Ivo back in the day and from
what he said all the info passed to the card was correct...
See http://lkml.org/lkml/2009/5/25/163 ( link is in bugzilla) in case
you missed it

> >
> > No; this is a desktop pc that connects to my home router/AP. A new wifi
> > card is cheaper than a new AP ...
>
> Perhaps you could capture some beacons from that AP?

f you explain how; I can try.

>

2010-07-13 19:19:11

by Stefan Richter

[permalink] [raw]
Subject: Re: rt2x00: slow wifi with correct basic rate bitmap (was Re: stable? quality assurance?)

Alejandro Riveira Fern?ndez wrote:
> El Tue, 13 Jul 2010 14:50:14 +0200
> Stefan Richter <[email protected]> escribi?:
>> There were promises made in this thread? Then I must have read a
>> different mailinglist or so.
>
> Ok no promises.
> Maybe I read to much in to Mr Tso previous mail. My apologies
> [quote]
> > So I tend to use -rc3, -rc4, and -rc5 kernels on my laptops, and when
> > I find bugs, I report them and I help fix them. If more people did
> > that, then the 2.6.X.0 releases would be more stable. But kernel
> > development is a volunteer effort, so it's up to the volunteers to
> > test and fix bugs during the rc4, -rc5 and -rc6 time frame.
>
> [...]
> > [...] Linux may be a very good bargain (look
> > at how much Oracle has increased its support contracts for Solaris!),
> > but it's still not a free lunch. At the end of the day, you get what
> > you put into it.
>
> I tested the kernels i reported the bugs and helped (to the best of my
> knowledge; I'm not a programmer)
> I got no result.

"You get what you put into it" probably did not mean "report a bug, get
it fixed, every time". Often enough, kernel bugs or hardware quirks are
very hard to fix without direct access to affected hardware.

Here is how my involvement with Linux started: I reported a bug but
nobody reacted. I collected some more information, reported the bug
again, and it was immediately fixed by the driver authors. From then on
I kept following driver development as a tester and answered user
questions. A few years later, the driver authors all had left for other
projects but there were still bugs to tackle. So I started to write and
submit bug fixes myself. (I'm not a programmer either but by then I
already knew a lot about the subsystem.)
--
Stefan Richter
-=====-==-=- -=== -==-=
http://arcgraph.de/sr/

2010-07-13 20:45:26

by David Newall

[permalink] [raw]
Subject: Re: stable? quality assurance?

Theodore Tso wrote:
> What this means is yes that stable basically means, "stable
> for the core kernel developers". You can say that this isn't
> correct, and maybe even dishonest, but if we wait until 2.6.34.N
> before we call a release "stable", and this discourages users
> from testing 2.6.34.M for M<N, it just delays when bugs will
> be found and fixed.
>

Calling it stable instils and reinforces a Pavlovian response in typical
users, that recent Linux kernels are dangerous and unreliable; one year
old was suggested as a safe benchmark. Typical users being 99% of the
population, testing hardly begins until a kernel is "sufficiently old."
This Pavlovian response is what really delays finding and fixing bugs.
Being up-front and saying which kernels are likely to fail would help
many users calculate the risk and improve their willingness to try newer
kernels. "Sufficiently old" might well come down to six months, maybe four.

That is to say, instead of taking a year to pass gamma-testing, new
kernels could be passed in six months or less. That would be a big
improvement in stability and quality assurance however you dice it.


> But demanding that kernel.org become "more stable" when it
> is supported by purely volunteers is simply not reasonable.

Let's not be hysterical; nobody made any demands. Semantics aside, the
suggestion is reasonable because it affects developers' workloads not
one whit. The only change is the label that Linus applies to new releases.

2010-07-14 06:33:59

by Theodore Ts'o

[permalink] [raw]
Subject: Re: stable? quality assurance?


On Jul 13, 2010, at 4:45 PM, David Newall wrote:
>
> Calling it stable instils and reinforces a Pavlovian response in typical users, that recent Linux kernels are dangerous and unreliable; one year old was suggested as a safe benchmark. Typical users being 99% of the population, testing hardly begins until a kernel is "sufficiently old." This Pavlovian response is what really delays finding and fixing bugs. Being up-front and saying which kernels are likely to fail would help many users calculate the risk and improve their willingness to try newer kernels. "Sufficiently old" might well come down to six months, maybe four.

Most typical users should be using distribution kernels. Period.

We can't say which kernels are likely to fail, because we don't know. If people don't test newer kernels, the mere passage of time, whether it's four months, or six months, or a year, or two years, is not going to magically make problems go away and get fixed. That only happens if someone steps up and tries it out, and if it breaks submits bug reports or patches. A fairly large number of Linux developers seem to prefer relatively recent vintage Thinkpads, preferably without Nvidia or ATI chipsets. These laptops are generally safe and reliable by -rc3 or so --- because if they aren't the Linux developers step up and complain and do code bisections and they fix the problem.

If someone has a T23 laptop, and they help out by doing the same, then it will also be safe and reliable by the time of 2.6.X.0. If they just kvetch and complain, and stamp their feet, and say "Linux is unsafe and unreliable", and no other T23 owners step up to the challenge, then two years might go by and the same kernel might still be unreliable --- for them.

-- Ted

2010-07-15 07:23:53

by David Lang

[permalink] [raw]
Subject: Re: stable? quality assurance?

On Tue, 13 Jul 2010, David Newall wrote:

> (Segue to a problem which follows from calling bleeding-edge kernels
> "stable".)
>
> When reporting bugs, the first response is often, "we're not interested in
> such an old kernel; try it with the latest." That's not hugely useful when
> the latest kernels are not suitable for production use. If kernels weren't
> marked stable until they had earned the moniker, for example 2.6.27, then the
> expectation of developers and of users would be consistent: developers could
> expect users to try it again with latest stable kernel, and users could
> reasonably expect that trying it wouldn't break their system.

2.6.27 didn't get declared 'stable' because it had very few bugs, it was
declared 'stable' because someone volunteered to maintain it longer and
back-port patches to it long past the normal process.

2.6.32 got declared 'long-term stable' before 2.6.33 was released, again
not because it was especially good, but because it didn't appear to be
especially bad and several distros were shipping kernels based on it, so
again someone volunteered (or was volunteered by the distro that pays
their paycheck) to badk-port patches to it longer.

I have been running kernel.org kernels on my production systems for >13
years. I am _very_ short of time, so I generally don't get a chance to
test the -rc kernels (once in a while I do get a chance to do so on my
laptop). What I do is every 2-3 kernel releases I wait a couple days after
the kernel release to see if there are show-stopper bugs, and if nothing
shows up (which is the common case for the last several years) I compile a
kernel and load it on machines in my lab. I try to have a selection of
machines that match the systems I have in production in what I have found
are the 'important' ways (a defintition that changes once in a while when
I find something that should 'just work' that doesn't ;-). This primarily
includes systems with all the network card types and Raid card types that
I use in production, but now also includes a machine with a SSD (after I
found a bug that only affected that combination)

if my lab machiens don't crash immediatly, I leave them running (usually
not even stress testing them, again lack of time) for a week or so, then I
put the new kernel on my development machiens, wait a few days, then put
them on QA machines, wait a few days, then put them in production. I have
the old kernel around so that I can re-boot into it if needed.

This tends to work very well for me. It's not perfect and every couple of
cycles I run into grief and have to report a bug to the kernel list.
Usually I find it before I get into production, but I have run into cases
that got all the way into production before I found a problem.

with the 'new' -stable series, I generally wait until at least 2.6.x.1 is
released before I consider it ready to go anywhere outside my lab (I'll
still install the 2.6.x kernel in the lab, but I'll wait for the
additional testing that comes with the .1 stable kernels before moving it
on)

I don't go through this entire process with the later -stable kernels, If
I'm already running 2.6.x and there is a 2.6.x.y released that contains
fixes that look like they are relavent to the configuration that I run
(which lets out the majority of changes, I do fairly minimal kernel
configs) I will just test it in the lab to do a smoke test, then schedule
a rollout through the rest of my network. If there are no problems before
I get permission to deploy to production I put it on half my boxes,
failover to them, then wait a little bit (a day to a week) before
upgrading the backups.

this writeup actually makes it sound like I spend a lot of time working
with kernels, but I really don't. I'll spend couple half days twice a year
on testing, and then additional time rolling it out to the 150+ clusters
of servers I have in place. If you can't spend at least this much time on
the kernel you are probably better off just running your distro kernel,
but even there you really should do a very similar set of tests on it's
kernel releases.

There's another department in my company that uses distro kernels (big
name distro, but I will avoid flames by not naming names) without the
testing routine that I use and my track record for stability compares
favorablely to theirs over the last 7 years or so (they haven't been
running linux as long as I have, so we can't go back as far ;-) They also
do more updates than I do simply because they can't as easily look at the
kernel release and decide it doesn't apply to them.

David Lang

2010-07-15 07:33:37

by David Lang

[permalink] [raw]
Subject: Re: stable? quality assurance?

On Tue, 13 Jul 2010, Stefan Richter wrote:

> Plus, a
> good bug report often requires experience or good intuition, besides
> patience and rigor.

In my experience these are less of a requirement than patience and
persistence. With these attributes you will be able to work your way
through figuring out what data is needed for this bug report by answering
questions (and if you get no response, trying again)

nobody starts off knowing how to report a bug, and frequently you don't
start off knowing all the info that will be needed to solve the bug, but
if you report it and keep digging you will almost always get helped.

David Lang

2010-07-15 09:17:16

by Valeo de Vries

[permalink] [raw]
Subject: Re: stable? quality assurance?

On 11 July 2010 08:18, Martin Steigerwald <[email protected]> wrote:
>
> Hi!
>
> 2.6.34 was a desaster for me: bug #15969 - patch was availble before
> 2.6.34 already, bug #15788, also reported with 2.6.34-rc2 already, as well
> as most important two complete lockups - well maybe just X.org and radeon
> KMS, I didn't start my second laptop to SSH into the locked up one - on my
> ThinkPad T42. I fixed the first one with the patch, but after the lockups I
> just downgraded to 2.6.33 again.
>
> I still actually *use* my machines for something else than hunting patches
> for kernel bugs and on kernel.org it is written "Latest *Stable* Kernel"
> (accentuation from me). I know of the argument that one should use a
> distro kernel for machines that are for production use. But frankly, does
> that justify to deliver in advance known crap to the distributors? What
> impact do partly grave bugs reported on bugzilla have on the release
> decision?
>
> And how about people who have their reasons - mine is TuxOnIce - to
> compile their own kernels?
>
> Well 2.6.34.1 fixed the two reported bugs and it seemed to have fixed the
> freezes as well. So far so good.
>
> Maybe it should read "prerelease of stable" for at least 2.6.34.0 on the
> website. And I just again always wait for .2 or .3, as with 2.6.34.1 I
> still have some problems like the hang on hibernation reported in
>
> hang on hibernation with kernel 2.6.34.1 and TuxOnIce 3.1.1.1
>
> on this mailing list just a moment ago. But then 2.6.33 did hang with
> TuxOnIce which apparently (!) wasn't a TuxOnIce problem either, since
> 2.6.34 did not hang with it anymore which was a reason for me to try
> 2.6.34 earlier.
>
> I am quite a bit worried about the quality of the recent kernels. Some
> iterations earlier I just compiled them, partly even rc-ones which I do
> not expact to be table, and they just worked. But in the recent times .0,
> partly even .1 or .2 versions haven't been stable for me quite some times
> already and thus they better not be advertised as such on kernel.org I
> think. I am willing to risk some testing and do bug reports, but these are
> still production machines, I do not have any spare test machines, and
> there needs to be some balance, i.e. the kernels should basically work.
> Thus I for sure will be more reluctant to upgrade in the future.

Ooh, it's been a while since I've partaken in a LKML flamewar. ;-)

On a slightly less childish note, I agree with a few of your points. I have
noticed *stable* releases (I'm talking distro kernels here) being less than
stable on occasion recently (the sporadic hard lock-up, bdi-writeback
taking damn?long, the recent 'umount with dirty buffers taking an ice-age
to complete'?bug). Additionally there seems to have been some very
chunky point-releases in the last 3-6 months, many containing patches
that really?should have been kept for the next Linus kernel.org kernel, IMO.
These annoyances drove me away from Linux for a good few months... it's
amazing what working full-time with Windows can do to one's soul, though!

That said, from what I've seen of late, there's only one guy (Greg) handling
most of the stable stuff (there are probably others working behind the
scenes),?and he has a hell of a lot on his plate. So if you, like me, want to
see more?reliable stable releases, I'd recommend either offering to help out
in some way?reviewing/testing stable patches, as telling volunteers their shit
doesn't tend?to gain you much at all, generally. :-)

Valeo

2010-07-16 06:59:36

by Greg KH

[permalink] [raw]
Subject: Re: stable? quality assurance?

On Sun, Jul 11, 2010 at 07:58:42PM +0400, William Pitcock wrote:
> 2.6.32.16 (possibly 2.6.32.15) has a regression where it is unusable
> as a Xen domU. I would say 2.6.32.12 is the best choice since who knows
> what other regressions there are in .16.

Did you happen to tell the stable maintainer about this and do a simple
'git bisect' to find the offending patch so that it can be resolved?

{sigh}

2010-07-16 06:59:43

by Greg KH

[permalink] [raw]
Subject: Re: stable? quality assurance?

On Thu, Jul 15, 2010 at 10:09:03AM +0100, Valeo de Vries wrote:
> That said, from what I've seen of late, there's only one guy (Greg) handling
> most of the stable stuff (there are probably others working behind the
> scenes),?and he has a hell of a lot on his plate.

Nope, it's just me :)

thanks,

greg "i need some minions" k-h

2010-07-16 07:19:58

by Justin P. Mattock

[permalink] [raw]
Subject: Re: stable? quality assurance?

On 07/16/2010 12:00 AM, Greg KH wrote:
> On Thu, Jul 15, 2010 at 10:09:03AM +0100, Valeo de Vries wrote:
>> That said, from what I've seen of late, there's only one guy (Greg) handling
>> most of the stable stuff (there are probably others working behind the
>> scenes),?and he has a hell of a lot on his plate.
>
> Nope, it's just me :)
>
> thanks,
>
> greg "i need some minions" k-h
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>


you need some some minions...

Justin P. Mattock

2010-07-16 15:25:23

by Randy Dunlap

[permalink] [raw]
Subject: Re: stable? quality assurance?

On Fri, 16 Jul 2010 00:00:10 -0700 Greg KH wrote:

> On Thu, Jul 15, 2010 at 10:09:03AM +0100, Valeo de Vries wrote:
> > That said, from what I've seen of late, there's only one guy (Greg) handling
> > most of the stable stuff (there are probably others working behind the
> > scenes),?and he has a hell of a lot on his plate.
>
> Nope, it's just me :)
>
> thanks,
>
> greg "i need some minions" k-h
> --

Chris Wright is still listed in MAINTAINERS...

---
~Randy
*** Remember to use Documentation/SubmitChecklist when testing your code ***

2010-07-16 15:34:48

by Valeo de Vries

[permalink] [raw]
Subject: Re: stable? quality assurance?

On 16 July 2010 08:00, Greg KH <[email protected]> wrote:
> On Thu, Jul 15, 2010 at 10:09:03AM +0100, Valeo de Vries wrote:
>> That said, from what I've seen of late, there's only one guy (Greg) handling
>> most of the stable stuff (there are probably others working behind the
>> scenes),?and he has a hell of a lot on his plate.
>
> Nope, it's just me :)
>
> thanks,
>
> greg "i need some minions" k-h

I thought that was the case, alas.

I'm not sure how much time I could commit, but I'd be interested in
helping out, even if it's just reviewing and testing patches heading
for stable. Are there any specific areas you could use a hand with
though?

Valeo

2010-08-05 03:27:35

by Jeremy Fitzhardinge

[permalink] [raw]
Subject: Re: stable? quality assurance?

On 07/15/2010 11:59 PM, Greg KH wrote:
> On Sun, Jul 11, 2010 at 07:58:42PM +0400, William Pitcock wrote:
>> 2.6.32.16 (possibly 2.6.32.15) has a regression where it is unusable
>> as a Xen domU. I would say 2.6.32.12 is the best choice since who knows
>> what other regressions there are in .16.
> Did you happen to tell the stable maintainer about this and do a simple
> 'git bisect' to find the offending patch so that it can be resolved?

If it is compiled on Debian then its probably that cmpxchg memory
argument bug which hits in pvclock.c.

J