2008-09-03 23:26:10

by David Miller

[permalink] [raw]
Subject: Re: [GIT]: Networking

From: Linus Torvalds <[email protected]>
Date: Wed, 3 Sep 2008 16:24:16 -0700 (PDT)

> On Wed, 3 Sep 2008, Linus Torvalds wrote:
> >
> > What's so hard to understand about this?
>
> Here's a simple rule of thumb:
> - if it's not on the regression list
> - if it's not a reported security hole
> - if it's not on the reported oopses list
> then why are people sending it to me?
>
> IOW, if it's just another random improvement, and you send it to me
> outside of the merge window, then what is the point of the merge window?

This is exactly what I've been trying to tell the Intel wireless folks
but they refuse to listen.

I just won't take their stuff until they get their act in gear.

No problem.


2008-09-04 02:33:14

by Tomas Winkler

[permalink] [raw]
Subject: Re: [GIT]: Networking

On Thu, Sep 4, 2008 at 3:18 AM, Linus Torvalds
<[email protected]> wrote:
>
>
> On Wed, 3 Sep 2008, Luis R. Rodriguez wrote:
>>
>> What I think really happens here IMHO is communication needs to be
>> enhanced. I only recently noticed that "regression only" policy for
>> 2.6.27 for example. I think some of us were rather surprised by it too.
>
> It's really not a "2.6.27" policy, it's always been a "merge window" vs
> "-rc" policy. It's just that I've started making more noise about it,
> since many people don't seem to see what the point of the merge window is.
>
> And no, the problem isn't new either. But it does seem to have been
> getting somewhat worse lately. Possibly simply because we have more code
> (ie releases have grown, which has made the problem more visible, even if
> it hasn't necessarily changed in any other way).
>
>> Before snapping at people for not following "orders" we need a clear
>> guideline for what should or not be sent.
>
> The basic rule really should be:
>
> - *any* new development should target the merge window.
>
> I think some of the driver people in particular have been mislead by
> the fact that we have actually accepted new drivers even after the
> merge window ("they can't regress"), and that in turn has probably
> de-sensitized people a bit to the whole merge window.
>
> Similarly, drivers (and other wholly self-contained modules like
> filesystems etc) that didn't exist before - even if they were then
> merged _during_ the merge window - have also been getting leeway in the
> -rc code, since again they cannot regress.
>
> And it really does appear like a lot of driver people just get very comfy
> and used to that _first_ release, when we allow almost anything. And then
> they just continue with it.
>
>> To avoid it I do think another exception needs to be made for patches:
>>
>> * Small fixes which make something that was not working, which was
>> supposed to work work
>
> Not really.
>
> The thing is, if it's a driver that has been around for more than a single
> release, then those "small fixes" really can't be worth all that much. And
> yes, they may be small, and yes, they may be "obvious", but the fact is,
> they definitely can regress.
>
> Sure, if some driver really haven't been around for very long, I can see
> your point. Not very many people have used it, and I can well imagine that
> there are lots of those "small fixes" that really do matter.
>
> but if the driver has been in active use for a few kernel versions, there
> really is almost no excuse to say "this fix is so important that it needs
> to go in even though it's not a regression/security/oops fix".
>
> So I would agree that the very first release a new driver (or filesystem)
> is in is special. Almost anything goes during the -rc releases leading up
> to that. And yes, I can well imagine that during the -rc for the second
> release it perhaps makes sense to be a _bit_ more relaxed, simply because
> it's young and as more people use it, silly (but serious) issues can just
> become more obvious.
>
> But in the long run? No. Drivers are not different from any other code,
> and the -rc rules should be "regressions only".
>
> In fact, in many ways we should be even _more_ careful with drivers than
> with most other things, because driver changes are seldom "obvious". Even
> really trivial stuff can interact with hardware in odd ways, in ways that
> is seldom true of normal code that you can think about and analyze (since
> we assume that the CPU itself doesn't have subtle timing etc bugs, of
> course).
>

I think we understand this points. I have nothing just agree.
I would just give few facts in that context from iwlwifi perspective
and maybe explain what is happening here.

Iwiwifi has quite a code change even I would say it was rewritten to
support new 5000 series HW, this is new for 2.6.27
The HW is just for few weeks on the shelves and we are receiving the
bug reports from the end users right now. These are mainly bugs that
weren't visible on platforms we used as our development vehicle so
this is the first source of the fixes here, few of them made them even
to the bugzilla so they are reported.

Second source of the bugs is what our validation defined as show
stoppers or critical meaning directly affecting user everyday
experience, they are listed in some database which unfortunately not
called bugzilla.
Such fixes are coming in bigger chunks like this one because we don't
run comprehensive testing cycle on each single fix and second because
they are piled up internally for some miss communication reasons (we
are taking measures)

Dave John, and Marcel gave us hard time to justify these fixes, even
if there was blood it was for good after all and we cut out some
collateral crap, so hope now I wouldn't call these fixes random
improvements.

Best regards
Tomas

2008-09-04 09:35:41

by Jeff Garzik

[permalink] [raw]
Subject: Re: [GIT]: Networking

Rafael J. Wysocki wrote:
> On Thursday, 4 of September 2008, David Miller wrote:
>> From: Linus Torvalds <[email protected]>
>> Date: Wed, 3 Sep 2008 16:24:16 -0700 (PDT)
>>
>>> On Wed, 3 Sep 2008, Linus Torvalds wrote:
>>>> What's so hard to understand about this?
>>> Here's a simple rule of thumb:
>>> - if it's not on the regression list
>>> - if it's not a reported security hole
>>> - if it's not on the reported oopses list
>>> then why are people sending it to me?
>>>
>>> IOW, if it's just another random improvement, and you send it to me
>>> outside of the merge window, then what is the point of the merge window?
>> This is exactly what I've been trying to tell the Intel wireless folks
>> but they refuse to listen.
>>
>> I just won't take their stuff until they get their act in gear.
>>
>> No problem.
>
> BTW, there are two regression fixes for forcedeth:
>
> http://marc.info/?l=linux-kernel&m=121894389018584&w=4
> http://marc.info/?l=linux-kernel&m=121917167232014&w=4
>
> that fix the regressions tracked as:
>
> http://bugzilla.kernel.org/show_bug.cgi?id=11358
> http://bugzilla.kernel.org/show_bug.cgi?id=11361
>
> respectively and both are more than 2 weeks old now.
>
> Is there any chance to push them upstream or is there anything wrong with them?

Your patch is fine. Since DaveM and I are gone this weekend, feel free
to add "acked-by: jgarzik@redhat" and forward upstream to Linus and Andrew.

I thought there was some discussion about Yinghai Lu's MAC addr patch,
though. I'll look into it Monday, but maybe Ayaz (maintainer @ nv)
could ack it sooner than that?

Jeff




2008-09-04 09:53:55

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: [GIT]: Networking

On Thursday, 4 of September 2008, Jeff Garzik wrote:
> Rafael J. Wysocki wrote:
> > On Thursday, 4 of September 2008, David Miller wrote:
> >> From: Linus Torvalds <[email protected]>
> >> Date: Wed, 3 Sep 2008 16:24:16 -0700 (PDT)
> >>
> >>> On Wed, 3 Sep 2008, Linus Torvalds wrote:
> >>>> What's so hard to understand about this?
> >>> Here's a simple rule of thumb:
> >>> - if it's not on the regression list
> >>> - if it's not a reported security hole
> >>> - if it's not on the reported oopses list
> >>> then why are people sending it to me?
> >>>
> >>> IOW, if it's just another random improvement, and you send it to me
> >>> outside of the merge window, then what is the point of the merge window?
> >> This is exactly what I've been trying to tell the Intel wireless folks
> >> but they refuse to listen.
> >>
> >> I just won't take their stuff until they get their act in gear.
> >>
> >> No problem.
> >
> > BTW, there are two regression fixes for forcedeth:
> >
> > http://marc.info/?l=linux-kernel&m=121894389018584&w=4
> > http://marc.info/?l=linux-kernel&m=121917167232014&w=4
> >
> > that fix the regressions tracked as:
> >
> > http://bugzilla.kernel.org/show_bug.cgi?id=11358
> > http://bugzilla.kernel.org/show_bug.cgi?id=11361
> >
> > respectively and both are more than 2 weeks old now.
> >
> > Is there any chance to push them upstream or is there anything wrong with them?
>
> Your patch is fine. Since DaveM and I are gone this weekend, feel free
> to add "acked-by: jgarzik@redhat" and forward upstream to Linus and Andrew.

OK, thanks.

Andrew, this patch is currently in -mm as forcedeth-fix-kexec-regression.patch
Could you please add the Jeff's ACK to it and push it to Linus?

> I thought there was some discussion about Yinghai Lu's MAC addr patch,
> though.

I see.

> I'll look into it Monday, but maybe Ayaz (maintainer @ nv) could ack it
> sooner than that?

OK, thanks a lot!

Rafael

2008-09-04 00:19:17

by Linus Torvalds

[permalink] [raw]
Subject: Re: [GIT]: Networking



On Wed, 3 Sep 2008, Luis R. Rodriguez wrote:
>
> What I think really happens here IMHO is communication needs to be
> enhanced. I only recently noticed that "regression only" policy for
> 2.6.27 for example. I think some of us were rather surprised by it too.

It's really not a "2.6.27" policy, it's always been a "merge window" vs
"-rc" policy. It's just that I've started making more noise about it,
since many people don't seem to see what the point of the merge window is.

And no, the problem isn't new either. But it does seem to have been
getting somewhat worse lately. Possibly simply because we have more code
(ie releases have grown, which has made the problem more visible, even if
it hasn't necessarily changed in any other way).

> Before snapping at people for not following "orders" we need a clear
> guideline for what should or not be sent.

The basic rule really should be:

- *any* new development should target the merge window.

I think some of the driver people in particular have been mislead by
the fact that we have actually accepted new drivers even after the
merge window ("they can't regress"), and that in turn has probably
de-sensitized people a bit to the whole merge window.

Similarly, drivers (and other wholly self-contained modules like
filesystems etc) that didn't exist before - even if they were then
merged _during_ the merge window - have also been getting leeway in the
-rc code, since again they cannot regress.

And it really does appear like a lot of driver people just get very comfy
and used to that _first_ release, when we allow almost anything. And then
they just continue with it.

> To avoid it I do think another exception needs to be made for patches:
>
> * Small fixes which make something that was not working, which was
> supposed to work work

Not really.

The thing is, if it's a driver that has been around for more than a single
release, then those "small fixes" really can't be worth all that much. And
yes, they may be small, and yes, they may be "obvious", but the fact is,
they definitely can regress.

Sure, if some driver really haven't been around for very long, I can see
your point. Not very many people have used it, and I can well imagine that
there are lots of those "small fixes" that really do matter.

but if the driver has been in active use for a few kernel versions, there
really is almost no excuse to say "this fix is so important that it needs
to go in even though it's not a regression/security/oops fix".

So I would agree that the very first release a new driver (or filesystem)
is in is special. Almost anything goes during the -rc releases leading up
to that. And yes, I can well imagine that during the -rc for the second
release it perhaps makes sense to be a _bit_ more relaxed, simply because
it's young and as more people use it, silly (but serious) issues can just
become more obvious.

But in the long run? No. Drivers are not different from any other code,
and the -rc rules should be "regressions only".

In fact, in many ways we should be even _more_ careful with drivers than
with most other things, because driver changes are seldom "obvious". Even
really trivial stuff can interact with hardware in odd ways, in ways that
is seldom true of normal code that you can think about and analyze (since
we assume that the CPU itself doesn't have subtle timing etc bugs, of
course).

Linus

2008-09-04 17:23:20

by Andrew Morton

[permalink] [raw]
Subject: Re: [GIT]: Networking

On Thu, 4 Sep 2008 11:58:18 +0200 "Rafael J. Wysocki" <[email protected]> wrote:

> > > BTW, there are two regression fixes for forcedeth:
> > >
> > > http://marc.info/?l=linux-kernel&m=121894389018584&w=4
> > > http://marc.info/?l=linux-kernel&m=121917167232014&w=4
> > >
> > > that fix the regressions tracked as:
> > >
> > > http://bugzilla.kernel.org/show_bug.cgi?id=11358
> > > http://bugzilla.kernel.org/show_bug.cgi?id=11361
> > >
> > > respectively and both are more than 2 weeks old now.
> > >
> > > Is there any chance to push them upstream or is there anything wrong with them?
> >
> > Your patch is fine. Since DaveM and I are gone this weekend, feel free
> > to add "acked-by: jgarzik@redhat" and forward upstream to Linus and Andrew.
>
> OK, thanks.
>
> Andrew, this patch is currently in -mm as forcedeth-fix-kexec-regression.patch
> Could you please add the Jeff's ACK to it and push it to Linus?

yup, thanks.

2008-09-04 00:10:52

by David Lang

[permalink] [raw]
Subject: Re: [GIT]: Networking

On Wed, 3 Sep 2008, Luis R. Rodriguez wrote:

> On Wed, Sep 03, 2008 at 04:26:03PM -0700, David Miller wrote:
>> From: Linus Torvalds <[email protected]>
>> Date: Wed, 3 Sep 2008 16:24:16 -0700 (PDT)
>>
>>> On Wed, 3 Sep 2008, Linus Torvalds wrote:
>>>>
>>>> What's so hard to understand about this?
>>>
>>> Here's a simple rule of thumb:
>>> - if it's not on the regression list
>>> - if it's not a reported security hole
>>> - if it's not on the reported oopses list
>>> then why are people sending it to me?
>>>
>>> IOW, if it's just another random improvement, and you send it to me
>>> outside of the merge window, then what is the point of the merge window?
>>
>> This is exactly what I've been trying to tell the Intel wireless folks
>> but they refuse to listen.
>
> What I think really happens here IMHO is communication needs to be
> enhanced. I only recently noticed that "regression only" policy for
> 2.6.27 for example. I think some of us were rather surprised by it too.

the policy has been the same for quite a while, it's not new for 2.6.27

>> I just won't take their stuff until they get their act in gear.
>>
>> No problem.
>
> On the other side of the spectrum, and to try to understand this better,
> since ath9k is new we obviously cannot get regressions, this means
> unless we find a security hole in our driver or if we can generate an
> oops with 2.6.27 we cannot send patches for ath9k for 2.6.27?
>
> It seems to make sense to me for purposes of stabalizing the kernel for
> sure, however, on the other hand it does also make 2.6.27 miss out on a
> few possible small fixes which may appear here and there.
>
> What will happen then is distributions will end up using their own
> patches on top of the kernel, or users using something like
> compat-wireless to get more bleeding edge stuff. This is exactly what we
> want to avoid.

no, if it's that bad it can go into the -stable kernel patches. but the
next release kernel will be out in just a couple of months, so it only
matters for that short window.

> To avoid it I do think another exception needs to be made for patches:
>
> * Small fixes which make something that was not working, which was
> supposed to work work

but you can't prove that this patch doesn't break something else (if only
by altering the timing just enough to expose a race condition somewhere)

> If these type of patches cannot be submitted we will have a stable
> kernel for sure but yet buggy leaving people to opt for alternatives
> for small fixes or driver updates.

but if you are changing the continuously, why should people test anything?
you're going to change it in unpredictable ways before the release anyway.


it may be valid to tinker with a new driver that didn't exist before, but
this far into the release cycle even that's highly questionable, and if
you are asking for something to be included becouse of that you need to
spell it when submiting the patch (and possibly in the changelog)

David Lang

2008-09-04 00:01:10

by Luis R. Rodriguez

[permalink] [raw]
Subject: Re: [GIT]: Networking

On Wed, Sep 03, 2008 at 04:26:03PM -0700, David Miller wrote:
> From: Linus Torvalds <[email protected]>
> Date: Wed, 3 Sep 2008 16:24:16 -0700 (PDT)
>
> > On Wed, 3 Sep 2008, Linus Torvalds wrote:
> > >
> > > What's so hard to understand about this?
> >
> > Here's a simple rule of thumb:
> > - if it's not on the regression list
> > - if it's not a reported security hole
> > - if it's not on the reported oopses list
> > then why are people sending it to me?
> >
> > IOW, if it's just another random improvement, and you send it to me
> > outside of the merge window, then what is the point of the merge window?
>
> This is exactly what I've been trying to tell the Intel wireless folks
> but they refuse to listen.

What I think really happens here IMHO is communication needs to be
enhanced. I only recently noticed that "regression only" policy for
2.6.27 for example. I think some of us were rather surprised by it too.

Before snapping at people for not following "orders" we need a clear
guideline for what should or not be sent. We're all at fault if we are
not getting this through people. Most likely not many of us subscribe to
linux-kernel so I would think the next best thing we can do is to get our
maintainers to communicate to us of new rules and give a few examples of
what can go in or not.

I trust that should this had been done *well* none of these issue would have
come up and we can avoid angry exchanges.

> I just won't take their stuff until they get their act in gear.
>
> No problem.

On the other side of the spectrum, and to try to understand this better,
since ath9k is new we obviously cannot get regressions, this means
unless we find a security hole in our driver or if we can generate an
oops with 2.6.27 we cannot send patches for ath9k for 2.6.27?

It seems to make sense to me for purposes of stabalizing the kernel for
sure, however, on the other hand it does also make 2.6.27 miss out on a
few possible small fixes which may appear here and there.

What will happen then is distributions will end up using their own
patches on top of the kernel, or users using something like
compat-wireless to get more bleeding edge stuff. This is exactly what we
want to avoid.

To avoid it I do think another exception needs to be made for patches:

* Small fixes which make something that was not working, which was
supposed to work work

If these type of patches cannot be submitted we will have a stable
kernel for sure but yet buggy leaving people to opt for alternatives
for small fixes or driver updates.

Luis

2008-09-04 08:21:15

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: [GIT]: Networking

On Thursday, 4 of September 2008, David Miller wrote:
> From: Linus Torvalds <[email protected]>
> Date: Wed, 3 Sep 2008 16:24:16 -0700 (PDT)
>
> > On Wed, 3 Sep 2008, Linus Torvalds wrote:
> > >
> > > What's so hard to understand about this?
> >
> > Here's a simple rule of thumb:
> > - if it's not on the regression list
> > - if it's not a reported security hole
> > - if it's not on the reported oopses list
> > then why are people sending it to me?
> >
> > IOW, if it's just another random improvement, and you send it to me
> > outside of the merge window, then what is the point of the merge window?
>
> This is exactly what I've been trying to tell the Intel wireless folks
> but they refuse to listen.
>
> I just won't take their stuff until they get their act in gear.
>
> No problem.

BTW, there are two regression fixes for forcedeth:

http://marc.info/?l=linux-kernel&m=121894389018584&w=4
http://marc.info/?l=linux-kernel&m=121917167232014&w=4

that fix the regressions tracked as:

http://bugzilla.kernel.org/show_bug.cgi?id=11358
http://bugzilla.kernel.org/show_bug.cgi?id=11361

respectively and both are more than 2 weeks old now.

Is there any chance to push them upstream or is there anything wrong with them?

Thanks,
Rafael