2018-05-01 16:38:47

by Sasha Levin

[permalink] [raw]
Subject: bug-introducing patches

Working on AUTOSEL, it became even more obvious to me how difficult it is for a
patch to get a proper review. Maintainers found it difficult to keep up with
the upstream work for their subsystem, and reviewing additional -stable patches
put even more load on them which some suggested would be more than what they
can handle.

While AUTOSEL tries to understand if a patch fixes a bug, this was a bit late:
the bug was already introduced, folks already have to deal with it, and the
kernel is broken. I was wondering if I can do a similar process to AUTOSEL, but
teach the AI about bug-introducing patches.

When someone fixes a bug, he would describe the patch differently than he would
if he was writing a new feature. This lets AUTOSEL build on different commit
message constructs, among various inputs, to recognize bug fixes. However,
people are unaware that they introduce a bug, so the commit message for bug
introducing patches is essentially the same as for commits that don't introduce
a bug. This meant that I had to try and source data out of different sources.

Few of the parameters I ended up using are:
- -next data (days spent in -next, changes in the patch between -next trees)
- Mailing list data (was this patch ever sent to a ML? How long before it was
merged? How many replies did it get? ...)
- Author/commiter/maintainer chain data. Just like sports, some folks are more
likely to produce better results than others. This goes beyond just "skill",
but also looks at things such as whether the author patches a subsystem he's
"familiar with" (== subsystem where most of his patches usually go), or is he
modifying a subsystem he never sent a patch for.
- Patch complexity metrics - various code metrics to indicate how "complex" a
patch is. Think 100 lines of whitespace fixes vs 100 lines that
significantly changes a subsystem.
- Kernel process correctness - I tried using "violations" of the kernel
process (patch formatting, correctness of the mailing to lkml, etc) as an
indicator of how familiar the author is with the kernel, with the presumption
that folks who are newer to kernel development are more likely to introduce
bugs

Running an initial iteration on a set of commits made two things very obvious
to me:

1. -rc releases suck. seriously suck. The quality of commits that went in -rc
cycles was much worse that merge window commit:
- All commits had the same chance of introducing a bug whether they came in a
merge window or an -rc cycle. This means that -rc commits mostly end up
replacing obvious bugs with less obvious ones.
- While the average merge window commit changes, on average, 3x more lines
than an -rc commit, the chances of a bug introduced per patch is the same,
which means that bugs-per-line metric of code is much higher with -rc patches.
- A merge window commit spent 50% more days, on average, in -next than a -rc
commit.
- The number of -rc commits that never saw any mailing list or has never been
replied to on a mailing list was **way** higher than merge window commits.
- For some reason, the odds of a -rc commit to be targetted for -stable is
over 20%, while for merge window commits it's about 3%. I can't quite
explain why that happens, but this would suggest that -rc commits end up
hurting -stable pretty badly.

2. Maintainers need to stop writing patches, commiting them, and pushing them
in without reviews. In -rc cycles there is quite a large number of commits
that were either written by maintainers, commited, and merged upstream the same
day. These patches are very likely to introduce a new bug.


I don't really have a proposal beyond "tighten up -rc cycles", but I think it's
a discussion worth having. We have enough data to show what parts of kernel
development work, and what parts are just hurting us.

I'd be happy to gather more data if someone has an idea he wants to look
into. The data used for this work is based on:

- v4.4..v4.16 (just becuase it's as far as linux-next-history goes).
- "bugs" are commits that were mentioned in a Fixes: tag of a later
commit.
- "stable commits" are commits that made it to a -stable tree.


2018-05-01 19:45:36

by Theodore Ts'o

[permalink] [raw]
Subject: Re: bug-introducing patches

On Tue, May 01, 2018 at 04:38:21PM +0000, Sasha Levin wrote:
> - A merge window commit spent 50% more days, on average, in -next than a -rc
> commit.

So it *used* to be the case that after the merge window, I would queue
up bug fixes for the next merge window. Greg K-H pushed for me to
send them to Linus sooner, instead of waiting for the next merge
window. TBH, it's actually easier for me to just wait until the next
merge window, but please understand that there are multiple pressures
on maintainers going on here, and the latest efforts to try to use
AUTOSEL is just the most recent pressure placed on maintainers.

The other thing is that when there is a regression users who are
testing linux-next want it fixed *fast*. That's considered more
important to them than waiting for one, perfect patch, just to keep
AUTOSEL happy.

So please understand that when you say that maintainers *need* to do X
or Y, that there you are not the only one standing in line putting
pressures on maintainers saying they *need* to do something. And
quite frankly, I consider keeping people who are nice enough to test
linux-next happy to be **far** more important than AUTOSEL.

Sorry.

- Ted

2018-05-01 20:01:28

by Sasha Levin

[permalink] [raw]
Subject: Re: bug-introducing patches

On Tue, May 01, 2018 at 03:44:50PM -0400, Theodore Y. Ts'o wrote:
>On Tue, May 01, 2018 at 04:38:21PM +0000, Sasha Levin wrote:
>> - A merge window commit spent 50% more days, on average, in -next than a -rc
>> commit.
>
>So it *used* to be the case that after the merge window, I would queue
>up bug fixes for the next merge window. Greg K-H pushed for me to
>send them to Linus sooner, instead of waiting for the next merge
>window. TBH, it's actually easier for me to just wait until the next
>merge window, but please understand that there are multiple pressures
>on maintainers going on here, and the latest efforts to try to use
>AUTOSEL is just the most recent pressure placed on maintainers.
>
>The other thing is that when there is a regression users who are
>testing linux-next want it fixed *fast*. That's considered more
>important to them than waiting for one, perfect patch, just to keep
>AUTOSEL happy.
>
>So please understand that when you say that maintainers *need* to do X
>or Y, that there you are not the only one standing in line putting
>pressures on maintainers saying they *need* to do something. And
>quite frankly, I consider keeping people who are nice enough to test
>linux-next happy to be **far** more important than AUTOSEL.

Ted,

I'm not at all asking to wait before adding the patches to your tree,
or to -next. I'm asking to hold on to them a bit longer before you
push them to Linus because I can show that patches that don't spend
enough time in -next are more likely to introduce bugs.

Yes, linux-next users want it fixed *now* and I completely agree it
should be done that way, but the fix should not be immediately pushed to
Linus as well.

I've just finished reading an interesting article on LWN about the
PostgreSQL fsync issues (https://lwn.net/Articles/752952/). If you
look at Willy's commit, he wrote the final version of it about 5 days
ago, Jeff merged it in 3 days ago, and Linus merged it in the tree
today. Did it spend any time getting -next testing? nope.

What's worse is that that commit is tagged for stable, which means
that (given Greg's schedule) it may find it's way to -stable users
even before some -next users/bots had a chance to test it out.

This is less about AUTOSEL, and more about asking maintainers to
improve the testing commits get before they are sent to Linus.
Linus would rant about commits during merge window that didn't go
through -next, but for -rc commits this rule is somehow forgiven,
which is what I'm trying to change.

2018-05-01 20:34:27

by Willy Tarreau

[permalink] [raw]
Subject: Re: bug-introducing patches

On Tue, May 01, 2018 at 08:00:21PM +0000, Sasha Levin wrote:
> What's worse is that that commit is tagged for stable, which means
> that (given Greg's schedule) it may find it's way to -stable users
> even before some -next users/bots had a chance to test it out.

But it's a difficult trade-off. I think that -next is mostly used by
developers and that as such the audience remains limited. On the
opposite, -stable is used by many users. So how many days of -next
does it take to get the equivalent coverage of one day of -stable,
I don't know but it's probably a lot. Also server workloads are
almost exclusively on -stable. So a bug affecting only server users
will not benefit from -next exposition.

In the end it's all about responding to users' expectations to see
the bugs fixed. In -stable we regularly see users asking to backport
certain fixes. Sometimes they have to wait one or two extra versions
before they get their fix, and they are really not happy at all. If
the fix is rushed too fast and doesn't work, they won't be happy
either. Making them happy means backporting the right fix the quickest
possible. Too little test can result in a wrong fix, but too much test
results in a slow backport.

Again, I really don't find the -stable situation bad nowadays, quite
the opposite. I often suggest to people who don't follow too closely
to stick to latest LTS minus 1 or 2 releases. This way they don't get
the very latest fixes and have a chance that if something breaks very
badly, it gets fixed quickly. It works pretty well apparently.

I suspect that some of the issues that really need to be improved are
the fixes to recently merged code. That's never easy by definition
because if the code is young, it's not yet very well known even by
its author.

What *could* possibly be done (though I'm not fond of this) would be
to state a rule that past a certain number of stacked fixes for a
recently merged code, an extra review delay will be enforced on the
subsystem or on patches coming from the submitter. But I really doubt
it would significantly improve the situation.

Willy

2018-05-01 20:42:56

by Sasha Levin

[permalink] [raw]
Subject: Re: bug-introducing patches

On Tue, May 01, 2018 at 10:33:25PM +0200, Willy Tarreau wrote:
>On Tue, May 01, 2018 at 08:00:21PM +0000, Sasha Levin wrote:
>> What's worse is that that commit is tagged for stable, which means
>> that (given Greg's schedule) it may find it's way to -stable users
>> even before some -next users/bots had a chance to test it out.
>
>But it's a difficult trade-off. I think that -next is mostly used by
>developers and that as such the audience remains limited. On the
>opposite, -stable is used by many users. So how many days of -next
>does it take to get the equivalent coverage of one day of -stable,
>I don't know but it's probably a lot. Also server workloads are
>almost exclusively on -stable. So a bug affecting only server users
>will not benefit from -next exposition.
>
>In the end it's all about responding to users' expectations to see
>the bugs fixed. In -stable we regularly see users asking to backport
>certain fixes. Sometimes they have to wait one or two extra versions
>before they get their fix, and they are really not happy at all. If
>the fix is rushed too fast and doesn't work, they won't be happy
>either. Making them happy means backporting the right fix the quickest
>possible. Too little test can result in a wrong fix, but too much test
>results in a slow backport.
>
>Again, I really don't find the -stable situation bad nowadays, quite
>the opposite. I often suggest to people who don't follow too closely
>to stick to latest LTS minus 1 or 2 releases. This way they don't get
>the very latest fixes and have a chance that if something breaks very
>badly, it gets fixed quickly. It works pretty well apparently.
>
>I suspect that some of the issues that really need to be improved are
>the fixes to recently merged code. That's never easy by definition
>because if the code is young, it's not yet very well known even by
>its author.
>
>What *could* possibly be done (though I'm not fond of this) would be
>to state a rule that past a certain number of stacked fixes for a
>recently merged code, an extra review delay will be enforced on the
>subsystem or on patches coming from the submitter. But I really doubt
>it would significantly improve the situation.

I think that this discussion has shifted to -stable issues, which is not
what I was aiming for.

I tried to present a statistic from the recent kernel commits showing
that per changed line of code, an -rc commit has more than 3 times the
likelyhood to introduce a bug rather than a merge window one.

Is this something the community sees as an issue, or do we expect a
significantly higher odds of introducing bugs in -rc commits?

Feed free to ignore any proposals I've made. If you see this as an
issue, what could we do to address it?

Let's leave -stable out of this for now.

2018-05-01 20:55:35

by Theodore Ts'o

[permalink] [raw]
Subject: Re: bug-introducing patches

On Tue, May 01, 2018 at 08:00:21PM +0000, Sasha Levin wrote:
>
> Yes, linux-next users want it fixed *now* and I completely agree it
> should be done that way, but the fix should not be immediately pushed to
> Linus as well.

I should have linux-head/linux-rc said testers, sorry. The fact that
we have very few live users testing linux-next is a separate problem,
which I accidentally conflated. But if a user who is testing -rc2
finds a problem, it is highly desirable to send a fix for -rc3,
instead of making that user wait to -rc4 or -rc5. And *that* is more
important than AUTOSEL.

> I've just finished reading an interesting article on LWN about the
> PostgreSQL fsync issues (https://lwn.net/Articles/752952/). If you
> look at Willy's commit, he wrote the final version of it about 5 days
> ago, Jeff merged it in 3 days ago, and Linus merged it in the tree
> today. Did it spend any time getting -next testing? nope.

I agree that having the errseq patch go straight into Linus's tree is
certainly unfortunate. The justification was this was a regression
fix, which I don't think it qualifies, since errseq_t went in some 9+
months ago.

It might be a good thing to quantify whether the patches you are
talking about are new features, bug fixes, or fixing a bug that was
introduced during the merge window or subsequently (e.g., a
regression).

> What's worse is that that commit is tagged for stable, which means
> that (given Greg's schedule) it may find it's way to -stable users
> even before some -next users/bots had a chance to test it out.

Well, it used to be that things tagged for stable most-merge window
are *supposed* to marinate for at least a week or before they would
get pulled into a stable release. Part of the whole problem is that
people are wanting to be a lot more aggressive (both in time and
volume) in shovelling things into stable.

> This is less about AUTOSEL, and more about asking maintainers to
> improve the testing commits get before they are sent to Linus.
> Linus would rant about commits during merge window that didn't go
> through -next, but for -rc commits this rule is somehow forgiven,
> which is what I'm trying to change.

I do think it's about AUTOSEL, because when I'm dealing with a
regression, I want to get it fixed fast. Because the alternative is
the merge-window commit getting reverted. AUTOSEL seems wants perfect
patches that it can cherry pick once, as opposed to a case where if the
user confirms that it fixes the regression, I want to send it to Linus
quickly. I do *not* want it to marinate in linux-next for 1-2 weeks.
I would much rather that *stable* hold off on picking up the patch for
1-2 weeks, but get it fixed in Linux HEAD sooner. If that means that
the regression fix needs a further clean up, so be it.

Post -rc3 or -rc4, in my opinion bug fixes should wait until the next
merge window before they get merged at all. (And certainly features
bugs should be Right Out.) And sure, bug fixes should certainly get
more testing. So I guess my main objection is your making a blanket
statement about all fixes, instead of breaking out regression fixes
versus bug fixes. Since in my opinion they are very different animals...

- Ted


2018-05-01 21:16:42

by Mark Brown

[permalink] [raw]
Subject: Re: [Ksummit-discuss] bug-introducing patches

On Tue, May 01, 2018 at 04:54:48PM -0400, Theodore Y. Ts'o wrote:

> I do think it's about AUTOSEL, because when I'm dealing with a
> regression, I want to get it fixed fast. Because the alternative is
> the merge-window commit getting reverted. AUTOSEL seems wants perfect
> patches that it can cherry pick once, as opposed to a case where if the
> user confirms that it fixes the regression, I want to send it to Linus
> quickly. I do *not* want it to marinate in linux-next for 1-2 weeks.
> I would much rather that *stable* hold off on picking up the patch for
> 1-2 weeks, but get it fixed in Linux HEAD sooner. If that means that
> the regression fix needs a further clean up, so be it.

We've had issues with the automated testing systems in the past where a
maintainer has had a policy of letting things percoltate for a week
before sending to Linus and there's been a bug that caused a substantial
set of tests to fail to run (generally it's something that had unnoticed
dependencies in -next so wasn't caught there) - we essentially end up
not getting the affected bits of test coverage for that period of time
which is not helpful.


Attachments:
(No filename) (1.14 kB)
signature.asc (499.00 B)
Download all attachments

2018-05-01 22:03:03

by Sasha Levin

[permalink] [raw]
Subject: Re: bug-introducing patches

On Tue, May 01, 2018 at 04:54:48PM -0400, Theodore Y. Ts'o wrote:
>On Tue, May 01, 2018 at 08:00:21PM +0000, Sasha Levin wrote:
>>
>> Yes, linux-next users want it fixed *now* and I completely agree it
>> should be done that way, but the fix should not be immediately pushed to
>> Linus as well.
>
>I should have linux-head/linux-rc said testers, sorry. The fact that
>we have very few live users testing linux-next is a separate problem,
>which I accidentally conflated. But if a user who is testing -rc2
>finds a problem, it is highly desirable to send a fix for -rc3,
>instead of making that user wait to -rc4 or -rc5. And *that* is more
>important than AUTOSEL.
>
>> I've just finished reading an interesting article on LWN about the
>> PostgreSQL fsync issues (https://lwn.net/Articles/752952/). If you
>> look at Willy's commit, he wrote the final version of it about 5 days
>> ago, Jeff merged it in 3 days ago, and Linus merged it in the tree
>> today. Did it spend any time getting -next testing? nope.
>
>I agree that having the errseq patch go straight into Linus's tree is
>certainly unfortunate. The justification was this was a regression
>fix, which I don't think it qualifies, since errseq_t went in some 9+
>months ago.
>
>It might be a good thing to quantify whether the patches you are
>talking about are new features, bug fixes, or fixing a bug that was
>introduced during the merge window or subsequently (e.g., a
>regression).

I see. So something like the following?

- New feature: 2+ weeks of -next without any code changes/fixes
- Merge window regression fix: immediate if < -rc3, 2+ weeks of next if
< -rc6, otherwise consider reverting new feature.
- bug fix in earlier release: 2+ weeks of -next

>> What's worse is that that commit is tagged for stable, which means
>> that (given Greg's schedule) it may find it's way to -stable users
>> even before some -next users/bots had a chance to test it out.
>
>Well, it used to be that things tagged for stable most-merge window
>are *supposed* to marinate for at least a week or before they would
>get pulled into a stable release. Part of the whole problem is that
>people are wanting to be a lot more aggressive (both in time and
>volume) in shovelling things into stable.
>
>> This is less about AUTOSEL, and more about asking maintainers to
>> improve the testing commits get before they are sent to Linus.
>> Linus would rant about commits during merge window that didn't go
>> through -next, but for -rc commits this rule is somehow forgiven,
>> which is what I'm trying to change.
>
>I do think it's about AUTOSEL, because when I'm dealing with a
>regression, I want to get it fixed fast. Because the alternative is
>the merge-window commit getting reverted. AUTOSEL seems wants perfect
>patches that it can cherry pick once, as opposed to a case where if the
>user confirms that it fixes the regression, I want to send it to Linus
>quickly. I do *not* want it to marinate in linux-next for 1-2 weeks.
>I would much rather that *stable* hold off on picking up the patch for
>1-2 weeks, but get it fixed in Linux HEAD sooner. If that means that
>the regression fix needs a further clean up, so be it.

For AUTOSEL, most of the commits that went in so far were from the
v4.9..v4.14 range. Only last week I've sent greg commits picked from
v4.15..v4.16. AUTOSEL is at least a month behind -stable (on average,
9.7 months).

>Post -rc3 or -rc4, in my opinion bug fixes should wait until the next
>merge window before they get merged at all. (And certainly features
>bugs should be Right Out.) And sure, bug fixes should certainly get
>more testing. So I guess my main objection is your making a blanket
>statement about all fixes, instead of breaking out regression fixes
>versus bug fixes. Since in my opinion they are very different animals...

I understant your point, you want to make fixes available to testers as
soon as possible. This might make sense, as you've mentioned, in < -rc3.

So yes, maybe the solution isn't to force -next, but rather add more
"quiet time" at the end of the cycle? Make special rules for -rc7/8? Or
even add a "test"/"beta" release at the end of the cycle?

From what I see, the same number of bugs-per-line-of-code applies for
commits accross all -rc releases, so while it makes sense to get a fix
in quickly at -rc1 to allow testing to continue, the same must not
happen during -rc8, but unfourtenately it does now.

2018-05-02 04:32:11

by Willy Tarreau

[permalink] [raw]
Subject: Re: bug-introducing patches

On Tue, May 01, 2018 at 10:02:30PM +0000, Sasha Levin wrote:
> On Tue, May 01, 2018 at 04:54:48PM -0400, Theodore Y. Ts'o wrote:
> >Post -rc3 or -rc4, in my opinion bug fixes should wait until the next
> >merge window before they get merged at all. (And certainly features
> >bugs should be Right Out.) And sure, bug fixes should certainly get
> >more testing. So I guess my main objection is your making a blanket
> >statement about all fixes, instead of breaking out regression fixes
> >versus bug fixes. Since in my opinion they are very different animals...
>
> I understant your point, you want to make fixes available to testers as
> soon as possible. This might make sense, as you've mentioned, in < -rc3.
>
> So yes, maybe the solution isn't to force -next, but rather add more
> "quiet time" at the end of the cycle? Make special rules for -rc7/8? Or
> even add a "test"/"beta" release at the end of the cycle?

I disagree with the proposals above, and for multiple reasons :
- leaving a known bug on purpose automatically degrades the quality of
each release. Given that less than 100% of the fixes introduce
regressions, by not merging any of these fixes, we'll end up with
more bugs. That's a very bad idea.

- this will give a worse image of dot-0 releases, and users will be
even less interested in testing them, prefering to wait for the
first stable version. In this case what's the point of dot-0 if it
is known broken and nobody uses it ?

- letting fixes rot longer on the developer side will send a very bad
signal to developers : "we don't care about your bugs". Companies
relying on contractors will have a harder time including fixes in
the contract, as it will only cover what's needed to get the feature
merged. Again this will put the focus on extremely fast and dirty
development, given that fixes will not be considered during the same
window.

I'd rather do the exact opposite except for those who introduce too many
regressions : set up a delay penalty to developers who create too many
regressions and make this penalty easy to check. I think it will generally
not affect subsystem maintainers, unless they pull and push lots of crap
without checking, of course. But it could prove very useful for those
developing under contract, because companies employing them will want to
ensure that their work will not be delayed due to a penalty. Thus is will
be important for these developers to be more careful.

After all, the person proposing a fix always knows better than anyone
else if this fix was done seriously or not. Developers who do lots of
testing before sending should not be penalized, and should get their
fix merged immediately. Those who just send untested patches should be
trusted much less.

> From what I see, the same number of bugs-per-line-of-code applies for
> commits accross all -rc releases, so while it makes sense to get a fix
> in quickly at -rc1 to allow testing to continue, the same must not
> happen during -rc8, but unfourtenately it does now.

That's where I strongly disagree, since it would mean releasing with even
more bugs than today.

Willy


2018-05-02 08:11:56

by Daniel Vetter

[permalink] [raw]
Subject: Re: [Ksummit-discuss] bug-introducing patches

On Tue, May 1, 2018 at 11:15 PM, Mark Brown <[email protected]> wrote:
> On Tue, May 01, 2018 at 04:54:48PM -0400, Theodore Y. Ts'o wrote:
>> I do think it's about AUTOSEL, because when I'm dealing with a
>> regression, I want to get it fixed fast. Because the alternative is
>> the merge-window commit getting reverted. AUTOSEL seems wants perfect
>> patches that it can cherry pick once, as opposed to a case where if the
>> user confirms that it fixes the regression, I want to send it to Linus
>> quickly. I do *not* want it to marinate in linux-next for 1-2 weeks.
>> I would much rather that *stable* hold off on picking up the patch for
>> 1-2 weeks, but get it fixed in Linux HEAD sooner. If that means that
>> the regression fix needs a further clean up, so be it.
>
> We've had issues with the automated testing systems in the past where a
> maintainer has had a policy of letting things percoltate for a week
> before sending to Linus and there's been a bug that caused a substantial
> set of tests to fail to run (generally it's something that had unnoticed
> dependencies in -next so wasn't caught there) - we essentially end up
> not getting the affected bits of test coverage for that period of time
> which is not helpful.

So much agreed. For our CI we carry a constantly rolling set of fixup
patches to keep it working, because regression fixes sometimes take
too long. And too long here for our needs is measured in days/hours -
developers start screaming pretty much immediately when our CI is down
:-)

Ofc I prefer if all subsystems ramp up pre-merge testing as much as
possible (and with xfstests and stuff like that I think filesystems
are leading here, if not consistently). But given the huge scope of
the kernel we'll never reach 100%, and oddball regressions will be
inevitable. Once a regression has crept through it imo really should
get fixed asap, with no unecessary soaking times - get a better
CI/kerneltests in place if you feel like you need to soak stuff.
-Daniel
--
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch

2018-05-02 15:33:33

by Geert Uytterhoeven

[permalink] [raw]
Subject: Re: bug-introducing patches

Hi Sasha,

On Tue, May 1, 2018 at 6:38 PM, Sasha Levin
<[email protected]> wrote:
> Working on AUTOSEL, it became even more obvious to me how difficult it is for a
> patch to get a proper review. Maintainers found it difficult to keep up with
> the upstream work for their subsystem, and reviewing additional -stable patches
> put even more load on them which some suggested would be more than what they
> can handle.

Thanks for your work!

> - For some reason, the odds of a -rc commit to be targetted for -stable is
> over 20%, while for merge window commits it's about 3%. I can't quite
> explain why that happens, but this would suggest that -rc commits end up
> hurting -stable pretty badly.

Aren't more -rc commits targeted for -stable because they are bugfixes?
Ideally, new features are supposed to be merged during the merge window,
while -rc commits fix bugs.

So they can be categorized like:
1. Plain -rc commits,
2. -rc commits fixing a bug:
a. in the same release cycle,
b. in a previous release.

2a assumes the bug was backported to -stable, too, doesn't it?

Do you have statistics for which categories are most buggy?

Gr{oetje,eeting}s,

Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- [email protected]

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
-- Linus Torvalds

2018-05-02 15:33:54

by Geert Uytterhoeven

[permalink] [raw]
Subject: Re: bug-introducing patches

Hi Sasha,

On Tue, May 1, 2018 at 6:38 PM, Sasha Levin
<[email protected]> wrote:
> Working on AUTOSEL, it became even more obvious to me how difficult it is for a
> patch to get a proper review. Maintainers found it difficult to keep up with
> the upstream work for their subsystem, and reviewing additional -stable patches
> put even more load on them which some suggested would be more than what they
> can handle.

Thanks for your work!

> - For some reason, the odds of a -rc commit to be targetted for -stable is
> over 20%, while for merge window commits it's about 3%. I can't quite
> explain why that happens, but this would suggest that -rc commits end up
> hurting -stable pretty badly.

Aren't more -rc commits targeted for -stable because they are bugfixes?
Ideally, new features are supposed to be merged during the merge window,
while -rc commits fix bugs.

So they can be categorized like:
1. Plain -rc commits,
2. -rc commits fixing a bug:
a. in the same release cycle,
b. in a previous release.

2a assumes the bug was backported to -stable, too, doesn't it?

Do you have statistics for which categories are most buggy?

Gr{oetje,eeting}s,

Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- [email protected]

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
-- Linus Torvalds

2018-05-02 19:43:09

by Sasha Levin

[permalink] [raw]
Subject: Re: bug-introducing patches

On Wed, May 02, 2018 at 06:30:17AM +0200, Willy Tarreau wrote:
>On Tue, May 01, 2018 at 10:02:30PM +0000, Sasha Levin wrote:
>> On Tue, May 01, 2018 at 04:54:48PM -0400, Theodore Y. Ts'o wrote:
>> >Post -rc3 or -rc4, in my opinion bug fixes should wait until the next
>> >merge window before they get merged at all. (And certainly features
>> >bugs should be Right Out.) And sure, bug fixes should certainly get
>> >more testing. So I guess my main objection is your making a blanket
>> >statement about all fixes, instead of breaking out regression fixes
>> >versus bug fixes. Since in my opinion they are very different animals...
>>
>> I understant your point, you want to make fixes available to testers as
>> soon as possible. This might make sense, as you've mentioned, in < -rc3.
>>
>> So yes, maybe the solution isn't to force -next, but rather add more
>> "quiet time" at the end of the cycle? Make special rules for -rc7/8? Or
>> even add a "test"/"beta" release at the end of the cycle?
>
>I disagree with the proposals above, and for multiple reasons :
> - leaving a known bug on purpose automatically degrades the quality of
> each release. Given that less than 100% of the fixes introduce
> regressions, by not merging any of these fixes, we'll end up with
> more bugs. That's a very bad idea.
>
> - this will give a worse image of dot-0 releases, and users will be
> even less interested in testing them, prefering to wait for the
> first stable version. In this case what's the point of dot-0 if it
> is known broken and nobody uses it ?
>
> - letting fixes rot longer on the developer side will send a very bad
> signal to developers : "we don't care about your bugs". Companies
> relying on contractors will have a harder time including fixes in
> the contract, as it will only cover what's needed to get the feature
> merged. Again this will put the focus on extremely fast and dirty
> development, given that fixes will not be considered during the same
> window.

I'm not advocating to keep bugs in. If there is a fix, but the developer
can't indicate that proper testing was done on the fix we should revert
the new feature rather than merge the untested fix in.

The way I see it, if a commit can get one or two tested-by, it's a good
alternative to a week in -next.

>I'd rather do the exact opposite except for those who introduce too many
>regressions : set up a delay penalty to developers who create too many
>regressions and make this penalty easy to check. I think it will generally
>not affect subsystem maintainers, unless they pull and push lots of crap
>without checking, of course. But it could prove very useful for those
>developing under contract, because companies employing them will want to
>ensure that their work will not be delayed due to a penalty. Thus is will
>be important for these developers to be more careful.
>
>After all, the person proposing a fix always knows better than anyone
>else if this fix was done seriously or not. Developers who do lots of
>testing before sending should not be penalized, and should get their
>fix merged immediately. Those who just send untested patches should be
>trusted much less.

I'm a bit worried about (social) side effects of a scheme like this.

>> From what I see, the same number of bugs-per-line-of-code applies for
>> commits accross all -rc releases, so while it makes sense to get a fix
>> in quickly at -rc1 to allow testing to continue, the same must not
>> happen during -rc8, but unfourtenately it does now.
>
>That's where I strongly disagree, since it would mean releasing with even
>more bugs than today.

Just don't release it. If we don't have a tested fix for a reported
regression either extend the release cycle (-rc10+) or just revert the
new feature and get it in the next merge window.

2018-05-02 19:47:04

by Sasha Levin

[permalink] [raw]
Subject: Re: [Ksummit-discuss] bug-introducing patches

On Wed, May 02, 2018 at 10:11:14AM +0200, Daniel Vetter wrote:
>On Tue, May 1, 2018 at 11:15 PM, Mark Brown <[email protected]> wrote:
>> On Tue, May 01, 2018 at 04:54:48PM -0400, Theodore Y. Ts'o wrote:
>>> I do think it's about AUTOSEL, because when I'm dealing with a
>>> regression, I want to get it fixed fast. Because the alternative is
>>> the merge-window commit getting reverted. AUTOSEL seems wants perfect
>>> patches that it can cherry pick once, as opposed to a case where if the
>>> user confirms that it fixes the regression, I want to send it to Linus
>>> quickly. I do *not* want it to marinate in linux-next for 1-2 weeks.
>>> I would much rather that *stable* hold off on picking up the patch for
>>> 1-2 weeks, but get it fixed in Linux HEAD sooner. If that means that
>>> the regression fix needs a further clean up, so be it.
>>
>> We've had issues with the automated testing systems in the past where a
>> maintainer has had a policy of letting things percoltate for a week
>> before sending to Linus and there's been a bug that caused a substantial
>> set of tests to fail to run (generally it's something that had unnoticed
>> dependencies in -next so wasn't caught there) - we essentially end up
>> not getting the affected bits of test coverage for that period of time
>> which is not helpful.
>
>So much agreed. For our CI we carry a constantly rolling set of fixup
>patches to keep it working, because regression fixes sometimes take
>too long. And too long here for our needs is measured in days/hours -
>developers start screaming pretty much immediately when our CI is down
>:-)
>
>Ofc I prefer if all subsystems ramp up pre-merge testing as much as
>possible (and with xfstests and stuff like that I think filesystems
>are leading here, if not consistently). But given the huge scope of
>the kernel we'll never reach 100%, and oddball regressions will be
>inevitable. Once a regression has crept through it imo really should
>get fixed asap, with no unecessary soaking times - get a better
>CI/kerneltests in place if you feel like you need to soak stuff.

Oh I agree with what you're saying, if you have a good testing setup
this is (usually) much better than just throwing stuff in -next, so I
didn't mean to force soaking every fix in -next for a few weeks.

As you said, the regression should be fixed "asap", not "immediately".
It should go through some sort of review and testing the maintainers are
happy with, but unfourtenately it doesn't happen now.

2018-05-02 19:53:21

by Sasha Levin

[permalink] [raw]
Subject: Re: bug-introducing patches

On Wed, May 02, 2018 at 05:32:37PM +0200, Geert Uytterhoeven wrote:
>Hi Sasha,
>
>On Tue, May 1, 2018 at 6:38 PM, Sasha Levin
><[email protected]> wrote:
>> Working on AUTOSEL, it became even more obvious to me how difficult it is for a
>> patch to get a proper review. Maintainers found it difficult to keep up with
>> the upstream work for their subsystem, and reviewing additional -stable patches
>> put even more load on them which some suggested would be more than what they
>> can handle.
>
>Thanks for your work!
>
>> - For some reason, the odds of a -rc commit to be targetted for -stable is
>> over 20%, while for merge window commits it's about 3%. I can't quite
>> explain why that happens, but this would suggest that -rc commits end up
>> hurting -stable pretty badly.
>
>Aren't more -rc commits targeted for -stable because they are bugfixes?
>Ideally, new features are supposed to be merged during the merge window,
>while -rc commits fix bugs.

new features can only be merged during a merge window, bug fixes can
be merged at any point.

>So they can be categorized like:
> 1. Plain -rc commits,

What's this exactly? -rc commits are only supposed to fix bugs.

> 2. -rc commits fixing a bug:
> a. in the same release cycle,
> b. in a previous release.
>
>2a assumes the bug was backported to -stable, too, doesn't it?

Bug fixes for features introduced in that release cycle won't be
backported to stable.

>Do you have statistics for which categories are most buggy?

I haven't broken it down to subsystems for a few reasons:

- My dataset is based on the Fixes: tag, some subsystems use it less
than others.
- Maintainers change, so even if one subsystem is being awesome about
it today, it might not be the case in a year.
- I don't really want to point fingers at a particular subsystem, I
think that this is an issue at the kernel level.

2018-05-02 20:03:44

by Willy Tarreau

[permalink] [raw]
Subject: Re: bug-introducing patches

On Wed, May 02, 2018 at 07:42:33PM +0000, Sasha Levin wrote:
> I'm not advocating to keep bugs in. If there is a fix, but the developer
> can't indicate that proper testing was done on the fix we should revert
> the new feature rather than merge the untested fix in.

If you're exclusively talking about newly merged features, I agree. But
I think that the initial point was not just about newly merged features.
Sometimes it will not work because other changes rely on this new feature
but the way I see it is that this kind of back-pressure should work well
enough to encourage developers to show they have valid reasons to trust
their fix.

> The way I see it, if a commit can get one or two tested-by, it's a good
> alternative to a week in -next.

Agreed. Even their own actually. And I'm not kidding. Those who run large
amounts of tests on certain patches could really mention is in tested-by,
as opposed to the most common cases where the code was just regularly
tested.

> >After all, the person proposing a fix always knows better than anyone
> >else if this fix was done seriously or not. Developers who do lots of
> >testing before sending should not be penalized, and should get their
> >fix merged immediately. Those who just send untested patches should be
> >trusted much less.
>
> I'm a bit worried about (social) side effects of a scheme like this.

Me as well, because it's still a bit early for this, people might not
be prepared to this yet. But if it were at least discussed and presented
as one of the possibilities for the long term, newcomers would arrive
here with this possibility in mind and would possibly join in better
conditions and possibly that ultimately this solution would only exist
as a threat against bad players but would never be used. Also there are
more and more places where people find it normal to be noted by others,
maybe it will really end up like this over the long term, who knows. At
the very least a first note for a contractor is "I contributed X commits
last year, my work never got reverted for bad quality".

> >> From what I see, the same number of bugs-per-line-of-code applies for
> >> commits accross all -rc releases, so while it makes sense to get a fix
> >> in quickly at -rc1 to allow testing to continue, the same must not
> >> happen during -rc8, but unfourtenately it does now.
> >
> >That's where I strongly disagree, since it would mean releasing with even
> >more bugs than today.
>
> Just don't release it. If we don't have a tested fix for a reported
> regression either extend the release cycle (-rc10+) or just revert the
> new feature and get it in the next merge window.

I agree in general, but the reality will often be different (think about
contractors for a limited time as I suggested). It should be considered
as a penalty against improper testing so that we don't even have to reach
this situation.

Willy


2018-05-02 20:42:27

by Geert Uytterhoeven

[permalink] [raw]
Subject: Re: bug-introducing patches

Hi Sasha,

On Wed, May 2, 2018 at 9:51 PM, Sasha Levin
<[email protected]> wrote:
> On Wed, May 02, 2018 at 05:32:37PM +0200, Geert Uytterhoeven wrote:
>>On Tue, May 1, 2018 at 6:38 PM, Sasha Levin
>><[email protected]> wrote:
>>> Working on AUTOSEL, it became even more obvious to me how difficult it is for a
>>> patch to get a proper review. Maintainers found it difficult to keep up with
>>> the upstream work for their subsystem, and reviewing additional -stable patches
>>> put even more load on them which some suggested would be more than what they
>>> can handle.
>>
>>Thanks for your work!
>>
>>> - For some reason, the odds of a -rc commit to be targetted for -stable is
>>> over 20%, while for merge window commits it's about 3%. I can't quite
>>> explain why that happens, but this would suggest that -rc commits end up
>>> hurting -stable pretty badly.
>>
>>Aren't more -rc commits targeted for -stable because they are bugfixes?
>>Ideally, new features are supposed to be merged during the merge window,
>>while -rc commits fix bugs.
>
> new features can only be merged during a merge window, bug fixes can
> be merged at any point.

I wrote "ideally". There's a big difference between theory and practice...

>>So they can be categorized like:
>> 1. Plain -rc commits,
>
> What's this exactly? -rc commits are only supposed to fix bugs.

... hence not all of them are fixes.

Sometimes fast-tracking a new feature or API reduces dependencies for the
next merge window. This is just one example of IMHO valid non-bugfix
-rc commits.

Between v4.17-rc1 and v4.17-rc3, there are 660 non-merge commits, of which
- 245 carry a Fixes tag,
- 196 carry a CC stable,
- 395 contain the string "fix".
(non-mutually exclusive)

That leaves us with 200 commits not falling in the bugfix category.

>> 2. -rc commits fixing a bug:
>> a. in the same release cycle,
>> b. in a previous release.
>>
>>2a assumes the bug was backported to -stable, too, doesn't it?
>
> Bug fixes for features introduced in that release cycle won't be
> backported to stable.

They do, if the original commit was introduced during the same cycle and
backported to stable.

>>Do you have statistics for which categories are most buggy?
>
> I haven't broken it down to subsystems for a few reasons:

I didn't mean break down by subsystem, but by category from the list above
(1, 2a, 2b).

Gr{oetje,eeting}s,

Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- [email protected]

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
-- Linus Torvalds

2018-05-03 00:06:59

by Theodore Ts'o

[permalink] [raw]
Subject: Re: [Ksummit-discuss] bug-introducing patches

On Wed, May 02, 2018 at 10:41:56PM +0200, Geert Uytterhoeven wrote:
>
> Between v4.17-rc1 and v4.17-rc3, there are 660 non-merge commits, of which
> - 245 carry a Fixes tag,
> - 196 carry a CC stable,
> - 395 contain the string "fix".
> (non-mutually exclusive)
>
> That leaves us with 200 commits not falling in the bugfix category.

Some non-bug fixes are allowed in -rc2. So perhaps what might be
interesting is to look at v4.16 (which is completed), and look at the
distribution of commits:

* regressions fixes (for bugs introduced during the current
release cycle)
* "normal" bug fixes
* commits which don't touch code (e.g., spelling or
documentation-only fixes)
* other commits (features or cleanup fixes)

at each rcX level. The historic "standard" has been feature commits
in -rc1 and -rc2 (tolerated, but ideally should before the merge
window), bug fixes / regressions in -rc3 and -rc4, and after -rc4,
regression fixes only. It would be interesting to see how well we
have been holding to the historical ideal.

It would then be intersting to use Sasha's analysis to see whether
there are more bug fixes caused by regression fixes versus normal bug
fixes, and whether or not they are common when fixes come "out of
cycle" --- for example, a non-regression bug fix in -rc5 or -rc6.

Because if that last is the case, then the prescription is very simple
and not controversial --- bug fixes found post -rc4 should be held to
the next merge window.

If the concern is regression fixes which require one or two tries
before they are fixed before 4.16-FINAL is released, then that's a
"life is hard for AUTOSEL" issue, and I suspect Sasha will find that
there is rather less sympathy for holding regression fixes for an
extra week or two.

If the concern is bug fixes that show up in -rc3 and -rc4, but which
aren't hitting linux-next first, then holding bug fixes in linux-next
for a week makes sense, and if that means that a bug fix found post
-rc3 needs to marinate in linux-next for a week, and then it then
misses the -rc4 "bug fix" deadline, we can have a discussion about
whether bug fixes should be allowed in -rc5 after a week's marination.

My personal opinion is "to hell with it, just wait until the next
merge window" --- but this can cause more work for the stable
maintainers since a lot of bug fixes would then land in -rc1.

Cheers,

- Ted

2018-05-03 00:40:17

by Guenter Roeck

[permalink] [raw]
Subject: Re: [Ksummit-discuss] bug-introducing patches

On 05/02/2018 05:06 PM, Theodore Y. Ts'o wrote:
> On Wed, May 02, 2018 at 10:41:56PM +0200, Geert Uytterhoeven wrote:
>>
>> Between v4.17-rc1 and v4.17-rc3, there are 660 non-merge commits, of which
>> - 245 carry a Fixes tag,
>> - 196 carry a CC stable,
>> - 395 contain the string "fix".
>> (non-mutually exclusive)
>>
>> That leaves us with 200 commits not falling in the bugfix category.
>
> Some non-bug fixes are allowed in -rc2. So perhaps what might be
> interesting is to look at v4.16 (which is completed), and look at the
> distribution of commits:
>
> * regressions fixes (for bugs introduced during the current
> release cycle)
> * "normal" bug fixes
> * commits which don't touch code (e.g., spelling or
> documentation-only fixes)
> * other commits (features or cleanup fixes)
>
> at each rcX level. The historic "standard" has been feature commits
> in -rc1 and -rc2 (tolerated, but ideally should before the merge
> window), bug fixes / regressions in -rc3 and -rc4, and after -rc4,
> regression fixes only. It would be interesting to see how well we
> have been holding to the historical ideal.
>
> It would then be intersting to use Sasha's analysis to see whether
> there are more bug fixes caused by regression fixes versus normal bug
> fixes, and whether or not they are common when fixes come "out of
> cycle" --- for example, a non-regression bug fix in -rc5 or -rc6.
>
> Because if that last is the case, then the prescription is very simple
> and not controversial --- bug fixes found post -rc4 should be held to
> the next merge window.
>

Holding up even fixes for severe bugs for 4-6 weeks ? Seriously, that is
unrealistic. Holding up the fix for the next SpeckHammer because it was not
ready before -rc4 ? I don't think so.

Even when not counting severe problems, you are adding lots of additional work
for those who do and want to rely on stable releases to merge in bug fixes.
Sure, I am at times annoyed having to deal with a regression in a stable
release, but it very much beats digging through various mailing lists for
pending patches to fix CVEs, or for crashes seen in the field, just because
they are held hostage by some restrictive process. Even worse, I'd end up
picking the regressions anyway because I can _not_ wait those 4-6 weeks
plus the time it takes for the fixes to show up in a stable release.

Really, that just makes the situation worse for everyone. We would be much
better off by further improving test coverage.

Guenter

2018-05-03 02:06:32

by Mark Brown

[permalink] [raw]
Subject: Re: [Ksummit-discuss] bug-introducing patches

On Wed, May 02, 2018 at 07:46:34PM +0000, Sasha Levin wrote:

> As you said, the regression should be fixed "asap", not "immediately".
> It should go through some sort of review and testing the maintainers are
> happy with, but unfourtenately it doesn't happen now.

Doesn't happen some of the time. It's not like this is a universal
problem.

Especially for driver specific things there's at times no realistic
prospect of getting useful independent review of fixes, the hardware
isn't always widely available and if the fix isn't a pure software thing
at some point you just have to trust the judgement of the vendor.


Attachments:
(No filename) (634.00 B)
signature.asc (499.00 B)
Download all attachments

2018-05-03 02:31:49

by Willy Tarreau

[permalink] [raw]
Subject: Re: [Ksummit-discuss] bug-introducing patches

On Wed, May 02, 2018 at 05:38:32PM -0700, Guenter Roeck wrote:
> > Because if that last is the case, then the prescription is very simple
> > and not controversial --- bug fixes found post -rc4 should be held to
> > the next merge window.
> >
>
> Holding up even fixes for severe bugs for 4-6 weeks ? Seriously, that is
> unrealistic. Holding up the fix for the next SpeckHammer because it was not
> ready before -rc4 ? I don't think so.

That's exactly what I explained earlier in this thread, it will actually
make the resulting kernels even worse as soon as there is less than 100%
regression (which is the case, since some fixes are valid). Postponing
valid fixes because some of them might be wrong is a bad idea. We need
to trust the developer regarding the test coverage and the developer has
to become trusted by openly indicating the type of testing run on the
patch. From there it will become easier to decide whether to revert a
whole patch set after a few failed fixes, or to take a few more fixes
in hopes that ultimately everything will be good.

Willy

2018-05-03 03:11:51

by Theodore Ts'o

[permalink] [raw]
Subject: Re: [Ksummit-discuss] bug-introducing patches

On Thu, May 03, 2018 at 11:05:50AM +0900, Mark Brown wrote:
> On Wed, May 02, 2018 at 07:46:34PM +0000, Sasha Levin wrote:
>
> > As you said, the regression should be fixed "asap", not "immediately".
> > It should go through some sort of review and testing the maintainers are
> > happy with, but unfourtenately it doesn't happen now.
>
> Doesn't happen some of the time. It's not like this is a universal
> problem.
>
> Especially for driver specific things there's at times no realistic
> prospect of getting useful independent review of fixes, the hardware
> isn't always widely available and if the fix isn't a pure software thing
> at some point you just have to trust the judgement of the vendor.

And sometimes the Demon Murphy will cause a regression fix for user A,
to cause breakage for slightly different hardware belonging to user B. :-(

- Ted



2018-05-03 03:54:13

by Guenter Roeck

[permalink] [raw]
Subject: Re: [Ksummit-discuss] bug-introducing patches

On 05/02/2018 08:10 PM, Theodore Y. Ts'o wrote:
> On Thu, May 03, 2018 at 11:05:50AM +0900, Mark Brown wrote:
>> On Wed, May 02, 2018 at 07:46:34PM +0000, Sasha Levin wrote:
>>
>>> As you said, the regression should be fixed "asap", not "immediately".
>>> It should go through some sort of review and testing the maintainers are
>>> happy with, but unfourtenately it doesn't happen now.
>>
>> Doesn't happen some of the time. It's not like this is a universal
>> problem.
>>
>> Especially for driver specific things there's at times no realistic
>> prospect of getting useful independent review of fixes, the hardware
>> isn't always widely available and if the fix isn't a pure software thing
>> at some point you just have to trust the judgement of the vendor.
>
> And sometimes the Demon Murphy will cause a regression fix for user A,
> to cause breakage for slightly different hardware belonging to user B. :-(
>

Believe me, I get my share of those. 7dac4a1726a9 ("ext4: add validity checks
for bitmap block numbers") and its fix 22be37acce25 (" ext4: fix bitmap
position validation") are pretty good examples. Yet, at the same time I had
to deal with three additional CVEs in the ext4 code. Even though the initial
fix for one of the four was buggy, I am glad that I got the other three through
stable releases.

As for -next, me and others stopped reporting bugs in it, because when we do
we tend to get flamed for the "noise". Is anyone aware (or cares) that mips
and nds32 images don't build ? Soaking clothes in an empty bathtub won't make
them wet, and bugs in code which no one builds, much less tests or uses, won't
be found.

I can only repeat - what we need is more sophisticated testing, not a more
restrictive process.

Guenter

2018-05-03 11:07:21

by Jani Nikula

[permalink] [raw]
Subject: Re: [Ksummit-discuss] bug-introducing patches

On Tue, 01 May 2018, "Theodore Y. Ts'o" <[email protected]> wrote:
> Post -rc3 or -rc4, in my opinion bug fixes should wait until the next
> merge window before they get merged at all.

What are -rc5 and later for then if not bug fixes? Baffled.

BR,
Jani.

--
Jani Nikula, Intel Open Source Technology Center

2018-05-03 11:44:32

by Al Viro

[permalink] [raw]
Subject: Re: bug-introducing patches

On Wed, May 02, 2018 at 10:41:56PM +0200, Geert Uytterhoeven wrote:

> Between v4.17-rc1 and v4.17-rc3, there are 660 non-merge commits, of which
> - 245 carry a Fixes tag,
> - 196 carry a CC stable,
> - 395 contain the string "fix".
> (non-mutually exclusive)

BTW, what about situations when we have a fix naturally split into
a series of 2-3 massaging equivalent transformations + one-liner fix
in the resulting tree?

2018-05-03 11:50:19

by Al Viro

[permalink] [raw]
Subject: Re: [Ksummit-discuss] bug-introducing patches

On Wed, May 02, 2018 at 08:06:20PM -0400, Theodore Y. Ts'o wrote:

> Because if that last is the case, then the prescription is very simple
> and not controversial --- bug fixes found post -rc4 should be held to
> the next merge window.

Provided it's not a known-to-be-exploited roothole, at least...

2018-05-03 12:09:13

by Greg Kroah-Hartman

[permalink] [raw]
Subject: Re: [Ksummit-discuss] bug-introducing patches

On Wed, May 02, 2018 at 08:52:29PM -0700, Guenter Roeck wrote:
> On 05/02/2018 08:10 PM, Theodore Y. Ts'o wrote:
> > On Thu, May 03, 2018 at 11:05:50AM +0900, Mark Brown wrote:
> > > On Wed, May 02, 2018 at 07:46:34PM +0000, Sasha Levin wrote:
> > >
> > > > As you said, the regression should be fixed "asap", not "immediately".
> > > > It should go through some sort of review and testing the maintainers are
> > > > happy with, but unfourtenately it doesn't happen now.
> > >
> > > Doesn't happen some of the time. It's not like this is a universal
> > > problem.
> > >
> > > Especially for driver specific things there's at times no realistic
> > > prospect of getting useful independent review of fixes, the hardware
> > > isn't always widely available and if the fix isn't a pure software thing
> > > at some point you just have to trust the judgement of the vendor.
> >
> > And sometimes the Demon Murphy will cause a regression fix for user A,
> > to cause breakage for slightly different hardware belonging to user B. :-(
> >
>
> Believe me, I get my share of those. 7dac4a1726a9 ("ext4: add validity checks
> for bitmap block numbers") and its fix 22be37acce25 (" ext4: fix bitmap
> position validation") are pretty good examples. Yet, at the same time I had
> to deal with three additional CVEs in the ext4 code. Even though the initial
> fix for one of the four was buggy, I am glad that I got the other three through
> stable releases.
>
> As for -next, me and others stopped reporting bugs in it, because when we do
> we tend to get flamed for the "noise". Is anyone aware (or cares) that mips
> and nds32 images don't build ? Soaking clothes in an empty bathtub won't make
> them wet, and bugs in code which no one builds, much less tests or uses, won't
> be found.
>
> I can only repeat - what we need is more sophisticated testing, not a more
> restrictive process.

I agree, and people are working on this. But we can always use more!

thanks,

greg k-h

2018-05-03 14:34:02

by James Bottomley

[permalink] [raw]
Subject: Re: [Ksummit-discuss] bug-introducing patches

On Thu, 2018-05-03 at 14:08 +0300, Jani Nikula wrote:
> On Tue, 01 May 2018, "Theodore Y. Ts'o" <[email protected]> wrote:
> > Post -rc3 or -rc4, in my opinion bug fixes should wait until the
> > next
> > merge window before they get merged at all.
>
> What are -rc5 and later for then if not bug fixes? Baffled.

They're definitely for bug fixes, but there's a spectrum: obvious bug
fixes with no side effects are easy to justify. More complex bug fixes
run the risk of having side effects which introduce other bugs, so
could potentially destabilize the -rc process. In SCSI we tend to look
at what the user visible effects of the bug are in the post -rc5 region
and if they're slight or wouldn't be visible to most users, we'll hold
them over. If the fix looks complex and we're not sure we caught the
ramifications, we often add it to the merge window tree with a cc to
stable and a note saying to wait X weeks before actually adding to the
stable tree just to make sure no side effects show up with wider
testing. So, as with most things, it's a judgment call for the
maintainer.

James


2018-05-03 14:47:51

by Sasha Levin

[permalink] [raw]
Subject: Re: [Ksummit-discuss] bug-introducing patches

On Wed, May 02, 2018 at 08:06:20PM -0400, Theodore Y. Ts'o wrote:
>On Wed, May 02, 2018 at 10:41:56PM +0200, Geert Uytterhoeven wrote:
>>
>> Between v4.17-rc1 and v4.17-rc3, there are 660 non-merge commits, of which
>> - 245 carry a Fixes tag,
>> - 196 carry a CC stable,
>> - 395 contain the string "fix".
>> (non-mutually exclusive)
>>
>> That leaves us with 200 commits not falling in the bugfix category.
>
>Some non-bug fixes are allowed in -rc2. So perhaps what might be
>interesting is to look at v4.16 (which is completed), and look at the
>distribution of commits:
>
> * regressions fixes (for bugs introduced during the current
> release cycle)
> * "normal" bug fixes
> * commits which don't touch code (e.g., spelling or
> documentation-only fixes)
> * other commits (features or cleanup fixes)
>
>at each rcX level. The historic "standard" has been feature commits
>in -rc1 and -rc2 (tolerated, but ideally should before the merge
>window), bug fixes / regressions in -rc3 and -rc4, and after -rc4,
>regression fixes only. It would be interesting to see how well we
>have been holding to the historical ideal.
>
>It would then be intersting to use Sasha's analysis to see whether
>there are more bug fixes caused by regression fixes versus normal bug
>fixes, and whether or not they are common when fixes come "out of
>cycle" --- for example, a non-regression bug fix in -rc5 or -rc6.
>
>Because if that last is the case, then the prescription is very simple
>and not controversial --- bug fixes found post -rc4 should be held to
>the next merge window.
>
>If the concern is regression fixes which require one or two tries
>before they are fixed before 4.16-FINAL is released, then that's a
>"life is hard for AUTOSEL" issue, and I suspect Sasha will find that
>there is rather less sympathy for holding regression fixes for an
>extra week or two.
>
>If the concern is bug fixes that show up in -rc3 and -rc4, but which
>aren't hitting linux-next first, then holding bug fixes in linux-next
>for a week makes sense, and if that means that a bug fix found post
>-rc3 needs to marinate in linux-next for a week, and then it then
>misses the -rc4 "bug fix" deadline, we can have a discussion about
>whether bug fixes should be allowed in -rc5 after a week's marination.
>
>My personal opinion is "to hell with it, just wait until the next
>merge window" --- but this can cause more work for the stable
>maintainers since a lot of bug fixes would then land in -rc1.

I'll work on breaking up the 4.16 commits into categories, but one
interesting statistic I've noticed while starting the work is:

Fixes in -rc cycles:
rc1 68
rc2 147
rc3 88
rc4 121
rc5 40
rc6 193
rc7 98

Average days in -next, for a fix, per -rc cycle:
rc1 27.25
rc2 21.4286
rc3 22.5114
rc4 18.281
rc5 14.65
rc6 12.6166
rc7 8.70408

Fixes for bugs not introduced in current merge window:
rc1 40
rc2 113
rc3 61
rc4 79
rc5 25
rc6 139
rc7 72

So for some reason, there is a rush to push fixes for older bugs (that
were not introduced in the current merge window) to the point that rc7
commits that only existed for a few days are merged in to address older
bugs.

2018-05-03 14:49:50

by Willy Tarreau

[permalink] [raw]
Subject: Re: [Ksummit-discuss] bug-introducing patches

On Thu, May 03, 2018 at 07:33:04AM -0700, James Bottomley wrote:
> They're definitely for bug fixes, but there's a spectrum: obvious bug
> fixes with no side effects are easy to justify. More complex bug fixes
> run the risk of having side effects which introduce other bugs, so
> could potentially destabilize the -rc process. In SCSI we tend to look
> at what the user visible effects of the bug are in the post -rc5 region
> and if they're slight or wouldn't be visible to most users, we'll hold
> them over. If the fix looks complex and we're not sure we caught the
> ramifications, we often add it to the merge window tree with a cc to
> stable and a note saying to wait X weeks before actually adding to the
> stable tree just to make sure no side effects show up with wider
> testing. So, as with most things, it's a judgment call for the
> maintainer.

For me this is the right, and responsible way to deal with bug fixes.
Self-control is much more efficient than random rejection and favors
a good analysis.

Willy

2018-05-03 14:54:00

by Willy Tarreau

[permalink] [raw]
Subject: Re: [Ksummit-discuss] bug-introducing patches

On Thu, May 03, 2018 at 02:46:14PM +0000, Sasha Levin wrote:
> I'll work on breaking up the 4.16 commits into categories, but one
> interesting statistic I've noticed while starting the work is:
>
> Fixes in -rc cycles:
> rc1 68
> rc2 147
> rc3 88
> rc4 121
> rc5 40
> rc6 193
> rc7 98
>
> Average days in -next, for a fix, per -rc cycle:
> rc1 27.25
> rc2 21.4286
> rc3 22.5114
> rc4 18.281
> rc5 14.65
> rc6 12.6166
> rc7 8.70408
>
> Fixes for bugs not introduced in current merge window:
> rc1 40
> rc2 113
> rc3 61
> rc4 79
> rc5 25
> rc6 139
> rc7 72
>
> So for some reason, there is a rush to push fixes for older bugs (that
> were not introduced in the current merge window) to the point that rc7
> commits that only existed for a few days are merged in to address older
> bugs.

IMHO it's because it's the time it takes for users to start to trust the
3rd or 4th stable release of the previous version, to switch to it, to
face a bug, to report it, and for the maintainer to write a fix.

I wouldn't be much surprised if you'd find that among those not introduced
in the current merge window, many were introduced in the previous release.

Willy

2018-05-03 14:56:08

by Sasha Levin

[permalink] [raw]
Subject: Re: [Ksummit-discuss] bug-introducing patches

On Wed, May 02, 2018 at 05:38:32PM -0700, Guenter Roeck wrote:
>On 05/02/2018 05:06 PM, Theodore Y. Ts'o wrote:
>>On Wed, May 02, 2018 at 10:41:56PM +0200, Geert Uytterhoeven wrote:
>>>
>>>Between v4.17-rc1 and v4.17-rc3, there are 660 non-merge commits, of which
>>> - 245 carry a Fixes tag,
>>> - 196 carry a CC stable,
>>> - 395 contain the string "fix".
>>>(non-mutually exclusive)
>>>
>>>That leaves us with 200 commits not falling in the bugfix category.
>>
>>Some non-bug fixes are allowed in -rc2. So perhaps what might be
>>interesting is to look at v4.16 (which is completed), and look at the
>>distribution of commits:
>>
>> * regressions fixes (for bugs introduced during the current
>> release cycle)
>> * "normal" bug fixes
>> * commits which don't touch code (e.g., spelling or
>> documentation-only fixes)
>> * other commits (features or cleanup fixes)
>>
>>at each rcX level. The historic "standard" has been feature commits
>>in -rc1 and -rc2 (tolerated, but ideally should before the merge
>>window), bug fixes / regressions in -rc3 and -rc4, and after -rc4,
>>regression fixes only. It would be interesting to see how well we
>>have been holding to the historical ideal.
>>
>>It would then be intersting to use Sasha's analysis to see whether
>>there are more bug fixes caused by regression fixes versus normal bug
>>fixes, and whether or not they are common when fixes come "out of
>>cycle" --- for example, a non-regression bug fix in -rc5 or -rc6.
>>
>>Because if that last is the case, then the prescription is very simple
>>and not controversial --- bug fixes found post -rc4 should be held to
>>the next merge window.
>>
>
>Holding up even fixes for severe bugs for 4-6 weeks ? Seriously, that is
>unrealistic. Holding up the fix for the next SpeckHammer because it was not
>ready before -rc4 ? I don't think so.

For severe problems, the patch usually gets more than enough reviews and
testing, so I don't see a need to soak it in -next more than some
minimal amount of time to get bot coverage.

However, these things show up only a few times per year. Most of the
fixes even in late -rc cycles are for older bugs that aren't too
critical. We can't base our decision on severe bugs that get exceptional
treatment anyways (see PTI getting pushed in -stable).

>Even when not counting severe problems, you are adding lots of additional work
>for those who do and want to rely on stable releases to merge in bug fixes.
>Sure, I am at times annoyed having to deal with a regression in a stable
>release, but it very much beats digging through various mailing lists for
>pending patches to fix CVEs, or for crashes seen in the field, just because
>they are held hostage by some restrictive process. Even worse, I'd end up
>picking the regressions anyway because I can _not_ wait those 4-6 weeks
>plus the time it takes for the fixes to show up in a stable release.

I think that for -stable we don't have a good idea how soon we want to
merge patches in. On one hand enterprise distro folks complain we're
jumping the gun, and on the other hand folks like yourself claim we're
too slow :)

>Really, that just makes the situation worse for everyone. We would be much
>better off by further improving test coverage.

I'm definitely not saying "no" to more test coverage, but these are work
streams that can happen in parallel.

2018-05-03 15:02:38

by Sasha Levin

[permalink] [raw]
Subject: Re: [Ksummit-discuss] bug-introducing patches

On Thu, May 03, 2018 at 04:52:05PM +0200, Willy Tarreau wrote:
>On Thu, May 03, 2018 at 02:46:14PM +0000, Sasha Levin wrote:
>> I'll work on breaking up the 4.16 commits into categories, but one
>> interesting statistic I've noticed while starting the work is:
>>
>> Fixes in -rc cycles:
>> rc1 68
>> rc2 147
>> rc3 88
>> rc4 121
>> rc5 40
>> rc6 193
>> rc7 98
>>
>> Average days in -next, for a fix, per -rc cycle:
>> rc1 27.25
>> rc2 21.4286
>> rc3 22.5114
>> rc4 18.281
>> rc5 14.65
>> rc6 12.6166
>> rc7 8.70408
>>
>> Fixes for bugs not introduced in current merge window:
>> rc1 40
>> rc2 113
>> rc3 61
>> rc4 79
>> rc5 25
>> rc6 139
>> rc7 72
>>
>> So for some reason, there is a rush to push fixes for older bugs (that
>> were not introduced in the current merge window) to the point that rc7
>> commits that only existed for a few days are merged in to address older
>> bugs.
>
>IMHO it's because it's the time it takes for users to start to trust the
>3rd or 4th stable release of the previous version, to switch to it, to
>face a bug, to report it, and for the maintainer to write a fix.
>
>I wouldn't be much surprised if you'd find that among those not introduced
>in the current merge window, many were introduced in the previous release.

Interesting. Here it is for v4.16-rcX fixes that fix something
introduced before v4.14:

rc1 30
rc2 87
rc3 51
rc4 68
rc5 23
rc6 113
rc7 61

So I'm not sure if what you described is really the case.

2018-05-03 15:06:43

by Sasha Levin

[permalink] [raw]
Subject: Re: [Ksummit-discuss] bug-introducing patches

On Thu, May 03, 2018 at 04:48:50PM +0200, Willy Tarreau wrote:
>On Thu, May 03, 2018 at 07:33:04AM -0700, James Bottomley wrote:
>> They're definitely for bug fixes, but there's a spectrum: obvious bug
>> fixes with no side effects are easy to justify. More complex bug fixes
>> run the risk of having side effects which introduce other bugs, so
>> could potentially destabilize the -rc process. In SCSI we tend to look
>> at what the user visible effects of the bug are in the post -rc5 region
>> and if they're slight or wouldn't be visible to most users, we'll hold
>> them over. If the fix looks complex and we're not sure we caught the
>> ramifications, we often add it to the merge window tree with a cc to
>> stable and a note saying to wait X weeks before actually adding to the
>> stable tree just to make sure no side effects show up with wider
>> testing. So, as with most things, it's a judgment call for the
>> maintainer.
>
>For me this is the right, and responsible way to deal with bug fixes.
>Self-control is much more efficient than random rejection and favors
>a good analysis.

I think that the ideal outcome of this discussion, at least for me, is a
tool to go under scripts/ that would allow maintainers to get some sort
of (quantifiable) data that will indicate how well the patch was tested via
the regular channels.

At which point it's the maintainer's judgement call on whether he wants
to grab the patch or wait for more tests or reviews.

This is very similar to what James has described, it just needs to leave
his brain and turn into code :)

2018-05-03 15:28:17

by James Bottomley

[permalink] [raw]
Subject: Re: [Ksummit-discuss] bug-introducing patches

On Thu, 2018-05-03 at 15:06 +0000, Sasha Levin via Ksummit-discuss
wrote:
> On Thu, May 03, 2018 at 04:48:50PM +0200, Willy Tarreau wrote:
> > On Thu, May 03, 2018 at 07:33:04AM -0700, James Bottomley wrote:
> > > They're definitely for bug fixes, but there's a spectrum: obvious
> > > bug fixes with no side effects are easy to justify.  More complex
> > > bug fixes run the risk of having side effects which introduce
> > > other bugs, so could potentially destabilize the -rc process.  In
> > > SCSI we tend to look at what the user visible effects of the bug
> > > are in the post -rc5 region and if they're slight or wouldn't be
> > > visible to most users, we'll hold them over.  If the fix looks
> > > complex and we're not sure we caught the ramifications, we often
> > > add it to the merge window tree with a cc to stable and a note
> > > saying to wait X weeks before actually adding to the
> > > stable tree just to make sure no side effects show up with wider
> > > testing.  So, as with most things, it's a judgment call for the
> > > maintainer.
> >
> > For me this is the right, and responsible way to deal with bug
> > fixes. Self-control is much more efficient than random rejection
> > and favors a good analysis.
>
> I think that the ideal outcome of this discussion, at least for me,
> is a tool to go under scripts/ that would allow maintainers to get
> some sort of (quantifiable) data that will indicate how well the
> patch was tested via the regular channels.
>
> At which point it's the maintainer's judgement call on whether he
> wants to grab the patch or wait for more tests or reviews.
>
> This is very similar to what James has described, it just needs to
> leave his brain and turn into code :)

I appreciate the sentiment, but if we could script taste, we'd have
replaced Linus with something far less cantankerous a long time ago ...

It's also a sad fact that a lot of things which look like obvious fixes
actually turn out not to be so with later testing. This is why the
user visibility test is paramount. If a bug fix has no real user
visible effects, it's often better to defer it no matter how obvious it
looks, which is why the static code checkers often get short shrift
before a merge window.

A script measuring user visibility would be nice, but looks a bit
complex ...

James


2018-05-03 15:44:23

by Sasha Levin

[permalink] [raw]
Subject: Re: [Ksummit-discuss] bug-introducing patches

On Thu, May 03, 2018 at 08:27:48AM -0700, James Bottomley wrote:
>On Thu, 2018-05-03 at 15:06 +0000, Sasha Levin via Ksummit-discuss
>wrote:
>> On Thu, May 03, 2018 at 04:48:50PM +0200, Willy Tarreau wrote:
>> > On Thu, May 03, 2018 at 07:33:04AM -0700, James Bottomley wrote:
>> > > They're definitely for bug fixes, but there's a spectrum: obvious
>> > > bug fixes with no side effects are easy to justify.??More complex
>> > > bug fixes run the risk of having side effects which introduce
>> > > other bugs, so could potentially destabilize the -rc process.??In
>> > > SCSI we tend to look at what the user visible effects of the bug
>> > > are in the post -rc5 region and if they're slight or wouldn't be
>> > > visible to most users, we'll hold them over.??If the fix looks
>> > > complex and we're not sure we caught the ramifications, we often
>> > > add it to the merge window tree with a cc to stable and a note
>> > > saying to wait X weeks before actually adding to the
>> > > stable tree just to make sure no side effects show up with wider
>> > > testing.??So, as with most things, it's a judgment call for the
>> > > maintainer.
>> >
>> > For me this is the right, and responsible way to deal with bug
>> > fixes. Self-control is much more efficient than random rejection
>> > and favors a good analysis.
>>
>> I think that the ideal outcome of this discussion, at least for me,
>> is a tool to go under scripts/ that would allow maintainers to get
>> some sort of (quantifiable) data that will indicate how well the
>> patch was tested via the regular channels.
>>
>> At which point it's the maintainer's judgement call on whether he
>> wants to grab the patch or wait for more tests or reviews.
>>
>> This is very similar to what James has described, it just needs to
>> leave his brain and turn into code :)
>
>I appreciate the sentiment, but if we could script taste, we'd have
>replaced Linus with something far less cantankerous a long time ago ...

Linus, IMO, is getting replaced. Look at how many functions he used to
do 10 years ago he's no longer responsible for.

One of the most obvious examples is -next, where most integration issues
are resolved before they even reach to Linus.

This is good for the community, as it allows us make the process better
and scale out. It is also good for Linus, as I'm not sure how long he'd
last if he still had to edit patches by hand too often. Instead, he gets
to play with things that interest him more where his is irreplaceable.

>It's also a sad fact that a lot of things which look like obvious fixes
>actually turn out not to be so with later testing. This is why the
>user visibility test is paramount. If a bug fix has no real user
>visible effects, it's often better to defer it no matter how obvious it
>looks, which is why the static code checkers often get short shrift
>before a merge window.
>
>A script measuring user visibility would be nice, but looks a bit
>complex ...

It is, but I think it's worthwhile. Would something that'll show you
things like:

- How long a patch has been in -next?
- How many replies/reviews/comments it got on a mailing list?
- Did the 0day bot test it?
- Did syzbot fuzz it? for how long?
- If it references a bugzilla of some sort, how many
comments/reviews/etc it got there?
- Is it -stable material, or does it fix a regression in the current
merge window?
- If subsystem has custom testing rig, results from those tests

be a step in the right way? is it something you'd use to make decisions
on whether you'd take a patch in?

2018-05-03 15:49:45

by Guenter Roeck

[permalink] [raw]
Subject: Re: [Ksummit-discuss] bug-introducing patches

On Thu, May 03, 2018 at 02:55:36PM +0000, Sasha Levin wrote:
> On Wed, May 02, 2018 at 05:38:32PM -0700, Guenter Roeck wrote:
> >On 05/02/2018 05:06 PM, Theodore Y. Ts'o wrote:
> >>On Wed, May 02, 2018 at 10:41:56PM +0200, Geert Uytterhoeven wrote:
> >>>
> >>>Between v4.17-rc1 and v4.17-rc3, there are 660 non-merge commits, of which
> >>> - 245 carry a Fixes tag,
> >>> - 196 carry a CC stable,
> >>> - 395 contain the string "fix".
> >>>(non-mutually exclusive)
> >>>
> >>>That leaves us with 200 commits not falling in the bugfix category.
> >>
> >>Some non-bug fixes are allowed in -rc2. So perhaps what might be
> >>interesting is to look at v4.16 (which is completed), and look at the
> >>distribution of commits:
> >>
> >> * regressions fixes (for bugs introduced during the current
> >> release cycle)
> >> * "normal" bug fixes
> >> * commits which don't touch code (e.g., spelling or
> >> documentation-only fixes)
> >> * other commits (features or cleanup fixes)
> >>
> >>at each rcX level. The historic "standard" has been feature commits
> >>in -rc1 and -rc2 (tolerated, but ideally should before the merge
> >>window), bug fixes / regressions in -rc3 and -rc4, and after -rc4,
> >>regression fixes only. It would be interesting to see how well we
> >>have been holding to the historical ideal.
> >>
> >>It would then be intersting to use Sasha's analysis to see whether
> >>there are more bug fixes caused by regression fixes versus normal bug
> >>fixes, and whether or not they are common when fixes come "out of
> >>cycle" --- for example, a non-regression bug fix in -rc5 or -rc6.
> >>
> >>Because if that last is the case, then the prescription is very simple
> >>and not controversial --- bug fixes found post -rc4 should be held to
> >>the next merge window.
> >>
> >
> >Holding up even fixes for severe bugs for 4-6 weeks ? Seriously, that is
> >unrealistic. Holding up the fix for the next SpeckHammer because it was not
> >ready before -rc4 ? I don't think so.
>
> For severe problems, the patch usually gets more than enough reviews and
> testing, so I don't see a need to soak it in -next more than some
> minimal amount of time to get bot coverage.
>
> However, these things show up only a few times per year. Most of the
> fixes even in late -rc cycles are for older bugs that aren't too
> critical. We can't base our decision on severe bugs that get exceptional
> treatment anyways (see PTI getting pushed in -stable).
>
> >Even when not counting severe problems, you are adding lots of additional work
> >for those who do and want to rely on stable releases to merge in bug fixes.
> >Sure, I am at times annoyed having to deal with a regression in a stable
> >release, but it very much beats digging through various mailing lists for
> >pending patches to fix CVEs, or for crashes seen in the field, just because
> >they are held hostage by some restrictive process. Even worse, I'd end up
> >picking the regressions anyway because I can _not_ wait those 4-6 weeks
> >plus the time it takes for the fixes to show up in a stable release.
>
> I think that for -stable we don't have a good idea how soon we want to
> merge patches in. On one hand enterprise distro folks complain we're
> jumping the gun, and on the other hand folks like yourself claim we're
> too slow :)
>

You are misquoting me. I am saying that it would be a bad idea to hold up
bug fixes after -rc4, which is quite different to saying that patches
don't make it into stable releases fast enough. I am perfectly happy to
wait a week or so for a patch to soak in _mainline_ before being applied
to stable.

I am absolutely _not_ happy with the number of patches making it into
-stable releases recently. I am especially very concerned that the current
flurry of patches queued for -stable will destabilize pretty much all
stable releases, and pretty badly, for that matter. I am seriously
contemplating not to integrate the next few stable releases into ChromeOS
for that very reason. That would be a different discussion, though.

Guenter

2018-05-03 15:57:38

by Willy Tarreau

[permalink] [raw]
Subject: Re: [Ksummit-discuss] bug-introducing patches

On Thu, May 03, 2018 at 08:27:48AM -0700, James Bottomley wrote:
> It's also a sad fact that a lot of things which look like obvious fixes
> actually turn out not to be so with later testing. This is why the
> user visibility test is paramount. If a bug fix has no real user
> visible effects, it's often better to defer it no matter how obvious it
> looks, which is why the static code checkers often get short shrift
> before a merge window.
>
> A script measuring user visibility would be nice, but looks a bit
> complex ...

I totally agree with this and it matches my experience in haproxy. We
have had series of fixes that broke something else in very subtle ways
that made us want to improve non-reg, but many of the times we noted
that reg testing would hardly spot them given that the failures require
so many conditions to happen only once every million that it's hopeless.
It's just that some users are (un)lucky enough to meet all the conditions
at once very often and to be very sensitive to one error per million.

User exposure is needed. Having multiple stable release ensures everyone
gets their expected level of trust. Those on -rc want to see bugs before
they hit their users. Regressions are bad and require self-moderation and
self-estimation of the amount of trust in one's code, but they're better
in -rc than in -stable. I do happen to write some fixes I'm not totally
sure about and prefer not to backport them immediately. Users value
transparency because that helps them take safe decisions. If I say "this
is my fix, but I'd love more testing as I'm not yet sold on it", I'll get
some testers, but not the ones complaining that I broke their setup. Only
later it makes sense to progressively backport.

I have broken stable releases many times with failed backports. Almost
every time it was my fault due to incomplete testing. I could argue that
once you've built one hundred times in a week-end you're probably a bit
more lenient about next builds, or whatever. But in the end I was the
one breaking a working version. Seeing my branches picked up by Guenter
was a huge relief and it started to spot many build issues that I could
not figure myself. It doesn't make remaining bugs less important but at
least they are easier to swallow, to spot and to address.

What's not acceptable is rushed fixes that have obvious side effects that
could have been caught by closer analysis or better testing. It always
happens but it must not happen too often for the same person/subsystem.
This I think is where the line must be drawn. When Linus shouts once in
a while, it's a reminder for all others. Tune the potentiometer of his
detection threshold a bit lower and we'll get less regressions because
it's never pleasant to be called stupid in public like he does.

Willy

2018-05-03 16:02:15

by Willy Tarreau

[permalink] [raw]
Subject: Re: [Ksummit-discuss] bug-introducing patches

On Thu, May 03, 2018 at 03:01:08PM +0000, Sasha Levin wrote:
> On Thu, May 03, 2018 at 04:52:05PM +0200, Willy Tarreau wrote:
> >I wouldn't be much surprised if you'd find that among those not introduced
> >in the current merge window, many were introduced in the previous release.
>
> Interesting. Here it is for v4.16-rcX fixes that fix something
> introduced before v4.14:
>
> rc1 30
> rc2 87
> rc3 51
> rc4 68
> rc5 23
> rc6 113
> rc7 61
>
> So I'm not sure if what you described is really the case.

This is rather interesting and probably deserves some analysis or
explanation. I agree that probably a number of the 61 fixes in rc7
could have cooked a little bit more if they fixed 5 months-old bugs.

Willy


2018-05-03 16:03:39

by Sasha Levin

[permalink] [raw]
Subject: Re: [Ksummit-discuss] bug-introducing patches

On Thu, May 03, 2018 at 08:49:11AM -0700, Guenter Roeck wrote:
>On Thu, May 03, 2018 at 02:55:36PM +0000, Sasha Levin wrote:
>> On Wed, May 02, 2018 at 05:38:32PM -0700, Guenter Roeck wrote:
>> >On 05/02/2018 05:06 PM, Theodore Y. Ts'o wrote:
>> >>On Wed, May 02, 2018 at 10:41:56PM +0200, Geert Uytterhoeven wrote:
>> >>>
>> >>>Between v4.17-rc1 and v4.17-rc3, there are 660 non-merge commits, of which
>> >>> - 245 carry a Fixes tag,
>> >>> - 196 carry a CC stable,
>> >>> - 395 contain the string "fix".
>> >>>(non-mutually exclusive)
>> >>>
>> >>>That leaves us with 200 commits not falling in the bugfix category.
>> >>
>> >>Some non-bug fixes are allowed in -rc2. So perhaps what might be
>> >>interesting is to look at v4.16 (which is completed), and look at the
>> >>distribution of commits:
>> >>
>> >> * regressions fixes (for bugs introduced during the current
>> >> release cycle)
>> >> * "normal" bug fixes
>> >> * commits which don't touch code (e.g., spelling or
>> >> documentation-only fixes)
>> >> * other commits (features or cleanup fixes)
>> >>
>> >>at each rcX level. The historic "standard" has been feature commits
>> >>in -rc1 and -rc2 (tolerated, but ideally should before the merge
>> >>window), bug fixes / regressions in -rc3 and -rc4, and after -rc4,
>> >>regression fixes only. It would be interesting to see how well we
>> >>have been holding to the historical ideal.
>> >>
>> >>It would then be intersting to use Sasha's analysis to see whether
>> >>there are more bug fixes caused by regression fixes versus normal bug
>> >>fixes, and whether or not they are common when fixes come "out of
>> >>cycle" --- for example, a non-regression bug fix in -rc5 or -rc6.
>> >>
>> >>Because if that last is the case, then the prescription is very simple
>> >>and not controversial --- bug fixes found post -rc4 should be held to
>> >>the next merge window.
>> >>
>> >
>> >Holding up even fixes for severe bugs for 4-6 weeks ? Seriously, that is
>> >unrealistic. Holding up the fix for the next SpeckHammer because it was not
>> >ready before -rc4 ? I don't think so.
>>
>> For severe problems, the patch usually gets more than enough reviews and
>> testing, so I don't see a need to soak it in -next more than some
>> minimal amount of time to get bot coverage.
>>
>> However, these things show up only a few times per year. Most of the
>> fixes even in late -rc cycles are for older bugs that aren't too
>> critical. We can't base our decision on severe bugs that get exceptional
>> treatment anyways (see PTI getting pushed in -stable).
>>
>> >Even when not counting severe problems, you are adding lots of additional work
>> >for those who do and want to rely on stable releases to merge in bug fixes.
>> >Sure, I am at times annoyed having to deal with a regression in a stable
>> >release, but it very much beats digging through various mailing lists for
>> >pending patches to fix CVEs, or for crashes seen in the field, just because
>> >they are held hostage by some restrictive process. Even worse, I'd end up
>> >picking the regressions anyway because I can _not_ wait those 4-6 weeks
>> >plus the time it takes for the fixes to show up in a stable release.
>>
>> I think that for -stable we don't have a good idea how soon we want to
>> merge patches in. On one hand enterprise distro folks complain we're
>> jumping the gun, and on the other hand folks like yourself claim we're
>> too slow :)
>>
>
>You are misquoting me. I am saying that it would be a bad idea to hold up
>bug fixes after -rc4, which is quite different to saying that patches
>don't make it into stable releases fast enough. I am perfectly happy to
>wait a week or so for a patch to soak in _mainline_ before being applied
>to stable.

Most bug fixes that go in at that point are fixes for previous released
kernels, what's the harm in keeping them around for longer?

I'm not saying that it should be some arbitrary rule for everyone, but
just suggesting that maintainers should exercise more caution merging
untested commits that don't even fix a current regression.

w.r.t stable, as you just said, you're fine with a week or two, the
enterprise folks (as well as Ted, to some extend, in this thread)
suggest that this should be a month+

>I am absolutely _not_ happy with the number of patches making it into
>-stable releases recently. I am especially very concerned that the current
>flurry of patches queued for -stable will destabilize pretty much all
>stable releases, and pretty badly, for that matter. I am seriously
>contemplating not to integrate the next few stable releases into ChromeOS
>for that very reason. That would be a different discussion, though.

For AUTOSEL, I'd be happy to learn of issues you encounter and address
them in my process.

I've been submitting automatically selected patches for over a year now
and the track record for regressions is on par with patches that are
tagged for stable.

2018-05-03 16:15:47

by Sasha Levin

[permalink] [raw]
Subject: Re: [Ksummit-discuss] bug-introducing patches

On Thu, May 03, 2018 at 06:00:46PM +0200, Willy Tarreau wrote:
>On Thu, May 03, 2018 at 03:01:08PM +0000, Sasha Levin wrote:
>> On Thu, May 03, 2018 at 04:52:05PM +0200, Willy Tarreau wrote:
>> >I wouldn't be much surprised if you'd find that among those not introduced
>> >in the current merge window, many were introduced in the previous release.
>>
>> Interesting. Here it is for v4.16-rcX fixes that fix something
>> introduced before v4.14:
>>
>> rc1 30
>> rc2 87
>> rc3 51
>> rc4 68
>> rc5 23
>> rc6 113
>> rc7 61
>>
>> So I'm not sure if what you described is really the case.
>
>This is rather interesting and probably deserves some analysis or
>explanation. I agree that probably a number of the 61 fixes in rc7
>could have cooked a little bit more if they fixed 5 months-old bugs.

I tried looking at a few commits that came in on -rc7, and I see quite a
few cases where a commit was merged to Linus' tree in about 24 hours
after it was authored. Or maintainers who just wrote it, pushed it in,
and shipped in to Linus.

I've attached the data I used. The columns are as follows:

1. Commit ID
2. When was it merged
3. How many days it spent in -next
4. What commit did it fix
5. When was that commit merged


Attachments:
416_fixes.txt (69.67 kB)
416_fixes.txt

2018-05-03 16:37:51

by Willy Tarreau

[permalink] [raw]
Subject: Re: [Ksummit-discuss] bug-introducing patches

On Thu, May 03, 2018 at 04:14:57PM +0000, Sasha Levin wrote:
> I tried looking at a few commits that came in on -rc7, and I see quite a
> few cases where a commit was merged to Linus' tree in about 24 hours
> after it was authored. Or maintainers who just wrote it, pushed it in,
> and shipped in to Linus.
>
> I've attached the data I used. The columns are as follows:
>
> 1. Commit ID
> 2. When was it merged
> 3. How many days it spent in -next
> 4. What commit did it fix
> 5. When was that commit merged

> b6cdbc85234b v4.16-rc7 5 ca254490c8df v4.3
> 82dd0d2a9a76 v4.16-rc7 5 8f58336d3f78 v4.2
> 5807b22c9164 v4.16-rc7 5 6c8702c60b88 v4.9
> f97c3dc3c0e8 v4.16-rc7 5 4c4dbb4a7363 v4.15
(...)

I like this (not what was done but the analysis).

I'd argue that a small part of them there are very likely valid reasons
(really obvious fix, security issue etc) but it seems there are quite a
large number of them here.

Now I understand what makes me uneasy with what I'm seeing here. As I
mentioned, -rc is for people who want to see bugs before their users.
-rc7 will ensure almost everyone discovers the fix at the same time,
because the next version will be 4.16, the first of a stable release,
the one that users are expected to trust.

So probably that we have to educate/encourage developers *not* to submit
fixes for old bugs that late in the cycle and to rather wait for the next
version so that it cooks in -rc for a while before hitting users, knowing
that these fixes will be backported to stable anyway once considered valid.

Just like Greg has its "WTF" script to remind some developers that their
patch is not suited to -stable, I think you could, based on your work,
try to spot regressions introduced by late patches that fall in the
category you've filtered and emit such WTF messages to the original
patch's authors/committers.

It's important to do it only when these patches cause breakage though,
because we don't want to needlessly delay fixes when they're considered
certain or well tested. Only when they cause trouble.

For me the rule seems simple to understand, every submitter should
think like this late in the cycle :

"you're sending a patch that is going to be part of a stable kernel
in no more than 2 weeks, possibly affecting all users upgrading to
that kernel if you did something wrong. Are you really certain you
want this patch merged now, that it got sufficient testing and that
it cannot wait for next -rc1 to get broader exposure first ?"

I'm pretty sure that most of the time it will be "sure I want it now"
and there will be no problem, which is fine as it automatically reduces
the number of bugs in releases. Some may reconsider their submission.
Some may get caught by your automated script if a later commit fixes
an issue introduced by their patch. And there public shaming is the
only option (or maybe only the second time if you really want to be
nice).

Willy

2018-05-03 16:51:40

by Justin Forbes

[permalink] [raw]
Subject: Re: [Ksummit-discuss] bug-introducing patches

On Thu, May 3, 2018 at 11:02 AM, Sasha Levin
<[email protected]> wrote:
> On Thu, May 03, 2018 at 08:49:11AM -0700, Guenter Roeck wrote:
>>On Thu, May 03, 2018 at 02:55:36PM +0000, Sasha Levin wrote:
>>> On Wed, May 02, 2018 at 05:38:32PM -0700, Guenter Roeck wrote:
>>> >On 05/02/2018 05:06 PM, Theodore Y. Ts'o wrote:
>>> >>On Wed, May 02, 2018 at 10:41:56PM +0200, Geert Uytterhoeven wrote:
>>> >>>
>>> >>>Between v4.17-rc1 and v4.17-rc3, there are 660 non-merge commits, of which
>>> >>> - 245 carry a Fixes tag,
>>> >>> - 196 carry a CC stable,
>>> >>> - 395 contain the string "fix".
>>> >>>(non-mutually exclusive)
>>> >>>
>>> >>>That leaves us with 200 commits not falling in the bugfix category.
>>> >>
>>> >>Some non-bug fixes are allowed in -rc2. So perhaps what might be
>>> >>interesting is to look at v4.16 (which is completed), and look at the
>>> >>distribution of commits:
>>> >>
>>> >> * regressions fixes (for bugs introduced during the current
>>> >> release cycle)
>>> >> * "normal" bug fixes
>>> >> * commits which don't touch code (e.g., spelling or
>>> >> documentation-only fixes)
>>> >> * other commits (features or cleanup fixes)
>>> >>
>>> >>at each rcX level. The historic "standard" has been feature commits
>>> >>in -rc1 and -rc2 (tolerated, but ideally should before the merge
>>> >>window), bug fixes / regressions in -rc3 and -rc4, and after -rc4,
>>> >>regression fixes only. It would be interesting to see how well we
>>> >>have been holding to the historical ideal.
>>> >>
>>> >>It would then be intersting to use Sasha's analysis to see whether
>>> >>there are more bug fixes caused by regression fixes versus normal bug
>>> >>fixes, and whether or not they are common when fixes come "out of
>>> >>cycle" --- for example, a non-regression bug fix in -rc5 or -rc6.
>>> >>
>>> >>Because if that last is the case, then the prescription is very simple
>>> >>and not controversial --- bug fixes found post -rc4 should be held to
>>> >>the next merge window.
>>> >>
>>> >
>>> >Holding up even fixes for severe bugs for 4-6 weeks ? Seriously, that is
>>> >unrealistic. Holding up the fix for the next SpeckHammer because it was not
>>> >ready before -rc4 ? I don't think so.
>>>
>>> For severe problems, the patch usually gets more than enough reviews and
>>> testing, so I don't see a need to soak it in -next more than some
>>> minimal amount of time to get bot coverage.
>>>
>>> However, these things show up only a few times per year. Most of the
>>> fixes even in late -rc cycles are for older bugs that aren't too
>>> critical. We can't base our decision on severe bugs that get exceptional
>>> treatment anyways (see PTI getting pushed in -stable).
>>>
>>> >Even when not counting severe problems, you are adding lots of additional work
>>> >for those who do and want to rely on stable releases to merge in bug fixes.
>>> >Sure, I am at times annoyed having to deal with a regression in a stable
>>> >release, but it very much beats digging through various mailing lists for
>>> >pending patches to fix CVEs, or for crashes seen in the field, just because
>>> >they are held hostage by some restrictive process. Even worse, I'd end up
>>> >picking the regressions anyway because I can _not_ wait those 4-6 weeks
>>> >plus the time it takes for the fixes to show up in a stable release.
>>>
>>> I think that for -stable we don't have a good idea how soon we want to
>>> merge patches in. On one hand enterprise distro folks complain we're
>>> jumping the gun, and on the other hand folks like yourself claim we're
>>> too slow :)
>>>
>>
>>You are misquoting me. I am saying that it would be a bad idea to hold up
>>bug fixes after -rc4, which is quite different to saying that patches
>>don't make it into stable releases fast enough. I am perfectly happy to
>>wait a week or so for a patch to soak in _mainline_ before being applied
>>to stable.
>
> Most bug fixes that go in at that point are fixes for previous released
> kernels, what's the harm in keeping them around for longer?
>
> I'm not saying that it should be some arbitrary rule for everyone, but
> just suggesting that maintainers should exercise more caution merging
> untested commits that don't even fix a current regression.
>
There is a balance here. In the past, one of the biggest complaints we
had as distro maintainers was that known regressions get reported, and
fixed, and then the maintainer would sit on the fix until the next
merge window. This happened even for trivial fixes. And not being in
tree does keep it out of stable. This has improved greatly recently.
Perhaps things have over compensated, but I don' t think that putting
a blanket rule out there is the answer. Just perhaps some best
practices for test coverage.

> w.r.t stable, as you just said, you're fine with a week or two, the
> enterprise folks (as well as Ted, to some extend, in this thread)
> suggest that this should be a month+

I don' t have an issue with some things percolating in mainline for a
bit before being pulled into stable, it might have saved us a lot of
grief with the random patches last week. But again there isn't a set
rule that seems logical here. Adequate test coverage is the concern,
not some set time, especially for obvious fixes. I know for Fedora,
we do have (some) people testing rawhide daily, so things in Linus'
tree start getting end user testing usually within 24 hours.

I am not saying that things are great now, or cannot be improved. I am
just concerned that we come up with some "rule" that takes us back to
keeping legitimate fixes out of tree for much longer than necessary.

>>I am absolutely _not_ happy with the number of patches making it into
>>-stable releases recently. I am especially very concerned that the current
>>flurry of patches queued for -stable will destabilize pretty much all
>>stable releases, and pretty badly, for that matter. I am seriously
>>contemplating not to integrate the next few stable releases into ChromeOS
>>for that very reason. That would be a different discussion, though.

There is certainly concern here. If end users stop trusting stable
kernel updates, the next time there is a big security issue, they may
just ignore the fix until there is consensus that it is safe to update
.
>
> For AUTOSEL, I'd be happy to learn of issues you encounter and address
> them in my process.
>
> I've been submitting automatically selected patches for over a year now
> and the track record for regressions is on par with patches that are
> tagged for stable.

2018-05-03 16:56:39

by Al Viro

[permalink] [raw]
Subject: Re: [Ksummit-discuss] bug-introducing patches

On Thu, May 03, 2018 at 02:46:14PM +0000, Sasha Levin via Ksummit-discuss wrote:

> Fixes in -rc cycles:
> rc1 68
> rc2 147
> rc3 88
> rc4 121
> rc5 40
> rc6 193
> rc7 98
>
> Average days in -next, for a fix, per -rc cycle:
> rc1 27.25
> rc2 21.4286
> rc3 22.5114
> rc4 18.281
> rc5 14.65
> rc6 12.6166
> rc7 8.70408
>
> Fixes for bugs not introduced in current merge window:
> rc1 40
> rc2 113
> rc3 61
> rc4 79
> rc5 25
> rc6 139
> rc7 72
>
> So for some reason, there is a rush to push fixes for older bugs (that
> were not introduced in the current merge window) to the point that rc7
> commits that only existed for a few days are merged in to address older
> bugs.

I really wonder how accurate your interpretation of Fixes: is.
Consider e.g. the situation when an old bug is found and partial fixes
applied. Then we find that those fixes did not cover everything and,
come next cycle, add more on top of those. Where should Fixes: on
the incrementals point to? Original commit? But they won't apply
without the first batch. The last in the original pile? But it
would imply (by your metrics) that original fixes had *INTRODUCED*
bugs.

Moreover, what the hell do you suggest in situation when
* foofs_barf() is b0rken in quite a few ways. There's an
easily triggerable memory corruptor that can be fixed locally as well
as something else that needs a change of e.g. ->mkdir() calling
conventions to take care of. The change is mechanical and fairly
simple, but it's already -rc4.

Even though the whole thing is well-understood at that point,
we *can't* apply all 3 - it's too late in the cycle for the
calling conventions' change in the middle of the series.

Should we apply the first fix that cycle, or should it slide to the
next one? We can't apply #1 + #3 --- #3 won't even compile without
#2. We can't apply #3 without #1 --- they can be transposed, but
it's nowhere near a mechanical transformation. So WTF should #3
have in "Fixes:"?

2018-05-03 17:12:03

by Guenter Roeck

[permalink] [raw]
Subject: Re: [Ksummit-discuss] bug-introducing patches

On Thu, May 03, 2018 at 04:02:12PM +0000, Sasha Levin wrote:
> >You are misquoting me. I am saying that it would be a bad idea to hold up
> >bug fixes after -rc4, which is quite different to saying that patches
> >don't make it into stable releases fast enough. I am perfectly happy to
> >wait a week or so for a patch to soak in _mainline_ before being applied
> >to stable.
>
> Most bug fixes that go in at that point are fixes for previous released
> kernels, what's the harm in keeping them around for longer?
>

The ones I am mostly concerned about are fixes for CVEs, crashes, file
system corruptions, and similar. Maybe the enterprise folks don't mind
keeping those around for a month or more even though a fix is available.
I do.

> For AUTOSEL, I'd be happy to learn of issues you encounter and address
> them in my process.
>
> I've been submitting automatically selected patches for over a year now
> and the track record for regressions is on par with patches that are
> tagged for stable.

So far it hasn't been an issue. Or, rather, not much; with more patches
applied, the percentage of regressions may be the same, but the number
of regressions is higher. My "customers" care about the number, not about
the percentage.

However, the set of test results attached below (from last night)
_is_ a problem. I don't know what changed, but something clearly did,
to the point that I am _very_ concerned about the next set of stable
releases.

Guenter

---
For v4.14.39-580-gc8cd674:

Build results:
total: 146 pass: 98 fail: 48
Qemu test results:
total: 100 pass: 21 fail: 79

For v4.4.131-268-ga33ce4a:

Build results:
total: 146 pass: 92 fail: 54
Qemu test results:
total: 127 pass: 91 fail: 36

2018-05-03 17:18:31

by Randy Dunlap

[permalink] [raw]
Subject: [Ksummit-discuss] bug-introducing patches

On 05/03/2018 08:43 AM, Sasha Levin wrote:
> On Thu, May 03, 2018 at 08:27:48AM -0700, James Bottomley wrote:
>> On Thu, 2018-05-03 at 15:06 +0000, Sasha Levin via Ksummit-discuss
>> wrote:
>>> On Thu, May 03, 2018 at 04:48:50PM +0200, Willy Tarreau wrote:
>>>> On Thu, May 03, 2018 at 07:33:04AM -0700, James Bottomley wrote:
>>>>> They're definitely for bug fixes, but there's a spectrum: obvious
>>>>> bug fixes with no side effects are easy to justify.  More complex
>>>>> bug fixes run the risk of having side effects which introduce
>>>>> other bugs, so could potentially destabilize the -rc process.  In
>>>>> SCSI we tend to look at what the user visible effects of the bug
>>>>> are in the post -rc5 region and if they're slight or wouldn't be
>>>>> visible to most users, we'll hold them over.  If the fix looks
>>>>> complex and we're not sure we caught the ramifications, we often
>>>>> add it to the merge window tree with a cc to stable and a note
>>>>> saying to wait X weeks before actually adding to the
>>>>> stable tree just to make sure no side effects show up with wider
>>>>> testing.  So, as with most things, it's a judgment call for the
>>>>> maintainer.
>>>>
>>>> For me this is the right, and responsible way to deal with bug
>>>> fixes. Self-control is much more efficient than random rejection
>>>> and favors a good analysis.
>>>
>>> I think that the ideal outcome of this discussion, at least for me,
>>> is a tool to go under scripts/ that would allow maintainers to get
>>> some sort of (quantifiable) data that will indicate how well the
>>> patch was tested via the regular channels.
>>>
>>> At which point it's the maintainer's judgement call on whether he
>>> wants to grab the patch or wait for more tests or reviews.
>>>
>>> This is very similar to what James has described, it just needs to
>>> leave his brain and turn into code :)
>>
>> I appreciate the sentiment, but if we could script taste, we'd have
>> replaced Linus with something far less cantankerous a long time ago ...
>
> Linus, IMO, is getting replaced. Look at how many functions he used to
> do 10 years ago he's no longer responsible for.

Agree.

> One of the most obvious examples is -next, where most integration issues
> are resolved before they even reach to Linus.
>
> This is good for the community, as it allows us make the process better
> and scale out. It is also good for Linus, as I'm not sure how long he'd
> last if he still had to edit patches by hand too often. Instead, he gets
> to play with things that interest him more where his is irreplaceable.
>
>> It's also a sad fact that a lot of things which look like obvious fixes
>> actually turn out not to be so with later testing. This is why the
>> user visibility test is paramount. If a bug fix has no real user
>> visible effects, it's often better to defer it no matter how obvious it
>> looks, which is why the static code checkers often get short shrift
>> before a merge window.
>>
>> A script measuring user visibility would be nice, but looks a bit
>> complex ...
>
> It is, but I think it's worthwhile. Would something that'll show you
> things like:
>
> - How long a patch has been in -next?
> - How many replies/reviews/comments it got on a mailing list?
> - Did the 0day bot test it?
> - Did syzbot fuzz it? for how long?
> - If it references a bugzilla of some sort, how many
> comments/reviews/etc it got there?
> - Is it -stable material, or does it fix a regression in the current
> merge window?
> - If subsystem has custom testing rig, results from those tests
>
> be a step in the right way? is it something you'd use to make decisions
> on whether you'd take a patch in?
>

Reminds me (too much) of checkpatch. Sure checkpatch has its uses,
as long as its not seen as the only true voice. (some beginners don't
know about that yet)

So with this new script, human evaluation would still be needed.
It's just a tool. I could be used or misused or abused.
$maintainer still has a job to do, but having a tool could help.

But be careful what you wish for. Having such a tool could help get
patches merged even quicker.

--
~Randy

2018-05-03 17:31:16

by Sasha Levin

[permalink] [raw]
Subject: Re: [Ksummit-discuss] bug-introducing patches

On Thu, May 03, 2018 at 06:35:16PM +0200, Willy Tarreau wrote:
>On Thu, May 03, 2018 at 04:14:57PM +0000, Sasha Levin wrote:
>> I tried looking at a few commits that came in on -rc7, and I see quite a
>> few cases where a commit was merged to Linus' tree in about 24 hours
>> after it was authored. Or maintainers who just wrote it, pushed it in,
>> and shipped in to Linus.
>>
>> I've attached the data I used. The columns are as follows:
>>
>> 1. Commit ID
>> 2. When was it merged
>> 3. How many days it spent in -next
>> 4. What commit did it fix
>> 5. When was that commit merged
>
>> b6cdbc85234b v4.16-rc7 5 ca254490c8df v4.3
>> 82dd0d2a9a76 v4.16-rc7 5 8f58336d3f78 v4.2
>> 5807b22c9164 v4.16-rc7 5 6c8702c60b88 v4.9
>> f97c3dc3c0e8 v4.16-rc7 5 4c4dbb4a7363 v4.15
>(...)
>
>I like this (not what was done but the analysis).
>
>I'd argue that a small part of them there are very likely valid reasons
>(really obvious fix, security issue etc) but it seems there are quite a
>large number of them here.
>
>Now I understand what makes me uneasy with what I'm seeing here. As I
>mentioned, -rc is for people who want to see bugs before their users.
>-rc7 will ensure almost everyone discovers the fix at the same time,
>because the next version will be 4.16, the first of a stable release,
>the one that users are expected to trust.
>
>So probably that we have to educate/encourage developers *not* to submit
>fixes for old bugs that late in the cycle and to rather wait for the next
>version so that it cooks in -rc for a while before hitting users, knowing
>that these fixes will be backported to stable anyway once considered valid.
>
>Just like Greg has its "WTF" script to remind some developers that their
>patch is not suited to -stable, I think you could, based on your work,
>try to spot regressions introduced by late patches that fall in the
>category you've filtered and emit such WTF messages to the original
>patch's authors/committers.
>
>It's important to do it only when these patches cause breakage though,
>because we don't want to needlessly delay fixes when they're considered
>certain or well tested. Only when they cause trouble.

I tried pulling all the fixes that went in 4.17 (so far) for bugs that
were introduced as fixes in the v4.16 cycle, I got this list:

d65026c6c62e v4.16-rc7 5 6b1e6cc7855b v4.7 d14d2b78090c
63489f8e8211 v4.16-rc6 13 045c7a3f53d9 v4.11-rc6 5df63c2a149a
5dcd8400884c v4.16-rc6 6 0759e552bce7 v4.7 bd28899dd34f
0ef58b0a05c1 v4.16-rc6 6 0cf737808ae7 v4.14 a56d99d71466 7992894c305e 2afc5d61a719
8936ef7604c1 v4.16-rc6 6 6c8702c60b88 v4.9 a957fa190aa9
bbc09e7842a5 v4.16-rc6 6 65a206c01e8e v4.13 3239534a79ee
6a2cf8d3663e v4.16-rc5 12 d64d6c5671db v4.15 6d6340672ba3
859d880cf544 v4.16-rc4 14 b68a68d3dcc1 v4.15 8420f71943ae
e39a97353e53 v4.16-rc4 16 2a842acab109 v4.12 cbe095e2b584
a27fd7a8ed38 v4.16-rc4 19 f214f915e7db v4.13 bffd168c3fc5
0f9da844d877 v4.16-rc2 16 28128c61e08e v4.16-rc2 a95b37e20db9
7324f5399b06 v4.16-rc2 19 186b3c998c50 v4.14 51568d69407d
e78c637127ee v4.16-rc3 25 187d7967a5ee v4.4 e988867fd774
ca9eee95a2de v4.16-rc3 25 d717f7352ec6 v4.12 e988867fd774

So out of 755 commits, 14 have been fixed, that's about 2% and we're not
even done with 4.17.

>For me the rule seems simple to understand, every submitter should
>think like this late in the cycle :
>
> "you're sending a patch that is going to be part of a stable kernel
> in no more than 2 weeks, possibly affecting all users upgrading to
> that kernel if you did something wrong. Are you really certain you
> want this patch merged now, that it got sufficient testing and that
> it cannot wait for next -rc1 to get broader exposure first ?"
>
>I'm pretty sure that most of the time it will be "sure I want it now"
>and there will be no problem, which is fine as it automatically reduces
>the number of bugs in releases. Some may reconsider their submission.
>Some may get caught by your automated script if a later commit fixes
>an issue introduced by their patch. And there public shaming is the
>only option (or maybe only the second time if you really want to be
>nice).

I'd much prefer to blame this on maintainers. Authors should be able to
submit a patch whenever they feel like it, maintainers should only merge
a patch in when it's right.

2018-05-03 17:36:38

by Sasha Levin

[permalink] [raw]
Subject: Re: [Ksummit-discuss] bug-introducing patches

On Thu, May 03, 2018 at 05:54:46PM +0100, Al Viro wrote:
>On Thu, May 03, 2018 at 02:46:14PM +0000, Sasha Levin via Ksummit-discuss wrote:
>
>> Fixes in -rc cycles:
>> rc1 68
>> rc2 147
>> rc3 88
>> rc4 121
>> rc5 40
>> rc6 193
>> rc7 98
>>
>> Average days in -next, for a fix, per -rc cycle:
>> rc1 27.25
>> rc2 21.4286
>> rc3 22.5114
>> rc4 18.281
>> rc5 14.65
>> rc6 12.6166
>> rc7 8.70408
>>
>> Fixes for bugs not introduced in current merge window:
>> rc1 40
>> rc2 113
>> rc3 61
>> rc4 79
>> rc5 25
>> rc6 139
>> rc7 72
>>
>> So for some reason, there is a rush to push fixes for older bugs (that
>> were not introduced in the current merge window) to the point that rc7
>> commits that only existed for a few days are merged in to address older
>> bugs.
>
>I really wonder how accurate your interpretation of Fixes: is.
>Consider e.g. the situation when an old bug is found and partial fixes
>applied. Then we find that those fixes did not cover everything and,
>come next cycle, add more on top of those. Where should Fixes: on
>the incrementals point to? Original commit? But they won't apply
>without the first batch. The last in the original pile? But it
>would imply (by your metrics) that original fixes had *INTRODUCED*
>bugs.

It's vaguely close. Beyond the things you mentioned, some commits don't
have a fixes tag, some mention what commit they fixed in the body rather
than a tag, and so on.

This is just an approximation.

>Moreover, what the hell do you suggest in situation when
> * foofs_barf() is b0rken in quite a few ways. There's an
>easily triggerable memory corruptor that can be fixed locally as well
>as something else that needs a change of e.g. ->mkdir() calling
>conventions to take care of. The change is mechanical and fairly
>simple, but it's already -rc4.

I'm not advocating to forcefully block people from submitting patches
after -rc4 (that was Ted's suggesting).

I'm just saying that as a maintainer, you should use your brain and
figure out how critical the bug is, how good is the fix and how well was
it tested, and decide if you want to merge it in or not.

If it fixed the bug and didn't introduce a regression, great! If it
messed something else, you'd have some input on how to address it better
in the future.

I'm trying to come up with a tool/system to help maintainers with
this task because right now it's not working too well. I'm not trying to
introduce arbitrary rules to make your life miserable.

>Even though the whole thing is well-understood at that point,
>we *can't* apply all 3 - it's too late in the cycle for the
>calling conventions' change in the middle of the series.
>
>Should we apply the first fix that cycle, or should it slide to the
>next one? We can't apply #1 + #3 --- #3 won't even compile without
>#2. We can't apply #3 without #1 --- they can be transposed, but
>it's nowhere near a mechanical transformation. So WTF should #3
>have in "Fixes:"?

2018-05-03 17:40:03

by Sasha Levin

[permalink] [raw]
Subject: Re: [Ksummit-discuss] bug-introducing patches

On Thu, May 03, 2018 at 10:17:57AM -0700, Randy Dunlap wrote:
>On 05/03/2018 08:43 AM, Sasha Levin wrote:
>> On Thu, May 03, 2018 at 08:27:48AM -0700, James Bottomley wrote:
>>> On Thu, 2018-05-03 at 15:06 +0000, Sasha Levin via Ksummit-discuss
>>> wrote:
>>>> On Thu, May 03, 2018 at 04:48:50PM +0200, Willy Tarreau wrote:
>>>>> On Thu, May 03, 2018 at 07:33:04AM -0700, James Bottomley wrote:
>>>>>> They're definitely for bug fixes, but there's a spectrum: obvious
>>>>>> bug fixes with no side effects are easy to justify.??More complex
>>>>>> bug fixes run the risk of having side effects which introduce
>>>>>> other bugs, so could potentially destabilize the -rc process.??In
>>>>>> SCSI we tend to look at what the user visible effects of the bug
>>>>>> are in the post -rc5 region and if they're slight or wouldn't be
>>>>>> visible to most users, we'll hold them over.??If the fix looks
>>>>>> complex and we're not sure we caught the ramifications, we often
>>>>>> add it to the merge window tree with a cc to stable and a note
>>>>>> saying to wait X weeks before actually adding to the
>>>>>> stable tree just to make sure no side effects show up with wider
>>>>>> testing.??So, as with most things, it's a judgment call for the
>>>>>> maintainer.
>>>>>
>>>>> For me this is the right, and responsible way to deal with bug
>>>>> fixes. Self-control is much more efficient than random rejection
>>>>> and favors a good analysis.
>>>>
>>>> I think that the ideal outcome of this discussion, at least for me,
>>>> is a tool to go under scripts/ that would allow maintainers to get
>>>> some sort of (quantifiable) data that will indicate how well the
>>>> patch was tested via the regular channels.
>>>>
>>>> At which point it's the maintainer's judgement call on whether he
>>>> wants to grab the patch or wait for more tests or reviews.
>>>>
>>>> This is very similar to what James has described, it just needs to
>>>> leave his brain and turn into code :)
>>>
>>> I appreciate the sentiment, but if we could script taste, we'd have
>>> replaced Linus with something far less cantankerous a long time ago ...
>>
>> Linus, IMO, is getting replaced. Look at how many functions he used to
>> do 10 years ago he's no longer responsible for.
>
>Agree.
>
>> One of the most obvious examples is -next, where most integration issues
>> are resolved before they even reach to Linus.
>>
>> This is good for the community, as it allows us make the process better
>> and scale out. It is also good for Linus, as I'm not sure how long he'd
>> last if he still had to edit patches by hand too often. Instead, he gets
>> to play with things that interest him more where his is irreplaceable.
>>
>>> It's also a sad fact that a lot of things which look like obvious fixes
>>> actually turn out not to be so with later testing. This is why the
>>> user visibility test is paramount. If a bug fix has no real user
>>> visible effects, it's often better to defer it no matter how obvious it
>>> looks, which is why the static code checkers often get short shrift
>>> before a merge window.
>>>
>>> A script measuring user visibility would be nice, but looks a bit
>>> complex ...
>>
>> It is, but I think it's worthwhile. Would something that'll show you
>> things like:
>>
>> - How long a patch has been in -next?
>> - How many replies/reviews/comments it got on a mailing list?
>> - Did the 0day bot test it?
>> - Did syzbot fuzz it? for how long?
>> - If it references a bugzilla of some sort, how many
>> comments/reviews/etc it got there?
>> - Is it -stable material, or does it fix a regression in the current
>> merge window?
>> - If subsystem has custom testing rig, results from those tests
>>
>> be a step in the right way? is it something you'd use to make decisions
>> on whether you'd take a patch in?
>>
>
>Reminds me (too much) of checkpatch. Sure checkpatch has its uses,
>as long as its not seen as the only true voice. (some beginners don't
>know about that yet)
>
>So with this new script, human evaluation would still be needed.
>It's just a tool. I could be used or misused or abused.
>$maintainer still has a job to do, but having a tool could help.
>
>But be careful what you wish for. Having such a tool could help get
>patches merged even quicker.

While checkpatch is a tool for both authors and maintainers, I'm hoping
that this tool will only be useful for maintainers, who are less likely
to abuse it. I hope.

Maintainers are still needed. I started this discussion because right
now maintainers don't scale enough, and that in turn causes both delays
and mistakes in the process. We have a bunch of tools to help patch
authors, but not as many for maintainers.

To some extent, I do wish that this will help patches get merged
earlier. If a maintainer sees that the patch spent a while in -next,
passed all his subsystem's internal testing, got a few reviews, he could
just go ahead and merge it in faster without starting to dig through his
mail client and git tree.

2018-05-03 17:58:13

by Willy Tarreau

[permalink] [raw]
Subject: Re: [Ksummit-discuss] bug-introducing patches

On Thu, May 03, 2018 at 05:29:29PM +0000, Sasha Levin wrote:
> I tried pulling all the fixes that went in 4.17 (so far) for bugs that
> were introduced as fixes in the v4.16 cycle, I got this list:
>
> d65026c6c62e v4.16-rc7 5 6b1e6cc7855b v4.7 d14d2b78090c
> 63489f8e8211 v4.16-rc6 13 045c7a3f53d9 v4.11-rc6 5df63c2a149a
> 5dcd8400884c v4.16-rc6 6 0759e552bce7 v4.7 bd28899dd34f
> 0ef58b0a05c1 v4.16-rc6 6 0cf737808ae7 v4.14 a56d99d71466 7992894c305e 2afc5d61a719
> 8936ef7604c1 v4.16-rc6 6 6c8702c60b88 v4.9 a957fa190aa9
> bbc09e7842a5 v4.16-rc6 6 65a206c01e8e v4.13 3239534a79ee
> 6a2cf8d3663e v4.16-rc5 12 d64d6c5671db v4.15 6d6340672ba3
> 859d880cf544 v4.16-rc4 14 b68a68d3dcc1 v4.15 8420f71943ae
> e39a97353e53 v4.16-rc4 16 2a842acab109 v4.12 cbe095e2b584
> a27fd7a8ed38 v4.16-rc4 19 f214f915e7db v4.13 bffd168c3fc5
> 0f9da844d877 v4.16-rc2 16 28128c61e08e v4.16-rc2 a95b37e20db9
> 7324f5399b06 v4.16-rc2 19 186b3c998c50 v4.14 51568d69407d
> e78c637127ee v4.16-rc3 25 187d7967a5ee v4.4 e988867fd774
> ca9eee95a2de v4.16-rc3 25 d717f7352ec6 v4.12 e988867fd774
>
> So out of 755 commits, 14 have been fixed, that's about 2% and we're not
> even done with 4.17.

OK but this is low. Quite frankly, at 2% regressions, even if this is
never fun, it means 98% of the fixes were right. Now just delay them
longer and you'll have 500 commits instead of 755, thus 255 more bugs
unfixed in the release just to try to save 14 wrong ones. *this* is
the problem I'm concerned about by enforcing extra delays on everyone.
This is the reason why in my opinion the most important is to raise
awareness about this so people are more careful or more verbose (and
more detailed commit messages don't hurt, I think all stable maintainers
have many times thought "WTF is this supposed to fix?"), and then remind
everyone that when some get caught abusing, they'll get a public blame.

> >Some may get caught by your automated script if a later commit fixes
> >an issue introduced by their patch. And there public shaming is the
> >only option (or maybe only the second time if you really want to be
> >nice).
>
> I'd much prefer to blame this on maintainers. Authors should be able to
> submit a patch whenever they feel like it, maintainers should only merge
> a patch in when it's right.

Sorry, wrong word on my side, I also meant maintainers (I very much favor
pushing back to ensure everyone in the chain is responsible for what is
done).

Willy

2018-05-03 18:10:51

by James Bottomley

[permalink] [raw]
Subject: Re: [Ksummit-discuss] bug-introducing patches

On Thu, 2018-05-03 at 15:43 +0000, Sasha Levin via Ksummit-discuss
wrote:
> On Thu, May 03, 2018 at 08:27:48AM -0700, James Bottomley wrote:
[...]
> > It's also a sad fact that a lot of things which look like obvious
> > fixes actually turn out not to be so with later testing.  This is
> > why the user visibility test is paramount.  If a bug fix has no
> > real user visible effects, it's often better to defer it no matter
> > how obvious it looks, which is why the static code checkers often
> > get short shrift before a merge window.
> >
> > A script measuring user visibility would be nice, but looks a bit
> > complex ...
>
> It is, but I think it's worthwhile. Would something that'll show you
> things like:
>
>  - How long a patch has been in -next?
>  - How many replies/reviews/comments it got on a mailing list?
>  - Did the 0day bot test it?
>  - Did syzbot fuzz it? for how long?
>  - If it references a bugzilla of some sort, how many
>    comments/reviews/etc it got there?
>  - Is it -stable material, or does it fix a regression in the current
>    merge window?
>  - If subsystem has custom testing rig, results from those tests
>
> be a step in the right way? is it something you'd use to make
> decisions on whether you'd take a patch in?

Actually, no, these are all not what I'm talking about: They're all
measures of whether the commit triggers another bug. Which, I agree,
is the fear, so it would be good to have them of course, but they all
take time the maintainer doesn't have when making a quick decision
about a late -rc bug fix.

At late -rc the decision is the current user visible problem set
against the risk of -rc destabilization. You're measuring the latter
in the above, but in the rule of thumb decision making we just assume
that's constant. What we're looking to measure is the user visible
effect of not fixing the problem.

So, for instance, a boot failure on a widely used SCSI board is a no
brainer for fix now and tackle consequences later. An obvious fix to
an error leg of a little used board is the other way: no-one is really
affected, so we don't take the risk. The judgment call is the spectrum
in between these two extremes.

James


2018-05-03 18:13:22

by Sasha Levin

[permalink] [raw]
Subject: Re: [Ksummit-discuss] bug-introducing patches

On Thu, May 03, 2018 at 07:57:23PM +0200, Willy Tarreau wrote:
>On Thu, May 03, 2018 at 05:29:29PM +0000, Sasha Levin wrote:
>> I tried pulling all the fixes that went in 4.17 (so far) for bugs that
>> were introduced as fixes in the v4.16 cycle, I got this list:
>>
>> d65026c6c62e v4.16-rc7 5 6b1e6cc7855b v4.7 d14d2b78090c
>> 63489f8e8211 v4.16-rc6 13 045c7a3f53d9 v4.11-rc6 5df63c2a149a
>> 5dcd8400884c v4.16-rc6 6 0759e552bce7 v4.7 bd28899dd34f
>> 0ef58b0a05c1 v4.16-rc6 6 0cf737808ae7 v4.14 a56d99d71466 7992894c305e 2afc5d61a719
>> 8936ef7604c1 v4.16-rc6 6 6c8702c60b88 v4.9 a957fa190aa9
>> bbc09e7842a5 v4.16-rc6 6 65a206c01e8e v4.13 3239534a79ee
>> 6a2cf8d3663e v4.16-rc5 12 d64d6c5671db v4.15 6d6340672ba3
>> 859d880cf544 v4.16-rc4 14 b68a68d3dcc1 v4.15 8420f71943ae
>> e39a97353e53 v4.16-rc4 16 2a842acab109 v4.12 cbe095e2b584
>> a27fd7a8ed38 v4.16-rc4 19 f214f915e7db v4.13 bffd168c3fc5
>> 0f9da844d877 v4.16-rc2 16 28128c61e08e v4.16-rc2 a95b37e20db9
>> 7324f5399b06 v4.16-rc2 19 186b3c998c50 v4.14 51568d69407d
>> e78c637127ee v4.16-rc3 25 187d7967a5ee v4.4 e988867fd774
>> ca9eee95a2de v4.16-rc3 25 d717f7352ec6 v4.12 e988867fd774
>>
>> So out of 755 commits, 14 have been fixed, that's about 2% and we're not
>> even done with 4.17.
>
>OK but this is low. Quite frankly, at 2% regressions, even if this is
>never fun, it means 98% of the fixes were right. Now just delay them
>longer and you'll have 500 commits instead of 755, thus 255 more bugs
>unfixed in the release just to try to save 14 wrong ones. *this* is
>the problem I'm concerned about by enforcing extra delays on everyone.
>This is the reason why in my opinion the most important is to raise
>awareness about this so people are more careful or more verbose (and
>more detailed commit messages don't hurt, I think all stable maintainers
>have many times thought "WTF is this supposed to fix?"), and then remind
>everyone that when some get caught abusing, they'll get a public blame.

This is low because we're only about a month in 4.17. Historical
figures are around 7% for these kinds of commits.

I'm also not trying to argue whether 7% is high or low, only that it's 3
times as many bugs per line of code than what we get from the merge
window.

Isn't the merge window supposed to be the "risky" part?

I understand your concern about delays, I'm not arguing for or against
it, I'm just trying to discuss option.

For example, how about extending the release cycle until the amount of
fixes for regressions introduced in the current merge window drops under
a certain thershold? (so go to -rc20 if we need to).

It has the advantage of less bugs when the kernel is released, it
doesn't stop bug fixes from coming in and it prevents the urge some
folks have to push things in last minute. OTOH, it makes the release
cycle unpredictable time-wise.

2018-05-03 18:21:16

by Al Viro

[permalink] [raw]
Subject: Re: [Ksummit-discuss] bug-introducing patches

On Thu, May 03, 2018 at 05:34:24PM +0000, Sasha Levin wrote:

> >Moreover, what the hell do you suggest in situation when
> > * foofs_barf() is b0rken in quite a few ways. There's an
> >easily triggerable memory corruptor that can be fixed locally as well
> >as something else that needs a change of e.g. ->mkdir() calling
> >conventions to take care of. The change is mechanical and fairly
> >simple, but it's already -rc4.
>
> I'm not advocating to forcefully block people from submitting patches
> after -rc4 (that was Ted's suggesting).

I am, though - change of a method signature when we have several dozens
of instances does *not* belong in -rc5; if nothing else, it guarantees
a nightmare pile of conflicts with individual filesystem trees.

> I'm just saying that as a maintainer, you should use your brain and
> figure out how critical the bug is, how good is the fix and how well was
> it tested, and decide if you want to merge it in or not.
>
> If it fixed the bug and didn't introduce a regression, great! If it
> messed something else, you'd have some input on how to address it better
> in the future.
>
> I'm trying to come up with a tool/system to help maintainers with
> this task because right now it's not working too well. I'm not trying to
> introduce arbitrary rules to make your life miserable.

And I am asking you what kind of rules do you want/expect/would prefer
for Fixes: pseudo-header. *I* do not give a flying fuck for its
contents; I can put it in, if there is a good reason, though. And
the obvious consumers of that thing are -stable maintainers. Including
yourself. Which is why I am asking you what should go in there in
situation described above. And no, that's not a rhetorical question;
I really want to know.

Let me describe it again:
* a bunch of holes is found in a function; all of them go back
several years
* a clean fix for the whole pile is a composition of
1) local fix of trivially triggered memory corruptor
2) tree-wide mechanical change of method signature + matching modifications
of callers of that method (say, all five of them).
3) further changes in the function in question and its caller (which happens
to be an instance of the method modified by (2).
* dependencies between parts: (1) is standalone, (3) has a hard
dependency on (2), (1) can be reordered past (2)+(modified 3), but modifications
needed in (1) and (3) are not trivial.
* the crap fixed by (1) is much more severe than that fixed by (3)
(and (2) is an equivalent transformation which does not affect behaviour of
anything).
* too late in the cycle for tree-wide patches like (2).

As far as I'm concerned (and if it makes -stable folks' lives unpleasant,
too fucking bad) the merge order is: (1) as soon as it's sufficiently
reviewed and tested, (2) and (3) - next merge window. The only question
is what kind of metadata should go into commit messages to minimize the
PITA for -stable folks, *given* *that* *merge* *timing*.

2018-05-03 18:46:34

by Guenter Roeck

[permalink] [raw]
Subject: Re: [Ksummit-discuss] bug-introducing patches

On Thu, May 03, 2018 at 06:12:36PM +0000, Sasha Levin via Ksummit-discuss wrote:
>
> For example, how about extending the release cycle until the amount of
> fixes for regressions introduced in the current merge window drops under
> a certain thershold? (so go to -rc20 if we need to).
>
Reminds me of what happened at a previous employer. We had a hard rule that
a product shall not be released unless it has no more than 2 severe or
critical bugs. Solution: Stop testing 2-3 weeks ahead of the scheduled
release day.

Here: We don't want to see -rc20. Solution: Stop applying bug fixes
at -rcX.

Guenter

2018-05-03 18:57:19

by Greg Kroah-Hartman

[permalink] [raw]
Subject: Re: [Ksummit-discuss] bug-introducing patches

On Thu, May 03, 2018 at 07:20:39PM +0100, Al Viro wrote:
> On Thu, May 03, 2018 at 05:34:24PM +0000, Sasha Levin wrote:
>
> > >Moreover, what the hell do you suggest in situation when
> > > * foofs_barf() is b0rken in quite a few ways. There's an
> > >easily triggerable memory corruptor that can be fixed locally as well
> > >as something else that needs a change of e.g. ->mkdir() calling
> > >conventions to take care of. The change is mechanical and fairly
> > >simple, but it's already -rc4.
> >
> > I'm not advocating to forcefully block people from submitting patches
> > after -rc4 (that was Ted's suggesting).
>
> I am, though - change of a method signature when we have several dozens
> of instances does *not* belong in -rc5; if nothing else, it guarantees
> a nightmare pile of conflicts with individual filesystem trees.
>
> > I'm just saying that as a maintainer, you should use your brain and
> > figure out how critical the bug is, how good is the fix and how well was
> > it tested, and decide if you want to merge it in or not.
> >
> > If it fixed the bug and didn't introduce a regression, great! If it
> > messed something else, you'd have some input on how to address it better
> > in the future.
> >
> > I'm trying to come up with a tool/system to help maintainers with
> > this task because right now it's not working too well. I'm not trying to
> > introduce arbitrary rules to make your life miserable.
>
> And I am asking you what kind of rules do you want/expect/would prefer
> for Fixes: pseudo-header. *I* do not give a flying fuck for its
> contents; I can put it in, if there is a good reason, though. And
> the obvious consumers of that thing are -stable maintainers. Including
> yourself. Which is why I am asking you what should go in there in
> situation described above. And no, that's not a rhetorical question;
> I really want to know.
>
> Let me describe it again:
> * a bunch of holes is found in a function; all of them go back
> several years
> * a clean fix for the whole pile is a composition of
> 1) local fix of trivially triggered memory corruptor
> 2) tree-wide mechanical change of method signature + matching modifications
> of callers of that method (say, all five of them).
> 3) further changes in the function in question and its caller (which happens
> to be an instance of the method modified by (2).
> * dependencies between parts: (1) is standalone, (3) has a hard
> dependency on (2), (1) can be reordered past (2)+(modified 3), but modifications
> needed in (1) and (3) are not trivial.
> * the crap fixed by (1) is much more severe than that fixed by (3)
> (and (2) is an equivalent transformation which does not affect behaviour of
> anything).
> * too late in the cycle for tree-wide patches like (2).
>
> As far as I'm concerned (and if it makes -stable folks' lives unpleasant,
> too fucking bad)

Don't care about me for stuff like this. Fix it correctly and I'll
worry about any dependancy issues later :)

greg k-h

2018-05-03 19:00:06

by Theodore Ts'o

[permalink] [raw]
Subject: Re: [Ksummit-discuss] bug-introducing patches

On Thu, May 03, 2018 at 02:08:51PM +0300, Jani Nikula wrote:
> On Tue, 01 May 2018, "Theodore Y. Ts'o" <[email protected]> wrote:
> > Post -rc3 or -rc4, in my opinion bug fixes should wait until the next
> > merge window before they get merged at all.
>
> What are -rc5 and later for then if not bug fixes? Baffled.

Regression fixes?

Note that if people stopped introducing regressions to the kernel, we
might actually be able to release the final version after -rc6 or even
earlier.

There's nothing which says that we MUST have -rc7, -rc8, -rc9
releases. If we were actually disciplined in our testing and in what
we push into Linus's tree, we might actually be able to go to a
two-month release cycle, or perhaps even slightly shorter.

But if people insist on trying to fix bugs, and those bugs fixes
introduce regressions, then we end up with a longer release cycle,
which causes people to want to stuck more bug fixes, or worse, even
some feature commits late into the development cycle, and it becomes a
vicious cycle with releases taking longer and longer.

- Ted

2018-05-03 19:05:42

by Willy Tarreau

[permalink] [raw]
Subject: Re: [Ksummit-discuss] bug-introducing patches

On Thu, May 03, 2018 at 06:12:36PM +0000, Sasha Levin wrote:
> I'm also not trying to argue whether 7% is high or low, only that it's 3
> times as many bugs per line of code than what we get from the merge
> window.

Yes but seen differently that's 14 times less bugs than the ones properly
fixed by applying the process which produces these 7%. We can discuss
about ways to improve it, but please consider that it must not reduce
the number of correct fixes represented by the 93% remaining ones.

> Isn't the merge window supposed to be the "risky" part?

"risky" might not be the correct term. Each single line of code comes
with a risk. I'd say the "most risky" part. As I said, what I agree with
the fact that early fixes just before a release have more chances of
affecting users, which in my opinion is the real problem. Education can
help here.

> For example, how about extending the release cycle until the amount of
> fixes for regressions introduced in the current merge window drops under
> a certain thershold? (so go to -rc20 if we need to).

Never works. And Linus already explained it : you cannot stop the development
process. While you're waiting, development continues, and the next merge
window gets twice the number of commits, which causes more than twice the
amount of problems. I've also experienced it in haproxy many years ago. I
made the mistake of saying "I'm finishing this, only 6 months, and I release
1.5". Result: bugs coming in parallel to development stalling progress
forever and it took 4.5 years to release it, or 9 times the expected amount
of time. Now we release approximately on time, missing features go in the
next release, easily testable fixes are merged, complex ones are postponed
for the stable releases with a note in the announce saying "don't play with
this yet, it's broken". We do ship with bugs, we're open about it and we
address them later. Overall this transparency is much appreciated. And we
also do regressions by the way.

Maybe in the end the only thing we're missing is a "known bugs" section in
release announcements, so that those with pending fixes are encouraged to
send a line or two to Linus for inclusion there, having more time to work
on their fixes.

Willy

2018-05-03 19:06:43

by Sasha Levin

[permalink] [raw]
Subject: Re: [Ksummit-discuss] bug-introducing patches

On Thu, May 03, 2018 at 07:20:39PM +0100, Al Viro wrote:
>On Thu, May 03, 2018 at 05:34:24PM +0000, Sasha Levin wrote:
>
>> >Moreover, what the hell do you suggest in situation when
>> > * foofs_barf() is b0rken in quite a few ways. There's an
>> >easily triggerable memory corruptor that can be fixed locally as well
>> >as something else that needs a change of e.g. ->mkdir() calling
>> >conventions to take care of. The change is mechanical and fairly
>> >simple, but it's already -rc4.
>>
>> I'm not advocating to forcefully block people from submitting patches
>> after -rc4 (that was Ted's suggesting).
>
>I am, though - change of a method signature when we have several dozens
>of instances does *not* belong in -rc5; if nothing else, it guarantees
>a nightmare pile of conflicts with individual filesystem trees.
>
>> I'm just saying that as a maintainer, you should use your brain and
>> figure out how critical the bug is, how good is the fix and how well was
>> it tested, and decide if you want to merge it in or not.
>>
>> If it fixed the bug and didn't introduce a regression, great! If it
>> messed something else, you'd have some input on how to address it better
>> in the future.
>>
>> I'm trying to come up with a tool/system to help maintainers with
>> this task because right now it's not working too well. I'm not trying to
>> introduce arbitrary rules to make your life miserable.
>
>And I am asking you what kind of rules do you want/expect/would prefer
>for Fixes: pseudo-header. *I* do not give a flying fuck for its
>contents; I can put it in, if there is a good reason, though. And
>the obvious consumers of that thing are -stable maintainers. Including
>yourself. Which is why I am asking you what should go in there in
>situation described above. And no, that's not a rhetorical question;
>I really want to know.
>
>Let me describe it again:
> * a bunch of holes is found in a function; all of them go back
>several years
> * a clean fix for the whole pile is a composition of
>1) local fix of trivially triggered memory corruptor
>2) tree-wide mechanical change of method signature + matching modifications
>of callers of that method (say, all five of them).
>3) further changes in the function in question and its caller (which happens
>to be an instance of the method modified by (2).
> * dependencies between parts: (1) is standalone, (3) has a hard
>dependency on (2), (1) can be reordered past (2)+(modified 3), but modifications
>needed in (1) and (3) are not trivial.
> * the crap fixed by (1) is much more severe than that fixed by (3)
>(and (2) is an equivalent transformation which does not affect behaviour of
>anything).
> * too late in the cycle for tree-wide patches like (2).
>
>As far as I'm concerned (and if it makes -stable folks' lives unpleasant,
>too fucking bad) the merge order is: (1) as soon as it's sufficiently
>reviewed and tested, (2) and (3) - next merge window. The only question
>is what kind of metadata should go into commit messages to minimize the
>PITA for -stable folks, *given* *that* *merge* *timing*.

Right, -stable shouldn't affect how/when you merge your patches in.

Also, in scenarios such as this we have enough tools in place to figure
out the dependency chain, and we could address it when backporting to
stable.

I don't want to create additional work that you wouldn't do anyways.
Keep in mind that the Fixes: tag will also be useful for yourself later
on if you have to go back to this patch a few years later.

As to the question you actually asked, assuming patch (3) is important
for stable as well (you didn't state it explicitly), the correct
tagging would be:

For patch (1):
Fixes: AAA ("Commit from several years ago that introduced the hole")
Cc: [email protected]

For patch (2):
No tags

For patch (3):
Fixes: AAA ("Commit from several years ago that introduced the hole")
Cc: [email protected] # commit-id-of-(2)



2018-05-03 19:14:39

by Willy Tarreau

[permalink] [raw]
Subject: Re: [Ksummit-discuss] bug-introducing patches

On Thu, May 03, 2018 at 11:55:29AM -0700, Greg KH wrote:
> Don't care about me for stuff like this. Fix it correctly and I'll
> worry about any dependancy issues later :)

For me the real value of the Fixes header is to let the person doing
the backport know if they must search when the patch looks irrelevant
at first glance. On old kernels more than half of the patches don't
apply and sometimes you really do not know if the code moved somewhere
else or if it was not there. Fixes gives some clues what to look for
and sometimes about an exact commit to get more details. I've been
quite happy with those mentioning only "3.2+" in it just like those
copy-pasting the commit ID is pretty fine as well. The former making it
easier to skip a useless patch, the latter providing more information.
But in my opinion these must not add burden on the committer. Even some
vague information like "4.4, maybe 4.2" or "oldest one having subsystem
foo" is extremely helpful there.

The dependency chain however matters less because once you start fighting
with a small patch set for 1 hour you can spend an extra minute testing
several combinations or figuring the dependencies in mainline.

Willy

2018-05-03 19:18:44

by Sasha Levin

[permalink] [raw]
Subject: Re: [Ksummit-discuss] bug-introducing patches

On Thu, May 03, 2018 at 09:14:05PM +0200, Willy Tarreau wrote:
>The dependency chain however matters less because once you start fighting
>with a small patch set for 1 hour you can spend an extra minute testing
>several combinations or figuring the dependencies in mainline.

https://git.kernel.org/pub/scm/linux/kernel/git/sashal/stable-tools.git/tree/stable-deps

:)

2018-05-03 22:43:15

by Mark Brown

[permalink] [raw]
Subject: Re: [Ksummit-discuss] bug-introducing patches

On Wed, May 02, 2018 at 08:52:29PM -0700, Guenter Roeck wrote:

> As for -next, me and others stopped reporting bugs in it, because when we do
> we tend to get flamed for the "noise". Is anyone aware (or cares) that mips
> and nds32 images don't build ? Soaking clothes in an empty bathtub won't make
> them wet, and bugs in code which no one builds, much less tests or uses, won't
> be found.

You've been flamed for testing -next? That's not been my experience and
frankly it's pretty horrifying that it's happening. Testing is pretty
much the whole point of -next existing in the first place so you have to
wonder why people are putting their trees there if they don't want
testing. I have seen a few issues with people reporting bugs on old
versions of -next but otherwise...


Attachments:
(No filename) (797.00 B)
signature.asc (499.00 B)
Download all attachments

2018-05-03 23:09:46

by Tony Lindgren

[permalink] [raw]
Subject: Re: [Ksummit-discuss] bug-introducing patches

* Mark Brown <[email protected]> [180503 22:44]:
> On Wed, May 02, 2018 at 08:52:29PM -0700, Guenter Roeck wrote:
>
> > As for -next, me and others stopped reporting bugs in it, because when we do
> > we tend to get flamed for the "noise". Is anyone aware (or cares) that mips
> > and nds32 images don't build ? Soaking clothes in an empty bathtub won't make
> > them wet, and bugs in code which no one builds, much less tests or uses, won't
> > be found.
>
> You've been flamed for testing -next? That's not been my experience and
> frankly it's pretty horrifying that it's happening. Testing is pretty
> much the whole point of -next existing in the first place so you have to
> wonder why people are putting their trees there if they don't want
> testing. I have seen a few issues with people reporting bugs on old
> versions of -next but otherwise...

Yes I agree testing Linux next is very important. That's the best way for
maintainers to ensure a usable -rc1 after a merge window. And then for
the -rc cycle, there not much of need for chasing bugs to get things working.

Bugs reported for Linux next often seem to get fixed or reverted faster
compared to the -rc cycle too. I think that's because people realize that
their code will not get merged until it's been fixed.

So some daily testing of Linux next can save a lot scrambling after the
merge window :)

Users don't usually upgrade kernels until after later -rc releases or only
after major releases so that probably explains some of the -rc cycle fixes.

Regards,

Tony


2018-05-04 09:58:45

by David Howells

[permalink] [raw]
Subject: Re: [Ksummit-discuss] bug-introducing patches

Sasha Levin via Ksummit-discuss wrote:

> Cc: [email protected] # commit-id-of-(2)

Can you please not do this? This screws up email address parsing in some
tools.

David

2018-05-04 12:29:16

by Jani Nikula

[permalink] [raw]
Subject: Re: [Ksummit-discuss] bug-introducing patches

On Fri, 04 May 2018, David Howells <[email protected]> wrote:
> Sasha Levin via Ksummit-discuss wrote:
>
>> Cc: [email protected] # commit-id-of-(2)
>
> Can you please not do this? This screws up email address parsing in some
> tools.

This has been documented since

commit 8e9b9362266dd16255473c080d846b13e27247bf
Author: Sebastian Andrzej Siewior <[email protected]>
Date: Sun Dec 6 12:24:31 2009 +0100

Doc/stable rules: add new cherry-pick logic

in v2.6.33 so seems like there should have been enough time to fix the
tools.


BR,
Jani.

--
Jani Nikula, Intel Open Source Technology Center

2018-05-04 13:10:18

by Theodore Ts'o

[permalink] [raw]
Subject: Re: [Ksummit-discuss] bug-introducing patches

On Fri, May 04, 2018 at 03:31:17PM +0300, Jani Nikula wrote:
> On Fri, 04 May 2018, David Howells <[email protected]> wrote:
> > Sasha Levin via Ksummit-discuss wrote:
> >
> >> Cc: [email protected] # commit-id-of-(2)
>
> This has been documented since
>
> commit 8e9b9362266dd16255473c080d846b13e27247bf
> Author: Sebastian Andrzej Siewior <[email protected]>
> Date: Sun Dec 6 12:24:31 2009 +0100
>
> Doc/stable rules: add new cherry-pick logic
>
> in v2.6.33 so seems like there should have been enough time to fix the
> tools.

The problem is that it's not being *used* that way. In fact, that
documentation is arguably out of date. When it does get used, it's
used to indicate which kernels the stable patch applies. You have to
go pretty far back before you find that suggested usage. Run:

git log --grep [email protected] | grep -i cc: | grep stable | grep \#

and see for yourself. The first couple of hits:

Cc: [email protected] # 3.11
Cc: [email protected] # 4.8+
Cc: [email protected] # 4.8+
Cc: [email protected] # 4.13+
Cc: [email protected] # 4.8+
Cc: [email protected] # 4.13 - together with 890da9cf0983
Cc: [email protected] # 4.13
Cc: [email protected] # 4.13
Cc: [email protected] # v4.8+
Cc: [email protected] # v4.10+
Cc: [email protected] # v4.10+
Cc: [email protected] # v4.10+
Cc: [email protected] # reverted commits were marked for stable
Cc: [email protected] # for the backport of the original commit
Cc: [email protected] # v4.8+

At this point, my suggestion would be to delete the text added by the
above-mentioned commit, and add a new syntax. We're much more willing
to support multiple headers, so something like this:

Stable-prereq: DEADBEEF1234: subsystem: bork bork bork....

With multiple Stable-preeq: lines allowed, where the order is
significant, might be one way to do things.

- Ted

2018-05-04 14:21:50

by Ulf Hansson

[permalink] [raw]
Subject: Re: [Ksummit-discuss] bug-introducing patches

Tony, Sasha, Mark

On 4 May 2018 at 01:09, Tony Lindgren <[email protected]> wrote:
> * Mark Brown <[email protected]> [180503 22:44]:
>> On Wed, May 02, 2018 at 08:52:29PM -0700, Guenter Roeck wrote:
>>
>> > As for -next, me and others stopped reporting bugs in it, because when we do
>> > we tend to get flamed for the "noise". Is anyone aware (or cares) that mips
>> > and nds32 images don't build ? Soaking clothes in an empty bathtub won't make
>> > them wet, and bugs in code which no one builds, much less tests or uses, won't
>> > be found.
>>
>> You've been flamed for testing -next? That's not been my experience and
>> frankly it's pretty horrifying that it's happening. Testing is pretty
>> much the whole point of -next existing in the first place so you have to
>> wonder why people are putting their trees there if they don't want
>> testing. I have seen a few issues with people reporting bugs on old
>> versions of -next but otherwise...
>
> Yes I agree testing Linux next is very important. That's the best way for
> maintainers to ensure a usable -rc1 after a merge window. And then for
> the -rc cycle, there not much of need for chasing bugs to get things working.
>
> Bugs reported for Linux next often seem to get fixed or reverted faster
> compared to the -rc cycle too. I think that's because people realize that
> their code will not get merged until it's been fixed.
>
> So some daily testing of Linux next can save a lot scrambling after the
> merge window :)
>
> Users don't usually upgrade kernels until after later -rc releases or only
> after major releases so that probably explains some of the -rc cycle fixes.

I fully agree with above, linux-next works very nicely as a
pre-integration tree, however for *both* fixes and new material.

I guess the concern here is about getting more confidence about pure
fixes, before they hit Linus' tree or any other stable tree. Although
without having to introduce delays or additional work for maintainers
etc.

Then, why don't we have a pre-integration tree for fixes? That would
at least simply automated testing of fixes separately from new
material.

Perhaps this has already been discussed, and concluded and it's not
worth it, then apologize for my ignorance.

Kind regards
Uffe

2018-05-04 17:42:08

by Greg Kroah-Hartman

[permalink] [raw]
Subject: Re: [Ksummit-discuss] bug-introducing patches

On Fri, May 04, 2018 at 09:09:32AM -0400, Theodore Y. Ts'o wrote:
> On Fri, May 04, 2018 at 03:31:17PM +0300, Jani Nikula wrote:
> > On Fri, 04 May 2018, David Howells <[email protected]> wrote:
> > > Sasha Levin via Ksummit-discuss wrote:
> > >
> > >> Cc: [email protected] # commit-id-of-(2)
> >
> > This has been documented since
> >
> > commit 8e9b9362266dd16255473c080d846b13e27247bf
> > Author: Sebastian Andrzej Siewior <[email protected]>
> > Date: Sun Dec 6 12:24:31 2009 +0100
> >
> > Doc/stable rules: add new cherry-pick logic
> >
> > in v2.6.33 so seems like there should have been enough time to fix the
> > tools.
>
> The problem is that it's not being *used* that way. In fact, that
> documentation is arguably out of date. When it does get used, it's
> used to indicate which kernels the stable patch applies. You have to
> go pretty far back before you find that suggested usage. Run:
>
> git log --grep [email protected] | grep -i cc: | grep stable | grep \#
>
> and see for yourself. The first couple of hits:
>
> Cc: [email protected] # 3.11
> Cc: [email protected] # 4.8+
> Cc: [email protected] # 4.8+
> Cc: [email protected] # 4.13+
> Cc: [email protected] # 4.8+
> Cc: [email protected] # 4.13 - together with 890da9cf0983
> Cc: [email protected] # 4.13
> Cc: [email protected] # 4.13
> Cc: [email protected] # v4.8+
> Cc: [email protected] # v4.10+
> Cc: [email protected] # v4.10+
> Cc: [email protected] # v4.10+
> Cc: [email protected] # reverted commits were marked for stable
> Cc: [email protected] # for the backport of the original commit
> Cc: [email protected] # v4.8+
>
> At this point, my suggestion would be to delete the text added by the
> above-mentioned commit, and add a new syntax. We're much more willing
> to support multiple headers, so something like this:
>
> Stable-prereq: DEADBEEF1234: subsystem: bork bork bork....
>
> With multiple Stable-preeq: lines allowed, where the order is
> significant, might be one way to do things.

Ugh, what? I don't understand what you are proposing here, what we have
today is just fine, what is broken with it?

thanks,

greg k-h

2018-05-04 21:14:29

by Theodore Ts'o

[permalink] [raw]
Subject: Re: [Ksummit-discuss] bug-introducing patches

On Fri, May 04, 2018 at 10:40:55AM -0700, Greg KH wrote:
> Ugh, what? I don't understand what you are proposing here, what we have
> today is just fine, what is broken with it?

What we have today is this:

Cc: [email protected] # 3.11
Cc: [email protected] # 4.8+
Cc: [email protected] # 4.8+

Jani was suggesting something documented which doesn't match current
practice. See commit 8e9b9362266d, which describes something like this:

Cc: <[email protected]> # .32.x: a1f84a3: sched: Check for idle
Cc: <[email protected]> # .32.x: 1b9508f: sched: Rate-limit newidle
Cc: <[email protected]> # .32.x: fd21073: sched: Fix affinity logic

... to specify prereqisite commits needed to backport the commit in
question. I am proposing that we delete what is in
stable_kernel_rules.rst, because it doesn't match with current
practice.

If it is necessary to explicitly specify prerequisites (as opposed to
having scripts or stable maintainers guess or figure things out
manually), then something like this might be better:

Stable-prereq: DEADBEEF1234: subsystem: bork bork bork....

If it's not necessary, fine. But we should still delete what is
currently documented in stable_kernel_rules and was introduced in
8e9b9362266d, because it doesn't describe current practice.

- Ted

2018-05-04 21:38:37

by James Bottomley

[permalink] [raw]
Subject: Re: [Ksummit-discuss] bug-introducing patches

On Fri, 2018-05-04 at 17:13 -0400, Theodore Y. Ts'o wrote:
> If it's not necessary, fine.  But we should still delete what is
> currently documented in stable_kernel_rules and was introduced in
> 8e9b9362266d, because it doesn't describe current practice.

It definitely doesn't seem to describe current practice. It looks like
it got applied because the commit description bears a somewhat strange
relation to the actual text that was added: The commit talks about the
original script that used to forward to stable (although it got me and
hpa confused) which seems to refer to a tiny deletion and the rest is
adding an Ingo one off proposal for dependencies.

For the record: Greg runs his own script now and I'm not involved.

Current process (at least from the SCSI centric view) is that if we
screw up and forward a commit with missing dependencies to stable via a
cc: tag, it won't apply and Greg tells us to fix it, which we do. That
seems to be an adequately functional process for the odd times we run
into this.

James


2018-05-04 21:53:06

by Sasha Levin

[permalink] [raw]
Subject: Re: [Ksummit-discuss] bug-introducing patches

On Fri, May 04, 2018 at 02:38:01PM -0700, James Bottomley wrote:
>On Fri, 2018-05-04 at 17:13 -0400, Theodore Y. Ts'o wrote:
>> If it's not necessary, fine.??But we should still delete what is
>> currently documented in stable_kernel_rules and was introduced in
>> 8e9b9362266d, because it doesn't describe current practice.
>
>It definitely doesn't seem to describe current practice. It looks like
>it got applied because the commit description bears a somewhat strange
>relation to the actual text that was added: The commit talks about the
> original script that used to forward to stable (although it got me and
>hpa confused) which seems to refer to a tiny deletion and the rest is
>adding an Ingo one off proposal for dependencies.

The usage for something like this is only if a commit that we didn't
previously think would go to stable now has to because it's a dependency
of a new -stable commit, so the expected usage will be pretty small
anyways.

I don't have an objection to moving this to it's own tag. It will make
my scripts somewhat simpler for sure.

>For the record: Greg runs his own script now and I'm not involved.
>
>Current process (at least from the SCSI centric view) is that if we
>screw up and forward a commit with missing dependencies to stable via a
>cc: tag, it won't apply and Greg tells us to fix it, which we do. That
>seems to be an adequately functional process for the odd times we run
>into this.

Assuming a commit won't apply/fail to build because of dependencies is
really not a safe approach, which I keep getting reminded of quite
often.

See for example this patch:

https://patchwork.kernel.org/patch/10243707/

It will apply and build, but will fail to boot on a particular flavor
of ARMv7, and this is just the obvious failure modes of approaches like
these.

So again, I don't have a an objection to changing the docs or the way
it's being done now, but the new way should make it very easy for folks
to list dependency chains if they want to.

2018-05-04 23:36:47

by Theodore Ts'o

[permalink] [raw]
Subject: Re: [Ksummit-discuss] bug-introducing patches

On Fri, May 04, 2018 at 09:51:14PM +0000, Sasha Levin wrote:
> I don't have an objection to moving this to it's own tag. It will make
> my scripts somewhat simpler for sure.

It's not a matter "moving this it's own tag", but creating a new tag
--- because what is in the docs is a lie. It does not describe what
we do today. And current practice is the reality, not what is in the
docs.

As to whether we should create a new tag to support explicit
dependencies, I'll leave that between you and Greg K-H and the rest of
the stable maintainers. :-)

- Ted

2018-05-05 04:24:37

by Willy Tarreau

[permalink] [raw]
Subject: Re: [Ksummit-discuss] bug-introducing patches

On Fri, May 04, 2018 at 07:35:42PM -0400, Theodore Y. Ts'o wrote:
> On Fri, May 04, 2018 at 09:51:14PM +0000, Sasha Levin wrote:
> > I don't have an objection to moving this to it's own tag. It will make
> > my scripts somewhat simpler for sure.
>
> It's not a matter "moving this it's own tag", but creating a new tag
> --- because what is in the docs is a lie. It does not describe what
> we do today. And current practice is the reality, not what is in the
> docs.
>
> As to whether we should create a new tag to support explicit
> dependencies, I'll leave that between you and Greg K-H and the rest of
> the stable maintainers. :-)

Guys, *personally*, I've sometimes been a bit annoyed by the huge amount
of irregular extra headers trying to compensate for horribly vague commit
messages, and I'm pretty sure it pisses off patch authors who don't know
anymore what to put in their description. We need to keep in mind that
authors are humans and not machines, and that natural language remains
the best to explain complex dependencies. I'd prefer to see :

This patch needs to be backported to all stable branches that contain
717d3133 and 207f5b3c (that's 3.10+) or their respective backports but
must be adapted (contact me) if only a backport of 717d3133 is present.

Cc: stable # v3.10+

Rather than horrible stuff like this :

Cc: stable # v3.10+ (717d3133 && 207f5b3c) || WARN_ON(back(717d3133))

Of course it's a bit made up, but not too far from what is being discussed
here, probably only the next step. People will often get complex rules
wrong, both on the producer and on the consumer side. The day we need a
compiler to emit commit messages, we'll have to wonder if we didn't go
too far.

Also I've found the Fixes header pretty useful. It allows patch authors
to mention what is being fixed without necessarily copying stable,
because sometimes you'd rather not see your patch immediately backported
or you think the risks are higher than the bug. And here as well, it's
only suited for simple situations with a single commit ID, complex
desriptions have to be part of the commit message body.

I think that what we have now works pretty well but that some descriptions
lack a bit of detail, especially on the impact of the bug which would help
decide to backport or drop. This is understandable because often the person
fixing a bug documents it for people knowing the same subsystem well. But
when you backport fixes into other kernel versions, you don't know well
how each subsystem works, and guessing the impact of a bug is not always
obvious. Most of the time, authors who add Fixes: and/or Cc: stable take
care of providing enough information, though I'd suspect that sometimes
they're making efforts trying to figure how to place the information
there and possibly try to avoid redundancy by writing a shorter body.

At this point, I'm really not seeing what we're trying to improve or
optimize, and to be honest this discussion worries me a bit, by just
thinking that it could result in annoying changes...

Willy

2018-05-05 05:03:34

by Eric W. Biederman

[permalink] [raw]
Subject: Re: [Ksummit-discuss] bug-introducing patches

Willy Tarreau <[email protected]> writes:

> On Fri, May 04, 2018 at 07:35:42PM -0400, Theodore Y. Ts'o wrote:
>> On Fri, May 04, 2018 at 09:51:14PM +0000, Sasha Levin wrote:
>> > I don't have an objection to moving this to it's own tag. It will make
>> > my scripts somewhat simpler for sure.
>>
>> It's not a matter "moving this it's own tag", but creating a new tag
>> --- because what is in the docs is a lie. It does not describe what
>> we do today. And current practice is the reality, not what is in the
>> docs.
>>
>> As to whether we should create a new tag to support explicit
>> dependencies, I'll leave that between you and Greg K-H and the rest of
>> the stable maintainers. :-)
>
> Guys, *personally*, I've sometimes been a bit annoyed by the huge amount
> of irregular extra headers trying to compensate for horribly vague commit
> messages, and I'm pretty sure it pisses off patch authors who don't know
> anymore what to put in their description. We need to keep in mind that
> authors are humans and not machines, and that natural language remains
> the best to explain complex dependencies. I'd prefer to see :
>
> This patch needs to be backported to all stable branches that contain
> 717d3133 and 207f5b3c (that's 3.10+) or their respective backports but
> must be adapted (contact me) if only a backport of 717d3133 is present.
>
> Cc: stable # v3.10+
>
> Rather than horrible stuff like this :
>
> Cc: stable # v3.10+ (717d3133 && 207f5b3c) || WARN_ON(back(717d3133))
>
> Of course it's a bit made up, but not too far from what is being discussed
> here, probably only the next step. People will often get complex rules
> wrong, both on the producer and on the consumer side. The day we need a
> compiler to emit commit messages, we'll have to wonder if we didn't go
> too far.
>
> Also I've found the Fixes header pretty useful. It allows patch authors
> to mention what is being fixed without necessarily copying stable,
> because sometimes you'd rather not see your patch immediately backported
> or you think the risks are higher than the bug. And here as well, it's
> only suited for simple situations with a single commit ID, complex
> desriptions have to be part of the commit message body.
>
> I think that what we have now works pretty well but that some descriptions
> lack a bit of detail, especially on the impact of the bug which would help
> decide to backport or drop. This is understandable because often the person
> fixing a bug documents it for people knowing the same subsystem well. But
> when you backport fixes into other kernel versions, you don't know well
> how each subsystem works, and guessing the impact of a bug is not always
> obvious. Most of the time, authors who add Fixes: and/or Cc: stable take
> care of providing enough information, though I'd suspect that sometimes
> they're making efforts trying to figure how to place the information
> there and possibly try to avoid redundancy by writing a shorter body.
>
> At this point, I'm really not seeing what we're trying to improve or
> optimize, and to be honest this discussion worries me a bit, by just
> thinking that it could result in annoying changes...

So the way I use headers today is:
Cc: [email protected]
Fixes: sha1hash "commit subject"

I will use "Fixes: v2.0.1" if something is so old that it isn't in git.
If it was in bitkeeper and now in tglx's tree I will use:
History Tree: https://git.kernel.org/pub/scm/linux/kernel/git/tglx/history.git

Just because you won't find the commit in Linus's git tree.

I tend not to find particularly serious bugs, just ancient ones so I
generally figure if it doesn't backport easily it probably is not a
candidate for stable. The bug has existed for ages without anyone
really carring anyway.

Eric

2018-05-05 05:28:57

by Sasha Levin

[permalink] [raw]
Subject: Re: [Ksummit-discuss] bug-introducing patches

On Fri, May 04, 2018 at 07:35:42PM -0400, Theodore Y. Ts'o wrote:
>On Fri, May 04, 2018 at 09:51:14PM +0000, Sasha Levin wrote:
>> I don't have an objection to moving this to it's own tag. It will make
>> my scripts somewhat simpler for sure.
>
>It's not a matter "moving this it's own tag", but creating a new tag
>--- because what is in the docs is a lie. It does not describe what
>we do today. And current practice is the reality, not what is in the
>docs.

I'm really confused here. What do you mean with "not describe what we do
today"?

The doc allows for three ways to tag a patch:

1. Empty tag: "Cc: [email protected]"
2. With a version, quoting from the doc:

Also, some patches may have kernel version prerequisites. This can be
specified in the following format in the sign-off area:

Cc: <[email protected]> # 3.3.x

The tag has the meaning of:

git cherry-pick <this commit>

For each "-stable" tree starting with the specified version.

3. With a prereq commit, which is in the form of:

Cc: <[email protected]> # 3.3.x: a1f84a3: sched: Check for idle

We expect this to be used rarely used, and indeed it's not used as much.

>As to whether we should create a new tag to support explicit
>dependencies, I'll leave that between you and Greg K-H and the rest of
>the stable maintainers. :-)

2018-05-05 16:40:16

by Greg Kroah-Hartman

[permalink] [raw]
Subject: Re: [Ksummit-discuss] bug-introducing patches

On Sat, May 05, 2018 at 12:02:47AM -0500, Eric W. Biederman wrote:
> So the way I use headers today is:
> Cc: [email protected]
> Fixes: sha1hash "commit subject"

And that makes my life _so_ much easier. The Fixes: tag is great
(thanks James!), I have scripts that I use to track if a fix was applied
to a stable tree to know if it needs to go into that branch as well.
Without that, the "# 4.9" marking just doesn't work, as it doesn't tell
me if that commit got backported to 4.4.y as well. I used to do that
type of detection by hand, but automating it is so much better and I
miss less patches that way.

Anyway, just my two cents, let's try to keep this simple for both
maintainers, and stable developers, if at all possible. I think what we
have now works well, but if people think the documentation should be
cleaned up, great, send patches :)

thanks,

greg k-h

2018-05-08 02:36:54

by Sasha Levin

[permalink] [raw]
Subject: Re: [Ksummit-discuss] bug-introducing patches

On Thu, May 03, 2018 at 04:09:05PM -0700, Tony Lindgren wrote:
>* Mark Brown <[email protected]> [180503 22:44]:
>> On Wed, May 02, 2018 at 08:52:29PM -0700, Guenter Roeck wrote:
>>
>> > As for -next, me and others stopped reporting bugs in it, because when we do
>> > we tend to get flamed for the "noise". Is anyone aware (or cares) that mips
>> > and nds32 images don't build ? Soaking clothes in an empty bathtub won't make
>> > them wet, and bugs in code which no one builds, much less tests or uses, won't
>> > be found.
>>
>> You've been flamed for testing -next? That's not been my experience and
>> frankly it's pretty horrifying that it's happening. Testing is pretty
>> much the whole point of -next existing in the first place so you have to
>> wonder why people are putting their trees there if they don't want
>> testing. I have seen a few issues with people reporting bugs on old
>> versions of -next but otherwise...
>
>Yes I agree testing Linux next is very important. That's the best way for
>maintainers to ensure a usable -rc1 after a merge window. And then for
>the -rc cycle, there not much of need for chasing bugs to get things working.
>
>Bugs reported for Linux next often seem to get fixed or reverted faster
>compared to the -rc cycle too. I think that's because people realize that
>their code will not get merged until it's been fixed.
>
>So some daily testing of Linux next can save a lot scrambling after the
>merge window :)
>
>Users don't usually upgrade kernels until after later -rc releases or only
>after major releases so that probably explains some of the -rc cycle fixes.

Tony, I'm curious, how many users are you aware of who actually run
Linus's tree? All the users I've encountered so far on Azure seem to be
running something based on -stable.

I can't really get any solid statistics about that on my end both
because I don't have visibility inside user VMs (I don't actually have
prod access believe it or not), and even if I had it would probably be
confidential, so I'm just basing this on reports from user's I've seen
so far.

I think that a question we should be asking ourselves is whether we
should be basing our decisions here on the assumption that (pretty much)
no one runs Linus's tree anymore?

2018-05-08 02:40:08

by Sasha Levin

[permalink] [raw]
Subject: Re: [Ksummit-discuss] bug-introducing patches

On Fri, May 04, 2018 at 07:42:17AM +0900, Mark Brown wrote:
>On Wed, May 02, 2018 at 08:52:29PM -0700, Guenter Roeck wrote:
>
>> As for -next, me and others stopped reporting bugs in it, because when we do
>> we tend to get flamed for the "noise". Is anyone aware (or cares) that mips
>> and nds32 images don't build ? Soaking clothes in an empty bathtub won't make
>> them wet, and bugs in code which no one builds, much less tests or uses, won't
>> be found.
>
>You've been flamed for testing -next? That's not been my experience and
>frankly it's pretty horrifying that it's happening. Testing is pretty
>much the whole point of -next existing in the first place so you have to
>wonder why people are putting their trees there if they don't want
>testing. I have seen a few issues with people reporting bugs on old
>versions of -next but otherwise...

This is just wrong, what else is -next for?

FWIW, our (MSFT) testing folks should now be reporting issues they see
on our -next testing pipeline directly to LKML. There's not much volume
there given that the 0-day bot catches most of the issues anyways, but
we sometimes see odd regressions given that no one else seems to test
Linux on Hyper-V but us :)

2018-05-08 03:49:18

by Theodore Ts'o

[permalink] [raw]
Subject: Re: [Ksummit-discuss] bug-introducing patches

On Tue, May 08, 2018 at 02:34:41AM +0000, Sasha Levin via Ksummit-discuss wrote:
>
> Tony, I'm curious, how many users are you aware of who actually run
> Linus's tree? All the users I've encountered so far on Azure seem to be
> running something based on -stable.

The people who run Linus's tree and test -rc kernels tend to be kernel
developers and individual users who want to run bleeding edge kernels
and who generally are technically clueful. If you were talking about
SLR cameras, you'd call them the "prosumers" segment of the market.

It tends to be more on desktops and laptops, so it doesn't surprise me
that you don't often see them in a hosting environment where you have
to pay $$$. (And where you do see them in a hosting environment, it's
probably for things like gce-xfstests.)

> I think that a question we should be asking ourselves is whether we
> should be basing our decisions here on the assumption that (pretty much)
> no one runs Linus's tree anymore?

These people *do* exist, because as a maintainer, I get bug reports
from them. (And sometimes as a user, I send bug reports when running
-rc kernels to other maintainers, such as the i915 drivers and the
Intel Wireless driver folks.)

Such reports are incredibly valuable and precious to me, since it
allows me to find problems that weren't picked up in my own testing.
(In the case of Intel Wireless, a while back the IWL team didn't have
Aruba Enterprise Access Points in their test hardware library, so I
found a regression after the merge window because I was running -rcX
on my laptop, and wireless access to googleguest network broke. If I
hadn't been running -rcX, they probably wouldn't have discovered this
problem until after that particular kernel had been released.)

So keeping those users happy is a good thing; since they tend to be
very technically clueful, they can do bisections for you, and they are
able to give a detailed and useful bug report. If they report that a
regression that was introduced in -rc2 is fixed by a particular patch,
I want to push it into -rc3 immediately, and not let it stall in
linux-next. If the reason why is because you don't trust my patch
because it "only" got tested by the technically advanced user
reporting the regression, then don't take patches from -rc3 into your
stable branch right away! Let it bake in Linus's tree anfor a week or
two, instead of demanding that patches stick around in Linux-next
before flowing into Linus's tree.

Because I will guarantee you this --- there are more real users
running Linus's tree than linux-next. This is because Linus's tree
tends to be far more stable than linux-next, since after -rc2
linux-next starts getting the first set of experiments for what will
be going into the next merge window. So while I am willing to run
something based on -rc2 or later on my laptop, there is no way in heck
I would be willing to put linux-next on my laptop. That's just way
too exciting for me....

Would I pull down linux-next, and fire up a VM running gce-xfstests?
Sure. But that's not a real-life use case; that's just running canned
test cases. And more often than not, linux-next will be broken while
Linus's -rcX tree is just fine; which is why I do most of my ext4
testing using patches based on top of -rcX, not based on top of
linux-next.

- Ted

2018-05-08 14:00:19

by Justin Forbes

[permalink] [raw]
Subject: Re: [Ksummit-discuss] bug-introducing patches

On Mon, May 7, 2018 at 9:34 PM, Sasha Levin
<[email protected]> wrote:
> On Thu, May 03, 2018 at 04:09:05PM -0700, Tony Lindgren wrote:
>>* Mark Brown <[email protected]> [180503 22:44]:
>>> On Wed, May 02, 2018 at 08:52:29PM -0700, Guenter Roeck wrote:
>>>
>>> > As for -next, me and others stopped reporting bugs in it, because when we do
>>> > we tend to get flamed for the "noise". Is anyone aware (or cares) that mips
>>> > and nds32 images don't build ? Soaking clothes in an empty bathtub won't make
>>> > them wet, and bugs in code which no one builds, much less tests or uses, won't
>>> > be found.
>>>
>>> You've been flamed for testing -next? That's not been my experience and
>>> frankly it's pretty horrifying that it's happening. Testing is pretty
>>> much the whole point of -next existing in the first place so you have to
>>> wonder why people are putting their trees there if they don't want
>>> testing. I have seen a few issues with people reporting bugs on old
>>> versions of -next but otherwise...
>>
>>Yes I agree testing Linux next is very important. That's the best way for
>>maintainers to ensure a usable -rc1 after a merge window. And then for
>>the -rc cycle, there not much of need for chasing bugs to get things working.
>>
>>Bugs reported for Linux next often seem to get fixed or reverted faster
>>compared to the -rc cycle too. I think that's because people realize that
>>their code will not get merged until it's been fixed.
>>
>>So some daily testing of Linux next can save a lot scrambling after the
>>merge window :)
>>
>>Users don't usually upgrade kernels until after later -rc releases or only
>>after major releases so that probably explains some of the -rc cycle fixes.
>
> Tony, I'm curious, how many users are you aware of who actually run
> Linus's tree? All the users I've encountered so far on Azure seem to be
> running something based on -stable.

I couldn't tell you the number of users we have running rawhide
kernels (daily builds of Linus's tree), but it is a positive integer.
We do get bug reports on things, sometimes a day after Linus commits
them.

>
> I can't really get any solid statistics about that on my end both
> because I don't have visibility inside user VMs (I don't actually have
> prod access believe it or not), and even if I had it would probably be
> confidential, so I'm just basing this on reports from user's I've seen
> so far.
>
> I think that a question we should be asking ourselves is whether we
> should be basing our decisions here on the assumption that (pretty much)
> no one runs Linus's tree anymore?

2018-05-08 14:50:33

by Tony Lindgren

[permalink] [raw]
Subject: Re: [Ksummit-discuss] bug-introducing patches

* Theodore Y. Ts'o <[email protected]> [180508 03:50]:
> On Tue, May 08, 2018 at 02:34:41AM +0000, Sasha Levin via Ksummit-discuss wrote:
> >
> > Tony, I'm curious, how many users are you aware of who actually run
> > Linus's tree? All the users I've encountered so far on Azure seem to be
> > running something based on -stable.
>
> The people who run Linus's tree and test -rc kernels tend to be kernel
> developers and individual users who want to run bleeding edge kernels
> and who generally are technically clueful. If you were talking about
> SLR cameras, you'd call them the "prosumers" segment of the market.

Yup that's the category. People tinkering with their devices and
using bleeding edge kernels because of some new device driver only
being in thr -rc series for example.

> > I think that a question we should be asking ourselves is whether we
> > should be basing our decisions here on the assumption that (pretty much)
> > no one runs Linus's tree anymore?
>
> These people *do* exist, because as a maintainer, I get bug reports
> from them. (And sometimes as a user, I send bug reports when running
> -rc kernels to other maintainers, such as the i915 drivers and the
> Intel Wireless driver folks.)

Yes.

> Such reports are incredibly valuable and precious to me, since it
> allows me to find problems that weren't picked up in my own testing.
> (In the case of Intel Wireless, a while back the IWL team didn't have
> Aruba Enterprise Access Points in their test hardware library, so I
> found a regression after the merge window because I was running -rcX
> on my laptop, and wireless access to googleguest network broke. If I
> hadn't been running -rcX, they probably wouldn't have discovered this
> problem until after that particular kernel had been released.)

Yes. So as maintainers, we should all test Linux next on frequent
basis to aim for usable -rc1 with no regressions based on that
testing. Then the rest of the -rc cycle should be easy with more
testing and reports from the "prosumer" market :)

> So keeping those users happy is a good thing; since they tend to be
> very technically clueful, they can do bisections for you, and they are
> able to give a detailed and useful bug report. If they report that a
> regression that was introduced in -rc2 is fixed by a particular patch,
> I want to push it into -rc3 immediately, and not let it stall in
> linux-next. If the reason why is because you don't trust my patch
> because it "only" got tested by the technically advanced user
> reporting the regression, then don't take patches from -rc3 into your
> stable branch right away! Let it bake in Linus's tree anfor a week or
> two, instead of demanding that patches stick around in Linux-next
> before flowing into Linus's tree.
>
> Because I will guarantee you this --- there are more real users
> running Linus's tree than linux-next. This is because Linus's tree
> tends to be far more stable than linux-next, since after -rc2
> linux-next starts getting the first set of experiments for what will
> be going into the next merge window. So while I am willing to run
> something based on -rc2 or later on my laptop, there is no way in heck
> I would be willing to put linux-next on my laptop. That's just way
> too exciting for me....

I follow Linux next on few test systems. Then when I see no regressions,
I might dare try it on my laptop. Something that's usable one week in
next may not be so any longer the next week. So testing minimum few
times a week and carrying occasional reverts are often needed to be
able to test Linux next on daily basis.

> Would I pull down linux-next, and fire up a VM running gce-xfstests?
> Sure. But that's not a real-life use case; that's just running canned
> test cases. And more often than not, linux-next will be broken while
> Linus's -rcX tree is just fine; which is why I do most of my ext4
> testing using patches based on top of -rcX, not based on top of
> linux-next.

Ideally we would somehow always end up with an -rc1 that people dare
to use though for the "prosumer" testing :)

Regards,

Tony

2018-05-08 20:30:10

by Sasha Levin

[permalink] [raw]
Subject: Re: [Ksummit-discuss] bug-introducing patches

On Mon, May 07, 2018 at 11:48:20PM -0400, Theodore Y. Ts'o wrote:
>On Tue, May 08, 2018 at 02:34:41AM +0000, Sasha Levin via Ksummit-discuss wrote:
>>
>> Tony, I'm curious, how many users are you aware of who actually run
>> Linus's tree? All the users I've encountered so far on Azure seem to be
>> running something based on -stable.
>
>The people who run Linus's tree and test -rc kernels tend to be kernel
>developers and individual users who want to run bleeding edge kernels
>and who generally are technically clueful. If you were talking about
>SLR cameras, you'd call them the "prosumers" segment of the market.
>
>It tends to be more on desktops and laptops, so it doesn't surprise me
>that you don't often see them in a hosting environment where you have
>to pay $$$. (And where you do see them in a hosting environment, it's
>probably for things like gce-xfstests.)
>
>> I think that a question we should be asking ourselves is whether we
>> should be basing our decisions here on the assumption that (pretty much)
>> no one runs Linus's tree anymore?
>
>These people *do* exist, because as a maintainer, I get bug reports
>from them. (And sometimes as a user, I send bug reports when running
>-rc kernels to other maintainers, such as the i915 drivers and the
>Intel Wireless driver folks.)
>
>Such reports are incredibly valuable and precious to me, since it
>allows me to find problems that weren't picked up in my own testing.
>(In the case of Intel Wireless, a while back the IWL team didn't have
>Aruba Enterprise Access Points in their test hardware library, so I
>found a regression after the merge window because I was running -rcX
>on my laptop, and wireless access to googleguest network broke. If I
>hadn't been running -rcX, they probably wouldn't have discovered this
>problem until after that particular kernel had been released.)
>
>So keeping those users happy is a good thing; since they tend to be
>very technically clueful, they can do bisections for you, and they are
>able to give a detailed and useful bug report. If they report that a
>regression that was introduced in -rc2 is fixed by a particular patch,
>I want to push it into -rc3 immediately, and not let it stall in
>linux-next. If the reason why is because you don't trust my patch
>because it "only" got tested by the technically advanced user
>reporting the regression, then don't take patches from -rc3 into your
>stable branch right away! Let it bake in Linus's tree anfor a week or
>two, instead of demanding that patches stick around in Linux-next
>before flowing into Linus's tree.
>
>Because I will guarantee you this --- there are more real users
>running Linus's tree than linux-next. This is because Linus's tree
>tends to be far more stable than linux-next, since after -rc2
>linux-next starts getting the first set of experiments for what will
>be going into the next merge window. So while I am willing to run
>something based on -rc2 or later on my laptop, there is no way in heck
>I would be willing to put linux-next on my laptop. That's just way
>too exciting for me....
>
>Would I pull down linux-next, and fire up a VM running gce-xfstests?
>Sure. But that's not a real-life use case; that's just running canned
>test cases. And more often than not, linux-next will be broken while
>Linus's -rcX tree is just fine; which is why I do most of my ext4
>testing using patches based on top of -rcX, not based on top of
>linux-next.

This is interesting. We have a group of power users who are testing out
-rc releases, who are usually happy to test out a fast moving target and
provide helpful reports back. We also have a group who run a -stable
kernel (-stable build/distro/android/etc) who want to avoid having to
report bugs to us.

What we don't have is a group of people who use Linus's actual releases
(not the -rc stuff, but the actual point releases). Power users will
move on to the next kernel, and -stable folks won't touch that release
until there's a corresponding -stable.

Even rawhide, like Josh mentioned, will just fill back with the merge
window commits after the release of an older kernel.

So the problem I'm seeing is that since a merge window is open only once
every 2-3 months people will sometimes try to push poorly tested code
just to make that merge window. Additionally, as later -rc releases
start showing up people will again merge poorly tested fixes just to
make it in time for that release.

For both cases, people will push poorly tested code in the kernel just
because they want to make it in time for a kernel release that no one
will actually use.

What if, instead, Linus doesn't actually ever release a point release?
We can make the merge window open more often, and since there's no
actual release, people won't rush to push fixes in later -rc cycles.

We take away the incentive to push poorly tested code. Maintainers still
free to commit anything they'd like, but there's no reason to commit
code they're not confident of just to make it to a random release no one
will use.

Merge window will happen more often, so there's no real reason to rush
things in a particular window, and since -stable releases every week
there's no rush to push a fix in since the next release is just a week
away.

2018-05-08 20:55:52

by Sasha Levin

[permalink] [raw]
Subject: Re: [Ksummit-discuss] bug-introducing patches

On Tue, May 08, 2018 at 08:40:02PM +0000, Matthew Wilcox wrote:
>I think your sample size omits some people. I run Debian Testing on my
>laptop. That gets something akin to a Linus release pretty soon after he
>releases it, and while it gets some amount of -stable patches, it
>progresses to the next release fairly rapidly.

Debian testing is pretty much a -stable tree, see the git log history:

https://salsa.debian.org/kernel-team/linux/commits/sid

It follows a current stable tree, and moves on to the next one once it's
available (about a week after Linus releases a new kernel).

>Added Ben to the cc for more updates.
>
>I think Fedora does something similar.

Fedora's rawhide is just (daily?) builds of Linus's tree, they don't
care what stage the tree is in at any point.


My point is that no one picks a release and sticks with it more than a
week. If someone plans to use a release for longer term they use a
-stable tree, and if they are interested in testing, they move on to the
next release once it's available.

There's no one, for example, who picked up vanilla v4.16 and plans to
keep using it for a year.

This leads to my point about rushing fixes: -stable releases for v4.16 are
done weekly, there's no need to rush them in during v4.16-rc8 just to
make some imaginary release no one will pick up.

2018-05-08 21:06:40

by David Lang

[permalink] [raw]
Subject: Re: [Ksummit-discuss] bug-introducing patches

On Tue, 8 May 2018, Sasha Levin wrote:

> There's no one, for example, who picked up vanilla v4.16 and plans to
> keep using it for a year.

Actually, at a prior job I would do almost exactly that.

I never intended to go a year without updating, but it would happen if nothing
came up that was related to the hardware/features I was running.

so 'no one uses the Linus kernel is false.

2018-05-08 21:08:19

by Ken Moffat

[permalink] [raw]
Subject: Re: [Ksummit-discuss] bug-introducing patches

On 8 May 2018 at 21:29, Sasha Levin <[email protected]> wrote:

>
> This is interesting. We have a group of power users who are testing out
> -rc releases, who are usually happy to test out a fast moving target and
> provide helpful reports back. We also have a group who run a -stable
> kernel (-stable build/distro/android/etc) who want to avoid having to
> report bugs to us.
>
> What we don't have is a group of people who use Linus's actual releases
> (not the -rc stuff, but the actual point releases). Power users will
> move on to the next kernel, and -stable folks won't touch that release
> until there's a corresponding -stable.

I resent that assumption :)

As a 'prosumer' in this context, I try to test an early -rc (usually
not until -rc2, sometimes not until later, depending on what I see on
this list), and then intermittently I spread the testing to more of my
desktop machines using later -rc versions. Once linus releases .0 I
hope to move my current systems to that in the next few days. But as
always, other things (sometimes real life, sometimes just new changes
in userspace) intervene.

After that, I will pick up Greg's latest if I build a new system
before the next kernel release, or if I become aware of something
critical (for my usage) in it. And then probably 4 or 5 weeks after
linus's release I will start the next cycle of testing -rc verisons.

So no, I rarely test Greg's current stable version, but there _is_ a
period of some weeks where I run .0 kernels.

ĸen


>
> Even rawhide, like Josh mentioned, will just fill back with the merge
> window commits after the release of an older kernel.
>
> So the problem I'm seeing is that since a merge window is open only once
> every 2-3 months people will sometimes try to push poorly tested code
> just to make that merge window. Additionally, as later -rc releases
> start showing up people will again merge poorly tested fixes just to
> make it in time for that release.
>
> For both cases, people will push poorly tested code in the kernel just
> because they want to make it in time for a kernel release that no one
> will actually use.
>
> What if, instead, Linus doesn't actually ever release a point release?
> We can make the merge window open more often, and since there's no
> actual release, people won't rush to push fixes in later -rc cycles.
>
> We take away the incentive to push poorly tested code. Maintainers still
> free to commit anything they'd like, but there's no reason to commit
> code they're not confident of just to make it to a random release no one
> will use.
>
> Merge window will happen more often, so there's no real reason to rush
> things in a particular window, and since -stable releases every week
> there's no rush to push a fix in since the next release is just a week
> away.

2018-05-08 21:27:40

by Justin Forbes

[permalink] [raw]
Subject: Re: [Ksummit-discuss] bug-introducing patches

On Tue, May 8, 2018 at 3:55 PM, Sasha Levin
<[email protected]> wrote:
> On Tue, May 08, 2018 at 08:40:02PM +0000, Matthew Wilcox wrote:
>>I think your sample size omits some people. I run Debian Testing on my
>>laptop. That gets something akin to a Linus release pretty soon after he
>>releases it, and while it gets some amount of -stable patches, it
>>progresses to the next release fairly rapidly.
>
> Debian testing is pretty much a -stable tree, see the git log history:
>
> https://salsa.debian.org/kernel-team/linux/commits/sid
>
> It follows a current stable tree, and moves on to the next one once it's
> available (about a week after Linus releases a new kernel).
>
>>Added Ben to the cc for more updates.
>>
>>I think Fedora does something similar.
>
> Fedora's rawhide is just (daily?) builds of Linus's tree, they don't
> care what stage the tree is in at any point.

It is, but there is a branch point when Linus releases. If we are
working on a new Fedora release, such as F28, all testing stayed on
4.16.0 until stable updates were released. If there is no release
deadline nearing, we have a "stabilization" repository where people
are using and testing the .0 release until stable updates happen. In
either case, the Linus release is really only tested until the stable
.1 happens, but there are users and testers of .0.

>
> My point is that no one picks a release and sticks with it more than a
> week. If someone plans to use a release for longer term they use a
> -stable tree, and if they are interested in testing, they move on to the
> next release once it's available.
>
> There's no one, for example, who picked up vanilla v4.16 and plans to
> keep using it for a year.
>
> This leads to my point about rushing fixes: -stable releases for v4.16 are
> done weekly, there's no need to rush them in during v4.16-rc8 just to
> make some imaginary release no one will pick up.

2018-05-08 21:44:25

by Sasha Levin

[permalink] [raw]
Subject: Re: [Ksummit-discuss] bug-introducing patches

On Tue, May 08, 2018 at 01:59:18PM -0700, David Lang wrote:
>On Tue, 8 May 2018, Sasha Levin wrote:
>
>>There's no one, for example, who picked up vanilla v4.16 and plans to
>>keep using it for a year.
>
>Actually, at a prior job I would do almost exactly that.
>
>I never intended to go a year without updating, but it would happen if
>nothing came up that was related to the hardware/features I was
>running.
>
>so 'no one uses the Linus kernel is false.

My point is not that "no one ever uses Linus kernel" but that no one
takes one of those kernels and plans to stick with it for 3 months until
the next one comes up, even if there are updates relevant to that user.

Yes, some users will use a .0 release until either Greg releases a
-stable, or until the next -rc is out.

What I'm trying to say is that there is that the .0 release makes some
people rush poorly tested commits in it even though the .0 release is
not significant in any way.

2018-05-08 21:53:42

by Dan Williams

[permalink] [raw]
Subject: Re: [Ksummit-discuss] bug-introducing patches

On Tue, May 8, 2018 at 2:43 PM, Sasha Levin via Ksummit-discuss
<[email protected]> wrote:
> On Tue, May 08, 2018 at 01:59:18PM -0700, David Lang wrote:
>>On Tue, 8 May 2018, Sasha Levin wrote:
>>
>>>There's no one, for example, who picked up vanilla v4.16 and plans to
>>>keep using it for a year.
>>
>>Actually, at a prior job I would do almost exactly that.
>>
>>I never intended to go a year without updating, but it would happen if
>>nothing came up that was related to the hardware/features I was
>>running.
>>
>>so 'no one uses the Linus kernel is false.
>
> My point is not that "no one ever uses Linus kernel" but that no one
> takes one of those kernels and plans to stick with it for 3 months until
> the next one comes up, even if there are updates relevant to that user.
>
> Yes, some users will use a .0 release until either Greg releases a
> -stable, or until the next -rc is out.
>
> What I'm trying to say is that there is that the .0 release makes some
> people rush poorly tested commits in it even though the .0 release is
> not significant in any way.

I think we should take pride in our releases, so I disagree that it is
insignificant. If a maintainer is rushing things into late rc's and
breaking things then they need that feedback, not de-emphasize the
importance of ".0" releases. Could the bar be raised higher on late
fixes, perhaps. I otherwise think the message is already clear
"changes at -rc6,7,8 had better be worthy of and coming in late and be
accompanied with good explanation".

2018-05-08 22:16:40

by Theodore Ts'o

[permalink] [raw]
Subject: Re: [Ksummit-discuss] bug-introducing patches

On Tue, May 08, 2018 at 08:29:14PM +0000, Sasha Levin wrote:
>
> This is interesting. We have a group of power users who are testing out
> -rc releases, who are usually happy to test out a fast moving target and
> provide helpful reports back. We also have a group who run a -stable
> kernel (-stable build/distro/android/etc) who want to avoid having to
> report bugs to us.
>
> What we don't have is a group of people who use Linus's actual releases
> (not the -rc stuff, but the actual point releases). Power users will
> move on to the next kernel, and -stable folks won't touch that release
> until there's a corresponding -stable.

Linus doesn't release the point releases. Those are done by the Greg
K-H and use the same process as does the stable kernels. The only
difference is that the life point release doesn't last very long; just
until the next kernel release from Linus.

There are probably fewer people who use the point releases compared to
the stable kernels. But I'd hesitate to call it zero. We once
assumed that companies were all using Distro kernels, and very few
people used the stable kernels except for distribution channels
(enterprise kernels, BSP kernels, etc.). Then we discovered that
there are people who use the stable kernel and don't go through the
enterprise distro vendors at all. It wouldn't surprise me if there
are also a silent (and perhaps large) set of users who take Linus's
releases, and then follow along on the dot releases until the next
release from Linus.

> What if, instead, Linus doesn't actually ever release a point
> release? We can make the merge window open more often, and since
> there's no actual release, people won't rush to push fixes in later
> -rc cycles.

I dont' understand your proposal. Linus doesn't actually release
point releases. Those happen during the -rcX cycle for those people
who are too chicken to follow Linus's tree, and just want the bug
fixes.

Getting rid of the point releases isn't going to change how frequently
the merge window opens. What would do that is being much more strict
about when we only allow regression fixes only into the tree, so
hopefully the tree stablizes itself by -rc5 or -rc6.

> Merge window will happen more often, so there's no real reason to
> rush things in a particular window, and since -stable releases every
> week there's no rush to push a fix in since the next release is just
> a week away.

Huh?

I can see shortening the release cycle to a six weeks, instead of our
current 8-9 week cycle. But after the each cycle where we introduce
new features, during the merge window / integration phase, we do need
to have a time when are fixing regression bugs.

- Ted

2018-05-08 22:42:09

by James Bottomley

[permalink] [raw]
Subject: Re: [Ksummit-discuss] bug-introducing patches

On Tue, 2018-05-08 at 21:43 +0000, Sasha Levin via Ksummit-discuss
wrote:
> On Tue, May 08, 2018 at 01:59:18PM -0700, David Lang wrote:
> > On Tue, 8 May 2018, Sasha Levin wrote:
> >
> > > There's no one, for example, who picked up vanilla v4.16 and
> > > plans to keep using it for a year.
> >
> > Actually, at a prior job I would do almost exactly that.
> >
> > I never intended to go a year without updating, but it would happen
> > if  nothing came up that was related to the hardware/features I
> > was running.
> >
> > so 'no one uses the Linus kernel is false.
>
> My point is not that "no one ever uses Linus kernel" but that no one
> takes one of those kernels and plans to stick with it for 3 months
> until the next one comes up, even if there are updates relevant to
> that user..

Actually, I have sometimes done that. My current laptop is running the
v4.16 tag now, not because I intended to run it for this long but
because I've run into a Round Tuit shortage as far as the -rc
candidates go.

> Yes, some users will use a .0 release until either Greg releases a
> -stable, or until the next -rc is out.
>
> What I'm trying to say is that there is that the .0 release makes
> some people rush poorly tested commits in it even though the .0
> release is not significant in any way.

As a milestone, it's extremely significant because it's the cadence
from which everything else flows. If we as developers stop taking the
-rc cycle seriously, you'll find immediate negative consequences for
your stable kernels. And I mean way worse consequences than the odd
bad judgment call about a patch that ought not to have gone in right
before a Linus release.

James


2018-05-09 04:49:26

by Willy Tarreau

[permalink] [raw]
Subject: Re: [Ksummit-discuss] bug-introducing patches

On Tue, May 08, 2018 at 08:29:14PM +0000, Sasha Levin wrote:
> What if, instead, Linus doesn't actually ever release a point release?
> We can make the merge window open more often, and since there's no
> actual release, people won't rush to push fixes in later -rc cycles.

And then what's the purpose of these later -rc cycles if you remove one
release ? You're just removing one step and shifting everything down by
one -rc but the issues are the same.

> We take away the incentive to push poorly tested code. Maintainers still
> free to commit anything they'd like, but there's no reason to commit
> code they're not confident of just to make it to a random release no one
> will use.
>
> Merge window will happen more often, so there's no real reason to rush
> things in a particular window, and since -stable releases every week
> there's no rush to push a fix in since the next release is just a week
> away.

I'm not sure what model you're having in mind but the description above
reminds me of 2.5 which was constantly had something broken and which
used to be unusable for many developers. Many of us even bought some
SCSI cards and disks by then because for a long time IDE was broken.

The primary purpose of Linus' releases and -rc is to synchronise everyone
on the same goal at the same time. The merge window is "send me your crap,
it must be OK but we know problems happen and you'll be allowed to fix it
later". The -rc ones are there so that everyone fixes their crap in
parallel so that we converge towards something acceptable for everyone.

Your argument that the .0 release is useless is wrong in my opinion. It
is as wrong as saying "statistics show that less people use .3 than .7".
And comparing "stable kernels" to ".0" is wrong because there are roughly
10 times more stable kernels than releases so statistically you'll find
10 times more of them in field. The reality is that deploying .0 always
takes a bit more time for end users so statistically it should be a bit
less common in field :
- you're never certain when the new version is going to be released
(will rc8/rc9 exist?)
- when it's released, you have to update your config and it takes
some time.
- by the time you find a quiet moment to do all this, it's not
unlikely that the end of the week is reached with .1 appearing.

And so what ? The .0 release is a stable release like any other one.
It doesn't deserve to be deployed more than any other specific stable
release. It serves as a reference. Before .0 the code experiences some
possibly breaking changes, even some reverts. After .0 it experiences
only small fixes according to the stable rules.

Overall I think the current model is not that bad, and that what is the
most needed is some education regarding how -stable works to encourage
developers to rush their fixes less (after more tests), and to ensure
that those who generally push good quality fixes can submit them at any
moment in the cycle so that we get them as fast as possible in -stable.

Willy

2018-05-09 08:14:43

by Mark Brown

[permalink] [raw]
Subject: Re: [Ksummit-discuss] bug-introducing patches

On Tue, May 08, 2018 at 07:49:59AM -0700, Tony Lindgren wrote:
> * Theodore Y. Ts'o <[email protected]> [180508 03:50]:

> > The people who run Linus's tree and test -rc kernels tend to be kernel
> > developers and individual users who want to run bleeding edge kernels
> > and who generally are technically clueful. If you were talking about
> > SLR cameras, you'd call them the "prosumers" segment of the market.

> Yup that's the category. People tinkering with their devices and
> using bleeding edge kernels because of some new device driver only
> being in thr -rc series for example.

You also get some people who are intending to ship on stable kernels but
are tracking upstream during product development so that they can be on
the most current stable release when they go to production.


Attachments:
(No filename) (809.00 B)
signature.asc (499.00 B)
Download all attachments

2018-05-09 08:46:20

by Mark Brown

[permalink] [raw]
Subject: Re: [Ksummit-discuss] bug-introducing patches

On Fri, May 04, 2018 at 04:21:26PM +0200, Ulf Hansson wrote:

> Then, why don't we have a pre-integration tree for fixes? That would
> at least simply automated testing of fixes separately from new
> material.

> Perhaps this has already been discussed, and concluded and it's not
> worth it, then apologize for my ignorance.

I think this is an excellent idea, copying in Stephen for his input.
I'm currently on holiday but unless someone convinces me it's a terrible
idea I'm willing to at least give it a go on a trial basis once I'm back
home.


Attachments:
(No filename) (561.00 B)
signature.asc (499.00 B)
Download all attachments

2018-05-09 08:49:09

by Daniel Vetter

[permalink] [raw]
Subject: Re: [Ksummit-discuss] bug-introducing patches

On Wed, May 9, 2018 at 10:44 AM, Mark Brown <[email protected]> wrote:
> On Fri, May 04, 2018 at 04:21:26PM +0200, Ulf Hansson wrote:
>
>> Then, why don't we have a pre-integration tree for fixes? That would
>> at least simply automated testing of fixes separately from new
>> material.
>
>> Perhaps this has already been discussed, and concluded and it's not
>> worth it, then apologize for my ignorance.
>
> I think this is an excellent idea, copying in Stephen for his input.
> I'm currently on holiday but unless someone convinces me it's a terrible
> idea I'm willing to at least give it a go on a trial basis once I'm back
> home.

Since Stephen merges all -fixes branches first, before merging all the
-next branches, he already generates that as part of linux-next. All
he'd need to do is push that intermediate state out to some
linux-fixes branch for consumption by test bots.
-Daniel
--
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch

2018-05-09 08:53:18

by Geert Uytterhoeven

[permalink] [raw]
Subject: Re: [Ksummit-discuss] bug-introducing patches

On Wed, May 9, 2018 at 10:47 AM, Daniel Vetter <[email protected]> wrote:
> On Wed, May 9, 2018 at 10:44 AM, Mark Brown <[email protected]> wrote:
>> On Fri, May 04, 2018 at 04:21:26PM +0200, Ulf Hansson wrote:
>>> Then, why don't we have a pre-integration tree for fixes? That would
>>> at least simply automated testing of fixes separately from new
>>> material.
>>
>>> Perhaps this has already been discussed, and concluded and it's not
>>> worth it, then apologize for my ignorance.
>>
>> I think this is an excellent idea, copying in Stephen for his input.
>> I'm currently on holiday but unless someone convinces me it's a terrible
>> idea I'm willing to at least give it a go on a trial basis once I'm back
>> home.
>
> Since Stephen merges all -fixes branches first, before merging all the
> -next branches, he already generates that as part of linux-next. All
> he'd need to do is push that intermediate state out to some
> linux-fixes branch for consumption by test bots.

+1

Gr{oetje,eeting}s,

Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- [email protected]

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
-- Linus Torvalds

2018-05-09 09:05:13

by Mark Brown

[permalink] [raw]
Subject: Re: [Ksummit-discuss] bug-introducing patches

On Wed, May 09, 2018 at 10:47:57AM +0200, Daniel Vetter wrote:
> On Wed, May 9, 2018 at 10:44 AM, Mark Brown <[email protected]> wrote:

> > I think this is an excellent idea, copying in Stephen for his input.
> > I'm currently on holiday but unless someone convinces me it's a terrible
> > idea I'm willing to at least give it a go on a trial basis once I'm back
> > home.

> Since Stephen merges all -fixes branches first, before merging all the
> -next branches, he already generates that as part of linux-next. All
> he'd need to do is push that intermediate state out to some
> linux-fixes branch for consumption by test bots.

True. It's currently only those -fixes branches that people have asked
him to merge separately which isn't as big a proportion of trees as have
them (perhaps fortunately given people's enthusiasm for fixes branches
that don't merge cleanly with their development branches) so we'd also
need to encourage people to add them separately.


Attachments:
(No filename) (988.00 B)
signature.asc (499.00 B)
Download all attachments

2018-05-09 10:48:28

by Stephen Rothwell

[permalink] [raw]
Subject: Re: [Ksummit-discuss] bug-introducing patches

On Wed, 9 May 2018 18:03:46 +0900 Mark Brown <[email protected]> wrote:
>
> On Wed, May 09, 2018 at 10:47:57AM +0200, Daniel Vetter wrote:
> > On Wed, May 9, 2018 at 10:44 AM, Mark Brown <[email protected]> wrote:
>
> > > I think this is an excellent idea, copying in Stephen for his input.
> > > I'm currently on holiday but unless someone convinces me it's a terrible
> > > idea I'm willing to at least give it a go on a trial basis once I'm back
> > > home.
>
> > Since Stephen merges all -fixes branches first, before merging all the
> > -next branches, he already generates that as part of linux-next. All
> > he'd need to do is push that intermediate state out to some
> > linux-fixes branch for consumption by test bots.

Good idea ... I will see what I can do.

> True. It's currently only those -fixes branches that people have asked
> him to merge separately which isn't as big a proportion of trees as have
> them (perhaps fortunately given people's enthusiasm for fixes branches
> that don't merge cleanly with their development branches) so we'd also
> need to encourage people to add them separately.

I currently have 44 such fixes branches. More welcome!

--
Cheers,
Stephen Rothwell


Attachments:
(No filename) (499.00 B)
OpenPGP digital signature

2018-05-09 10:59:44

by Vinod Koul

[permalink] [raw]
Subject: Re: [Ksummit-discuss] bug-introducing patches

On 09-05-18, 20:47, Stephen Rothwell wrote:
> On Wed, 9 May 2018 18:03:46 +0900 Mark Brown <[email protected]> wrote:
> >
> > On Wed, May 09, 2018 at 10:47:57AM +0200, Daniel Vetter wrote:
> > > On Wed, May 9, 2018 at 10:44 AM, Mark Brown <[email protected]> wrote:
> >
> > > > I think this is an excellent idea, copying in Stephen for his input.
> > > > I'm currently on holiday but unless someone convinces me it's a terrible
> > > > idea I'm willing to at least give it a go on a trial basis once I'm back
> > > > home.
> >
> > > Since Stephen merges all -fixes branches first, before merging all the
> > > -next branches, he already generates that as part of linux-next. All
> > > he'd need to do is push that intermediate state out to some
> > > linux-fixes branch for consumption by test bots.
>
> Good idea ... I will see what I can do.
>
> > True. It's currently only those -fixes branches that people have asked
> > him to merge separately which isn't as big a proportion of trees as have
> > them (perhaps fortunately given people's enthusiasm for fixes branches
> > that don't merge cleanly with their development branches) so we'd also
> > need to encourage people to add them separately.
>
> I currently have 44 such fixes branches. More welcome!

Great so do you want us to send fixes branch or scan the existing trees and add
them.

In case of former please do add slave-dma/fixes as well

~Vinod

2018-05-09 12:44:50

by Stephen Rothwell

[permalink] [raw]
Subject: Re: [Ksummit-discuss] bug-introducing patches

Hi Vinod,

On Wed, 9 May 2018 16:25:34 +0530 Vinod Koul <[email protected]> wrote:
> >
> > I currently have 44 such fixes branches. More welcome!
>
> Great so do you want us to send fixes branch or scan the existing trees and add
> them.

The former.

> In case of former please do add slave-dma/fixes as well

Done. Should I switch your contact address to your kernel.org one
(from your Intel one)?

--
Cheers,
Stephen Rothwell


Attachments:
(No filename) (499.00 B)
OpenPGP digital signature

2018-05-09 12:47:54

by Vinod Koul

[permalink] [raw]
Subject: Re: [Ksummit-discuss] bug-introducing patches

On 09-05-18, 22:43, Stephen Rothwell wrote:
> Hi Vinod,
>
> On Wed, 9 May 2018 16:25:34 +0530 Vinod Koul <[email protected]> wrote:
> > >
> > > I currently have 44 such fixes branches. More welcome!
> >
> > Great so do you want us to send fixes branch or scan the existing trees and add
> > them.
>
> The former.
>
> > In case of former please do add slave-dma/fixes as well
>
> Done. Should I switch your contact address to your kernel.org one
> (from your Intel one)?

Yes please, that is no longer valid.

--
~Vinod

2018-05-09 14:06:27

by Mark Brown

[permalink] [raw]
Subject: Re: [Ksummit-discuss] bug-introducing patches

On Wed, May 09, 2018 at 08:47:27PM +1000, Stephen Rothwell wrote:
> On Wed, 9 May 2018 18:03:46 +0900 Mark Brown <[email protected]> wrote:
> > On Wed, May 09, 2018 at 10:47:57AM +0200, Daniel Vetter wrote:
> > > On Wed, May 9, 2018 at 10:44 AM, Mark Brown <[email protected]> wrote:

> > True. It's currently only those -fixes branches that people have asked
> > him to merge separately which isn't as big a proportion of trees as have
> > them (perhaps fortunately given people's enthusiasm for fixes branches
> > that don't merge cleanly with their development branches) so we'd also
> > need to encourage people to add them separately.

> I currently have 44 such fixes branches. More welcome!

Well, all my trees have a for-linus branch to go with the for-next
branch for a start.


Attachments:
(No filename) (807.00 B)
signature.asc (499.00 B)
Download all attachments

2018-05-09 15:58:48

by Guenter Roeck

[permalink] [raw]
Subject: Re: [Ksummit-discuss] bug-introducing patches

On Wed, May 09, 2018 at 08:47:27PM +1000, Stephen Rothwell wrote:
> On Wed, 9 May 2018 18:03:46 +0900 Mark Brown <[email protected]> wrote:
> >
> > On Wed, May 09, 2018 at 10:47:57AM +0200, Daniel Vetter wrote:
> > > On Wed, May 9, 2018 at 10:44 AM, Mark Brown <[email protected]> wrote:
> >
> > > > I think this is an excellent idea, copying in Stephen for his input.
> > > > I'm currently on holiday but unless someone convinces me it's a terrible
> > > > idea I'm willing to at least give it a go on a trial basis once I'm back
> > > > home.
> >
> > > Since Stephen merges all -fixes branches first, before merging all the
> > > -next branches, he already generates that as part of linux-next. All
> > > he'd need to do is push that intermediate state out to some
> > > linux-fixes branch for consumption by test bots.
>
> Good idea ... I will see what I can do.
>
> > True. It's currently only those -fixes branches that people have asked
> > him to merge separately which isn't as big a proportion of trees as have
> > them (perhaps fortunately given people's enthusiasm for fixes branches
> > that don't merge cleanly with their development branches) so we'd also
> > need to encourage people to add them separately.
>
> I currently have 44 such fixes branches. More welcome!
>
Please add

git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging.git hwmon

as fixes branch.

Thanks,
Guenter

> --
> Cheers,
> Stephen Rothwell



> _______________________________________________
> Ksummit-discuss mailing list
> [email protected]
> https://lists.linuxfoundation.org/mailman/listinfo/ksummit-discuss


2018-05-09 16:05:26

by Dan Williams

[permalink] [raw]
Subject: Re: [Ksummit-discuss] bug-introducing patches

On Wed, May 9, 2018 at 3:47 AM, Stephen Rothwell <[email protected]> wrote:
> On Wed, 9 May 2018 18:03:46 +0900 Mark Brown <[email protected]> wrote:
>>
>> On Wed, May 09, 2018 at 10:47:57AM +0200, Daniel Vetter wrote:
>> > On Wed, May 9, 2018 at 10:44 AM, Mark Brown <[email protected]> wrote:
>>
>> > > I think this is an excellent idea, copying in Stephen for his input.
>> > > I'm currently on holiday but unless someone convinces me it's a terrible
>> > > idea I'm willing to at least give it a go on a trial basis once I'm back
>> > > home.
>>
>> > Since Stephen merges all -fixes branches first, before merging all the
>> > -next branches, he already generates that as part of linux-next. All
>> > he'd need to do is push that intermediate state out to some
>> > linux-fixes branch for consumption by test bots.
>
> Good idea ... I will see what I can do.
>
>> True. It's currently only those -fixes branches that people have asked
>> him to merge separately which isn't as big a proportion of trees as have
>> them (perhaps fortunately given people's enthusiasm for fixes branches
>> that don't merge cleanly with their development branches) so we'd also
>> need to encourage people to add them separately.
>
> I currently have 44 such fixes branches. More welcome!

Please add:

git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm.git
libnvdimm-fixes

We currently merge this into libnvdimm-for-next for -next coverage,
and resolve any conflicts vs new development. Do you want to see those
conflicts? Otherwise I would recommend only pulling libnvdimm-for-next
for -next and libnvdimm-fixes for this new -next-fixes effort.

2018-05-09 19:37:29

by Boris Brezillon

[permalink] [raw]
Subject: Re: [Ksummit-discuss] bug-introducing patches

Hi Stephen,

On Wed, 9 May 2018 20:47:27 +1000
Stephen Rothwell <[email protected]> wrote:

> On Wed, 9 May 2018 18:03:46 +0900 Mark Brown <[email protected]> wrote:
> >
> > On Wed, May 09, 2018 at 10:47:57AM +0200, Daniel Vetter wrote:
> > > On Wed, May 9, 2018 at 10:44 AM, Mark Brown <[email protected]> wrote:
> >
> > > > I think this is an excellent idea, copying in Stephen for his input.
> > > > I'm currently on holiday but unless someone convinces me it's a terrible
> > > > idea I'm willing to at least give it a go on a trial basis once I'm back
> > > > home.
> >
> > > Since Stephen merges all -fixes branches first, before merging all the
> > > -next branches, he already generates that as part of linux-next. All
> > > he'd need to do is push that intermediate state out to some
> > > linux-fixes branch for consumption by test bots.
>
> Good idea ... I will see what I can do.
>
> > True. It's currently only those -fixes branches that people have asked
> > him to merge separately which isn't as big a proportion of trees as have
> > them (perhaps fortunately given people's enthusiasm for fixes branches
> > that don't merge cleanly with their development branches) so we'd also
> > need to encourage people to add them separately.
>
> I currently have 44 such fixes branches. More welcome!

I see that the nand/fixes and spi-nor/fixes branch are already there [1].
You can add:

mtd-fixes git git://git.infradead.org/linux-mtd.git#master

You can also remove the mtd entry [2], since mtd-2.6.git is just a sym
link to linux-mtd.git, so it will just be a duplicate of the mtd-fixes
entry. You can also rename the l2-mtd entry [3] into mtd.

Regards,

Boris

[1]https://kernel.googlesource.com/pub/scm/linux/kernel/git/next/linux-next/+/43cd1f4979998ba0ef1c0b8e1c5d23d2de5ab172/Next/Trees#41
[2]https://kernel.googlesource.com/pub/scm/linux/kernel/git/next/linux-next/+/43cd1f4979998ba0ef1c0b8e1c5d23d2de5ab172/Next/Trees#155
[3]https://kernel.googlesource.com/pub/scm/linux/kernel/git/next/linux-next/+/43cd1f4979998ba0ef1c0b8e1c5d23d2de5ab172/Next/Trees#156

2018-05-09 21:47:03

by Stephen Rothwell

[permalink] [raw]
Subject: Re: [Ksummit-discuss] bug-introducing patches

Hi Guenter,

On Wed, 9 May 2018 08:57:33 -0700 Guenter Roeck <[email protected]> wrote:
>
> Please add
>
> git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging.git hwmon
>
> as fixes branch.

Added from today.

Thanks for adding your subsystem tree as a participant of linux-next. As
you may know, this is not a judgement of your code. The purpose of
linux-next is for integration testing and to lower the impact of
conflicts between subsystems in the next merge window.

You will need to ensure that the patches/commits in your tree/series have
been:
* submitted under GPL v2 (or later) and include the Contributor's
Signed-off-by,
* posted to the relevant mailing list,
* reviewed by you (or another maintainer of your subsystem tree),
* successfully unit tested, and
* destined for the current or next Linux merge window.

Basically, this should be just what you would send to Linus (or ask him
to fetch). It is allowed to be rebased if you deem it necessary.

--
Cheers,
Stephen Rothwell
[email protected]


Attachments:
(No filename) (499.00 B)
OpenPGP digital signature

2018-05-09 21:52:19

by Stephen Rothwell

[permalink] [raw]
Subject: Re: [Ksummit-discuss] bug-introducing patches

Hi Dan,

On Wed, 9 May 2018 09:04:31 -0700 Dan Williams <[email protected]> wrote:
>
> Please add:
>
> git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm.git
> libnvdimm-fixes
>
> We currently merge this into libnvdimm-for-next for -next coverage,
> and resolve any conflicts vs new development. Do you want to see those
> conflicts? Otherwise I would recommend only pulling libnvdimm-for-next
> for -next and libnvdimm-fixes for this new -next-fixes effort.

The conflicts are usually fine (but if you do the merges, I won't see
them - which is even better :-)). I like to have the fixes branches in
linux-next so that noone has to worry about probelsm in Linus' tree
that have pending fixes already.

Added from today.

Thanks for adding your subsystem tree as a participant of linux-next. As
you may know, this is not a judgement of your code. The purpose of
linux-next is for integration testing and to lower the impact of
conflicts between subsystems in the next merge window.

You will need to ensure that the patches/commits in your tree/series have
been:
* submitted under GPL v2 (or later) and include the Contributor's
Signed-off-by,
* posted to the relevant mailing list,
* reviewed by you (or another maintainer of your subsystem tree),
* successfully unit tested, and
* destined for the current or next Linux merge window.

Basically, this should be just what you would send to Linus (or ask him
to fetch). It is allowed to be rebased if you deem it necessary.

--
Cheers,
Stephen Rothwell
[email protected]


Attachments:
(No filename) (499.00 B)
OpenPGP digital signature

2018-05-09 21:59:48

by Stephen Rothwell

[permalink] [raw]
Subject: Re: [Ksummit-discuss] bug-introducing patches

Hi Boris,

On Wed, 9 May 2018 21:35:28 +0200 Boris Brezillon <[email protected]> wrote:
>
> I see that the nand/fixes and spi-nor/fixes branch are already there [1].
> You can add:
>
> mtd-fixes git git://git.infradead.org/linux-mtd.git#master

Added from today.

> You can also remove the mtd entry [2], since mtd-2.6.git is just a sym
> link to linux-mtd.git, so it will just be a duplicate of the mtd-fixes
> entry. You can also rename the l2-mtd entry [3] into mtd.

All done.

Thanks for adding your subsystem tree as a participant of linux-next. As
you may know, this is not a judgement of your code. The purpose of
linux-next is for integration testing and to lower the impact of
conflicts between subsystems in the next merge window.

You will need to ensure that the patches/commits in your tree/series have
been:
* submitted under GPL v2 (or later) and include the Contributor's
Signed-off-by,
* posted to the relevant mailing list,
* reviewed by you (or another maintainer of your subsystem tree),
* successfully unit tested, and
* destined for the current or next Linux merge window.

Basically, this should be just what you would send to Linus (or ask him
to fetch). It is allowed to be rebased if you deem it necessary.

--
Cheers,
Stephen Rothwell
[email protected]


Attachments:
(No filename) (499.00 B)
OpenPGP digital signature

2018-05-09 22:10:05

by Stephen Rothwell

[permalink] [raw]
Subject: Re: [Ksummit-discuss] bug-introducing patches

Hi Mark,

On Wed, 9 May 2018 23:05:32 +0900 Mark Brown <[email protected]> wrote:
>
> Well, all my trees have a for-linus branch to go with the for-next
> branch for a start.

The regmap and regulator trees have no for-linus branch (currently).
Added sound-asoc-fixes and spi-fixes from today.

Thanks for adding your subsystem tree as a participant of linux-next. As
you may know, this is not a judgement of your code. The purpose of
linux-next is for integration testing and to lower the impact of
conflicts between subsystems in the next merge window.

You will need to ensure that the patches/commits in your tree/series have
been:
* submitted under GPL v2 (or later) and include the Contributor's
Signed-off-by,
* posted to the relevant mailing list,
* reviewed by you (or another maintainer of your subsystem tree),
* successfully unit tested, and
* destined for the current or next Linux merge window.

Basically, this should be just what you would send to Linus (or ask him
to fetch). It is allowed to be rebased if you deem it necessary.

--
Cheers,
Stephen Rothwell
[email protected]


Attachments:
(No filename) (499.00 B)
OpenPGP digital signature

2018-05-10 03:15:35

by Sasha Levin

[permalink] [raw]
Subject: Re: [Ksummit-discuss] bug-introducing patches

On Wed, May 09, 2018 at 08:47:27PM +1000, Stephen Rothwell wrote:
>On Wed, 9 May 2018 18:03:46 +0900 Mark Brown <[email protected]> wrote:
>>
>> On Wed, May 09, 2018 at 10:47:57AM +0200, Daniel Vetter wrote:
>> > On Wed, May 9, 2018 at 10:44 AM, Mark Brown <[email protected]> wrote:
>>
>> > > I think this is an excellent idea, copying in Stephen for his input.
>> > > I'm currently on holiday but unless someone convinces me it's a terrible
>> > > idea I'm willing to at least give it a go on a trial basis once I'm back
>> > > home.
>>
>> > Since Stephen merges all -fixes branches first, before merging all the
>> > -next branches, he already generates that as part of linux-next. All
>> > he'd need to do is push that intermediate state out to some
>> > linux-fixes branch for consumption by test bots.
>
>Good idea ... I will see what I can do.

This is very interesting! I'm curious how the statistics will look when
we'll compare patches that didn't go through this tree, patches that
spent minimal time in this tree, and patches that has spent some time
in the tree before being merged in Linus's tree.

Would this be something we would want to point actual users to, rather
than just bots? If every commit in next-fixes is essentially queued up
for Linus at some point, users might as well test out next-fixes instead
of -rc.

>> True. It's currently only those -fixes branches that people have asked
>> him to merge separately which isn't as big a proportion of trees as have
>> them (perhaps fortunately given people's enthusiasm for fixes branches
>> that don't merge cleanly with their development branches) so we'd also
>> need to encourage people to add them separately.
>
>I currently have 44 such fixes branches. More welcome!

I've tried looking at git for bigger subsystems, and a few branches that
would fit this description, and are not in -next are:

git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git perf/urgent
git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git sched/urgent
git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git x86/urgent
git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git timers/urgent
git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git irq/urgent
git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git core/urgent
git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git efi/urgent
git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git locking/urgent
git.kernel.dk/linux-block.git for-linus
git.kernel.org/pub/scm/linux/kernel/git/s390/linux.git fixes
git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse.git for-linux
git.kernel.org/pub/scm/linux/kernel/git/iwlwifi/iwlwifi-fixes.git master
git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi.git fixes
git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux.git queue-rc
git.kernel.org/pub/scm/linux/kernel/git/jikos/hid.git for-linus
git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace.git ftrace/urgent

Would it make sense to ping the respective maintainers to ack the
inclusion of these branches?

There are a few other trees where the fixes branch has a name that
depends on the release, we can ask them to also create a simple fixes
branch so that next-fixes could merge it in?

git.kernel.org/pub/scm/fs/xfs/xfs-linux.git xfs-4.17-fixes
git.kernel.org/pub/scm/linux/kernel/git/rw/uml.git for-linux-4.17-rc1
git.kernel.org/pub/scm/linux/kernel/git/khilman/linux-amlogic.git v4.17/fixes

2018-05-10 13:37:47

by Mark Brown

[permalink] [raw]
Subject: Re: [Ksummit-discuss] bug-introducing patches

On Thu, May 10, 2018 at 08:09:09AM +1000, Stephen Rothwell wrote:
> On Wed, 9 May 2018 23:05:32 +0900 Mark Brown <[email protected]> wrote:

> > Well, all my trees have a for-linus branch to go with the for-next
> > branch for a start.

> The regmap and regulator trees have no for-linus branch (currently).
> Added sound-asoc-fixes and spi-fixes from today.

That's not what git claims when I try to push... no idea what's going
on there, I just deleted and repushed - hopefully that helps.


Attachments:
(No filename) (505.00 B)
signature.asc (499.00 B)
Download all attachments

2018-05-10 15:39:31

by Tony Lindgren

[permalink] [raw]
Subject: Re: [Ksummit-discuss] bug-introducing patches

* Tony Lindgren <[email protected]> [180508 14:52]:
> * Theodore Y. Ts'o <[email protected]> [180508 03:50]:
> > Would I pull down linux-next, and fire up a VM running gce-xfstests?
> > Sure. But that's not a real-life use case; that's just running canned
> > test cases. And more often than not, linux-next will be broken while
> > Linus's -rcX tree is just fine; which is why I do most of my ext4
> > testing using patches based on top of -rcX, not based on top of
> > linux-next.
>
> Ideally we would somehow always end up with an -rc1 that people dare
> to use though for the "prosumer" testing :)

BTW, the reason why I think we all should test Linux next on regular
basis is that it's often "some other people's branches(tm)" that cause
the regressions :) Maybe because their own test cases did not show
any regressions, or because they were unable to test the patches.
Or it's because of some "clean-up" work that's completely untested
on some systems.

And that's how we end up with regressions getting merged into -rc1.

Regards,

Tony

2018-05-10 15:58:27

by Tony Lindgren

[permalink] [raw]
Subject: Re: [Ksummit-discuss] bug-introducing patches

* Stephen Rothwell <[email protected]> [180509 10:49]:
> I currently have 44 such fixes branches. More welcome!

Can you please also add mine:

git://git.kernel.org/pub/scm/linux/kernel/git/tmlind/linux-omap.git fixes

Thanks,

Tony



Attachments:
(No filename) (251.00 B)
signature.asc (849.00 B)
Download all attachments

2018-05-10 16:05:28

by Jiri Kosina

[permalink] [raw]
Subject: Re: [Ksummit-discuss] bug-introducing patches

On Wed, 9 May 2018, Daniel Vetter wrote:

> >> Then, why don't we have a pre-integration tree for fixes? That would
> >> at least simply automated testing of fixes separately from new
> >> material.
> >
> >> Perhaps this has already been discussed, and concluded and it's not
> >> worth it, then apologize for my ignorance.
> >
> > I think this is an excellent idea, copying in Stephen for his input.
> > I'm currently on holiday but unless someone convinces me it's a terrible
> > idea I'm willing to at least give it a go on a trial basis once I'm back
> > home.
>
> Since Stephen merges all -fixes branches first, before merging all the
> -next branches, he already generates that as part of linux-next. All
> he'd need to do is push that intermediate state out to some
> linux-fixes branch for consumption by test bots.

What I do for my trees is that I actually merge the '-fixes' branch (that
is scheduled to go to Linus in the 'current' cycle) into my for-next
branch as well.

This has the advantage of (a) getting all the coverage linux-next does (b)
seeing any potential merge conflicts early

Is this not feasible for other trees?

--
Jiri Kosina
SUSE Labs


2018-05-10 16:40:53

by Sasha Levin

[permalink] [raw]
Subject: Re: [Ksummit-discuss] bug-introducing patches

On Tue, May 08, 2018 at 06:15:34PM -0400, Theodore Y. Ts'o wrote:
>On Tue, May 08, 2018 at 08:29:14PM +0000, Sasha Levin wrote:
>>
>> This is interesting. We have a group of power users who are testing out
>> -rc releases, who are usually happy to test out a fast moving target and
>> provide helpful reports back. We also have a group who run a -stable
>> kernel (-stable build/distro/android/etc) who want to avoid having to
>> report bugs to us.
>>
>> What we don't have is a group of people who use Linus's actual releases
>> (not the -rc stuff, but the actual point releases). Power users will
>> move on to the next kernel, and -stable folks won't touch that release
>> until there's a corresponding -stable.
>
>Linus doesn't release the point releases. Those are done by the Greg
>K-H and use the same process as does the stable kernels. The only
>difference is that the life point release doesn't last very long; just
>until the next kernel release from Linus.
>
>There are probably fewer people who use the point releases compared to
>the stable kernels. But I'd hesitate to call it zero. We once
>assumed that companies were all using Distro kernels, and very few
>people used the stable kernels except for distribution channels
>(enterprise kernels, BSP kernels, etc.). Then we discovered that
>there are people who use the stable kernel and don't go through the
>enterprise distro vendors at all. It wouldn't surprise me if there
>are also a silent (and perhaps large) set of users who take Linus's
>releases, and then follow along on the dot releases until the next
>release from Linus.

I was referring to Linus's non-rc releases (4.15, 4.16, etc). While many
users start with, for example, 4.16, most users will either switch to
Greg's releases which start about a week after Linus's release, or
they'll move on to test the 4.17 -rc releases.

There are pretty much no users who pick 4.16, stay with it for 3 months,
switch to 4.17, stay with that for 3 months, and so on.

>> What if, instead, Linus doesn't actually ever release a point
>> release? We can make the merge window open more often, and since
>> there's no actual release, people won't rush to push fixes in later
>> -rc cycles.
>
>I dont' understand your proposal. Linus doesn't actually release
>point releases. Those happen during the -rcX cycle for those people
>who are too chicken to follow Linus's tree, and just want the bug
>fixes.

What I'm suggesting is that "4.16" never gets released. When 4.16-rc8 is
closing and Linus would have tagged 4.16, he'd just open the merge
window and start merging in new features.

At this point Greg could either release 4.16.0 or wait a week and do
4.16.1. This effectively puts the kernel on a weekly release schedule.

>Getting rid of the point releases isn't going to change how frequently
>the merge window opens. What would do that is being much more strict
>about when we only allow regression fixes only into the tree, so
>hopefully the tree stablizes itself by -rc5 or -rc6.
>
>> Merge window will happen more often, so there's no real reason to
>> rush things in a particular window, and since -stable releases every
>> week there's no rush to push a fix in since the next release is just
>> a week away.
>
>Huh?
>
>I can see shortening the release cycle to a six weeks, instead of our
>current 8-9 week cycle. But after the each cycle where we introduce
>new features, during the merge window / integration phase, we do need
>to have a time when are fixing regression bugs.

What I'm suggesting is that most of the commits in -rc6/7/8 actually fix
bugs introduced in older kernels rather than the current merge window.
Thus, they don't have much value in "stabilizing" the release.

On the other hand, for some odd reason, folks will try squeezing poorly
tested commits into these late -rc cycles because "Linus is about to
release and we must make it in time for the release", even though in
practice there's no big rush to make it to a particular release since
most folks will just keep updating via -stable or distro kernels.

So since there's not much value in -rc6/7/8, just cancel them.

2018-05-10 16:48:42

by Sasha Levin

[permalink] [raw]
Subject: Re: [Ksummit-discuss] bug-introducing patches

On Thu, May 10, 2018 at 06:03:22PM +0200, Jiri Kosina wrote:
>On Wed, 9 May 2018, Daniel Vetter wrote:
>
>> >> Then, why don't we have a pre-integration tree for fixes? That would
>> >> at least simply automated testing of fixes separately from new
>> >> material.
>> >
>> >> Perhaps this has already been discussed, and concluded and it's not
>> >> worth it, then apologize for my ignorance.
>> >
>> > I think this is an excellent idea, copying in Stephen for his input.
>> > I'm currently on holiday but unless someone convinces me it's a terrible
>> > idea I'm willing to at least give it a go on a trial basis once I'm back
>> > home.
>>
>> Since Stephen merges all -fixes branches first, before merging all the
>> -next branches, he already generates that as part of linux-next. All
>> he'd need to do is push that intermediate state out to some
>> linux-fixes branch for consumption by test bots.
>
>What I do for my trees is that I actually merge the '-fixes' branch (that
>is scheduled to go to Linus in the 'current' cycle) into my for-next
>branch as well.
>
>This has the advantage of (a) getting all the coverage linux-next does (b)
>seeing any potential merge conflicts early
>
>Is this not feasible for other trees?

When Linus tags -rc1, -next will start filling up with commits destined
for the next merge window. The resulting -next tree becomes very
unstable, and very difficult to test.

The idea behind next-fixes is to provide a tree that will contain fixes
for the current merge window, which will generate a much more stable
tree that users/bots could actually run and validate the fixes that will
be merged in the upcoming weeks.

Right now, with the method you've described, there is no easy way to
test your '-fixes' branch even though the commits in there will be
pulled in by Linus much sooner than your 'for-next' branch.

You'll still get the same coverage from -next, but if you provide your
-fixes branch seperately you'll also get more coverage for the fixes
you're about to send to Linus.


2018-05-10 22:01:50

by Stephen Rothwell

[permalink] [raw]
Subject: Re: [Ksummit-discuss] bug-introducing patches

Hi Mark,

On Thu, 10 May 2018 22:36:28 +0900 Mark Brown <[email protected]> wrote:
>
> On Thu, May 10, 2018 at 08:09:09AM +1000, Stephen Rothwell wrote:
> > On Wed, 9 May 2018 23:05:32 +0900 Mark Brown <[email protected]> wrote:
>
> > > Well, all my trees have a for-linus branch to go with the for-next
> > > branch for a start.
>
> > The regmap and regulator trees have no for-linus branch (currently).
> > Added sound-asoc-fixes and spi-fixes from today.
>
> That's not what git claims when I try to push... no idea what's going
> on there, I just deleted and repushed - hopefully that helps.

I have added regmap-fixes and regulator-fixes now.

--
Cheers,
Stephen Rothwell


Attachments:
(No filename) (499.00 B)
OpenPGP digital signature

2018-05-10 22:05:51

by Stephen Rothwell

[permalink] [raw]
Subject: Re: [Ksummit-discuss] bug-introducing patches

Hi Tony,

On Thu, 10 May 2018 08:57:55 -0700 Tony Lindgren <[email protected]> wrote:
>
> * Stephen Rothwell <[email protected]> [180509 10:49]:
> > I currently have 44 such fixes branches. More welcome!
>
> Can you please also add mine:
>
> git://git.kernel.org/pub/scm/linux/kernel/git/tmlind/linux-omap.git fixes

Added from today.

Thanks for adding your subsystem tree as a participant of linux-next. As
you may know, this is not a judgement of your code. The purpose of
linux-next is for integration testing and to lower the impact of
conflicts between subsystems in the next merge window.

You will need to ensure that the patches/commits in your tree/series have
been:
* submitted under GPL v2 (or later) and include the Contributor's
Signed-off-by,
* posted to the relevant mailing list,
* reviewed by you (or another maintainer of your subsystem tree),
* successfully unit tested, and
* destined for the current or next Linux merge window.

Basically, this should be just what you would send to Linus (or ask him
to fetch). It is allowed to be rebased if you deem it necessary.

--
Cheers,
Stephen Rothwell
[email protected]


Attachments:
(No filename) (499.00 B)
OpenPGP digital signature

2018-05-11 02:10:56

by Mark Brown

[permalink] [raw]
Subject: Re: [Ksummit-discuss] bug-introducing patches

On Thu, May 10, 2018 at 06:03:22PM +0200, Jiri Kosina wrote:
> On Wed, 9 May 2018, Daniel Vetter wrote:

> > Since Stephen merges all -fixes branches first, before merging all the
> > -next branches, he already generates that as part of linux-next. All
> > he'd need to do is push that intermediate state out to some
> > linux-fixes branch for consumption by test bots.

> What I do for my trees is that I actually merge the '-fixes' branch (that
> is scheduled to go to Linus in the 'current' cycle) into my for-next
> branch as well.

> This has the advantage of (a) getting all the coverage linux-next does (b)
> seeing any potential merge conflicts early

> Is this not feasible for other trees?

That's obviously best practice which I hope everyone who doesn't have a
separate fix branch in -next is doing but it means that the fixes branch
is not getting tested without the changes in your -next branch, and also
reduces the coverage separate to other people's -next branches. This
means that there's room for implicit dependencies to slip through.


Attachments:
(No filename) (1.06 kB)
signature.asc (499.00 B)
Download all attachments

2018-05-11 08:50:40

by David Sterba

[permalink] [raw]
Subject: Re: [Ksummit-discuss] bug-introducing patches

On Wed, May 09, 2018 at 08:47:27PM +1000, Stephen Rothwell wrote:
> > True. It's currently only those -fixes branches that people have asked
> > him to merge separately which isn't as big a proportion of trees as have
> > them (perhaps fortunately given people's enthusiasm for fixes branches
> > that don't merge cleanly with their development branches) so we'd also
> > need to encourage people to add them separately.
>
> I currently have 44 such fixes branches. More welcome!

Please add

git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux.git next-fixes

Thanks.

2018-05-12 04:04:32

by Stephen Rothwell

[permalink] [raw]
Subject: Re: [Ksummit-discuss] bug-introducing patches

Hi David,

On Fri, 11 May 2018 10:47:01 +0200 David Sterba <[email protected]> wrote:
>
> Please add
>
> git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux.git next-fixes

Added from Monday (as btrfs-fixes).

Thanks for adding your subsystem tree as a participant of linux-next. As
you may know, this is not a judgement of your code. The purpose of
linux-next is for integration testing and to lower the impact of
conflicts between subsystems in the next merge window.

You will need to ensure that the patches/commits in your tree/series have
been:
* submitted under GPL v2 (or later) and include the Contributor's
Signed-off-by,
* posted to the relevant mailing list,
* reviewed by you (or another maintainer of your subsystem tree),
* successfully unit tested, and
* destined for the current or next Linux merge window.

Basically, this should be just what you would send to Linus (or ask him
to fetch). It is allowed to be rebased if you deem it necessary.

--
Cheers,
Stephen Rothwell
[email protected]


Attachments:
(No filename) (499.00 B)
OpenPGP digital signature

2018-05-12 04:38:43

by Stephen Rothwell

[permalink] [raw]
Subject: Re: [Ksummit-discuss] bug-introducing patches

Hi all,

On Wed, 9 May 2018 20:47:27 +1000 Stephen Rothwell <[email protected]> wrote:
>
> On Wed, 9 May 2018 18:03:46 +0900 Mark Brown <[email protected]> wrote:
> >
> > On Wed, May 09, 2018 at 10:47:57AM +0200, Daniel Vetter wrote:
> > > On Wed, May 9, 2018 at 10:44 AM, Mark Brown <[email protected]> wrote:
> >
> > > > I think this is an excellent idea, copying in Stephen for his input.
> > > > I'm currently on holiday but unless someone convinces me it's a terrible
> > > > idea I'm willing to at least give it a go on a trial basis once I'm back
> > > > home.
> >
> > > Since Stephen merges all -fixes branches first, before merging all the
> > > -next branches, he already generates that as part of linux-next. All
> > > he'd need to do is push that intermediate state out to some
> > > linux-fixes branch for consumption by test bots.
>
> Good idea ... I will see what I can do.

See my announcement of a pending-fixes branch in linux-next (on LKML
and others)

> I currently have 44 such fixes branches. More welcome!

We are up to 55.

--
Cheers,
Stephen Rothwell


Attachments:
(No filename) (499.00 B)
OpenPGP digital signature

2018-05-12 18:35:08

by Guenter Roeck

[permalink] [raw]
Subject: Re: [Ksummit-discuss] bug-introducing patches

On 05/11/2018 09:38 PM, Stephen Rothwell wrote:
> Hi all,
>
> On Wed, 9 May 2018 20:47:27 +1000 Stephen Rothwell <[email protected]> wrote:
>>
>> On Wed, 9 May 2018 18:03:46 +0900 Mark Brown <[email protected]> wrote:
>>>
>>> On Wed, May 09, 2018 at 10:47:57AM +0200, Daniel Vetter wrote:
>>>> On Wed, May 9, 2018 at 10:44 AM, Mark Brown <[email protected]> wrote:
>>>
>>>>> I think this is an excellent idea, copying in Stephen for his input.
>>>>> I'm currently on holiday but unless someone convinces me it's a terrible
>>>>> idea I'm willing to at least give it a go on a trial basis once I'm back
>>>>> home.
>>>
>>>> Since Stephen merges all -fixes branches first, before merging all the
>>>> -next branches, he already generates that as part of linux-next. All
>>>> he'd need to do is push that intermediate state out to some
>>>> linux-fixes branch for consumption by test bots.
>>
>> Good idea ... I will see what I can do.
>
> See my announcement of a pending-fixes branch in linux-next (on LKML
> and others)

Excellent.

Build/test results match mainline.

For v4.17-rc4-241-ga1b6c55:

Build results:
total: 132 pass: 130 fail: 2
Failed builds:
m68k:allmodconfig
xtensa:allmodconfig
Qemu test results:
total: 138 pass: 138 fail: 0

Guenter

2018-05-13 13:54:51

by Andy Shevchenko

[permalink] [raw]
Subject: Re: [Ksummit-discuss] bug-introducing patches

On Sat, May 12, 2018 at 7:38 AM, Stephen Rothwell <[email protected]> wrote:
> Hi all,
>
> On Wed, 9 May 2018 20:47:27 +1000 Stephen Rothwell <[email protected]> wrote:
>>
>> On Wed, 9 May 2018 18:03:46 +0900 Mark Brown <[email protected]> wrote:
>> >
>> > On Wed, May 09, 2018 at 10:47:57AM +0200, Daniel Vetter wrote:
>> > > On Wed, May 9, 2018 at 10:44 AM, Mark Brown <[email protected]> wrote:
>> >
>> > > > I think this is an excellent idea, copying in Stephen for his input.
>> > > > I'm currently on holiday but unless someone convinces me it's a terrible
>> > > > idea I'm willing to at least give it a go on a trial basis once I'm back
>> > > > home.
>> >
>> > > Since Stephen merges all -fixes branches first, before merging all the
>> > > -next branches, he already generates that as part of linux-next. All
>> > > he'd need to do is push that intermediate state out to some
>> > > linux-fixes branch for consumption by test bots.
>>
>> Good idea ... I will see what I can do.
>
> See my announcement of a pending-fixes branch in linux-next (on LKML
> and others)
>
>> I currently have 44 such fixes branches. More welcome!
>
> We are up to 55.

For PDx86 we have

git://git.infradead.org/linux-platform-drivers-x86.git fixes

branch

--
With Best Regards,
Andy Shevchenko

2018-05-14 07:54:26

by Geert Uytterhoeven

[permalink] [raw]
Subject: Re: [Ksummit-discuss] bug-introducing patches

On Thu, May 10, 2018 at 6:47 PM, Sasha Levin via Ksummit-discuss
<[email protected]> wrote:
> On Thu, May 10, 2018 at 06:03:22PM +0200, Jiri Kosina wrote:
>>On Wed, 9 May 2018, Daniel Vetter wrote:
>>> >> Then, why don't we have a pre-integration tree for fixes? That would
>>> >> at least simply automated testing of fixes separately from new
>>> >> material.
>>> >
>>> >> Perhaps this has already been discussed, and concluded and it's not
>>> >> worth it, then apologize for my ignorance.
>>> >
>>> > I think this is an excellent idea, copying in Stephen for his input.
>>> > I'm currently on holiday but unless someone convinces me it's a terrible
>>> > idea I'm willing to at least give it a go on a trial basis once I'm back
>>> > home.
>>>
>>> Since Stephen merges all -fixes branches first, before merging all the
>>> -next branches, he already generates that as part of linux-next. All
>>> he'd need to do is push that intermediate state out to some
>>> linux-fixes branch for consumption by test bots.
>>
>>What I do for my trees is that I actually merge the '-fixes' branch (that
>>is scheduled to go to Linus in the 'current' cycle) into my for-next
>>branch as well.
>>
>>This has the advantage of (a) getting all the coverage linux-next does (b)
>>seeing any potential merge conflicts early
>>
>>Is this not feasible for other trees?
>
> When Linus tags -rc1, -next will start filling up with commits destined
> for the next merge window. The resulting -next tree becomes very
> unstable, and very difficult to test.
>
> The idea behind next-fixes is to provide a tree that will contain fixes
> for the current merge window, which will generate a much more stable
> tree that users/bots could actually run and validate the fixes that will
> be merged in the upcoming weeks.
>
> Right now, with the method you've described, there is no easy way to
> test your '-fixes' branch even though the commits in there will be
> pulled in by Linus much sooner than your 'for-next' branch.
>
> You'll still get the same coverage from -next, but if you provide your
> -fixes branch seperately you'll also get more coverage for the fixes
> you're about to send to Linus.

I think you missed the "as well" in Jiri's response.

When I create the bi-weekly renesas-drivers release (see e.g.
https://www.spinics.net/lists/linux-renesas-soc/msg27350.html), there are
some subsystems that manage to have several conflicts between their
for-next branch and their fixes in Linus' tree almost every single release.
Hence I strongly support merging your own fixes branches into your own
for-next branch, and resolve the conflicts yourself, to keep your for-next
branch conflict free.

(Note that the last release linked above was very atypical: it was one of
the very few (first one ever?) that didn't have any conflicts).

Thanks!

Gr{oetje,eeting}s,

Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- [email protected]

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
-- Linus Torvalds

2018-05-14 08:01:17

by Geert Uytterhoeven

[permalink] [raw]
Subject: Re: bug-introducing patches

On Tue, May 1, 2018 at 10:00 PM, Sasha Levin
<[email protected]> wrote:
> On Tue, May 01, 2018 at 03:44:50PM -0400, Theodore Y. Ts'o wrote:
>>On Tue, May 01, 2018 at 04:38:21PM +0000, Sasha Levin wrote:
>>> - A merge window commit spent 50% more days, on average, in -next than a -rc
>>> commit.
>>
>>So it *used* to be the case that after the merge window, I would queue
>>up bug fixes for the next merge window. Greg K-H pushed for me to
>>send them to Linus sooner, instead of waiting for the next merge
>>window. TBH, it's actually easier for me to just wait until the next
>>merge window, but please understand that there are multiple pressures
>>on maintainers going on here, and the latest efforts to try to use
>>AUTOSEL is just the most recent pressure placed on maintainers.
>>
>>The other thing is that when there is a regression users who are
>>testing linux-next want it fixed *fast*. That's considered more
>>important to them than waiting for one, perfect patch, just to keep
>>AUTOSEL happy.
>>
>>So please understand that when you say that maintainers *need* to do X
>>or Y, that there you are not the only one standing in line putting
>>pressures on maintainers saying they *need* to do something. And
>>quite frankly, I consider keeping people who are nice enough to test
>>linux-next happy to be **far** more important than AUTOSEL.
>
> Ted,
>
> I'm not at all asking to wait before adding the patches to your tree,
> or to -next. I'm asking to hold on to them a bit longer before you
> push them to Linus because I can show that patches that don't spend
> enough time in -next are more likely to introduce bugs.
>
> Yes, linux-next users want it fixed *now* and I completely agree it
> should be done that way, but the fix should not be immediately pushed to
> Linus as well.
>
> I've just finished reading an interesting article on LWN about the
> PostgreSQL fsync issues (https://lwn.net/Articles/752952/). If you
> look at Willy's commit, he wrote the final version of it about 5 days
> ago, Jeff merged it in 3 days ago, and Linus merged it in the tree
> today. Did it spend any time getting -next testing? nope.
>
> What's worse is that that commit is tagged for stable, which means
> that (given Greg's schedule) it may find it's way to -stable users
> even before some -next users/bots had a chance to test it out.

I just noticed a case where a commit was picked up for stable, while a
bot had flagged it as a build regression 18 hours earlier (with a CC to
lkml).

So it looks like the script for backporting commits should be enhanced to
check for this (searching for the commit ID in my email archive found the
bot report).

Thanks!

Gr{oetje,eeting}s,

Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- [email protected]

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
-- Linus Torvalds

2018-05-14 08:17:43

by Boris Brezillon

[permalink] [raw]
Subject: Re: [Ksummit-discuss] bug-introducing patches

On Mon, 14 May 2018 10:00:30 +0200
Geert Uytterhoeven <[email protected]> wrote:

> On Tue, May 1, 2018 at 10:00 PM, Sasha Levin
> <[email protected]> wrote:
> > On Tue, May 01, 2018 at 03:44:50PM -0400, Theodore Y. Ts'o wrote:
> >>On Tue, May 01, 2018 at 04:38:21PM +0000, Sasha Levin wrote:
> >>> - A merge window commit spent 50% more days, on average, in -next than a -rc
> >>> commit.
> >>
> >>So it *used* to be the case that after the merge window, I would queue
> >>up bug fixes for the next merge window. Greg K-H pushed for me to
> >>send them to Linus sooner, instead of waiting for the next merge
> >>window. TBH, it's actually easier for me to just wait until the next
> >>merge window, but please understand that there are multiple pressures
> >>on maintainers going on here, and the latest efforts to try to use
> >>AUTOSEL is just the most recent pressure placed on maintainers.
> >>
> >>The other thing is that when there is a regression users who are
> >>testing linux-next want it fixed *fast*. That's considered more
> >>important to them than waiting for one, perfect patch, just to keep
> >>AUTOSEL happy.
> >>
> >>So please understand that when you say that maintainers *need* to do X
> >>or Y, that there you are not the only one standing in line putting
> >>pressures on maintainers saying they *need* to do something. And
> >>quite frankly, I consider keeping people who are nice enough to test
> >>linux-next happy to be **far** more important than AUTOSEL.
> >
> > Ted,
> >
> > I'm not at all asking to wait before adding the patches to your tree,
> > or to -next. I'm asking to hold on to them a bit longer before you
> > push them to Linus because I can show that patches that don't spend
> > enough time in -next are more likely to introduce bugs.
> >
> > Yes, linux-next users want it fixed *now* and I completely agree it
> > should be done that way, but the fix should not be immediately pushed to
> > Linus as well.
> >
> > I've just finished reading an interesting article on LWN about the
> > PostgreSQL fsync issues (https://lwn.net/Articles/752952/). If you
> > look at Willy's commit, he wrote the final version of it about 5 days
> > ago, Jeff merged it in 3 days ago, and Linus merged it in the tree
> > today. Did it spend any time getting -next testing? nope.
> >
> > What's worse is that that commit is tagged for stable, which means
> > that (given Greg's schedule) it may find it's way to -stable users
> > even before some -next users/bots had a chance to test it out.
>
> I just noticed a case where a commit was picked up for stable, while a
> bot had flagged it as a build regression 18 hours earlier (with a CC to
> lkml).

Also, this patch has been on a tree that I know is tested by Fengguang's
robots for more than a week (and in linux-next for 2 days, which, I
agree, is probably not enough), and still, I only received the bug
report when the patch reached mainline. Are there tests that are only
run on Linus' tree?

>
> So it looks like the script for backporting commits should be enhanced to
> check for this (searching for the commit ID in my email archive found the
> bot report).
>
> Thanks!
>
> Gr{oetje,eeting}s,
>
> Geert
>


2018-05-14 08:30:00

by Geert Uytterhoeven

[permalink] [raw]
Subject: Re: [Ksummit-discuss] bug-introducing patches

Hi Boris,

On Mon, May 14, 2018 at 10:12 AM, Boris Brezillon
<[email protected]> wrote:
> On Mon, 14 May 2018 10:00:30 +0200
> Geert Uytterhoeven <[email protected]> wrote:
>> On Tue, May 1, 2018 at 10:00 PM, Sasha Levin
>> <[email protected]> wrote:
>> > On Tue, May 01, 2018 at 03:44:50PM -0400, Theodore Y. Ts'o wrote:
>> >>On Tue, May 01, 2018 at 04:38:21PM +0000, Sasha Levin wrote:
>> > What's worse is that that commit is tagged for stable, which means
>> > that (given Greg's schedule) it may find it's way to -stable users
>> > even before some -next users/bots had a chance to test it out.
>>
>> I just noticed a case where a commit was picked up for stable, while a
>> bot had flagged it as a build regression 18 hours earlier (with a CC to
>> lkml).
>
> Also, this patch has been on a tree that I know is tested by Fengguang's
> robots for more than a week (and in linux-next for 2 days, which, I
> agree, is probably not enough), and still, I only received the bug
> report when the patch reached mainline. Are there tests that are only
> run on Linus' tree?

Have your received a success report from Fengguang's bot, listing all
configs tested (the broken one should be included; it is included in the
configs tested on my branches)?

Gr{oetje,eeting}s,

Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- [email protected]

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
-- Linus Torvalds

2018-05-14 08:36:19

by Boris Brezillon

[permalink] [raw]
Subject: Re: [Ksummit-discuss] bug-introducing patches

On Mon, 14 May 2018 10:29:04 +0200
Geert Uytterhoeven <[email protected]> wrote:

> Hi Boris,
>
> On Mon, May 14, 2018 at 10:12 AM, Boris Brezillon
> <[email protected]> wrote:
> > On Mon, 14 May 2018 10:00:30 +0200
> > Geert Uytterhoeven <[email protected]> wrote:
> >> On Tue, May 1, 2018 at 10:00 PM, Sasha Levin
> >> <[email protected]> wrote:
> >> > On Tue, May 01, 2018 at 03:44:50PM -0400, Theodore Y. Ts'o wrote:
> >> >>On Tue, May 01, 2018 at 04:38:21PM +0000, Sasha Levin wrote:
> >> > What's worse is that that commit is tagged for stable, which means
> >> > that (given Greg's schedule) it may find it's way to -stable users
> >> > even before some -next users/bots had a chance to test it out.
> >>
> >> I just noticed a case where a commit was picked up for stable, while a
> >> bot had flagged it as a build regression 18 hours earlier (with a CC to
> >> lkml).
> >
> > Also, this patch has been on a tree that I know is tested by Fengguang's
> > robots for more than a week (and in linux-next for 2 days, which, I
> > agree, is probably not enough), and still, I only received the bug
> > report when the patch reached mainline. Are there tests that are only
> > run on Linus' tree?
>
> Have your received a success report from Fengguang's bot, listing all
> configs tested (the broken one should be included; it is included in the
> configs tested on my branches)?

Yes I did (see below).

-->8--
From: kbuild test robot <[email protected]>
To: Boris Brezillon <[email protected]>
Subject: [bbrezillon-0day:mtd/fixes] BUILD SUCCESS fc3a9e15b492eef707afd56b7478001fdecfe53f
Date: Mon, 07 May 2018 20:05:52 +0800
User-Agent: Heirloom mailx 12.5 6/20/10

tree/branch: https://github.com/bbrezillon/linux-0day mtd/fixes
branch HEAD: fc3a9e15b492eef707afd56b7478001fdecfe53f mtd: rawnand: Make sure we wait tWB before polling the STATUS reg

elapsed time: 49m

configs tested: 142

The following configs have been built successfully.
More configs may be tested in the coming days.

powerpc skiroot_defconfig
sh kfr2r09_defconfig
x86_64 acpi-redef
x86_64 allyesdebian
x86_64 nfsroot
m68k bvme6000_defconfig
powerpc ppa8548_defconfig
sh allnoconfig
sh rsk7269_defconfig
sh sh7785lcr_32bit_defconfig
sh titan_defconfig
i386 randconfig-c0-05071338
i386 tinyconfig
i386 randconfig-n0-201818
x86_64 randconfig-x002-201818
x86_64 randconfig-x006-201818
x86_64 randconfig-x005-201818
x86_64 randconfig-x001-201818
x86_64 randconfig-x009-201818
x86_64 randconfig-x004-201818
x86_64 randconfig-x003-201818
x86_64 randconfig-x007-201818
x86_64 randconfig-x000-201818
x86_64 randconfig-x008-201818
i386 randconfig-i1-201818
i386 randconfig-i0-201818
alpha defconfig
parisc allnoconfig
parisc b180_defconfig
parisc c3000_defconfig
parisc defconfig
ia64 defconfig
mips defconfig
powerpc allnoconfig
powerpc defconfig
powerpc ppc64_defconfig
s390 default_defconfig
x86_64 randconfig-g0-05071702
openrisc or1ksim_defconfig
um i386_defconfig
um x86_64_defconfig
i386 allmodconfig
ia64 alldefconfig
ia64 allnoconfig
i386 randconfig-a0-201818
i386 randconfig-a1-201818
x86_64 randconfig-s0-05071833
x86_64 randconfig-s1-05071833
x86_64 randconfig-s2-05071833
x86_64 randconfig-s0-05071933
x86_64 randconfig-s1-05071933
x86_64 randconfig-s2-05071933
c6x evmc6678_defconfig
h8300 h8300h-sim_defconfig
nios2 10m50_defconfig
xtensa common_defconfig
xtensa iss_defconfig
i386 alldefconfig
i386 allnoconfig
i386 defconfig
x86_64 federa-25
x86_64 rhel
x86_64 rhel-7.2
m68k m5475evb_defconfig
m68k multi_defconfig
m68k sun3_defconfig
i386 randconfig-s0-201818
i386 randconfig-s1-201818
x86_64 randconfig-s3-05071918
x86_64 randconfig-s4-05071918
x86_64 randconfig-s5-05071918
mips 32r2_defconfig
mips 64r6el_defconfig
mips allnoconfig
mips fuloong2e_defconfig
mips jz4740
mips malta_kvm_defconfig
mips txx9
x86_64 randconfig-i0-201818
i386 randconfig-b0-05071353
sparc defconfig
sparc64 allnoconfig
sparc64 defconfig
mips maltaaprp_defconfig
mips mtx1_defconfig
microblaze mmu_defconfig
microblaze nommu_defconfig
i386 randconfig-h0-05070458
i386 randconfig-h1-05070458
x86_64 randconfig-ne0-05071525
x86_64 randconfig-h0-05070841
i386 randconfig-x008-201818
i386 randconfig-x006-201818
i386 randconfig-x000-201818
i386 randconfig-x004-201818
i386 randconfig-x009-201818
i386 randconfig-x003-201818
i386 randconfig-x001-201818
i386 randconfig-x002-201818
i386 randconfig-x005-201818
i386 randconfig-x007-201818
x86_64 randconfig-x010-201818
x86_64 randconfig-x014-201818
x86_64 randconfig-x013-201818
x86_64 randconfig-x011-201818
x86_64 randconfig-x015-201818
x86_64 randconfig-x019-201818
x86_64 randconfig-x016-201818
x86_64 randconfig-x018-201818
x86_64 randconfig-x012-201818
x86_64 randconfig-x017-201818
i386 randconfig-x013-201818
i386 randconfig-x014-201818
i386 randconfig-x019-201818
i386 randconfig-x016-201818
i386 randconfig-x015-201818
i386 randconfig-x012-201818
i386 randconfig-x018-201818
i386 randconfig-x010-201818
i386 randconfig-x011-201818
i386 randconfig-x017-201818
i386 randconfig-x073-201818
i386 randconfig-x075-201818
i386 randconfig-x079-201818
i386 randconfig-x071-201818
i386 randconfig-x076-201818
i386 randconfig-x074-201818
i386 randconfig-x078-201818
i386 randconfig-x077-201818
i386 randconfig-x070-201818
i386 randconfig-x072-201818
arm allnoconfig
arm at91_dt_defconfig
arm efm32_defconfig
arm exynos_defconfig
arm multi_v5_defconfig
arm multi_v7_defconfig
arm shmobile_defconfig
arm sunxi_defconfig
arm64 allnoconfig
arm64 defconfig
i386 randconfig-x0-05071422

---
0-DAY kernel test infrastructure Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all Intel Corporation


2018-05-14 08:38:07

by Ulf Hansson

[permalink] [raw]
Subject: Re: [Ksummit-discuss] bug-introducing patches

On 9 May 2018 at 12:47, Stephen Rothwell <[email protected]> wrote:
> On Wed, 9 May 2018 18:03:46 +0900 Mark Brown <[email protected]> wrote:
>>
>> On Wed, May 09, 2018 at 10:47:57AM +0200, Daniel Vetter wrote:
>> > On Wed, May 9, 2018 at 10:44 AM, Mark Brown <[email protected]> wrote:
>>
>> > > I think this is an excellent idea, copying in Stephen for his input.
>> > > I'm currently on holiday but unless someone convinces me it's a terrible
>> > > idea I'm willing to at least give it a go on a trial basis once I'm back
>> > > home.
>>
>> > Since Stephen merges all -fixes branches first, before merging all the
>> > -next branches, he already generates that as part of linux-next. All
>> > he'd need to do is push that intermediate state out to some
>> > linux-fixes branch for consumption by test bots.
>
> Good idea ... I will see what I can do.
>
>> True. It's currently only those -fixes branches that people have asked
>> him to merge separately which isn't as big a proportion of trees as have
>> them (perhaps fortunately given people's enthusiasm for fixes branches
>> that don't merge cleanly with their development branches) so we'd also
>> need to encourage people to add them separately.
>
> I currently have 44 such fixes branches. More welcome!

Great!

Stephen, thanks for picking up the idea so quickly!

I will ping the kernelci folkz to request them to include your new
fixes branch for daily builds.

For mmc, please add my fixes branch according to below.

git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc.git fixes

Kind regards
Uffe

2018-05-14 08:42:58

by Geert Uytterhoeven

[permalink] [raw]
Subject: Re: [Ksummit-discuss] bug-introducing patches

Hi Boris,

On Mon, May 14, 2018 at 10:34 AM, Boris Brezillon
<[email protected]> wrote:
> On Mon, 14 May 2018 10:29:04 +0200
> Geert Uytterhoeven <[email protected]> wrote:
>> On Mon, May 14, 2018 at 10:12 AM, Boris Brezillon
>> <[email protected]> wrote:
>> > On Mon, 14 May 2018 10:00:30 +0200
>> > Geert Uytterhoeven <[email protected]> wrote:
>> >> On Tue, May 1, 2018 at 10:00 PM, Sasha Levin
>> >> <[email protected]> wrote:
>> >> > On Tue, May 01, 2018 at 03:44:50PM -0400, Theodore Y. Ts'o wrote:
>> >> >>On Tue, May 01, 2018 at 04:38:21PM +0000, Sasha Levin wrote:
>> >> > What's worse is that that commit is tagged for stable, which means
>> >> > that (given Greg's schedule) it may find it's way to -stable users
>> >> > even before some -next users/bots had a chance to test it out.
>> >>
>> >> I just noticed a case where a commit was picked up for stable, while a
>> >> bot had flagged it as a build regression 18 hours earlier (with a CC to
>> >> lkml).
>> >
>> > Also, this patch has been on a tree that I know is tested by Fengguang's
>> > robots for more than a week (and in linux-next for 2 days, which, I
>> > agree, is probably not enough), and still, I only received the bug
>> > report when the patch reached mainline. Are there tests that are only
>> > run on Linus' tree?
>>
>> Have your received a success report from Fengguang's bot, listing all
>> configs tested (the broken one should be included; it is included in the
>> configs tested on my branches)?
>
> Yes I did (see below).
>
> -->8--
> From: kbuild test robot <[email protected]>
> To: Boris Brezillon <[email protected]>
> Subject: [bbrezillon-0day:mtd/fixes] BUILD SUCCESS fc3a9e15b492eef707afd56b7478001fdecfe53f
> Date: Mon, 07 May 2018 20:05:52 +0800
> User-Agent: Heirloom mailx 12.5 6/20/10
>
> tree/branch: https://github.com/bbrezillon/linux-0day mtd/fixes
> branch HEAD: fc3a9e15b492eef707afd56b7478001fdecfe53f mtd: rawnand: Make sure we wait tWB before polling the STATUS reg
>
> elapsed time: 49m
>
> configs tested: 142

But the failed config (m68k/allmodconfig) is not listed?

BTW, my last report had:

configs tested: 178

Gr{oetje,eeting}s,

Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- [email protected]

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
-- Linus Torvalds

2018-05-14 08:51:26

by Boris Brezillon

[permalink] [raw]
Subject: Re: [Ksummit-discuss] bug-introducing patches

+Fengguang

On Mon, 14 May 2018 10:40:10 +0200
Geert Uytterhoeven <[email protected]> wrote:

> Hi Boris,
>
> On Mon, May 14, 2018 at 10:34 AM, Boris Brezillon
> <[email protected]> wrote:
> > On Mon, 14 May 2018 10:29:04 +0200
> > Geert Uytterhoeven <[email protected]> wrote:
> >> On Mon, May 14, 2018 at 10:12 AM, Boris Brezillon
> >> <[email protected]> wrote:
> >> > On Mon, 14 May 2018 10:00:30 +0200
> >> > Geert Uytterhoeven <[email protected]> wrote:
> >> >> On Tue, May 1, 2018 at 10:00 PM, Sasha Levin
> >> >> <[email protected]> wrote:
> >> >> > On Tue, May 01, 2018 at 03:44:50PM -0400, Theodore Y. Ts'o wrote:
> >> >> >>On Tue, May 01, 2018 at 04:38:21PM +0000, Sasha Levin wrote:
> >> >> > What's worse is that that commit is tagged for stable, which means
> >> >> > that (given Greg's schedule) it may find it's way to -stable users
> >> >> > even before some -next users/bots had a chance to test it out.
> >> >>
> >> >> I just noticed a case where a commit was picked up for stable, while a
> >> >> bot had flagged it as a build regression 18 hours earlier (with a CC to
> >> >> lkml).
> >> >
> >> > Also, this patch has been on a tree that I know is tested by Fengguang's
> >> > robots for more than a week (and in linux-next for 2 days, which, I
> >> > agree, is probably not enough), and still, I only received the bug
> >> > report when the patch reached mainline. Are there tests that are only
> >> > run on Linus' tree?
> >>
> >> Have your received a success report from Fengguang's bot, listing all
> >> configs tested (the broken one should be included; it is included in the
> >> configs tested on my branches)?
> >
> > Yes I did (see below).
> >
> > -->8--
> > From: kbuild test robot <[email protected]>
> > To: Boris Brezillon <[email protected]>
> > Subject: [bbrezillon-0day:mtd/fixes] BUILD SUCCESS fc3a9e15b492eef707afd56b7478001fdecfe53f
> > Date: Mon, 07 May 2018 20:05:52 +0800
> > User-Agent: Heirloom mailx 12.5 6/20/10
> >
> > tree/branch: https://github.com/bbrezillon/linux-0day mtd/fixes
> > branch HEAD: fc3a9e15b492eef707afd56b7478001fdecfe53f mtd: rawnand: Make sure we wait tWB before polling the STATUS reg
> >
> > elapsed time: 49m
> >
> > configs tested: 142
>
> But the failed config (m68k/allmodconfig) is not listed?

Yes, that's my point. It seems that some configs are only rarely
(never?) tested on my linux-0day tree (probably because they take longer
to build), and I should only take kbuild robot results as an indication
not a guarantee.

2018-05-14 09:26:37

by Fengguang Wu

[permalink] [raw]
Subject: Re: [Ksummit-discuss] bug-introducing patches

On Mon, May 14, 2018 at 10:48:03AM +0200, Boris Brezillon wrote:
>+Fengguang
>
>On Mon, 14 May 2018 10:40:10 +0200
>Geert Uytterhoeven <[email protected]> wrote:
>
>> Hi Boris,
>>
>> On Mon, May 14, 2018 at 10:34 AM, Boris Brezillon
>> <[email protected]> wrote:
>> > On Mon, 14 May 2018 10:29:04 +0200
>> > Geert Uytterhoeven <[email protected]> wrote:
>> >> On Mon, May 14, 2018 at 10:12 AM, Boris Brezillon
>> >> <[email protected]> wrote:
>> >> > On Mon, 14 May 2018 10:00:30 +0200
>> >> > Geert Uytterhoeven <[email protected]> wrote:
>> >> >> On Tue, May 1, 2018 at 10:00 PM, Sasha Levin
>> >> >> <[email protected]> wrote:
>> >> >> > On Tue, May 01, 2018 at 03:44:50PM -0400, Theodore Y. Ts'o wrote:
>> >> >> >>On Tue, May 01, 2018 at 04:38:21PM +0000, Sasha Levin wrote:
>> >> >> > What's worse is that that commit is tagged for stable, which means
>> >> >> > that (given Greg's schedule) it may find it's way to -stable users
>> >> >> > even before some -next users/bots had a chance to test it out.
>> >> >>
>> >> >> I just noticed a case where a commit was picked up for stable, while a
>> >> >> bot had flagged it as a build regression 18 hours earlier (with a CC to
>> >> >> lkml).
>> >> >
>> >> > Also, this patch has been on a tree that I know is tested by Fengguang's
>> >> > robots for more than a week (and in linux-next for 2 days, which, I
>> >> > agree, is probably not enough), and still, I only received the bug
>> >> > report when the patch reached mainline. Are there tests that are only
>> >> > run on Linus' tree?
>> >>
>> >> Have your received a success report from Fengguang's bot, listing all
>> >> configs tested (the broken one should be included; it is included in the
>> >> configs tested on my branches)?
>> >
>> > Yes I did (see below).
>> >
>> > -->8--
>> > From: kbuild test robot <[email protected]>
>> > To: Boris Brezillon <[email protected]>
>> > Subject: [bbrezillon-0day:mtd/fixes] BUILD SUCCESS fc3a9e15b492eef707afd56b7478001fdecfe53f
>> > Date: Mon, 07 May 2018 20:05:52 +0800
>> > User-Agent: Heirloom mailx 12.5 6/20/10
>> >
>> > tree/branch: https://github.com/bbrezillon/linux-0day mtd/fixes
>> > branch HEAD: fc3a9e15b492eef707afd56b7478001fdecfe53f mtd: rawnand: Make sure we wait tWB before polling the STATUS reg
>> >
>> > elapsed time: 49m
>> >
>> > configs tested: 142
>>
>> But the failed config (m68k/allmodconfig) is not listed?
>
>Yes, that's my point. It seems that some configs are only rarely
>(never?) tested on my linux-0day tree (probably because they take longer
>to build), and I should only take kbuild robot results as an indication
>not a guarantee.

Yeah sorry, there is no 100% guarantee. There are 2 main aspects to
this problem.

- Response time vs coverage. Most build errors can be caught within 1
day. The build success notification email is typically sent within
half day (a reasonable feedback time). At this time, it can only be
a rough indication not a guarantee. After sending the 0day build
success notification, the build tests will actually continue for
about 1 week to increase test coverage.

- Merge-test-bisect based workflow. If one branch is hard to merge
with others, especially if it's based on old kernel, it'll receive
much less test coverage. Branches with known build/boot errors will
be excluded from further merges, too.

Thanks,
Fengguang

2018-05-14 21:46:06

by Stephen Rothwell

[permalink] [raw]
Subject: Re: [Ksummit-discuss] bug-introducing patches

Hi Ulf,

On Mon, 14 May 2018 10:36:04 +0200 Ulf Hansson <[email protected]> wrote:
>
> I will ping the kernelci folkz to request them to include your new
> fixes branch for daily builds.

Excellent, thanks.

> For mmc, please add my fixes branch according to below.
>
> git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc.git fixes

Added from today.

Thanks for adding your subsystem tree as a participant of linux-next. As
you may know, this is not a judgement of your code. The purpose of
linux-next is for integration testing and to lower the impact of
conflicts between subsystems in the next merge window.

You will need to ensure that the patches/commits in your tree/series have
been:
* submitted under GPL v2 (or later) and include the Contributor's
Signed-off-by,
* posted to the relevant mailing list,
* reviewed by you (or another maintainer of your subsystem tree),
* successfully unit tested, and
* destined for the current or next Linux merge window.

Basically, this should be just what you would send to Linus (or ask him
to fetch). It is allowed to be rebased if you deem it necessary.

--
Cheers,
Stephen Rothwell
[email protected]


Attachments:
(No filename) (499.00 B)
OpenPGP digital signature

2018-05-15 10:46:37

by Krzysztof Kozlowski

[permalink] [raw]
Subject: Re: [Ksummit-discuss] bug-introducing patches

On Wed, May 9, 2018 at 2:43 PM, Stephen Rothwell <[email protected]> wrote:
> Hi Vinod,
>
> On Wed, 9 May 2018 16:25:34 +0530 Vinod Koul <[email protected]> wrote:
>> >
>> > I currently have 44 such fixes branches. More welcome!
>>
>> Great so do you want us to send fixes branch or scan the existing trees and add
>> them.
>
> The former.

Please merge following fixes branches from my trees:
Tree: samsung-krzk
git://git.kernel.org/pub/scm/linux/kernel/git/krzk/linux.git
branch: fixes

Tree: pinctrl-samsung
git://git.kernel.org/pub/scm/linux/kernel/git/pinctrl/samsung.git
branch: pinctrl-fixes

Although both are currently empty... but I guess you are collecting
this also for future.

Best regards,
Krzysztof

2018-05-15 11:57:11

by Stephen Rothwell

[permalink] [raw]
Subject: Re: [Ksummit-discuss] bug-introducing patches

Hi Krzysztof,

On Tue, 15 May 2018 12:42:49 +0200 Krzysztof Kozlowski <[email protected]> wrote:
>
> Please merge following fixes branches from my trees:
> Tree: samsung-krzk
> git://git.kernel.org/pub/scm/linux/kernel/git/krzk/linux.git
> branch: fixes
>
> Tree: pinctrl-samsung
> git://git.kernel.org/pub/scm/linux/kernel/git/pinctrl/samsung.git
> branch: pinctrl-fixes
>
> Although both are currently empty... but I guess you are collecting
> this also for future.

Yes, indeed.

Added from tomorrow.

Thanks for adding your subsystem tree as a participant of linux-next. As
you may know, this is not a judgement of your code. The purpose of
linux-next is for integration testing and to lower the impact of
conflicts between subsystems in the next merge window.

You will need to ensure that the patches/commits in your tree/series have
been:
* submitted under GPL v2 (or later) and include the Contributor's
Signed-off-by,
* posted to the relevant mailing list,
* reviewed by you (or another maintainer of your subsystem tree),
* successfully unit tested, and
* destined for the current or next Linux merge window.

Basically, this should be just what you would send to Linus (or ask him
to fetch). It is allowed to be rebased if you deem it necessary.

--
Cheers,
Stephen Rothwell
[email protected]


Attachments:
(No filename) (499.00 B)
OpenPGP digital signature

2018-05-17 16:46:15

by Mark Brown

[permalink] [raw]
Subject: Re: [Ksummit-discuss] bug-introducing patches

On Mon, May 14, 2018 at 10:36:04AM +0200, Ulf Hansson wrote:

> I will ping the kernelci folkz to request them to include your new
> fixes branch for daily builds.

No need, I already added it.


Attachments:
(No filename) (200.00 B)
signature.asc (499.00 B)
Download all attachments

2018-07-14 17:40:07

by Pavel Machek

[permalink] [raw]
Subject: Re: bug-introducing patches

Hi!

> > The way I see it, if a commit can get one or two tested-by, it's a good
> > alternative to a week in -next.
>
> Agreed. Even their own actually. And I'm not kidding. Those who run large
> amounts of tests on certain patches could really mention is in tested-by,
> as opposed to the most common cases where the code was just regularly
> tested.

Actually, it would be cool to get "Tested: no" and "Tested: compile"
tags in the commit mesages. Sometimes it is clear from the context
that patch was not tested (treewide update of time to 64bit), but
sometime it is not.

This is especially problem for -stable, as it seems that lately
patches are backported from new version without any testing.
Pavel
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

2018-07-14 18:39:49

by Guenter Roeck

[permalink] [raw]
Subject: Re: [Ksummit-discuss] bug-introducing patches

On 07/14/2018 10:38 AM, Pavel Machek wrote:
> Hi!
>
>>> The way I see it, if a commit can get one or two tested-by, it's a good
>>> alternative to a week in -next.
>>
>> Agreed. Even their own actually. And I'm not kidding. Those who run large
>> amounts of tests on certain patches could really mention is in tested-by,
>> as opposed to the most common cases where the code was just regularly
>> tested.
>
> Actually, it would be cool to get "Tested: no" and "Tested: compile"
> tags in the commit mesages. Sometimes it is clear from the context
> that patch was not tested (treewide update of time to 64bit), but
> sometime it is not.
>
> This is especially problem for -stable, as it seems that lately
> patches are backported from new version without any testing.


When I started my own testing some five years ago, most architectures
did not even build in stable releases. At that time, the only tests being
done on stable release candidates were a number of build tests, and most
of the results were ignored.

Today, we have 0day, kernelci, kerneltests, Linaro's LKFT, and more, plus
several merge and boot tests done by individuals. Stable release candidates
are build tested on all supported architectures, with hundreds of configurations.
Each stable release candidate is boot tested on qemu with more than 150
configurations on each architecture supported by qemu. A substantial amount of
boot tests run on real hardware. On key architectures, more sophisticated tests
such as kerneltests and LTP ensure that no new regressions are introduced.

What is new is that many more patches are being applied and backported
to stable releases, at least to degree due to Sasha's scripts, but also due
to tools like syzbot running on older kernels and finding problems which
have been fixed upstream, but the fix has not been backported.

At the same time, stable release test coverage has been improved substantially
over the last few years. I am _much_ more confident with stable releases
than I used to be a few of years ago.

Sure, there are regressions. However, the regression rate is very low (last
time I checked it was around 0.1% to 0.3% per stable release for us). Sure,
I would like to further reduce regression rate, to improve stability but also
because each and every regression is used by someone to argue that stable releases
would be unreliable. However, this is more a matter of perception than reality.
Reality is that more than 90% of all CVEs are fixed in stable releases by the time
they are published as affecting a stable release. Reality is that a substantial
percentage of problems reported by syzbot _are_ being fixed in stable releases.
Reality is that, by the time bugs are reported from the field, it is more and
more likely that we find out that the bug has already been fixed in a later
release due to a stable release merge.

Given all that, I think it is quite misleading to claim that the number of
patches applied to stable releases would create additional problems, or that
patches would be applied "without any testing". On the contrary, I would argue
that the additional testing now being performed _enabled_ us to apply more
patches (bug fixes) to stable releases.

Sure, testing is still far from perfect and needs to be improved. However,
requiring that every patch applied to stable releases be tested individually
(where ? on all affected architectures ?) would be the wrong direction.
What we need to do is to further improve test coverage. We should have no
regressions, but we need to get there by improving test coverage, not by
demanding explicit per-patch and per-release testing (which would be all
but impossible anyway - no one has the infrastructure necessary to test
a patch on all affected architectures).

I would encourage everyone interested in kernel testing to attend the
kernel test sessions at Linux Plumbers and ELC later this year to discuss
concerns and possible solutions.

Thanks,
Guenter

2018-07-14 19:48:10

by Pavel Machek

[permalink] [raw]
Subject: Re: [Ksummit-discuss] bug-introducing patches

Hi!

> >>>The way I see it, if a commit can get one or two tested-by, it's a good
> >>>alternative to a week in -next.
> >>
> >>Agreed. Even their own actually. And I'm not kidding. Those who run large
> >>amounts of tests on certain patches could really mention is in tested-by,
> >>as opposed to the most common cases where the code was just regularly
> >>tested.
> >
> >Actually, it would be cool to get "Tested: no" and "Tested: compile"
> >tags in the commit mesages. Sometimes it is clear from the context
> >that patch was not tested (treewide update of time to 64bit), but
> >sometime it is not.
> >
> >This is especially problem for -stable, as it seems that lately
> >patches are backported from new version without any testing.
>
>
> When I started my own testing some five years ago, most architectures
> did not even build in stable releases. At that time, the only tests being
> done on stable release candidates were a number of build tests, and most
> of the results were ignored.
>
> Today, we have 0day, kernelci, kerneltests, Linaro's LKFT, and more, plus
> several merge and boot tests done by individuals. Stable release candidates
> are build tested on all supported architectures, with hundreds of
...
> Sure, testing is still far from perfect and needs to be improved. However,
> requiring that every patch applied to stable releases be tested individually
> (where ? on all affected architectures ?) would be the wrong
>direction.

Well, 0day, kernelci etc... is nice... until the change is in the
driver. Most of the kernel are drivers, remember?

I don't know. I'd say that if patch is important enough for -stable,
there should be someone testing it. For core kernel changes, that can
be 0day bot, but for drivers...

And problem exists on mainline, too.

Hmm. Patch for obscure driver. Wow, nice, is WellKnownName actually
using that driver? Aha, no, he is not; he is doing global
search&replace, and did not test the patch...

Pavel
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html


Attachments:
(No filename) (2.11 kB)
signature.asc (188.00 B)
Digital signature
Download all attachments

2018-07-14 20:41:16

by Guenter Roeck

[permalink] [raw]
Subject: Re: [Ksummit-discuss] bug-introducing patches

On 07/14/2018 12:47 PM, Pavel Machek wrote:
> Hi!
>
>>>>> The way I see it, if a commit can get one or two tested-by, it's a good
>>>>> alternative to a week in -next.
>>>>
>>>> Agreed. Even their own actually. And I'm not kidding. Those who run large
>>>> amounts of tests on certain patches could really mention is in tested-by,
>>>> as opposed to the most common cases where the code was just regularly
>>>> tested.
>>>
>>> Actually, it would be cool to get "Tested: no" and "Tested: compile"
>>> tags in the commit mesages. Sometimes it is clear from the context
>>> that patch was not tested (treewide update of time to 64bit), but
>>> sometime it is not.
>>>
>>> This is especially problem for -stable, as it seems that lately
>>> patches are backported from new version without any testing.
>>
>>
>> When I started my own testing some five years ago, most architectures
>> did not even build in stable releases. At that time, the only tests being
>> done on stable release candidates were a number of build tests, and most
>> of the results were ignored.
>>
>> Today, we have 0day, kernelci, kerneltests, Linaro's LKFT, and more, plus
>> several merge and boot tests done by individuals. Stable release candidates
>> are build tested on all supported architectures, with hundreds of
> ...
>> Sure, testing is still far from perfect and needs to be improved. However,
>> requiring that every patch applied to stable releases be tested individually
>> (where ? on all affected architectures ?) would be the wrong
>> direction.
>
> Well, 0day, kernelci etc... is nice... until the change is in the
> driver. Most of the kernel are drivers, remember?
>
> I don't know. I'd say that if patch is important enough for -stable,
> there should be someone testing it. For core kernel changes, that can
> be 0day bot, but for drivers...
>

For my part I am just glad that we were able to pick up a fix in xhci code
just last week, tested or not, from -stable, instead of having to track it
down ourselves. Similar for many other driver patches which _do_ affect us
(like the flurry of ext4 patches this week). Granted, there are lots of
patches which we don't use/need, but even there it is surprising how many
problems are found with existing testing.

For anyone interested in making sure that obscure (whatever that means)
drivers are tested for stable releases, but does not want to spend time on it,
all I can recommend is to implement qemu support for it and let me know,
and I'll be happy to add a respective test to my test farm.

However, ultimately, stable release candidates are public. Everyone is
invited to test them. Anyone interested in a specific release and driver
is invited take stable release candidates and do the necessary testing,
just like I and several others do.

> And problem exists on mainline, too.
>
> Hmm. Patch for obscure driver. Wow, nice, is WellKnownName actually
> using that driver? Aha, no, he is not; he is doing global
> search&replace, and did not test the patch...
>

Ah, like me with the strncpy(x, y, strlen(y)) -> memcpy() replacements
I did a week or so ago ? You are right, I only compile tested those and
otherwise trusted my ability to understand C code. If that caused any
problems, please let me know, and hopefully I'll be able to learn something
from it.

Really, there are infrastructure changes all the time. Sometimes I am asked
to run a complete test sequence with those, but most of the time they are
applied to -next and people wait for the fallout. That may not be perfect but,
seriously, the only alternative would be to declare that in-kernel APIs
shall not be changed anymore. I don't think that would be feasible.

Thanks,
Guenter

2018-07-14 21:10:39

by Pavel Machek

[permalink] [raw]
Subject: Re: [Ksummit-discuss] bug-introducing patches

Hi!

> >Well, 0day, kernelci etc... is nice... until the change is in the
> >driver. Most of the kernel are drivers, remember?
> >
> >I don't know. I'd say that if patch is important enough for -stable,
> >there should be someone testing it. For core kernel changes, that can
> >be 0day bot, but for drivers...
> >
>
> For my part I am just glad that we were able to pick up a fix in xhci code
> just last week, tested or not, from -stable, instead of having to track it
> down ourselves. Similar for many other driver patches which _do_ affect us
> (like the flurry of ext4 patches this week). Granted, there are lots of
> patches which we don't use/need, but even there it is surprising how many
> problems are found with existing testing.
>
> For anyone interested in making sure that obscure (whatever that means)
> drivers are tested for stable releases, but does not want to spend time on it,
> all I can recommend is to implement qemu support for it and let me know,
> and I'll be happy to add a respective test to my test farm.

Umm. Yes, qemu support for every driver would be nice, but will not happen.

> However, ultimately, stable release candidates are public. Everyone is
> invited to test them. Anyone interested in a specific release and
> driver

Yes, they are public. SubmittingPatches says every patch should be
tested, and that's clearly not happening for -stable. And I'd like
those patch marked such.

> >And problem exists on mainline, too.
> >
> >Hmm. Patch for obscure driver. Wow, nice, is WellKnownName actually
> >using that driver? Aha, no, he is not; he is doing global
> >search&replace, and did not test the patch...
> >
>
> Ah, like me with the strncpy(x, y, strlen(y)) -> memcpy() replacements
> I did a week or so ago ? You are right, I only compile tested those and
> otherwise trusted my ability to understand C code. If that caused any
> problems, please let me know, and hopefully I'll be able to learn something
> from it.

Yes, such stuff. No, I was not talking about you. I did not want to
give concrete example, but...

# > get_monotonic_boottime() is deprecated, so let's convert this to
# > the simpler ktime_get_boot_ns().
# >
# > Signed-off-by: <redacted>
#
# Have you tested it?
#
...
# > - curr_boot = timespec_to_ns(&boot_time) * cpus;
#
# Original code is pretty weird (notice the * cpus), so I'm
# double-checking.

Yes, often you can guess that patch was probably not tested, but it
would be nice to have

Tested: compile

annotation to take away the guesswork. It took me a while an some head
scratching in this concrete example, and it is not first time this
happened.

Best regards,
Pavel
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html


Attachments:
(No filename) (2.81 kB)
signature.asc (188.00 B)
Digital signature
Download all attachments

2018-07-15 05:58:44

by Willy Tarreau

[permalink] [raw]
Subject: Re: [Ksummit-discuss] bug-introducing patches

On Sat, Jul 14, 2018 at 11:09:13PM +0200, Pavel Machek wrote:
> > For anyone interested in making sure that obscure (whatever that means)
> > drivers are tested for stable releases, but does not want to spend time on it,
> > all I can recommend is to implement qemu support for it and let me know,
> > and I'll be happy to add a respective test to my test farm.
>
> Umm. Yes, qemu support for every driver would be nice, but will not happen.

Well, I would argue that driver code changes much less than core code
between kernel versions, and that most of the changes in drivers are
mostly infrastructure changes. Drivers don't evolve much in general,
they are written, tested, merged, they receive fixes and then they
only receive infrastructure changes that touch all drivers in the same
category.

When you backport fixes to drivers, it is very common that the code
looks almost the same between even a very old kernel and mainline, and
when not, the adaptations generally look quite straightforward, and if
not it means the driver changed significantly and in this case we don't
backport the fix as we don't even know if it is relevant.

I've always had much more difficulties backporting fixes under the arch/
subdir where stuff changes all the time. Sometimes a patch applies but
doesn't even compile. I learned not to play black magic in this area
because some patches are subtle and if the code changed you often need
the author and/or maintainers to double-check. Some subsystems like KVM
improve a lot over time and are difficult to backport to as well, and
even if you manage to properly backport a fix you're uncertain how to
verify you backported it well. Similarly you don't want to improvise
yourself the backporter of the year in this area.

Drivers are often OK and are the ones harder to test, so in the end
you don't miss much by your limited ability to test a backport there.

What I can certainly say as a stable kernel user is that the regression
rate is so low compared to the fix rate that I never have any problem
upgrading to a more recent version in the same branch, because the
number of problems that will be fixed is much higher than the risk of
a single regression.

As Guenter says, we can always improve, but the most important is to
deliver fixes in a timely manner. When you see that any LTS branch
accumulates around 5000 fixes over time, you understand that any
single new kernel being released contains around 5000 bugs left to be
found. Fixing them quickly is much more important to me (as a user)
than ensuring that I will not reach 5001 by inheriting from a poorly
tested backport.

My hope is that thanks to all the automated testing in place we can
further accelerate the backport rate so that a stable kernel reaches
in 2 months the level of quality that we previously used to reach only
after one year. And I think we're already about there, as both 4.4.x
and 4.9.x in their early versions (x < 10) were already very good for
various use cases. 4.17.5 I'm using on this PC looks pretty slick as
well. Overall it means that we can provide a clean upgrade path for
users so that they don't stick to bogus or insecure kernels by fear
of upgrading. We can always argue that a bug may appear once in a
while but for me while technically this is true, stastistically this
is just FUD and is not relevant to end users' real usage.

Willy

2018-07-15 08:55:43

by Greg Kroah-Hartman

[permalink] [raw]
Subject: Re: bug-introducing patches

On Sat, Jul 14, 2018 at 07:38:12PM +0200, Pavel Machek wrote:
> Hi!
>
> > > The way I see it, if a commit can get one or two tested-by, it's a good
> > > alternative to a week in -next.
> >

Pavel, I "love" how you fail to point out that you are responding to a 2
month old thread :(

And that thread was beaten to death, and still you want to revise it,
which is odd to me, perhaps you just don't like stable releases? Given
that you never mark any of the patches for your subsystem for stable
releases, why do you care about how they are maintained?

> > Agreed. Even their own actually. And I'm not kidding. Those who run large
> > amounts of tests on certain patches could really mention is in tested-by,
> > as opposed to the most common cases where the code was just regularly
> > tested.
>
> Actually, it would be cool to get "Tested: no" and "Tested: compile"
> tags in the commit mesages. Sometimes it is clear from the context
> that patch was not tested (treewide update of time to 64bit), but
> sometime it is not.
>
> This is especially problem for -stable, as it seems that lately
> patches are backported from new version without any testing.

As everyone has pointed out numerous times in this thread, there are
more testing of stable patches and releases than _EVER_ before in the
history of stable kernels. And if you feel there are ways to do more
testing that we somehow are missing, wonderful, please provide
constructive criticism. If not, and you just want to complain, well, my
killfile can always use a new member...

And as always, you have a choice:
- if you don't like stable kernels, don't run them.

greg k-h

2018-07-15 14:51:55

by Theodore Ts'o

[permalink] [raw]
Subject: Re: bug-introducing patches

On Sun, Jul 15, 2018 at 10:54:03AM +0200, Greg KH wrote:
> Pavel, I "love" how you fail to point out that you are responding to a 2
> month old thread :(

And apologies for releasing some ancient messages that were caught in
the ksummit-discuss's moderation queue. I hadn't been paying
attention to the fact that there had been some messages caught there,
and I figured it might be some people on the ksummit-discuss list who
might have missed the context of that old thread.

That being said, it *was* beaten to death two months ago, so people
who are replying might want to keep that in mind. :-)

> And as always, you have a choice:
> - if you don't like stable kernels, don't run them.

And if you don't like stable kernels, you can pay $$$ to an enterprise
distro kernel. Although you might find they aren't that much better
at the whole fixing bugs versus introducing regressions tradeoff.
Indeed, because there are crazy people who insist on using 3.18
kernels *and* getting support for the latest hardware, you may find
that it's somewhat worse; not to mention not necessarily being able to
get all of the fixes for the cache-related security problems getting
found...

- Ted

2018-07-15 20:16:42

by Pavel Machek

[permalink] [raw]
Subject: Re: bug-introducing patches

On Sun 2018-07-15 10:54:03, Greg KH wrote:
> On Sat, Jul 14, 2018 at 07:38:12PM +0200, Pavel Machek wrote:
> > Hi!
> >
> > > > The way I see it, if a commit can get one or two tested-by, it's a good
> > > > alternative to a week in -next.
> > >
>
> Pavel, I "love" how you fail to point out that you are responding to a 2
> month old thread :(
>
> And that thread was beaten to death, and still you want to revise it,
> which is odd to me, perhaps you just don't like stable releases? Given
> that you never mark any of the patches for your subsystem for stable
> releases, why do you care about how they are maintained?

I do mark patches -- acording to stable kernel rules. But you are
apparently using different rules, not written anywhere, and I get
complains when I don't mark patches according to _those_.

But this was supposed to be about testing.

And I'd like to see Tested: no/compile headers, instead.

[And yes, motivation for this was that broken LED patches were merged
to stable without any testing, and when that was questioned, I was
told that testing was not performed because it would require unusual
hardware called "USB keyboard".]

Pavel
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html


Attachments:
(No filename) (1.31 kB)
signature.asc (188.00 B)
Digital signature
Download all attachments