Hello,
Several people proposed that linux-next should not be tested on
syzbot. While some people suggested that it needs to test as many
trees as possible. I've initially included linux-next as it is a
staging area before upstream tree, with the intention that patches are
_tested_ there, is they are not tested there, bugs enter upstream
tree. And then it takes much longer to get fix into other trees.
So the question is: what trees/branches should be tested? Preferably
in priority order as syzbot can't test all of them.
Thanks
On Mon, Jan 15, 2018 at 11:51 PM, Dmitry Vyukov <[email protected]> wrote:
> Hello,
>
> Several people proposed that linux-next should not be tested on
> syzbot. While some people suggested that it needs to test as many
> trees as possible. I've initially included linux-next as it is a
> staging area before upstream tree, with the intention that patches are
> _tested_ there, is they are not tested there, bugs enter upstream
> tree. And then it takes much longer to get fix into other trees.
>
> So the question is: what trees/branches should be tested? Preferably
> in priority order as syzbot can't test all of them.
>
I always thought that -next existed specifically to give people a
chance to test the code in it. Maybe the question is where to report
the test results ?
Guenter
On Tue, Jan 16, 2018 at 10:45 AM, Guenter Roeck <[email protected]> wrote:
> On Mon, Jan 15, 2018 at 11:51 PM, Dmitry Vyukov <[email protected]> wrote:
>> Hello,
>>
>> Several people proposed that linux-next should not be tested on
>> syzbot. While some people suggested that it needs to test as many
>> trees as possible. I've initially included linux-next as it is a
>> staging area before upstream tree, with the intention that patches are
>> _tested_ there, is they are not tested there, bugs enter upstream
>> tree. And then it takes much longer to get fix into other trees.
>>
>> So the question is: what trees/branches should be tested? Preferably
>> in priority order as syzbot can't test all of them.
>>
>
> I always thought that -next existed specifically to give people a
> chance to test the code in it. Maybe the question is where to report
> the test results ?
FTR, from Guenter on another thread:
> Interesting. Assuming that refers to linux-next, not linux-net, that
> may explain why linux-next tends to deteriorate. I wonder if I should
> drop it from my testing as well. I'll be happy to follow whatever the
> result of this exchange is and do the same.
If we agree on some list of important branches, and what branches
specifically should not be tested with automatic reporting, I think it
will benefit everybody.
+Fengguang, can you please share your list and rationale behind it?
On Tue, Jan 16, 2018 at 1:58 AM, Dmitry Vyukov <[email protected]> wrote:
> On Tue, Jan 16, 2018 at 10:45 AM, Guenter Roeck <[email protected]> wrote:
>> On Mon, Jan 15, 2018 at 11:51 PM, Dmitry Vyukov <[email protected]> wrote:
>>> Hello,
>>>
>>> Several people proposed that linux-next should not be tested on
>>> syzbot. While some people suggested that it needs to test as many
>>> trees as possible. I've initially included linux-next as it is a
>>> staging area before upstream tree, with the intention that patches are
>>> _tested_ there, is they are not tested there, bugs enter upstream
>>> tree. And then it takes much longer to get fix into other trees.
>>>
>>> So the question is: what trees/branches should be tested? Preferably
>>> in priority order as syzbot can't test all of them.
>>>
>>
>> I always thought that -next existed specifically to give people a
>> chance to test the code in it. Maybe the question is where to report
>> the test results ?
>
> FTR, from Guenter on another thread:
>
>> Interesting. Assuming that refers to linux-next, not linux-net, that
>> may explain why linux-next tends to deteriorate. I wonder if I should
>> drop it from my testing as well. I'll be happy to follow whatever the
>> result of this exchange is and do the same.
>
> If we agree on some list of important branches, and what branches
> specifically should not be tested with automatic reporting, I think it
> will benefit everybody.
> +Fengguang, can you please share your list and rationale behind it?
https://github.com/fengguang/lkp-tests, more specifically
https://github.com/fengguang/lkp-tests/tree/master/repo/linux
Guenter
Dmitry Vyukov <[email protected]> writes:
> On Tue, Jan 16, 2018 at 10:45 AM, Guenter Roeck <[email protected]> wrote:
>> On Mon, Jan 15, 2018 at 11:51 PM, Dmitry Vyukov <[email protected]> wrote:
>>> Hello,
>>>
>>> Several people proposed that linux-next should not be tested on
>>> syzbot. While some people suggested that it needs to test as many
>>> trees as possible. I've initially included linux-next as it is a
>>> staging area before upstream tree, with the intention that patches are
>>> _tested_ there, is they are not tested there, bugs enter upstream
>>> tree. And then it takes much longer to get fix into other trees.
>>>
>>> So the question is: what trees/branches should be tested? Preferably
>>> in priority order as syzbot can't test all of them.
>>>
>>
>> I always thought that -next existed specifically to give people a
>> chance to test the code in it. Maybe the question is where to report
>> the test results ?
>
> FTR, from Guenter on another thread:
>
>> Interesting. Assuming that refers to linux-next, not linux-net, that
>> may explain why linux-next tends to deteriorate. I wonder if I should
>> drop it from my testing as well. I'll be happy to follow whatever the
>> result of this exchange is and do the same.
>
> If we agree on some list of important branches, and what branches
> specifically should not be tested with automatic reporting, I think it
> will benefit everybody.
> +Fengguang, can you please share your list and rationale behind it?
The problem is testing linux-next and then using get-maintainer.pl to
report the problem.
If you are resource limited I would start by testing Linus's tree to
find the existing bugs, and to get a baseline. Using get-maintainer.pl
is fine for sending emails to developers there.
After that I would test the individual tress that are pulled into
linux-next. So that any issue not found in Linus's tree can be
attributed to the tree you are testing and sent the the appropriate
maintainer.
After that I would consider testing linux-next itself and see if any
issues are caused by the merger of all of those trees.
Eric
On Tue, Jan 16, 2018 at 11:02:17AM -0600, Eric W. Biederman wrote:
> Dmitry Vyukov <[email protected]> writes:
>
> > On Tue, Jan 16, 2018 at 10:45 AM, Guenter Roeck <[email protected]> wrote:
> >> On Mon, Jan 15, 2018 at 11:51 PM, Dmitry Vyukov <[email protected]> wrote:
> >>> Hello,
> >>>
> >>> Several people proposed that linux-next should not be tested on
> >>> syzbot. While some people suggested that it needs to test as many
> >>> trees as possible. I've initially included linux-next as it is a
> >>> staging area before upstream tree, with the intention that patches are
> >>> _tested_ there, is they are not tested there, bugs enter upstream
> >>> tree. And then it takes much longer to get fix into other trees.
> >>>
> >>> So the question is: what trees/branches should be tested? Preferably
> >>> in priority order as syzbot can't test all of them.
> >>>
> >>
> >> I always thought that -next existed specifically to give people a
> >> chance to test the code in it. Maybe the question is where to report
> >> the test results ?
> >
> > FTR, from Guenter on another thread:
> >
> >> Interesting. Assuming that refers to linux-next, not linux-net, that
> >> may explain why linux-next tends to deteriorate. I wonder if I should
> >> drop it from my testing as well. I'll be happy to follow whatever the
> >> result of this exchange is and do the same.
> >
> > If we agree on some list of important branches, and what branches
> > specifically should not be tested with automatic reporting, I think it
> > will benefit everybody.
> > +Fengguang, can you please share your list and rationale behind it?
>
> The problem is testing linux-next and then using get-maintainer.pl to
> report the problem.
>
> If you are resource limited I would start by testing Linus's tree to
> find the existing bugs, and to get a baseline. Using get-maintainer.pl
> is fine for sending emails to developers there.
I second this, almost all of the issues you are hitting are usually in
Linus's tree. Let's make that "clean" first, before messing around and
adding 100+ other random developer's trees into the mix :)
thanks,
greg k-h
Hi Dmitry,
On Tue, Jan 16, 2018 at 10:58:51AM +0100, Dmitry Vyukov wrote:
>On Tue, Jan 16, 2018 at 10:45 AM, Guenter Roeck <[email protected]> wrote:
>> On Mon, Jan 15, 2018 at 11:51 PM, Dmitry Vyukov <[email protected]> wrote:
>>> Hello,
>>>
>>> Several people proposed that linux-next should not be tested on
>>> syzbot. While some people suggested that it needs to test as many
>>> trees as possible. I've initially included linux-next as it is a
>>> staging area before upstream tree, with the intention that patches are
>>> _tested_ there, is they are not tested there, bugs enter upstream
>>> tree. And then it takes much longer to get fix into other trees.
>>>
>>> So the question is: what trees/branches should be tested? Preferably
>>> in priority order as syzbot can't test all of them.
>>>
>>
>> I always thought that -next existed specifically to give people a
>> chance to test the code in it. Maybe the question is where to report
>> the test results ?
>
>FTR, from Guenter on another thread:
>
>> Interesting. Assuming that refers to linux-next, not linux-net, that
>> may explain why linux-next tends to deteriorate. I wonder if I should
>> drop it from my testing as well. I'll be happy to follow whatever the
>> result of this exchange is and do the same.
>
>If we agree on some list of important branches, and what branches
>specifically should not be tested with automatic reporting, I think it
>will benefit everybody.
>+Fengguang, can you please share your list and rationale behind it?
0-day aims to aggressively test as much tree and branches as possible,
including various developer trees, maintainer, linux-next, mainline and
stable trees. Here are the complete list of 800+ trees we monitored:
https://git.kernel.org/pub/scm/linux/kernel/git/wfg/lkp-tests.git/tree/repo/linux
The rationale is obvious. IMHO what really matters here is about
capability rather than rationale: that policy heavily relies on the
fundamental capability of auto bisecting. Once regressions are
bisected, we know the owners of problem to auto send report to, ie.
the first bad commit's author and committer.
For the bugs that cannot be bisected, they tend to be old ones and
we report more often on mainline tree than linux-next.
Thanks,
Fengguang
On Tue, Jan 16, 2018 at 6:34 PM, Greg Kroah-Hartman
<[email protected]> wrote:
> On Tue, Jan 16, 2018 at 11:02:17AM -0600, Eric W. Biederman wrote:
>> Dmitry Vyukov <[email protected]> writes:
>>
>> > On Tue, Jan 16, 2018 at 10:45 AM, Guenter Roeck <[email protected]> wrote:
>> >> On Mon, Jan 15, 2018 at 11:51 PM, Dmitry Vyukov <[email protected]> wrote:
>> >>> Hello,
>> >>>
>> >>> Several people proposed that linux-next should not be tested on
>> >>> syzbot. While some people suggested that it needs to test as many
>> >>> trees as possible. I've initially included linux-next as it is a
>> >>> staging area before upstream tree, with the intention that patches are
>> >>> _tested_ there, is they are not tested there, bugs enter upstream
>> >>> tree. And then it takes much longer to get fix into other trees.
>> >>>
>> >>> So the question is: what trees/branches should be tested? Preferably
>> >>> in priority order as syzbot can't test all of them.
>> >>>
>> >>
>> >> I always thought that -next existed specifically to give people a
>> >> chance to test the code in it. Maybe the question is where to report
>> >> the test results ?
>> >
>> > FTR, from Guenter on another thread:
>> >
>> >> Interesting. Assuming that refers to linux-next, not linux-net, that
>> >> may explain why linux-next tends to deteriorate. I wonder if I should
>> >> drop it from my testing as well. I'll be happy to follow whatever the
>> >> result of this exchange is and do the same.
>> >
>> > If we agree on some list of important branches, and what branches
>> > specifically should not be tested with automatic reporting, I think it
>> > will benefit everybody.
>> > +Fengguang, can you please share your list and rationale behind it?
>>
>> The problem is testing linux-next and then using get-maintainer.pl to
>> report the problem.
>>
>> If you are resource limited I would start by testing Linus's tree to
>> find the existing bugs, and to get a baseline. Using get-maintainer.pl
>> is fine for sending emails to developers there.
>
> I second this, almost all of the issues you are hitting are usually in
> Linus's tree. Let's make that "clean" first, before messing around and
> adding 100+ other random developer's trees into the mix :)
FTR I've just dropped linux-next and mmots from syzbot.
On Fri, Jan 19, 2018 at 2:48 AM, Fengguang Wu <[email protected]> wrote:
> Hi Dmitry,
>
>
> On Tue, Jan 16, 2018 at 10:58:51AM +0100, Dmitry Vyukov wrote:
>>
>> On Tue, Jan 16, 2018 at 10:45 AM, Guenter Roeck <[email protected]> wrote:
>>>
>>> On Mon, Jan 15, 2018 at 11:51 PM, Dmitry Vyukov <[email protected]>
>>> wrote:
>>>>
>>>> Hello,
>>>>
>>>> Several people proposed that linux-next should not be tested on
>>>> syzbot. While some people suggested that it needs to test as many
>>>> trees as possible. I've initially included linux-next as it is a
>>>> staging area before upstream tree, with the intention that patches are
>>>> _tested_ there, is they are not tested there, bugs enter upstream
>>>> tree. And then it takes much longer to get fix into other trees.
>>>>
>>>> So the question is: what trees/branches should be tested? Preferably
>>>> in priority order as syzbot can't test all of them.
>>>>
>>>
>>> I always thought that -next existed specifically to give people a
>>> chance to test the code in it. Maybe the question is where to report
>>> the test results ?
>>
>>
>> FTR, from Guenter on another thread:
>>
>>> Interesting. Assuming that refers to linux-next, not linux-net, that
>>> may explain why linux-next tends to deteriorate. I wonder if I should
>>> drop it from my testing as well. I'll be happy to follow whatever the
>>> result of this exchange is and do the same.
>>
>>
>> If we agree on some list of important branches, and what branches
>> specifically should not be tested with automatic reporting, I think it
>> will benefit everybody.
>> +Fengguang, can you please share your list and rationale behind it?
>
>
> 0-day aims to aggressively test as much tree and branches as possible,
> including various developer trees, maintainer, linux-next, mainline and
> stable trees. Here are the complete list of 800+ trees we monitored:
>
> https://git.kernel.org/pub/scm/linux/kernel/git/wfg/lkp-tests.git/tree/repo/linux
>
> The rationale is obvious. IMHO what really matters here is about
> capability rather than rationale: that policy heavily relies on the
> fundamental capability of auto bisecting. Once regressions are
> bisected, we know the owners of problem to auto send report to, ie.
> the first bad commit's author and committer.
>
> For the bugs that cannot be bisected, they tend to be old ones and
> we report more often on mainline tree than linux-next.
Thanks for the info, Fengguang.
Bisecting is something we need to syzbot in future. However about 50%
of syzbot bugs are due to races and are somewhat difficult to bisect
reliably.
On 2018/01/22 22:32, Dmitry Vyukov wrote:
> On Tue, Jan 16, 2018 at 6:34 PM, Greg Kroah-Hartman
> <[email protected]> wrote:
>>> The problem is testing linux-next and then using get-maintainer.pl to
>>> report the problem.
>>>
>>> If you are resource limited I would start by testing Linus's tree to
>>> find the existing bugs, and to get a baseline. Using get-maintainer.pl
>>> is fine for sending emails to developers there.
>>
>> I second this, almost all of the issues you are hitting are usually in
>> Linus's tree. Let's make that "clean" first, before messing around and
>> adding 100+ other random developer's trees into the mix :)
>
> FTR I've just dropped linux-next and mmots from syzbot.
>
I hope that we can test linux-next on syzbot, as a tree for testing debug
printk() patches. People do not like sending debug printk() patches to
Linus's tree, while majority of bugs are found in Linus's tree.
We could automatically expire (and delete) reports found in linux-next from
the table at https://syzkaller.appspot.com/ if the bug was not reproduced
for some time (e.g. one week or one month).
On Fri, Jun 8, 2018 at 11:36 PM Tetsuo Handa
<[email protected]> wrote:
> On 2018/01/22 22:32, Dmitry Vyukov wrote:
> >
> > FTR I've just dropped linux-next and mmots from syzbot.
>
> I hope that we can test linux-next on syzbot, as a tree for testing debug
> printk() patches.
I think it would be lovely to get linux-next back eventually, but it
sounds like it's just too noisy right now, and yes, we should have a
baseline for the standard tree first.
But once there's a "this is known for the baseline", I think adding
linux-next back in and then maybe even have linux-next simply just
kick out trees that cause problems would be a good idea.
Right now linux-next only kicks things out based on build issues (or
extreme merge issues), afaik. But it *would* be good to also have
things like syzbot do quality control on linux-next.
Because the more things get found and fixed before they even hit my
tree, the better.
Linus
On Sat, Jun 09, 2018 at 03:17:21PM -0700, Linus Torvalds wrote:
> I think it would be lovely to get linux-next back eventually, but it
> sounds like it's just too noisy right now, and yes, we should have a
> baseline for the standard tree first.
>
> But once there's a "this is known for the baseline", I think adding
> linux-next back in and then maybe even have linux-next simply just
> kick out trees that cause problems would be a good idea.
>
> Right now linux-next only kicks things out based on build issues (or
> extreme merge issues), afaik. But it *would* be good to also have
> things like syzbot do quality control on linux-next.
Syzbot is always getting improved to find new classes of problems. So
the only way to get a baseline would be to use an older version of
syzbot for linux-next, and to have it suppress sending e-mails about
failures that are duplicates that were already found via the mainline
tree.
Then periodically, once version N has run for M weeks, and has spewed
some large number of new failures to LKML, then you could promote
version N to be run against linux-next, and so hopefully the only
thing it would report against linux-next are regressions, and not
duplicates of new bugs also being found via the latest and greatest
version of syzbot being run against the mainline kernel.
- Ted
On Sun, Jun 10, 2018 at 3:51 AM, Theodore Y. Ts'o <[email protected]> wrote:
> On Sat, Jun 09, 2018 at 03:17:21PM -0700, Linus Torvalds wrote:
>> I think it would be lovely to get linux-next back eventually, but it
>> sounds like it's just too noisy right now, and yes, we should have a
>> baseline for the standard tree first.
>>
>> But once there's a "this is known for the baseline", I think adding
>> linux-next back in and then maybe even have linux-next simply just
>> kick out trees that cause problems would be a good idea.
>>
>> Right now linux-next only kicks things out based on build issues (or
>> extreme merge issues), afaik. But it *would* be good to also have
>> things like syzbot do quality control on linux-next.
>
> Syzbot is always getting improved to find new classes of problems. So
> the only way to get a baseline would be to use an older version of
> syzbot for linux-next, and to have it suppress sending e-mails about
> failures that are duplicates that were already found via the mainline
> tree.
>
> Then periodically, once version N has run for M weeks, and has spewed
> some large number of new failures to LKML, then you could promote
> version N to be run against linux-next, and so hopefully the only
> thing it would report against linux-next are regressions, and not
> duplicates of new bugs also being found via the latest and greatest
> version of syzbot being run against the mainline kernel.
The set of trees where a crash happened is visible on dashboard, so
one can see if it's only linux-next or whole set of trees. Potentially
syzbot can act differently depending on this predicate, but I don't
see what should be the difference. However, this does not fully save
from falsely assessing bugs as linux-next-only just because they
happened few times and only on linux-next so far. But using an older
syzkaller revision won't save from this fully either, because (1) some
bugs take long time to find, and (2) a bug can be hidden by another
known bug, so when the second bug is fixed the first one suddenly pops
up, but it's not a new bug (and the chances are that the second one
will be fixed on linux-next first, so the first bug will look like
linux-next-only).
I think re removing commits from linux-next, one of the main signals
can be: were there recent changes related to the bug. Looking at new
bugs being reported, frequently it's quite obvious (e.g.
"use-after-free in foo" and a recent "make foo faster").
But in general, if we go with linux-next, maintainers and developers
need to agree to deal with this additional aspect during bug triage.
There is also a problem with rebasing of linux-next: reported commit
hashes do not make sense and we can forget about bisection.
On a related note, recently Greg suggested to onboard more subsystem
-next trees (currently we test only net-next and bpf-next), so I tried
to formulate requirements for these trees:
https://github.com/google/syzkaller/issues/592
- not rebased (commit hashes work, bisection works)
- maintained in a reasonably good shape (no tons of assorted crashes)
- reasonably active (makes sense to test)
- merge upstream periodically (bugs are getting fixed)
- with maintainers who are willing to cooperate and fix bugs
Any volunteers?
Thanks
On Sun, Jun 10, 2018 at 08:11:05AM +0200, Dmitry Vyukov wrote:
>
> The set of trees where a crash happened is visible on dashboard, so
> one can see if it's only linux-next or whole set of trees. Potentially
> syzbot can act differently depending on this predicate, but I don't
> see what should be the difference. However, this does not fully save
> from falsely assessing bugs as linux-next-only just because they
> happened few times and only on linux-next so far.
So how about this, only report something as being a linux-next
regression if (a) there is a reproducer, and (b) the reproducer does
not trigger any kind of crash on mainline?
> There is also a problem with rebasing of linux-next: reported commit
> hashes do not make sense and we can forget about bisection.
If there is a valid reproducer, bisection should simply be a matter ofu
running and if we know the reproducer doesn't trigger on mainline,
then the bisection should only require no more than 8-10 VM runs. For
Linux-next, this would be *super* valuable. Reporting the commit ID
and the one-line commit summary will be enough for most maintainers,
since even if they are using a rewinding head, so long as the
bisection can be done quickly enough (e.g., within a few days), it
will still be in their git repository.
And if you have a reproducer, then once it's identified as a
linux-next reproducer with a guilty commit, that can be confirmed by
either (a) seeing if you can revert the commit and if it makes the
problem go away, or (b) figure out which subsystem git tree the commit
was introduced via, and then verify that the reproducer triggers on
the tip of the subsystem git tree.
All of this will require development effort, so I suspect it's not
something we'll see from syzbot tomorrow --- but it's not
*impossible*.
I think though that sending e-mail about a linux-next syzbot crash if
there is a reproducer and the reproducer doesn't trigger a crash on
mainline should be really simple to implement, and it would add huge
value without spamming the subsystem maintainers.
- Ted
On Mon, Jun 11, 2018 at 3:22 AM, Theodore Y. Ts'o <[email protected]> wrote:
> On Sun, Jun 10, 2018 at 08:11:05AM +0200, Dmitry Vyukov wrote:
>>
>> The set of trees where a crash happened is visible on dashboard, so
>> one can see if it's only linux-next or whole set of trees. Potentially
>> syzbot can act differently depending on this predicate, but I don't
>> see what should be the difference. However, this does not fully save
>> from falsely assessing bugs as linux-next-only just because they
>> happened few times and only on linux-next so far.
>
> So how about this, only report something as being a linux-next
> regression if (a) there is a reproducer, and (b) the reproducer does
> not trigger any kind of crash on mainline?
>
>> There is also a problem with rebasing of linux-next: reported commit
>> hashes do not make sense and we can forget about bisection.
>
> If there is a valid reproducer, bisection should simply be a matter ofu
> running and if we know the reproducer doesn't trigger on mainline,
> then the bisection should only require no more than 8-10 VM runs. For
> Linux-next, this would be *super* valuable. Reporting the commit ID
> and the one-line commit summary will be enough for most maintainers,
> since even if they are using a rewinding head, so long as the
> bisection can be done quickly enough (e.g., within a few days), it
> will still be in their git repository.
>
> And if you have a reproducer, then once it's identified as a
> linux-next reproducer with a guilty commit, that can be confirmed by
> either (a) seeing if you can revert the commit and if it makes the
> problem go away, or (b) figure out which subsystem git tree the commit
> was introduced via, and then verify that the reproducer triggers on
> the tip of the subsystem git tree.
>
> All of this will require development effort, so I suspect it's not
> something we'll see from syzbot tomorrow --- but it's not
> *impossible*.
>
> I think though that sending e-mail about a linux-next syzbot crash if
> there is a reproducer and the reproducer doesn't trigger a crash on
> mainline should be really simple to implement, and it would add huge
> value without spamming the subsystem maintainers.
But if this also happens on upstream, then we want to report it
twofold. So this predicate can be reduced to "report crashes that
happen only on linux-next iff they have reproducers", right?
We will probably also need something that will auto-invalidate old
bugs that were never reported.
Re backwards bisection (when bug is introduced), we can actually test
linux-next-history instead of linux-next, right?
But forward bisection (when bug is fixed) unfortunately won't work
because these commits are not connected to HEAD. And forward bisection
is very important, otherwise who will bring order to all these
hundreds of open bugs?
https://syzkaller.appspot.com/
Hi Dmitry,
On Fri, 15 Jun 2018 11:54:16 +0200 Dmitry Vyukov <[email protected]> wrote:
>
> Re backwards bisection (when bug is introduced), we can actually test
> linux-next-history instead of linux-next, right?
I don't see why using linux-next-history would be any better, it just
contains all the linux-next releases while the linux-next tree contains
the last 3 months worth.
--
Cheers,
Stephen Rothwell
Dmitry Vyukov <[email protected]> writes:
> On Mon, Jun 11, 2018 at 3:22 AM, Theodore Y. Ts'o <[email protected]> wrote:
>> On Sun, Jun 10, 2018 at 08:11:05AM +0200, Dmitry Vyukov wrote:
>>>
>>> The set of trees where a crash happened is visible on dashboard, so
>>> one can see if it's only linux-next or whole set of trees. Potentially
>>> syzbot can act differently depending on this predicate, but I don't
>>> see what should be the difference. However, this does not fully save
>>> from falsely assessing bugs as linux-next-only just because they
>>> happened few times and only on linux-next so far.
>>
>> So how about this, only report something as being a linux-next
>> regression if (a) there is a reproducer, and (b) the reproducer does
>> not trigger any kind of crash on mainline?
>>
>>> There is also a problem with rebasing of linux-next: reported commit
>>> hashes do not make sense and we can forget about bisection.
>>
>> If there is a valid reproducer, bisection should simply be a matter ofu
>> running and if we know the reproducer doesn't trigger on mainline,
>> then the bisection should only require no more than 8-10 VM runs. For
>> Linux-next, this would be *super* valuable. Reporting the commit ID
>> and the one-line commit summary will be enough for most maintainers,
>> since even if they are using a rewinding head, so long as the
>> bisection can be done quickly enough (e.g., within a few days), it
>> will still be in their git repository.
>>
>> And if you have a reproducer, then once it's identified as a
>> linux-next reproducer with a guilty commit, that can be confirmed by
>> either (a) seeing if you can revert the commit and if it makes the
>> problem go away, or (b) figure out which subsystem git tree the commit
>> was introduced via, and then verify that the reproducer triggers on
>> the tip of the subsystem git tree.
>>
>> All of this will require development effort, so I suspect it's not
>> something we'll see from syzbot tomorrow --- but it's not
>> *impossible*.
>>
>> I think though that sending e-mail about a linux-next syzbot crash if
>> there is a reproducer and the reproducer doesn't trigger a crash on
>> mainline should be really simple to implement, and it would add huge
>> value without spamming the subsystem maintainers.
>
>
> But if this also happens on upstream, then we want to report it
> twofold. So this predicate can be reduced to "report crashes that
> happen only on linux-next iff they have reproducers", right?
> We will probably also need something that will auto-invalidate old
> bugs that were never reported.
>
> Re backwards bisection (when bug is introduced), we can actually test
> linux-next-history instead of linux-next, right?
> But forward bisection (when bug is fixed) unfortunately won't work
> because these commits are not connected to HEAD. And forward bisection
> is very important, otherwise who will bring order to all these
> hundreds of open bugs?
> https://syzkaller.appspot.com/
Maybe you want to monitor linux-next and see if the problem commits
disappear. That can let you stop worrying about the issue.
I don't see the point of worrying about which linux-next build a problem
appeared in. It is the first commit that reproduces the problem that is
interesting.
That commit tells you who did something that was problematic. If you
notify the committer with the reproducer they should be able to
reproduce the problem and fix it.
Very rarely I suspect it will be the merge commit into linux-next that
is the problem, but most of the time these commits are going to be in
the subsystem trees.
Eric
> But forward bisection (when bug is fixed) unfortunately won't work
> because these commits are not connected to HEAD. And forward bisection
> is very important, otherwise who will bring order to all these
> hundreds of open bugs?
> https://syzkaller.appspot.com/
Bisection isn't so important when you are trying to close bugs that
got fixed, with a note that it's no longer reproducable. It might mean the
reproducer broke but it also stops you drowning and it tells a user that
they might as well try the new one and see if still breaks thus
collecting the information needed.
True it's nice to know what commit may have magically fixed it but it's
not essential. Further more once you see a bug is fixed even in -next you
can later run the reproducer against an actual release to make sure it's
still fixed there, and bisect between previous release and that release to
find a mainline commit id if it's a single fix point.
Alan
On 2018/06/10 7:17, Linus Torvalds wrote:
> On Fri, Jun 8, 2018 at 11:36 PM Tetsuo Handa
> <[email protected]> wrote:
>> On 2018/01/22 22:32, Dmitry Vyukov wrote:
>>>
>>> FTR I've just dropped linux-next and mmots from syzbot.
>>
>> I hope that we can test linux-next on syzbot, as a tree for testing debug
>> printk() patches.
>
> I think it would be lovely to get linux-next back eventually, but it
> sounds like it's just too noisy right now, and yes, we should have a
> baseline for the standard tree first.
>
> But once there's a "this is known for the baseline", I think adding
> linux-next back in and then maybe even have linux-next simply just
> kick out trees that cause problems would be a good idea.
>
> Right now linux-next only kicks things out based on build issues (or
> extreme merge issues), afaik. But it *would* be good to also have
> things like syzbot do quality control on linux-next.
>
> Because the more things get found and fixed before they even hit my
> tree, the better.
>
> Linus
>
I hope we can accept NOW either "reviving linux-next.git" or "allowing debug printk()
patches for linux.git". For example, "INFO: task hung in __sb_start_write" got 900
crashes in 81 days but still unable to find a reproducer. Dmitry tried to reproduce
locally with debug printk() patches but not yet successful. I think that testing with
http://lkml.kernel.org/r/[email protected]
on linux.git or linux-next.git is the only realistic way for debugging this bug.
More we postpone revival of the linux-next, more syzbot reports we will get...
On Tue, Jun 26, 2018 at 07:54:53PM +0900, Tetsuo Handa wrote:
> I hope we can accept NOW either "reviving linux-next.git" or "allowing debug printk()
> patches for linux.git". For example, "INFO: task hung in __sb_start_write" got 900
> crashes in 81 days but still unable to find a reproducer. Dmitry tried to reproduce
> locally with debug printk() patches but not yet successful. I think that testing with
> http://lkml.kernel.org/r/[email protected]
> on linux.git or linux-next.git is the only realistic way for debugging this bug.
> More we postpone revival of the linux-next, more syzbot reports we will get...
Here's a proposal for adding linux-next back:
*) Subsystems or maintainers need to have a way to opt out of getting
spammed with Syzkaller reports that have no reproducer. More often
than not, they are not actionable, and just annoy the maintainers,
with the net result that they tune out all Syzkaller reports as
noise.
*) Email reports for failures on linux-next that correspond to known
failures on mainline should be suppressed. Another way of doing
this would be to only report a problem found by a specific
reproducer to the mailing list unless the recipient has agreed to
be spammed by Syskaller noise.
And please please please, Syzkaller needs to figure out how to do
bisection runs once you have a reproducer.
- Ted
On Tue, Jun 26, 2018 at 7:38 AM Dmitry Vyukov <[email protected]> wrote:
>
> On Tue, Jun 26, 2018 at 4:16 PM, Theodore Y. Ts'o <[email protected]> wrote:
> > On Tue, Jun 26, 2018 at 07:54:53PM +0900, Tetsuo Handa wrote:
> >> I hope we can accept NOW either "reviving linux-next.git" or "allowing debug printk()
> >> patches for linux.git". For example, "INFO: task hung in __sb_start_write" got 900
> >> crashes in 81 days but still unable to find a reproducer. Dmitry tried to reproduce
> >> locally with debug printk() patches but not yet successful. I think that testing with
> >> http://lkml.kernel.org/r/[email protected]
> >> on linux.git or linux-next.git is the only realistic way for debugging this bug.
> >> More we postpone revival of the linux-next, more syzbot reports we will get...
> >
> > Here's a proposal for adding linux-next back:
> >
> > *) Subsystems or maintainers need to have a way to opt out of getting
> > spammed with Syzkaller reports that have no reproducer. More often
> > than not, they are not actionable, and just annoy the maintainers,
> > with the net result that they tune out all Syzkaller reports as
> > noise.
>
> False. You can count yourself. 2/3 are actionable and fixed.
>
Problem is that some if not many of the other 1/3 will be considered
noise, and even some of the 2/3 will be considered noise because they
have already been fixed by the time they are reported. Same problem as
with, say, stable tree merges: People don't see the thousands of bug
fixes inherited with such merges, but they do see the two or three
regressions. Plus, of course, one can not prove that the thousands of
bug fixes did any good because the fixed bugs are not observable
anymore. The only remedy is to try to reduce regressions down to zero
(or, of course, stop using/merging stable releases).
The same applies here: People won't see the good, they only see the
noise. This is pretty much the reason why I all but stopped reporting
build/boot failures on -next. You would have to reduce the noise
almost down to zero for people to stop complaining, and you would have
to be _really_ sure that the problem was not already fixed or reported
elsewhere.
Guenter
> This also makes the following point ungrounded.
>
> > *) Email reports for failures on linux-next that correspond to known
> > failures on mainline should be suppressed. Another way of doing
> > this would be to only report a problem found by a specific
> > reproducer to the mailing list unless the recipient has agreed to
> > be spammed by Syskaller noise.
> >
> > And please please please, Syzkaller needs to figure out how to do
> > bisection runs once you have a reproducer.
> >
> > - Ted
On Tue, Jun 26, 2018 at 4:16 PM, Theodore Y. Ts'o <[email protected]> wrote:
> On Tue, Jun 26, 2018 at 07:54:53PM +0900, Tetsuo Handa wrote:
>> I hope we can accept NOW either "reviving linux-next.git" or "allowing debug printk()
>> patches for linux.git". For example, "INFO: task hung in __sb_start_write" got 900
>> crashes in 81 days but still unable to find a reproducer. Dmitry tried to reproduce
>> locally with debug printk() patches but not yet successful. I think that testing with
>> http://lkml.kernel.org/r/[email protected]
>> on linux.git or linux-next.git is the only realistic way for debugging this bug.
>> More we postpone revival of the linux-next, more syzbot reports we will get...
>
> Here's a proposal for adding linux-next back:
>
> *) Subsystems or maintainers need to have a way to opt out of getting
> spammed with Syzkaller reports that have no reproducer. More often
> than not, they are not actionable, and just annoy the maintainers,
> with the net result that they tune out all Syzkaller reports as
> noise.
False. You can count yourself. 2/3 are actionable and fixed.
This also makes the following point ungrounded.
> *) Email reports for failures on linux-next that correspond to known
> failures on mainline should be suppressed. Another way of doing
> this would be to only report a problem found by a specific
> reproducer to the mailing list unless the recipient has agreed to
> be spammed by Syskaller noise.
>
> And please please please, Syzkaller needs to figure out how to do
> bisection runs once you have a reproducer.
>
> - Ted
On 2018/06/26 23:54, Guenter Roeck wrote:
> On Tue, Jun 26, 2018 at 7:38 AM Dmitry Vyukov <[email protected]> wrote:
>>
>> On Tue, Jun 26, 2018 at 4:16 PM, Theodore Y. Ts'o <[email protected]> wrote:
>>> On Tue, Jun 26, 2018 at 07:54:53PM +0900, Tetsuo Handa wrote:
>>>> I hope we can accept NOW either "reviving linux-next.git" or "allowing debug printk()
>>>> patches for linux.git". For example, "INFO: task hung in __sb_start_write" got 900
>>>> crashes in 81 days but still unable to find a reproducer. Dmitry tried to reproduce
>>>> locally with debug printk() patches but not yet successful. I think that testing with
>>>> http://lkml.kernel.org/r/[email protected]
>>>> on linux.git or linux-next.git is the only realistic way for debugging this bug.
>>>> More we postpone revival of the linux-next, more syzbot reports we will get...
>>>
>>> Here's a proposal for adding linux-next back:
>>>
>>> *) Subsystems or maintainers need to have a way to opt out of getting
>>> spammed with Syzkaller reports that have no reproducer. More often
>>> than not, they are not actionable, and just annoy the maintainers,
>>> with the net result that they tune out all Syzkaller reports as
>>> noise.
>>
>> False. You can count yourself. 2/3 are actionable and fixed.
>>
>
> Problem is that some if not many of the other 1/3 will be considered
> noise, and even some of the 2/3 will be considered noise because they
> have already been fixed by the time they are reported. Same problem as
> with, say, stable tree merges: People don't see the thousands of bug
> fixes inherited with such merges, but they do see the two or three
> regressions. Plus, of course, one can not prove that the thousands of
> bug fixes did any good because the fixed bugs are not observable
> anymore. The only remedy is to try to reduce regressions down to zero
> (or, of course, stop using/merging stable releases).
>
> The same applies here: People won't see the good, they only see the
> noise. This is pretty much the reason why I all but stopped reporting
> build/boot failures on -next. You would have to reduce the noise
> almost down to zero for people to stop complaining, and you would have
> to be _really_ sure that the problem was not already fixed or reported
> elsewhere.
>
> Guenter
>
I think that syzbot can stop deciding email recipients and leave it to those who
diagnose bugs, for the ratio of sending to wrong subsystem maintainers is not low.
For example, syzbot assumed that "INFO: task hung in __get_super" is a fs layer bug.
But I think that the problem is in more lower layers (block or mm or locking layer).
The root cause could even be just overstressed due to instructions enabled by
CONFIG_KCOV_ENABLE_COMPARISONS=y.
On 2018/06/27 5:37, Tetsuo Handa wrote:
> I think that syzbot can stop deciding email recipients and leave it to those who
> diagnose bugs, for the ratio of sending to wrong subsystem maintainers is not low.
> For example, syzbot assumed that "INFO: task hung in __get_super" is a fs layer bug.
> But I think that the problem is in more lower layers (block or mm or locking layer).
> The root cause could even be just overstressed due to instructions enabled by
> CONFIG_KCOV_ENABLE_COMPARISONS=y.
>
Thinking from today's bpf related reports, I think that subversion/quilt-based
custom patches will be useful as well.
Since quilt can apply changes in a patch atomically (using "quilt push" command),
we can maintain one custom patch for one git tree. Then, the kernel source syzbot
will test is either "no custom patch applied" or "only one custom patch applied".
That is, if "quilt push" fails, syzbot will continue testing without custom patch.
Since subversion manages revision number using an integer, adding a column for
indicating "which custom patch was applied for this report" to the table will not
occupy much space. We will figure out that custom patch needs to be updated via
syzbot reports with that column being empty.
The custom patch can contain whatever changes which might be useful for debugging.
For example, debug printk() for "INFO: task hung in __sb_start_write" case.
For another example, context identifier for printk().
Updating custom patches in subversion repository is done manually. But the cost is
negligible.
Hello Andrew,
It seems that syzbot (experimentally ?) restarted testing linux-next.
May I ask you to carry temporarily debug printk() patch at
https://groups.google.com/d/msg/syzkaller-bugs/E8M8WTqt034/OpadOICfCAAJ
for "INFO: task hung in __sb_start_write" case?
The bug should be reproduced within a day if executed under syzbot environment.
Thus, I'm sure that we don't need to carry this patch for long.
On Sat, 7 Jul 2018 08:26:32 +0900 Tetsuo Handa <[email protected]> wrote:
> Hello Andrew,
>
> It seems that syzbot (experimentally ?) restarted testing linux-next.
>
> May I ask you to carry temporarily debug printk() patch at
> https://groups.google.com/d/msg/syzkaller-bugs/E8M8WTqt034/OpadOICfCAAJ
> for "INFO: task hung in __sb_start_write" case?
>
> The bug should be reproduced within a day if executed under syzbot environment.
> Thus, I'm sure that we don't need to carry this patch for long.
Sure, I can add that. Let's get the build warning sorted out first,
please. Any old silly workaround will suffice in a developer-only
debug patch.
Andrew Morton wrote:
> On Sat, 7 Jul 2018 08:26:32 +0900 Tetsuo Handa <[email protected]> wrote:
>
> > Hello Andrew,
> >
> > It seems that syzbot (experimentally ?) restarted testing linux-next.
> >
> > May I ask you to carry temporarily debug printk() patch at
> > https://groups.google.com/d/msg/syzkaller-bugs/E8M8WTqt034/OpadOICfCAAJ
> > for "INFO: task hung in __sb_start_write" case?
> >
> > The bug should be reproduced within a day if executed under syzbot environment.
> > Thus, I'm sure that we don't need to carry this patch for long.
>
> Sure, I can add that.
Thank you.
> Let's get the build warning sorted out first,
> please. Any old silly workaround will suffice in a developer-only
> debug patch.
The build warning is about mips architecture rather than this patch itself,
for x86_64 builds fine. You can add this patch despite the build warning.