MIME-Version: 1.0
In-Reply-To: <87inchsl4h.fsf@xmission.com>
References: <CACT4Y+ZNJRzPxxPQNR88vcPxD54mHY-evhMwR39iMsJh_NwaEA@mail.gmail.com>
 <20180104092552.GA991@amd> <1515058705.7875.25.camel@gmx.de>
 <20180104095628.GA4407@atrey.karlin.mff.cuni.cz> <CACT4Y+YB_p_p4ORbke1oZfkLSC8rnAD81Ra_fprsPSugGAQy8g@mail.gmail.com>
 <87inchsl4h.fsf@xmission.com>
From: Dmitry Vyukov <dvyukov@google.com>
Date: Mon, 15 Jan 2018 11:54:13 +0100
Message-ID: <CACT4Y+ayTp5Ocz4Z8hN_gx7i-7gOeHvmLP93ekrE9f=R9gqaiA@mail.gmail.com>
Subject: Re: LKML admins (syzbot emails are not delivered)
To: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Pavel Machek <pavel@ucw.cz>, Mike Galbraith <efault@gmx.de>,
        LKML <linux-kernel@vger.kernel.org>,
        Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
        Andrew Morton <akpm@linux-foundation.org>,
        Linus Torvalds <torvalds@linux-foundation.org>,
        syzkaller <syzkaller@googlegroups.com>
Content-Type: text/plain; charset="UTF-8"
Sender: linux-kernel-owner@vger.kernel.org

On Thu, Jan 4, 2018 at 4:23 PM, Eric W. Biederman <ebiederm@xmission.com> wrote:
> Dmitry Vyukov <dvyukov@google.com> writes:
>
>> Hi Pavel,
>>
>> I've answered this question here in full detail. In short, this is
>> useful and actionable.
>> https://groups.google.com/d/msg/syzkaller/2nVn_XkVhEE/GjjfISejCgAJ
>
> *Snort*
>
> If the information to solve an issue is not in the Oops syzbot is
> useless.

Hi Eric

That's true. But maintainers of the subsystem is in the best position
to judge that. For that they need to see the report.


> The Oops isn't even mailed in plain text so I have to save the stupid
> thing in a file to see the full text of the problem.

Please elaborate.
Take any syzbot email, oops is right there in the email, in plain text:
https://groups.google.com/forum/#!topic/syzkaller-bugs/F6ImuLmyue8


> Further there is no place in the syzbot process to test fixes.

Please elaborate.
Kernel developer who fixes the bugs, tests it the same way as he/she
does for any other bugs. There is really nothing in syzbot that
prevents you from testing.


> Then there is the issue of testing linux-next and reporting errors on
> who knows what code configuration against code that hasn't changed in
> linux-next.   Which presumably any sane person would assume the errors
> are introduced by some other piece of new code.  But syzbot goes and
> spams the people who wrote the function where the code is failing.

syzbot uses get_maintainers.pl. If you have better suggestions, I am listening.
And note: syzbot _always_ provides exact code configuration.


> Bots can work.  We have all of the automatic testing infrastructure
> against everyone's branches on kernel.org to prove it.

If you mean build/boot testing, than that's an order of magnitude
simper problem. You can build on every commit, you can precisely
pinpoint the guilty commit, etc. Please keep this in mind.


> syzbot finds weird errors, so that makes the problem space more
> difficult to deal with.

kernel contains weird errors, that makes the problem space more difficult.


> Still I compleltely don't see the people behind syzbot presumably you
> Dmitry taking responsibility for syzbot failings.  Instead I see excuses
> like you don't completely control some part of the code that syzbot is
> built on so can't fix practical real world issues.  Like Content-type.

As far as I understand you mean this one:
https://groups.google.com/d/msg/syzkaller/2nVn_XkVhEE/VSZaokajCgAJ

I probably should have described the rationale in more details.
It's not only about technical limitations. It's also about importance
of a feature, time required to implement it, and in the end if it's
the right thing to do at all or not. If that would be a major issue
that is significantly affects experience, that would happen one way or
another regardless of technical limitations. Also simple one-line
changes generally happen even if it's low profit. But in that case, I
think it's just the wrong thing to do. .txt is good, standard
extension for text files. On the other hand, .syz is completely
non-standard that no programs know how to deal with. That's why it did
not happen.
The support for Reported-by tags as discussed in "syzbot process"
thread happened within a week.

Hope this resolves your concerns.


> Bots can be the most horrible thing for a code base.  If there is not
> someone or something going through an filtering out the false positives.
> If there is not a process to ensure that issues are brought to the
> proper peoples attention so things get fixed.  Bots can be completely
> demoralizing or possibily desensitizing because you keep seeing issues,
> and nothing you do ever makes the issues go away.
>
> Given that no one seems to take any responsibility for syzbots failures
> of any kind.  Not content-type in the emails.  Not the body of the
> message (which has a massive disclaimer).  I don't find syzbot at all
> useful.
>
> Tools are for people, in this case kernel programmers.  syzbot has
> serious usability issues.  That makes syzbot a bad tool.

First of all, none of syzbot reports are false positives in the main
sense of this term. Everything it reports happened on unmodified
kernel, running user-space workload, without loading custom modules,
writing to /dev/mem, etc. These _all_ are kernel bugs. In some cases
kernel is bad at explaining precisely what went wrong. But it's
expected from complex, concurrent, non-deterministic system written in
an unsafe language. We need to deal with it.
You get the same reports from humans as well. Say, there is an invalid
free in pcrypt which corrupts memory, but kernel crashes in selinux
later. You will get report about selinux from a human.
syzbot actually makes situation a bit better to the degree possible as
it enables almost all debugging configs. So instead of a random
corruption reports, it provides a KASAN report about the exact
location. Instead of a dead kernel, you get LOCKDEP report about exact
lock inversion, etc.

Now there are duplicates, induced bugs, unexplainable crashes, reports
mailed to wrong people, etc.
There are hundreds of subsystems in kernel. And answering any of these
questions requires expertise in a particular subsystem. Say, this
crash is also a possible way how that bug could manifest. Or, the
crash happened in this subsystem, but the root cause is actually in
the upper-level subsystem that misuses this subsystem.
The right people to deal with this are maintainers of particular
subsystems. Not a single person that does not work on any of these
hundreds of subsystems.