Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754347AbeAOKyh (ORCPT + 1 other); Mon, 15 Jan 2018 05:54:37 -0500 Received: from mail-pl0-f45.google.com ([209.85.160.45]:40898 "EHLO mail-pl0-f45.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751895AbeAOKyf (ORCPT ); Mon, 15 Jan 2018 05:54:35 -0500 X-Google-Smtp-Source: ACJfBovn7aO8MrAWebDS0uvsVLIE9BkXC4AYxhdjvdzo8jtlHIoZU5v7ZoDCGUsKzAeY5JHjzUPLe2vQiBaIlbUwbPc= MIME-Version: 1.0 In-Reply-To: <87inchsl4h.fsf@xmission.com> References: <20180104092552.GA991@amd> <1515058705.7875.25.camel@gmx.de> <20180104095628.GA4407@atrey.karlin.mff.cuni.cz> <87inchsl4h.fsf@xmission.com> From: Dmitry Vyukov Date: Mon, 15 Jan 2018 11:54:13 +0100 Message-ID: Subject: Re: LKML admins (syzbot emails are not delivered) To: "Eric W. Biederman" Cc: Pavel Machek , Mike Galbraith , LKML , Greg Kroah-Hartman , Andrew Morton , Linus Torvalds , syzkaller Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Return-Path: On Thu, Jan 4, 2018 at 4:23 PM, Eric W. Biederman wrote: > Dmitry Vyukov writes: > >> Hi Pavel, >> >> I've answered this question here in full detail. In short, this is >> useful and actionable. >> https://groups.google.com/d/msg/syzkaller/2nVn_XkVhEE/GjjfISejCgAJ > > *Snort* > > If the information to solve an issue is not in the Oops syzbot is > useless. Hi Eric That's true. But maintainers of the subsystem is in the best position to judge that. For that they need to see the report. > The Oops isn't even mailed in plain text so I have to save the stupid > thing in a file to see the full text of the problem. Please elaborate. Take any syzbot email, oops is right there in the email, in plain text: https://groups.google.com/forum/#!topic/syzkaller-bugs/F6ImuLmyue8 > Further there is no place in the syzbot process to test fixes. Please elaborate. Kernel developer who fixes the bugs, tests it the same way as he/she does for any other bugs. There is really nothing in syzbot that prevents you from testing. > Then there is the issue of testing linux-next and reporting errors on > who knows what code configuration against code that hasn't changed in > linux-next. Which presumably any sane person would assume the errors > are introduced by some other piece of new code. But syzbot goes and > spams the people who wrote the function where the code is failing. syzbot uses get_maintainers.pl. If you have better suggestions, I am listening. And note: syzbot _always_ provides exact code configuration. > Bots can work. We have all of the automatic testing infrastructure > against everyone's branches on kernel.org to prove it. If you mean build/boot testing, than that's an order of magnitude simper problem. You can build on every commit, you can precisely pinpoint the guilty commit, etc. Please keep this in mind. > syzbot finds weird errors, so that makes the problem space more > difficult to deal with. kernel contains weird errors, that makes the problem space more difficult. > Still I compleltely don't see the people behind syzbot presumably you > Dmitry taking responsibility for syzbot failings. Instead I see excuses > like you don't completely control some part of the code that syzbot is > built on so can't fix practical real world issues. Like Content-type. As far as I understand you mean this one: https://groups.google.com/d/msg/syzkaller/2nVn_XkVhEE/VSZaokajCgAJ I probably should have described the rationale in more details. It's not only about technical limitations. It's also about importance of a feature, time required to implement it, and in the end if it's the right thing to do at all or not. If that would be a major issue that is significantly affects experience, that would happen one way or another regardless of technical limitations. Also simple one-line changes generally happen even if it's low profit. But in that case, I think it's just the wrong thing to do. .txt is good, standard extension for text files. On the other hand, .syz is completely non-standard that no programs know how to deal with. That's why it did not happen. The support for Reported-by tags as discussed in "syzbot process" thread happened within a week. Hope this resolves your concerns. > Bots can be the most horrible thing for a code base. If there is not > someone or something going through an filtering out the false positives. > If there is not a process to ensure that issues are brought to the > proper peoples attention so things get fixed. Bots can be completely > demoralizing or possibily desensitizing because you keep seeing issues, > and nothing you do ever makes the issues go away. > > Given that no one seems to take any responsibility for syzbots failures > of any kind. Not content-type in the emails. Not the body of the > message (which has a massive disclaimer). I don't find syzbot at all > useful. > > Tools are for people, in this case kernel programmers. syzbot has > serious usability issues. That makes syzbot a bad tool. First of all, none of syzbot reports are false positives in the main sense of this term. Everything it reports happened on unmodified kernel, running user-space workload, without loading custom modules, writing to /dev/mem, etc. These _all_ are kernel bugs. In some cases kernel is bad at explaining precisely what went wrong. But it's expected from complex, concurrent, non-deterministic system written in an unsafe language. We need to deal with it. You get the same reports from humans as well. Say, there is an invalid free in pcrypt which corrupts memory, but kernel crashes in selinux later. You will get report about selinux from a human. syzbot actually makes situation a bit better to the degree possible as it enables almost all debugging configs. So instead of a random corruption reports, it provides a KASAN report about the exact location. Instead of a dead kernel, you get LOCKDEP report about exact lock inversion, etc. Now there are duplicates, induced bugs, unexplainable crashes, reports mailed to wrong people, etc. There are hundreds of subsystems in kernel. And answering any of these questions requires expertise in a particular subsystem. Say, this crash is also a possible way how that bug could manifest. Or, the crash happened in this subsystem, but the root cause is actually in the upper-level subsystem that misuses this subsystem. The right people to deal with this are maintainers of particular subsystems. Not a single person that does not work on any of these hundreds of subsystems.