Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753468AbdL1M2P (ORCPT ); Thu, 28 Dec 2017 07:28:15 -0500 Received: from mail-pl0-f51.google.com ([209.85.160.51]:35522 "EHLO mail-pl0-f51.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751442AbdL1M2N (ORCPT ); Thu, 28 Dec 2017 07:28:13 -0500 X-Google-Smtp-Source: ACJfBotXaEbALlNGPAjzc9Oq1MyVMo3xsvCI1ozpE0T70odfeWjilfHWD6RNdLyj//uRsLW49W+pIbmyFyL2pf2plFY= MIME-Version: 1.0 In-Reply-To: References: <20171222033229.GB26818@zzz.localdomain> <581031514458281@web9g.yandex.ru> From: Dmitry Vyukov Date: Thu, 28 Dec 2017 13:27:52 +0100 Message-ID: Subject: Re: [RFC] syzbot process To: Ozgur Cc: Eric Biggers , LKML , syzkaller , Eric Dumazet , Kostya Serebryany , Alexander Potapenko , andreyknvl , Linus Torvalds , Greg Kroah-Hartman , Andrew Morton , Tetsuo Handa , David Miller , Willem de Bruijn , Guenter Roeck , Stephan Mueller , "Eric W. Biederman" , Jiri Slaby , Peter Hurley , Alan Cox Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 5076 Lines: 93 On Thu, Dec 21, 2017 at 6:09 PM, Eric W. Biederman wrote: > The thing is syzbot sucks. It tells us things are wrong but not how to > reproduce the problem. Apparently syzbot will test fixes, but that > doesn't help when more information is needed to track down the problem. > > The long of the short of it is that I don't care about about bug reports > that no one can reproduce and no human cares about. syzbot doesn't care > in the sense of helping to fix things, that things are broken. syzbot > just cries like a baby "It's broken! It's broken!" > > Further syzbot is written in a language (go) that switches which kernel > thread things run in at arbitrary times. That is absolutely not > productive to understanding what is happening when things break. I have > heard too many complaints from container run-times that they can't make > what should be a couple of line change but is completely non-trivial > because someone choose go for their implementation language. Whatever > benefits go has it is not a programming lanauge I would choose for fine > and reproducible control of kernel interfaces. > > This in addition to syzbot needing the latest and greatest version of go > which is not packaged in a handy form by my distro. > > Which in my experenience makes syzbot a whining crybaby that won't do > anything to help and fights you when you try and get close. Hi Eric, Re reproducers: that's not completely true. syzbot aims at providing reproducers for reported bugs, and you can see 140 bug reports with reproducers here: https://groups.google.com/forum/#!searchin/syzkaller-bugs/%22reproducer$20is$20attached%22%7Csort:date Unfortunately, localizing kernel bugs is hard and is not possible in all cases. The root cause of this is actually in the kernel itself, not in syzbot. Things would be much simpler if we would work on a single-threaded, deterministic user-space library. Then we would get preceise reproducers in 100% of cases. But kernel is a concurrent, parallel, non-deterimnistic system that constantly accumulates state. We do try to incrementally improve percent of cases where syzbot manages to create reproducers in general and C reproducers in particular. But that will never be 100% due to the nature of the tested system. Also, you seem to dealt with a single hard case. From what I see over lots of hundreds of reported bugs, in ~2/3 of cases it's actually possible to localize the bug looking at the crash report only (I see that developers frequently don't even run the reproducer when it's present). For example, LOCKDEP/KASAN reports frequently contain enough context information to rootcause, lots of WARNING/BUG/GPFs are due to simple, shallow bugs like missed input check or off-by-one, etc. So I think it would be a mistake to not report bugs without reproducers. Even if there is no reproducer and it's a hard bug with no obvious cause, it happened and it would be wrong to hide this information from the world and pretend that nothing happened. But I understand that the bar for fixing bugs without reproducers is generally higher. I've looked at the case you dealt with ("proc_flush_task oops"). syzbot has provided a syzkaller reproducer for it, and I was in fact able to reproduce the crash running the reproducer. What happened there is that reproducing the crash took ~15-20 mins, syzbot got a lucky coin once when trying syzkaller program, but then when it tested the corresponding C program it did not trigger the crash within allotted time. In such cases syzbot decided to not mis-inform you that the C program triggers the crash. Each report it provides is actual kernel output obtained on a freshly booted machine running exact reproducer it provides on exact kernel commit and config. Re Go (implementation language): this is not true. The part of syzkaller that actually executes syscalls is written in C++ from day one. You can see the code here: https://github.com/google/syzkaller/tree/master/executor It does explicit, manual thread scheduling; compiled as static binary to avoid any variance due to dynamic loading; does not use C++ runtime support nor malloc to avoid unexpected mmap calls. This is mostly for the reasons you outlined. Re version of Go: yes, that's unfortunate. But there is no way we can change this with limited human resources and without subscribing to constant flow of maintanance work. Distros should provide more up-to-date packages. For example, version of gcc that my distro provides (4.8.4, about 5-years old) can't compile for arm at all. It just can't produce binaries, compiler and assembler don't agree on units of alignment directives. I don't see how we can realistically work around cumulative amount of bugs in software over the last 5 years. Fortunately, obtaining a fresh Go toolchain boils down to unpacking an archive from https://golang.org/dl/. Please don't draw too broad conclusions from one/few negative cases that you hit. syzkaller has found 1000+ real bugs in kernels. We are doing our best. Problem domain is hard. Thank you