Received: by 2002:a5b:505:0:0:0:0:0 with SMTP id o5csp1238071ybp; Fri, 4 Oct 2019 11:31:00 -0700 (PDT) X-Google-Smtp-Source: APXvYqysyEtHdSDjmaLWnLcEr3z41/TII79PJREcirj3ACLfoFW5UHdMMhKIMJDXx6hWLxlY/dmV X-Received: by 2002:a17:906:3110:: with SMTP id 16mr13603888ejx.306.1570213859879; Fri, 04 Oct 2019 11:30:59 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1570213859; cv=none; d=google.com; s=arc-20160816; b=to8K5h9Rsu7+K+ThEbVt4CavCl6yNiQmdOfb/cq8QzR896zjS8SxOA14QwjlHpE1JO EWkADSLMtPjezj4ABZQ4qOTB9LNI4hpQE+c40lRTYZjw1jFtOT1v1J99SL452cYdG4A3 KRu8QPfQmdSrGLJhj0d6X5+1yIc7TOEmDblEcHuRkDmioCRCSQDb/EGLe+I93BjJZ7Kg vKS9qzbPa8q6/XbCvsrdLZF9LgdGpxm3XlKKjW2QpOnITlqmbv60rRBYbBWVLiX3qJ/B VBMPzIoKEPdYCbO5SgyOiFTtwcNf4CN4lD+T389ZaVJwPTa46aVOKX6ttNmR5nL6UxKp Tk7Q== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:dkim-signature; bh=XKTE+CPRe5jd94ICvF2hQ8zNqLxfi/uYBmbl92O68Kw=; b=JFBtax6+OLbWeTgtI7KZLxZz9iDx3LUYVO+K099yv/DTjmoSRnybnL4SeCws7bXHdc KZ5DKTdzHabiY1MY8Hsb1Uu1i0eETPstRquOo+LR491XwB9qqNIWn+oX6rM54ljbdQZ2 tN35Y/YC3TIdojOkqhBPkNKQpESmd/UJacrNLAqz7FnKm5H05GN2kDLJE2DgKhHtA5bO uUeAI4oAa3AQub+1hW7yPtM+SBv5vFLlm39SuDdqov3yFBwiAZleAv2q9K/4u3IPfC+v ldWxCjFkP0Q6+Im8Ag7FNpdkOZInRj75ks9S7Fq5U0QKw5SO0gjTeplYvZNERBd+SLpb azMw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20161025 header.b=Zm45PcY2; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id e1si3267997ejf.28.2019.10.04.11.30.35; Fri, 04 Oct 2019 11:30:59 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20161025 header.b=Zm45PcY2; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729529AbfJDS24 (ORCPT + 99 others); Fri, 4 Oct 2019 14:28:56 -0400 Received: from mail-qt1-f196.google.com ([209.85.160.196]:35635 "EHLO mail-qt1-f196.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725932AbfJDS2z (ORCPT ); Fri, 4 Oct 2019 14:28:55 -0400 Received: by mail-qt1-f196.google.com with SMTP id m15so9880862qtq.2 for ; Fri, 04 Oct 2019 11:28:55 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=XKTE+CPRe5jd94ICvF2hQ8zNqLxfi/uYBmbl92O68Kw=; b=Zm45PcY24Po/wDqJiyZ/zV6koWoQVRkSgx8dnFmgg4JA+uOkMBYeUs0EJx0FDf28e2 u14hF0BHoNAks3ZLSOTcvcXjZ3r27f9JEAdTaCOKel3zN14wLq3s1qaTMfZth+nqUWgI hZVpvx0FNr0MB0vsMVRhE+rchV+1+6DK5f59QV4BemcuWY7Hf2wG7VPdgSziRJ1LgQh7 oZh5OcogxICwTsO9icLOPg2S7mN6HWumL2tFg8bEznh5VR6gj2Gjkc77fdxddDqPydrM u5TvvrNKhNMKNagGG8bSaKPu/3lSLsrUTIkamCqxE5dlxznX98HHcSNAN+rMi7JRwep9 UYkg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=XKTE+CPRe5jd94ICvF2hQ8zNqLxfi/uYBmbl92O68Kw=; b=CFlhr8vehd2OVqRieGNlTvlB81oTuZNymSYWp58/uH5Y4rKCqN+yvXz8GFLxHTos+Q cefIE5AaDw3LCIh1WNeXXIlGXWacI7YKRsI/n2BwwzRSAbO9SxMAP0z77Ae+5pYVX9yx CQSb5V58rIP8Zy+xlbiC/QjbzRZGqej7bNybuSImClNqMxo+VPiAS2VdzsAcDsUKNJJt s+zV5gJbma+wuiXXmuhX89H1VhwAD1JQArNfXPBNVBwrlQY7rFkb4hOewUfrpSi39/54 UF1BZARTng948XMq8YwMju05MrT2ZQgy4e8Qg4HJMPKCbuVducEaeQe4AhjlLPIy4XQU R3UA== X-Gm-Message-State: APjAAAX153jSCDOcOtYrzpJoZ71ApAEcAHgvqJQVW/UtXJzV28RYMiqX z2iNmZ1+LQfkVeve5wiV4wB5LfM2YfOQDm8LxzupBQ== X-Received: by 2002:a0c:facc:: with SMTP id p12mr15620694qvo.80.1570213733780; Fri, 04 Oct 2019 11:28:53 -0700 (PDT) MIME-Version: 1.0 References: <20191001211948.GA42035@google.com> <20191004164859.GD253167@google.com> <20191004165736.GF253167@google.com> <20191004180848.GH253167@google.com> In-Reply-To: <20191004180848.GH253167@google.com> From: Dmitry Vyukov Date: Fri, 4 Oct 2019 20:28:42 +0200 Message-ID: Subject: Re: Kernel Concurrency Sanitizer (KCSAN) To: Joel Fernandes Cc: Marco Elver , kasan-dev , LKML , Andrey Konovalov , Alexander Potapenko , "Paul E. McKenney" , Paul Turner , Daniel Axtens , Anatol Pomazau , Will Deacon , Andrea Parri , Alan Stern , LKMM Maintainers -- Akira Yokosawa , Nicholas Piggin , Boqun Feng , Daniel Lustig , Jade Alglave , Luc Maranget Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org " On Fri, Oct 4, 2019 at 8:08 PM Joel Fernandes wrote: > > > > > > > > Hi all, > > > > > > > > > > > > > > > > We would like to share a new data-race detector for the Linux kernel: > > > > > > > > Kernel Concurrency Sanitizer (KCSAN) -- > > > > > > > > https://github.com/google/ktsan/wiki/KCSAN (Details: > > > > > > > > https://github.com/google/ktsan/blob/kcsan/Documentation/dev-tools/kcsan.rst) > > > > > > > > > > > > > > > > To those of you who we mentioned at LPC that we're working on a > > > > > > > > watchpoint-based KTSAN inspired by DataCollider [1], this is it (we > > > > > > > > renamed it to KCSAN to avoid confusion with KTSAN). > > > > > > > > [1] http://usenix.org/legacy/events/osdi10/tech/full_papers/Erickson.pdf > > > > > > > > > > > > > > > > In the coming weeks we're planning to: > > > > > > > > * Set up a syzkaller instance. > > > > > > > > * Share the dashboard so that you can see the races that are found. > > > > > > > > * Attempt to send fixes for some races upstream (if you find that the > > > > > > > > kcsan-with-fixes branch contains an important fix, please feel free to > > > > > > > > point it out and we'll prioritize that). > > > > > > > > > > > > > > > > There are a few open questions: > > > > > > > > * The big one: most of the reported races are due to unmarked > > > > > > > > accesses; prioritization or pruning of races to focus initial efforts > > > > > > > > to fix races might be required. Comments on how best to proceed are > > > > > > > > welcome. We're aware that these are issues that have recently received > > > > > > > > attention in the context of the LKMM > > > > > > > > (https://lwn.net/Articles/793253/). > > > > > > > > * How/when to upstream KCSAN? > > > > > > > > > > > > > > Looks exciting. I think based on our discussion at LPC, you mentioned > > > > > > > one way of pruning is if the compiler generated different code with _ONCE > > > > > > > annotations than what would have otherwise been generated. Is that still on > > > > > > > the table, for the purposing of pruning the reports? > > > > > > > > > > > > This might be interesting at first, but it's not entirely clear how > > > > > > feasible it is. It's also dangerous, because the real issue would be > > > > > > ignored. It may be that one compiler version on a particular > > > > > > architecture generates the same code, but any change in compiler or > > > > > > architecture and this would no longer be true. Let me know if you have > > > > > > any more ideas. > > > > > > > > > > My thought was this technique of looking at compiler generated code can be > > > > > used for prioritization of the reports. Have you tested it though? I think > > > > > without testing such technique, we could not know how much of benefit (or > > > > > lack thereof) there is to the issue. > > > > > > > > > > In fact, IIRC, the compiler generating different code with _ONCE annotation > > > > > can be given as justification for patches doing such conversions. > > > > > > > > > > > > We also should not forget about "missed mutex" races (e.g. unprotected > > > > radix tree), which are much worse and higher priority than a missed > > > > atomic annotation. If we look at codegen we may discard most of them > > > > as non important. > > > > > > Sure. I was not asking to look at codegen as the only signal. But to use the > > > signal for whatever it is worth. > > > > But then we need other, stronger signals. We don't have any. > > So if the codegen is the only one and it says "this is not important", > > then we conclude "this is not important". > > I didn't mean for codegen to say "this is not important", but rather "this IS > important". And for the other ones, "this may not be important, or it may > be very important, I don't know". > > Why do you say a missed atomic anotation is lower priority? A bug is a bug, You started talking about prioritization ;) > and ought to be fixed IMHO. Arguably missing lock acquisition can be detected > more easily due to lockdep assertions and using lockdep, than missing _ONCE > annotations. The latter has no way of being detected at runtime easily and > can be causing failures in mysterious ways. > > I think you can divide the problem up.. One set of bugs that are because of > codegen changes and data races and are "important" for that reason. Another > one that is less clear whether they are important or not -- until you have a > better way of providing a signal for categorizing those. > > Did I miss something? We have: 1. missed annotation with changing codegen. 2. missed annotation with non-changing codegen. 3. missed mutex with changing codegen. 4. missed mutex with non-changing codegen. One can arguably say that 2 is less important than 1. But then both 3 and 4 are not low priority under any circumstances. And we don't have any means to distinguish 1/2 from 3/4. In this situation I don't see how "changing codegen" vs "non-changing codegen" gives us any useful signal. Assuming we have some signal for lower priority, the only useful way of using this signal that I see is throwing lower priority bugs away automatically for now (not reporting on syzbot). Because if we do report all bugs and humans need to look at all of them anyway, this signal is not too useful. If am already spending time on a report, I can as well quickly prioritize it much more precisely than any automatic scheme. If we are not reporting lower priority bugs, we cannot offer to classify "missed mutexes" as lower priority.