Received: by 2002:a05:6a10:22f:0:0:0:0 with SMTP id 15csp83019pxk; Fri, 11 Sep 2020 00:37:34 -0700 (PDT) X-Google-Smtp-Source: ABdhPJyvwuwX3yrIAKtSTnEXZsM+sOAZhwQCdItJbppP1haX8yDF5nwsyHsdeLtsD9jBUiFXdxOx X-Received: by 2002:aa7:c387:: with SMTP id k7mr671682edq.242.1599809853849; Fri, 11 Sep 2020 00:37:33 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1599809853; cv=none; d=google.com; s=arc-20160816; b=pXX2XQqXlXoP+3wk6H+KIoxGBAA4KJnwavzRmXZeiEGxe7RAyv/UpbtVuYCGEVGD2C mSDjq2mprm92ooCKoJhr/ZfigzHg9sHswyafS+UfNUwEIkZj5k7g+9yiKozFKWhkfVhP pE1GkMdaw/3NK8HlwwaICeWdfo62bICdP42fl/AtgUi679lJYaCpc5CSm9VAz1JwvaVn NheYT+Qvs3a0tHR7fqmdNQi0fvnd3uopyxW58O/lfx425AE+qFHlshxgATSzWNQbltrw 0CyVgaOlhL+ZJWpnzEa9C6lkdQmURQ7WYSgDgUed1Ssh8u832/AHpVYEtWSvDLMaumYz ou6Q== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:dkim-signature; bh=AgNHmGolg065FIPbyM/md78J/4QNU3wlyP9uV/8/RYg=; b=otl5fUzoY5qw2cH7Cie02Kto9Aa+tJJNHo8HsYrwUVXdleIDPlJ+o/KDNX9eC1KGHk b6122b3UshS1TYmBlEYNHAnfT0/dJJETibZsiRv10rChyhKiwRM+mxDdd+6sRnvfFh8X r5MjKBvusRxhuT+eIHATcqQGPhllMh2ByOWYrQDfNQ4sdwiDlHDapIwnjYOH/0EouKsO NhB7WoY7uOqkk7c/Scp1hHo7wCrfW5UjttROUlgbuUcqSOpVQjusKXwnssVNB0GWpn4z 2tccO3BoP9Bs9mHe+253PMM/I/H8qLzqHzTFjI3ZLtCeGH/he6YMIZxs7pqbF+3yhiCR 3bIA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20161025 header.b="JiUND/gj"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id bh4si809642ejb.193.2020.09.11.00.37.10; Fri, 11 Sep 2020 00:37:33 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20161025 header.b="JiUND/gj"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1725768AbgIKHgV (ORCPT + 99 others); Fri, 11 Sep 2020 03:36:21 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:45624 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725648AbgIKHgL (ORCPT ); Fri, 11 Sep 2020 03:36:11 -0400 Received: from mail-qt1-x843.google.com (mail-qt1-x843.google.com [IPv6:2607:f8b0:4864:20::843]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 4FCB3C061756 for ; Fri, 11 Sep 2020 00:36:10 -0700 (PDT) Received: by mail-qt1-x843.google.com with SMTP id 19so7157135qtp.1 for ; Fri, 11 Sep 2020 00:36:10 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=AgNHmGolg065FIPbyM/md78J/4QNU3wlyP9uV/8/RYg=; b=JiUND/gjxrV/gBkj143394Hm2srxooce7sGxGfCQSsiLfV4lvwc6tVZuZPj155tOU6 L3uPm0geiPdeXrRMHUqYgtM1qD7dGKHVMcQxgnkI2FGfUqQIOcs4r9zAl+XEbYLULF9c Dr5ibZj4jpH+Vo1ZDbAr29HeknsCBt1a77gCErDYBN1Xp6zKczaF5v3KAjaP2MRH2aT3 M/6dsFHremiT4SeSPVFW9cWRPT5HMEw2EVJFSVTaZ0/V/Wy5USdxStUEt0CtIL7ZlXia otL+xA3gFKF3xHSOh69ooIli779AmgDhoKeNtAUCnw5T6RPdmbPRd5Eod2PxXe0ZFbB2 4gyg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=AgNHmGolg065FIPbyM/md78J/4QNU3wlyP9uV/8/RYg=; b=XkYP2pZ8TwRu1qHbfD07BBGj838pOZ8F9i8ZJaT1u9xHDvnLW+/0il4x7b4G3a2ywW pG3e4isna4vRWBVBN8zPhV/SHuGnoaJB4ElLke4GUp2RXXIu038s5ZaP1X3YJUCNwe/K 4GFg/oe9TcDwEQSiXpVR9Hnh1FrlCRl9XCjdrhmpYVKuS1iTpNlcq2oS79TVKdwh9ltu XynT4Umioth5UZrXyAU/eWfe+gv8Sebpo8D4ax4myYzuTNDYBvYejeX6K0VqKdRylUqn RWvB8ujHKj3HCPbO+wLKiB3kvp8zK07foy4Xb8ZtnmENrbQlH9zwnMHQfEvvVwGr06G8 eTSg== X-Gm-Message-State: AOAM532g4FR+rAKhUUSontIdAUcALHD9sYzzuuapaglGPr+Pv7wdz4Np OYjHY+qXgQGZ2YmMpbaMd9fEYxTtSE4aZXvUYBbkbw== X-Received: by 2002:ac8:5215:: with SMTP id r21mr667601qtn.257.1599809769436; Fri, 11 Sep 2020 00:36:09 -0700 (PDT) MIME-Version: 1.0 References: <20200907134055.2878499-1-elver@google.com> <20200908153102.GB61807@elver.google.com> <20200908155631.GC61807@elver.google.com> In-Reply-To: <20200908155631.GC61807@elver.google.com> From: Dmitry Vyukov Date: Fri, 11 Sep 2020 09:35:58 +0200 Message-ID: Subject: Re: [PATCH RFC 00/10] KFENCE: A low-overhead sampling-based memory safety error detector To: Marco Elver Cc: Vlastimil Babka , Dave Hansen , Alexander Potapenko , Andrew Morton , Catalin Marinas , Christoph Lameter , David Rientjes , Joonsoo Kim , Mark Rutland , Pekka Enberg , "H. Peter Anvin" , "Paul E. McKenney" , Andrey Konovalov , Andrey Ryabinin , Andy Lutomirski , Borislav Petkov , Dave Hansen , Eric Dumazet , Greg Kroah-Hartman , Ingo Molnar , Jann Horn , Jonathan Corbet , Kees Cook , Peter Zijlstra , Qian Cai , Thomas Gleixner , Will Deacon , "the arch/x86 maintainers" , "open list:DOCUMENTATION" , LKML , kasan-dev , Linux ARM , Linux-MM Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, Sep 8, 2020 at 5:56 PM Marco Elver wrote: > > On Tue, Sep 08, 2020 at 05:36PM +0200, Vlastimil Babka wrote: > > On 9/8/20 5:31 PM, Marco Elver wrote: > > >> > > >> How much memory overhead does this end up having? I know it depends on > > >> the object size and so forth. But, could you give some real-world > > >> examples of memory consumption? Also, what's the worst case? Say I > > >> have a ton of worst-case-sized (32b) slab objects. Will I notice? > > > > > > KFENCE objects are limited (default 255). If we exhaust KFENCE's memory > > > pool, no more KFENCE allocations will occur. > > > Documentation/dev-tools/kfence.rst gives a formula to calculate the > > > KFENCE pool size: > > > > > > The total memory dedicated to the KFENCE memory pool can be computed as:: > > > > > > ( #objects + 1 ) * 2 * PAGE_SIZE > > > > > > Using the default config, and assuming a page size of 4 KiB, results in > > > dedicating 2 MiB to the KFENCE memory pool. > > > > > > Does that clarify this point? Or anything else that could help clarify > > > this? > > > > Hmm did you observe that with this limit, a long-running system would eventually > > converge to KFENCE memory pool being filled with long-aged objects, so there > > would be no space to sample new ones? > > Sure, that's a possibility. But remember that we're not trying to > deterministically detect bugs on 1 system (if you wanted that, you > should use KASAN), but a fleet of machines! The non-determinism of which > allocations will end up in KFENCE, will ensure we won't end up with a > fleet of machines of identical allocations. That's exactly what we're > after. Even if we eventually exhaust the pool, you'll still detect bugs > if there are any. > > If you are overly worried, either the sample interval or number of > available objects needs to be tweaked to be larger. The default of 255 > is quite conservative, and even using something larger on a modern > system is hardly noticeable. Choosing a sample interval & number of > objects should also factor in how many machines you plan to deploy this > on. Monitoring /sys/kernel/debug/kfence/stats can help you here. Hi Marco, I reviewed patches and they look good to me (minus some local comments that I've left). The main question/concern I have is what Vlastimil mentioned re long-aged objects. Is the default sample interval values reasonable for typical workloads? Do we have any guidelines on choosing the sample interval? Should it depend on workload/use pattern? By "reasonable" I mean if the pool will last long enough to still sample something after hours/days? Have you tried any experiments with some workload (both short-lived processes and long-lived processes/namespaces) capturing state of the pool? It can make sense to do to better understand dynamics. I suspect that the rate may need to be orders of magnitude lower. Also I am wondering about the boot process (both kernel and init). It's both inherently almost the same for the whole population of machines and inherently produces persistent objects. Should we lower the rate for the first minute of uptime? Or maybe make it proportional to uptime? I feel it's quite an important aspect. We can have this awesome idea and implementation, but radically lower its utility by using bad sampling value (which will have silent "failure mode" -- no bugs detected). But to make it clear: all of this does not conflict with the merge of the first version. Just having tunable sampling interval is good enough. We will get the ultimate understanding only when we start using it widely anyway.