Received: by 2002:a05:6a10:9848:0:0:0:0 with SMTP id x8csp616080pxf; Wed, 10 Mar 2021 13:17:02 -0800 (PST) X-Google-Smtp-Source: ABdhPJzi/wQYIA5+nAKgIFFkUXgrM0E9hkaPDRDElzwgDAm/mE/OWjQ10Euf1fvlBvxFVF41X17x X-Received: by 2002:aa7:de8b:: with SMTP id j11mr5380350edv.363.1615411022266; Wed, 10 Mar 2021 13:17:02 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1615411022; cv=none; d=google.com; s=arc-20160816; b=lqZfGcz13Vx2ReaThLvPP6U/N/mLzrotdW0z6VFNN1c9Mp/OeDiKlc5ACTUiJu6xrU tLjRAWFod3lTszC7Sh3KzlcXOr36eeA4qoC/qcxgYhXVn+cLcNrGBFXvgACDyS1DDLZ2 oeVge/esMUtk+V9lxP1JWqzyM+8W1jHUD9XN1OXwryGRTOGtv039wkcRmFXyHrlbnIQB EJYzliVep3wXpWylhXQb8pSyr6blNiP3GAPq90qB4gFQnSlMKzz+/LCiPsXlDKS9W/Iz JyfpE4U2wwabglvv38aRbwffcBcNzO5dT1c4UwsTr/KbB0JRfCxohf1C6pbZu4gdMA9+ /Txw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:subject:message-id:date:from:in-reply-to :references:mime-version:dkim-signature; bh=mWjSyj9+MgpkAuhvTxP4EPO0ZRIe4bF8NB224VR5k5Y=; b=WJ1KINTSaZMsf/d2moZHIj45Y8iKkRfhQ5gmlME75UpGIVPeUyxsVpyoA+p/QrjJP0 DUZ6/cgOURl/YbMocb0t8jPInesGEXWm4oi4WcO+IzhMsaeMmi7+mMTSTpYg7zZot0/u 1TNbE3JA4yNhzwwH/MzxVs22scM0H5lLnAxiCIO9dH1bR9PTWDKfGY/rr7aKWjHKP4L7 sm3W5qHm8+yWvXm+r6prlFKc3w5ar3Pgb+T0jYTstZHT2BMlQgtpWUCZZTYH/Qgy10yN eTQQ7mMePGUFfR9vRVH2Ls1rZ6JqzsaiKZIMzljViN1ikZMq3pcOT3FUbRRBUQ/dx7r7 tH8A== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linux-foundation.org header.s=google header.b=PIqDJ2U0; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id jz16si311824ejb.586.2021.03.10.13.16.39; Wed, 10 Mar 2021 13:17:02 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@linux-foundation.org header.s=google header.b=PIqDJ2U0; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231905AbhCJVPM (ORCPT + 99 others); Wed, 10 Mar 2021 16:15:12 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:34864 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232465AbhCJVOp (ORCPT ); Wed, 10 Mar 2021 16:14:45 -0500 Received: from mail-lf1-x129.google.com (mail-lf1-x129.google.com [IPv6:2a00:1450:4864:20::129]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id A9217C061756 for ; Wed, 10 Mar 2021 13:14:44 -0800 (PST) Received: by mail-lf1-x129.google.com with SMTP id m22so35941313lfg.5 for ; Wed, 10 Mar 2021 13:14:44 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux-foundation.org; s=google; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=mWjSyj9+MgpkAuhvTxP4EPO0ZRIe4bF8NB224VR5k5Y=; b=PIqDJ2U0GSCx4NtqwWKrFE9QHN7qXy5qh2Js/ocQazC6t5gl+kXnW0bxIJE/LFuD2k T6xs2xk/6PHqSWp9K4hjRH0M1ASKEW1Ap/K9lX9Q8UAOfl66sg1BvnzXj2ilccmkluus iX3LkCYCwJNr7hiSwfQg9Kx4Hg3FIHAwDl0VY= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=mWjSyj9+MgpkAuhvTxP4EPO0ZRIe4bF8NB224VR5k5Y=; b=C1QWMmVZvigJkc9gOiQvef6dAW0aenIaG8F76sK95dMPkjcwZScCGJwMKuw9A7DRQl I5O25CTtA7TXbR36QociAIAJvvIrYL3aNlwCvbeJH1YahxuYGLk72+vdfxswyxqIXk9V M0dZC7m3AsDq3mXnPIqivQX/MJIMqYN3tA24CB8X/mTv/uifiKEsia3k4I44APKMb9Sq 8MINWcir5eKBMonvilYQJC0t0l2gi9mi6ERyA+A+RX0WQgNegG4LeDy6wE86KsIyq2Qf mXN3d8uwCyYi7/DHj5b/u8V69126r+fZAtxX1RQ2ztYlTbTwJw7aj69Y1NTWgnETmdYz o4rA== X-Gm-Message-State: AOAM532F3JaaBaTJwjBfGFnWPpDWb2rPqzYgu+crwCNPBUm6TqGdXEVU LOofugU5bOkoOPVTpqck7zSdpU+GHPa1sQ== X-Received: by 2002:ac2:5e26:: with SMTP id o6mr222993lfg.355.1615410882705; Wed, 10 Mar 2021 13:14:42 -0800 (PST) Received: from mail-lj1-f177.google.com (mail-lj1-f177.google.com. [209.85.208.177]) by smtp.gmail.com with ESMTPSA id l30sm138278lfp.221.2021.03.10.13.14.41 for (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Wed, 10 Mar 2021 13:14:41 -0800 (PST) Received: by mail-lj1-f177.google.com with SMTP id h4so27535276ljl.0 for ; Wed, 10 Mar 2021 13:14:41 -0800 (PST) X-Received: by 2002:a2e:5c84:: with SMTP id q126mr2800569ljb.61.1615410880841; Wed, 10 Mar 2021 13:14:40 -0800 (PST) MIME-Version: 1.0 References: <59ee3289194cd97d70085cce701bc494bfcb4fd2.1615372955.git.gladkov.alexey@gmail.com> In-Reply-To: <59ee3289194cd97d70085cce701bc494bfcb4fd2.1615372955.git.gladkov.alexey@gmail.com> From: Linus Torvalds Date: Wed, 10 Mar 2021 13:14:24 -0800 X-Gmail-Original-Message-ID: Message-ID: Subject: Re: [PATCH v8 3/8] Use atomic_t for ucounts reference counting To: Alexey Gladkov Cc: LKML , io-uring , Kernel Hardening , Linux Containers , Linux-MM , Alexey Gladkov , Andrew Morton , Christian Brauner , "Eric W . Biederman" , Jann Horn , Jens Axboe , Kees Cook , Oleg Nesterov Content-Type: text/plain; charset="UTF-8" Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, Mar 10, 2021 at 4:01 AM Alexey Gladkov wrote: > > > +/* 127: arbitrary random number, small enough to assemble well */ > +#define refcount_zero_or_close_to_overflow(ucounts) \ > + ((unsigned int) atomic_read(&ucounts->count) + 127u <= 127u) > + > +struct ucounts *get_ucounts(struct ucounts *ucounts) > +{ > + if (ucounts) { > + if (refcount_zero_or_close_to_overflow(ucounts)) { > + WARN_ONCE(1, "ucounts: counter has reached its maximum value"); > + return NULL; > + } > + atomic_inc(&ucounts->count); > + } > + return ucounts; Side note: you probably should just make the limit be the "oh, the count overflows into the sign bit". The reason the page cache did that tighter thing is that it actually has _two_ limits: - the "try_get_page()" thing uses the sign bit as a "uhhuh, I've now used up half of the available reference counting bits, and I will refuse to use any more". This is basically your "get_ucounts()" function. It's a "I want a refcount, but I'm willing to deal with failures". - the page cache has a _different_ set of "I need to unconditionally get a refcount, and I can *not* deal with failures". This is basically the traditional "get_page()", which is only used in fairly controlled places, and should never be something that can overflow. And *that* special code then uses that "zero_or_close_to_overflow()" case as a "doing a get_page() in this situation is very very wrong". This is purely a debugging feature used for a VM_BUG_ON() (that has never triggered, as far as I know). For your ucounts situation, you don't have that second case at all, so you have no reason to ever allow the count to even get remotely close to overflowing. A reference count being within 128 counts of overflow (when we're talking a 32-bit count) is basically never a good idea. It means that you are way too close to the limit, and there's a risk that lots of concurrent people all first see an ok value, and then *all* decide to do the increment, and then you're toast. In contrast, if you use the sign bit as a "ok, let's stop incrementing", the fact that your "overflow" test and the increment aren't atomic really isn't a big deal. (And yes, you could use a cmpxchg to *make* the overflow test atomic, but it's often much much more expensive, so..) Linus