Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1167917AbdDXJVa (ORCPT ); Mon, 24 Apr 2017 05:21:30 -0400 Received: from merlin.infradead.org ([205.233.59.134]:48560 "EHLO merlin.infradead.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1167791AbdDXJVU (ORCPT ); Mon, 24 Apr 2017 05:21:20 -0400 Date: Mon, 24 Apr 2017 11:20:51 +0200 From: Peter Zijlstra To: Jann Horn Cc: Kees Cook , linux-kernel@vger.kernel.org, Eric Biggers , Christoph Hellwig , "axboe@kernel.dk" , James Bottomley , Elena Reshetova , Hans Liljestrand , David Windsor , x86@kernel.org, Ingo Molnar , Arnd Bergmann , Greg Kroah-Hartman , Jann Horn , "David S. Miller" , linux-arch@vger.kernel.org, kernel-hardening@lists.openwall.com, PaX Team Subject: Re: [kernel-hardening] Re: [PATCH] x86/refcount: Implement fast refcount_t handling Message-ID: <20170424092051.ooww4tngcu5274qz@hirez.programming.kicks-ass.net> References: <20170421220939.GA65363@beast> <20170424083250.h2wv2exbi4ytigac@hirez.programming.kicks-ass.net> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: NeoMutt/20170113 (1.7.2) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 1935 Lines: 43 On Mon, Apr 24, 2017 at 10:53:30AM +0200, Jann Horn wrote: > On Mon, Apr 24, 2017 at 10:32 AM, Peter Zijlstra wrote: > > On Fri, Apr 21, 2017 at 03:09:39PM -0700, Kees Cook wrote: > >> This patch ports the x86-specific atomic overflow handling from PaX's > >> PAX_REFCOUNT to the upstream refcount_t API. This is an updated version > >> from PaX that eliminates the saturation race condition by resetting the > >> atomic counter back to the INT_MAX saturation value on both overflow and > >> underflow. To win a race, a system would have to have INT_MAX threads > >> simultaneously overflow before the saturation handler runs. > > > > And is this impossible? Highly unlikely I'll grant you, but absolutely > > impossible? > > > > Also, you forgot nr_cpus in your bound. Afaict the worst case here is > > O(nr_tasks + 3*nr_cpus). > > > > Because PaX does it, is not a correctness argument. And this really > > wants one. > > From include/linux/threads.h: > > /* > * A maximum of 4 million PIDs should be enough for a while. > * [NOTE: PID/TIDs are limited to 2^29 ~= 500+ million, see futex.h.] > */ > #define PID_MAX_LIMIT (CONFIG_BASE_SMALL ? PAGE_SIZE * 8 : \ > (sizeof(long) > 4 ? 4 * 1024 * 1024 : PID_MAX_DEFAULT)) > > AFAICS that means you can only have up to 2^22 running tasks at > a time, since every running task requires a PID in the init pid namespace. That's but an artificial limit and bound to be increased sooner rather than later, given the trend in building ever larger machines. If we go with the futex limit on the full tid space, we end up with 2^30 (the comment here about 29 is wrong, see FUTEX_TID_MASK). We then still have to add a multiple of nr_cpus. Granted, that will still be slightly short of 3^31, but not to any amount I'm really comfortable with, we're _really_ close. Point remains though; Changelog (and arguably the code itself) should very much include such bits.