Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S936471AbdGTPMr (ORCPT ); Thu, 20 Jul 2017 11:12:47 -0400 Received: from mail-io0-f170.google.com ([209.85.223.170]:33643 "EHLO mail-io0-f170.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S935848AbdGTPMo (ORCPT ); Thu, 20 Jul 2017 11:12:44 -0400 MIME-Version: 1.0 In-Reply-To: <878tjj8exc.fsf@xmission.com> References: <1499417992-3238-1-git-send-email-elena.reshetova@intel.com> <1499417992-3238-2-git-send-email-elena.reshetova@intel.com> <87bmottgo4.fsf@xmission.com> <20170719153546.37567fbf77861653172fa263@linux-foundation.org> <20170719225427.GD14395@linux-80c1.suse> <20170719155833.641a283467bf6b89a7d2e56b@linux-foundation.org> <20170720093402.55alnsgsodgs4mfk@gmail.com> <878tjj8exc.fsf@xmission.com> From: Kees Cook Date: Thu, 20 Jul 2017 08:12:42 -0700 X-Google-Sender-Auth: b5uTC53ASym7x70XyzdTokvcHKU Message-ID: Subject: Re: [PATCH 1/3] ipc: convert ipc_namespace.count from atomic_t to refcount_t To: "Eric W. Biederman" Cc: Ingo Molnar , Andrew Morton , Davidlohr Bueso , Elena Reshetova , LKML , Peter Zijlstra , Greg KH , Ingo Molnar , Alexey Dobriyan , "Serge E. Hallyn" , arozansk@redhat.com, Hans Liljestrand , David Windsor Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 4984 Lines: 111 On Thu, Jul 20, 2017 at 5:34 AM, Eric W. Biederman wrote: > Ingo Molnar writes: > >> * Andrew Morton wrote: >> >>> On Wed, 19 Jul 2017 15:54:27 -0700 Davidlohr Bueso wrote: >>> >>> > On Wed, 19 Jul 2017, Andrew Morton wrote: >>> > >>> > >I do rather dislike these conversions from the point of view of >>> > >performance overhead and general code bloat. But I seem to have lost >>> > >that struggle and I don't think any of these are fastpath(?). >>> > >>> > Well, since we now have fd25d19 (locking/refcount: Create unchecked atomic_t >>> > implementation), performance is supposed to be ok. >>> >>> Sure, things are OK for people who disable the feature. >> >> So with the WIP fast-refcount series from Kees: >> >> [PATCH v6 0/2] x86: Implement fast refcount overflow protection >> >> I believe the robustness difference between optimized-refcount_t and >> full-refcount_t will be marginal. >> >> I.e. we'll be able to have both higher API safety _and_ performance. >> >>> But for people who want to enable the feature we really should minimize the cost >>> by avoiding blindly converting sites which simply don't need it: simple, safe, >>> old, well-tested code. Why go and slow down such code? Need to apply some >>> common sense here... >> >> It's old, well-tested code _for existing, sane parameters_, until someone finds a >> decade old bug in one of these with an insane parameters no-one stumbled upon so >> far, and builds an exploit on top of it. >> >> Only by touching all these places do we have a chance to improve things measurably >> in terms of reducing the probability of bugs. > > The more I hear people pushing the upsides of refcount_t without > considering the downsides the more I dislike it. > > - refcount_t is really the wrong thing because it uses saturation > semantics. So by definition it includes a bug. This is a feature, not a bug. :) If the kernel has a refcount overflow flaw (which, in the pantheon of exploitable kernel bugs, is _common_[1], as I've referenced earlier), then we're downgrading an exploitable use-after-free to a harmless memory allocation leak. Even if you don't include malicious attackers in the consideration, this changes a memory corruption of unknown results into a memory leak. That's actually an _improvement_ to availability and integrity. > - refcount_t will only really prevent something if there is an extra > increment. That is not the kind of bug people are likely to make. Like I've said, this is common. This is usually a mistake in error handling which forgets (or misplaces) a "put". > - refcount_t won't help if you have an extra decrement. The bad > use-after-free will still happen. Yes, and not having a protected refcount_t will also allow a use-after-free. There is no change here, so it's not a "downside" of refcount_t. In fact, having gained the implicit annotation of refcount_t being a refcounter (rather than a simple atomic_t) means that auditing users is easier and more focused. This could reduce the chance people make mistakes in the first place, especially since the API is more constrained than atomic_t. > - refcount_t won't help if there is a memory stomp. As with an extra > decrement the bad use-after-free will still happen. A stomp of the refcount_t value itself? Sure, and this remains as vulnerable as atomic_t. This isn't a downside to refcount_t. And again, since there _is_ checking of the value in places, it's possible an actionable warning will be produced (though, yes, after the use-after-free has been exposed), which is a benefit over simple atomic_t. I mention this in the commit log ("better to maybe produce the warning than be universally silent"). > So all I see is a huge amount of code churn to implement a buggy (by > definition) refcounting API, that risks adding new bugs and only truly > helps with bugs that are unlikely in the first place. Given that the conversions alone have been uncovering refcount bugs and that the implementation isn't "buggy" (it provides a specific set of protections), I strongly disagree with your assessment. > I really don't think this is an obvious slam dunk. It entirely blocks a commonly exploitable flaw in the kernel. This isn't a probabilistic mitigation, either. While I'm not sure I'd ever describe a security protection as a slam dunk, I think this is up there. :) -Kees [1] When I say "common", I'm speaking from the perspective of security flaw frequency. The kernel sees about 1-2 high severity security flaws a year (with an average lifetime of 5 years), and the refcount-overflow use-after-free class of flaw is normally reliable for attackers (and I'd classify as high severity). With 2016 seeing two known separate refcount-overflow use-after-free flaws, this could be better described as an epidemic, but I'll try to be less inflammatory and just say "common". -- Kees Cook Pixel Security