From: "Reshetova, Elena" <elena.reshetova@intel.com>
To: "Eric W. Biederman" <ebiederm@xmission.com>
CC: "linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
        "peterz@infradead.org" <peterz@infradead.org>,
        "gregkh@linuxfoundation.org" <gregkh@linuxfoundation.org>,
        "akpm@linux-foundation.org" <akpm@linux-foundation.org>,
        "mingo@redhat.com" <mingo@redhat.com>,
        "adobriyan@gmail.com" <adobriyan@gmail.com>,
        "serge@hallyn.com" <serge@hallyn.com>,
        "arozansk@redhat.com" <arozansk@redhat.com>,
        "dave@stgolabs.net" <dave@stgolabs.net>,
        "keescook@chromium.org" <keescook@chromium.org>,
        Hans Liljestrand <ishkamiel@gmail.com>,
        "David Windsor" <dwindsor@gmail.com>
Subject: RE: [PATCH 1/3] ipc: convert ipc_namespace.count from atomic_t to
 refcount_t
Thread-Topic: [PATCH 1/3] ipc: convert ipc_namespace.count from atomic_t to
 refcount_t
Thread-Index: AQHS+buqSQPfUhM6vkqQ9ug58kizeqJP5dfQ
Date: Wed, 12 Jul 2017 09:21:21 +0000
Message-ID: <2236FBA76BA1254E88B949DDB74E612B6FF2366A@IRSMSX102.ger.corp.intel.com>
References: <1499417992-3238-1-git-send-email-elena.reshetova@intel.com>
        <1499417992-3238-2-git-send-email-elena.reshetova@intel.com>
        <87bmottgo4.fsf@xmission.com>
        <2236FBA76BA1254E88B949DDB74E612B6FF2269B@IRSMSX102.ger.corp.intel.com>
        <87k23gsn4u.fsf@xmission.com>
        <2236FBA76BA1254E88B949DDB74E612B6FF22730@IRSMSX102.ger.corp.intel.com>
        <878tjwpm7l.fsf@xmission.com>
        <2236FBA76BA1254E88B949DDB74E612B6FF227D2@IRSMSX102.ger.corp.intel.com>
 <874lukkp7f.fsf@xmission.com>
In-Reply-To: <874lukkp7f.fsf@xmission.com>
Accept-Language: en-US
Content-Language: en-US
dlp-product: dlpe-windows
dlp-version: 10.0.102.7
dlp-reaction: no-action
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: 8BIT
MIME-Version: 1.0
Sender: linux-kernel-owner@vger.kernel.org
Content-Length: 10495
Lines: 234


> "Reshetova, Elena" <elena.reshetova@intel.com> writes:
> 
> >> "Reshetova, Elena" <elena.reshetova@intel.com> writes:
> >>
> >> >> "Reshetova, Elena" <elena.reshetova@intel.com> writes:
> >> >>
> >> >> 2>> Elena Reshetova <elena.reshetova@intel.com> writes:
> >> >> >>
> >> >> >> > refcount_t type and corresponding API should be
> >> >> >> > used instead of atomic_t when the variable is used as
> >> >> >> > a reference counter. This allows to avoid accidental
> >> >> >> > refcounter overflows that might lead to use-after-free
> >> >> >> > situations.
> >> >> >>
> >> >> >> In this patch you can see all of the uses of the count.
> >> >> >> What accidental refcount overflows are possible?
> >> >> >
> >> >> > Even if one can guarantee and prove that in the current implementation
> >> >> > there are no overflows possible, we can't say that for
> >> >> > sure for any future implementation. Bugs might always happen
> >> >> > unfortunately, but if we convert the refcounter to a safer
> >> >> > type we can be sure that overflows are not possible.
> >> >> >
> >> >> > Does it make sense to you?
> >> >>
> >> >> Not for code that is likely to remain unchanged for a decade no.
> >> >
> >> > Can we really be sure for any kernel code about this? And does it make
> >> > sense to trust our security on a fact like this?
> >>
> >> But refcount_t doesn't fix anything.  At best it changes a bad bug to a
> >> less bad bug.  So now my machine OOMS instead of allows a memory
> >> overwrite.   It still doesn't work.
> >
> > Well, it is a step forward from security standpoint. OOMS is really hard
> > to exploit vs. memory overwrites. Pretty much all exploits need either
> > memory write or memory read, out of memory is really much harder to
> > exploit.
> 
> OOM in production is a denial of service attack.  From a serverity point
> of view an OOM can be considered equivalent to a kernel panic and
> something that requires a box to reboot.

Denial of service is usually (unless the system is designed poorly) not the
highest security concern. It is one thing to get your system down for a
certain period of time vs. telling your customers that all your database
of their data and credit cards info is now public in the Internet. 
So, I would still stand by the statement that it makes the situation better.

> 
> From a long term perspective I expect we will need to change all
> reference counters to a type where that is not saturating but instead
> fails to increment and returns an error.  If we want to keep a system
> functioning in the face of maxing out a reference count that is the only
> way it can truly be done.

I think the work on refcount_t internals will continue. We have a number
of things that need to be improved further. Kees has first set of patches
in work now, and no one expects it to be the last. However, we also do
need to make conversions in the kernel, otherwise we are not moving
anywhere and it is also easier to test implications of changes inside
refcount_t  on as many converted cases as possible when they are merged.

> 
> >> Plus refcount_t does not provide any safety on the architectures where
> >> it is a noop.
> >
> > Not sure I understood this. What do you mean by "noop"?
> > refcount_t is currently architecture independent.
> 
> noop being short for no operation.
> 
> I believe there were some/most archicture implementations that define
> refcount_t to atomic_t out of performance concerns.  I know I saw
> patches fly by to that effect.  On an architecture where refcount_t is
> equivalent to atomic_t the change in these patches is a noop.

This choice isn't architecture-dependent now. It is defaults to atomic_t for all
architectures unless CONFIG_REFCOUNT_FULL is enabled
(then it is arch. independent refcount_t implementation). Next set of
patches (still under review) provides x86 arch. dependent fast
refcount_t implementation. 
> 
> >> >> This looks like a large set of unautomated changes without any real
> >> >> thought put into it.
> >> >
> >> > We are soon into the end of the first year that we started to look into
> >> > refcounter overflow/underflow problem and coming up this far was
> >> > not easy enough (just check all the millions of emails on kernel-hardening
> >> > mailing list). Each refcount_t conversion candidate was first found by
> Coccinelle
> >> > analysis and then manually checked and converted. The story of
> >> > refcount_t API and all discussions go even further.
> >> > So you can't really claim that there is no " thought put into it " :)
> >>
> >> But the conversion of the instance happens without thought and manually.
> >> Which is a good recipe for typos.  Which is what I am saying.
> >>
> >> There have been lots of conversions like that in the kernel and
> >> practically every one has introduced at least one typo.
> >
> > What do you exactly mean by "typo"? Typos should be detected at these
> > stages:
> 
> An unintentional mistake.  The term "thinko" might be more what I am
> thinking of.   Many human mistakes are not slips of the fingers but
> accidentally giving the wrong command at some level.
> 
> > 1) typos like wrong function name etc. can be found at compile time
> >     (trust me I have found a number of these on the very first iteration with
> patches)
> > 2) "typos" (not sure if it is correct to call them typos) like usage of wrong
> refcount_t
> >       API vs. original atomic_t API can be found during internal reviews or reviews
> by maintainers
> 
> This I worry about most as the mental distance from xxx_inc to xxx_dec
> can be very short.

I would say the distance is huge :) Can't think of possible way one would
make such a mistake. 

> 
> > 3) much bigger problem is actually not any typos, but hidden issues that show
> up only
> >     in run-time that detect underflows/overflows or inability to increment from
> zero.
> >  These only are nasty, but given that refcount_t WARNs left and right about
> them,
> >   we can detect them fast.
> 
> Which means I worry about those less.
> 
> > I don't know what is a better recipe for doing API changes like this?
> > Do you have any suggestions?
> 
> I would think a semantic patch targeting a specific lock would be less
> error prone.  I would think that the same semantic patch could be used
> from lock to lock with just a change of the lock that is being targeted.

I was considering in the beginning of writing a full cocci semantic patch, 
but based on how spread the code for refcounters might be (take smth like
socket's refcounter), how many different edge cases we had,
the semantic patch would be so complex (and I
would even not really trust the output personally), needs manual verification
anyway etc. So, at the end we end up with just using .cocci file to find
occurrences, but do conversions manually. And when I say manually, it
didn't mean that I manually typed each function name, the process was
roughly like this:

1. take a look on the reported occurrence. Determine if it should be
converted at the first place
2. If needs conversion, run substitute atomic_* to refcount_* (not this
only changes prefixes)
3. resolve cases that need manual addressing (i.e refcount_* analog
function doesn't simply exists)
4. check the conversion overall if everything makes sense together, 
fix other issues (remove unneeded WARNs, change inc to set for
initializing refcounting etc.)

Note the 4th step has many of these angles.
> 
> I strongly suspect that would reduce the chance of accident when dealing
> with a particular API and being scripted would increase the confidence
> in the changes.

Even if I made a wrong decision with it, I guess now it doesn't matter so much
because we have our conversions now and I am very confident that if there
are mistakes there, they could not be caught by semantic patch. Unless we
write now a script to check the validity of each patch, which is probably
doable, but I don't think I have energy for this left. Any volunteers? :) 
> 
> >> So from an engineering standpoint it is a very valid question to ask
> >> about.  And I find the apparent insistence that you don't make typos
> >> very disturbing.
> >>
> >> >  That almost always results in a typo somewhere
> >> >> that breaks things.
> >> >>
> >> >> So there is no benefit to the code, and a non-zero chance that there
> >> >> will be a typo breaking the code.
> >> >
> >> > The code is very active on issuing WARNs when anything goes wrong.
> >> > Using this feature we have not only found errors in conversions, but
> >> > sometimes errors in code itself. So, any bug would be actually much
> >> > faster visible than using old atomic_t interface.
> >> >
> >> > In addition by default refcount_t equals to atomic, which also gives a
> >> > possibility to make a softer transition and catch all related bugs in couple
> >> > of cycles when enabling CONFIG_REFCOUNT_FULL.
> >>
> >> But if you make a typo and change one operation for another I don't see
> >> how any of that applies.
> >
> > It is hard to make a "typo" to change one operation to another. It is not a
> > one-two char mismatch error.
> 
> Those might be more properly "thinkos" but they do happen.
> 
> > When doing these patches we followed the
> > logic of "less code changes - better" (since less chances of making mistake),
> > so if in some cases functions are changed (like from atomic_sub to
> > refcount_sub_and_test(), or from atomic_inc_not_zero() to atomic_inc() etc.)
> > there was a reason for making it and the change wasn't automatic and without
> > thinking at all. Again, we do have our maintainers also to catch if a change that
> > we did doesn't actually work for them right?
> 
> In general a maintainers job is make certain the appropriate code review
> and due dilligence has happened not nececessarily to perform that code
> review themselves.
> 
> Changing from atomoic_inc_not_zero to a simple refcount_inc really
> troubles me because it makes it harder to switch to something fully
> correct.

I actually meant case like this (just came up recently, so stayed in my head):

static void get_pi_state(struct futex_pi_state *pi_state)
 {
-	WARN_ON_ONCE(!atomic_inc_not_zero(&pi_state->refcount));
+	refcount_inc(&pi_state->refcount);

Just to avoid extra layer of unneeded warning. But even with cases like this
as I said we were super conservative and changing as little as possible unless
It is 100% clear case or people suggest the change during patch review. 

Best Regards,
Elena.


> 
> Eric