Received: by 2002:ac0:bc90:0:0:0:0:0 with SMTP id a16csp104454img; Wed, 27 Mar 2019 18:02:34 -0700 (PDT) X-Google-Smtp-Source: APXvYqxJlmk/aYgX3T/JqGZq7b6qu5xt3wO6z3Jy2q/0ZrFUYCIr7HhFkke+fzYFSavafmF655x/ X-Received: by 2002:aa7:8518:: with SMTP id v24mr38114911pfn.219.1553734954567; Wed, 27 Mar 2019 18:02:34 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1553734954; cv=none; d=google.com; s=arc-20160816; b=uHm84NLsJdDvv5E0xZniD4C+LKzG+tc8Q3z+xMw8bU9ranfO95TSn9+iN2ZbXK4NNx OUB6xcdwd9Ms7qeSOxOTaYa6tD4QGqSsp9IywHVXlGyFEIwal55nl+f05kujFVTn0Add TOKEYbDvsm/3sRUZQkhzq+i8/dygfdXpLyouQQ9IjaNWwtl8tP5Rpl/ukp8guaD+/alR E1xaOhjzioT7XUOFAtZ7hzPHAvSxzR/lx1cFpew5AVF64FBfOMA7TRyvCmYV8NHrG+xI fZ+NBydirLWgdUkI77Ig7NeiTBggiPkTtAtV7GN09fs8nAviG0mge/SmZlH0LTu+uGMK 0+1w== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:dkim-signature; bh=cZ8GO8aoB4z8b/syN7KNKrJi26ULf6l5YY/SscEWg9A=; b=n97SdnSEoOIkEEAqzxi4B7T4f7bl+ArsfjR+AQ5pJzIh6MJgvU5kiQ1pwxaxE6bY09 irZCadZC6clI82w0xqmdsgOAcIL+ZjAwcWRU+fwVlWJ2vwlrsFvmxijKKXsjinnau/7x nVQSQtwauGe445wrNz2ZxoYO3EPdPsXEK7Ls5r0994zXin737nVHEn4Abkz43ckizl8O mMXXH8XnkTbsr1/UU81Sx+edb/NWH+MhccagXNWb/AS9Cam1jlQN8ZTz+AZt9po+uWfB UfUje7MEWe1iGUspj0aiwo9Ik7jpeZRWHggD4iUsCgedP7VG0mILW6XwaBj7X0ML0efb +vlA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20161025 header.b=Un5VE50G; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id f18si6945417pgj.188.2019.03.27.18.02.18; Wed, 27 Mar 2019 18:02:34 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20161025 header.b=Un5VE50G; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727944AbfC1BAN (ORCPT + 99 others); Wed, 27 Mar 2019 21:00:13 -0400 Received: from mail-ot1-f65.google.com ([209.85.210.65]:40740 "EHLO mail-ot1-f65.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726268AbfC1BAM (ORCPT ); Wed, 27 Mar 2019 21:00:12 -0400 Received: by mail-ot1-f65.google.com with SMTP id t8so11472883otp.7 for ; Wed, 27 Mar 2019 18:00:12 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=cZ8GO8aoB4z8b/syN7KNKrJi26ULf6l5YY/SscEWg9A=; b=Un5VE50G5EIFKZB4NatrzEVNo5ffg5PZhdjpX1+jM0VJSYafrN8ph02IaJZsTQpQAW HHc93Yy6jLGRgOUbU3oJ36xZczdc9jlwki+8ku0cq1CIHcQwU3s1wNlN1uIBHmk4Scip FeUQ4W0pu7Y0hc9LZM4jVMDTkyM0ojVIfG/fOCyl5N9GJPmV+OM3W4rlqd/Gv/dpeb0g D3aMMPhve2om9/s+65dwgKgJSv5lFIsug/4J0DLwGyuxGrXlHeaJTbh4rvta2yVKO27M IeUQyg1hKWLpSjqGeVaTYwKa8bP08TEeEE83x5jLGrGuxdWczPzTnsdBHgViqiWQRlDQ YlPw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=cZ8GO8aoB4z8b/syN7KNKrJi26ULf6l5YY/SscEWg9A=; b=CepqFObH4DuIAxMvqFumWzp2TQhb3Dvcvyqt8CZzplD61Lhf2v5MS1vtPz12OxAzDw fFEFZ5gHI9hA6paxifOcSBDpRWevlSs7vPKlVHVrH8QKYGUQf2NXIFGP7ruB7WtQs2Yj 4gnJhooYtvkQXuR4zroM6Tnyha+jS9eLLdmfxwvqGTl2VhpLbSc7jMQ/JW5cycJ714Uu 5ahrrEuw2PtkBGqSSdwvLdCkKl5elEFRLDVhBA2hHCmBhDCYN11RMGmpJFqmqAR7Yrjd ceJ7dWSGZE7vc93c1PIbQhKGju5Jtv9wXfOCyjZOA0pK9P7HGA9SolnDXaV5jnxyMgo/ COqA== X-Gm-Message-State: APjAAAUIT2GxVNHdabbiisd1lhjmPIxwVN+GujyAzt8T1eb2z03wYAN1 edBMmHMfpWnJFVtrFJVnYARFIjxySKfj6UvEzr7VIw== X-Received: by 2002:a9d:309:: with SMTP id 9mr27707445otv.230.1553734811474; Wed, 27 Mar 2019 18:00:11 -0700 (PDT) MIME-Version: 1.0 References: <20190327145331.215360-1-joel@joelfernandes.org> In-Reply-To: From: Jann Horn Date: Thu, 28 Mar 2019 01:59:45 +0100 Message-ID: Subject: Re: [PATCH] Convert struct pid count to refcount_t To: Kees Cook , "Eric W. Biederman" Cc: "Joel Fernandes (Google)" , LKML , Android Kernel Team , Kernel Hardening , Andrew Morton , Matthew Wilcox , Michal Hocko , Oleg Nesterov , "Reshetova, Elena" Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, Mar 28, 2019 at 1:06 AM Kees Cook wrote: > On Wed, Mar 27, 2019 at 7:53 AM Joel Fernandes (Google) > wrote: > > > > struct pid's count is an atomic_t field used as a refcount. Use > > refcount_t for it which is basically atomic_t but does additional > > checking to prevent use-after-free bugs. No change in behavior if > > CONFIG_REFCOUNT_FULL=n. > > > > Cc: keescook@chromium.org > > Cc: kernel-team@android.com > > Cc: kernel-hardening@lists.openwall.com > > Signed-off-by: Joel Fernandes (Google) > > [...] > > diff --git a/kernel/pid.c b/kernel/pid.c > > index 20881598bdfa..2095c7da644d 100644 > > --- a/kernel/pid.c > > +++ b/kernel/pid.c > > @@ -37,7 +37,7 @@ > > #include > > #include > > #include > > -#include > > +#include > > #include > > #include > > > > @@ -106,8 +106,8 @@ void put_pid(struct pid *pid) > > return; > > > > ns = pid->numbers[pid->level].ns; > > - if ((atomic_read(&pid->count) == 1) || > > - atomic_dec_and_test(&pid->count)) { > > + if ((refcount_read(&pid->count) == 1) || > > + refcount_dec_and_test(&pid->count)) { > > Why is this (and the original code) safe in the face of a race against > get_pid()? i.e. shouldn't this only use refcount_dec_and_test()? I > don't see this code pattern anywhere else in the kernel. Semantically, it doesn't make a difference whether you do this or leave out the "refcount_read(&pid->count) == 1". If you read a 1 from refcount_read(), then you have the only reference to "struct pid", and therefore you want to free it. If you don't get a 1, you have to atomically drop a reference, which, if someone else is concurrently also dropping a reference, may leave you with the last reference (in the case where refcount_dec_and_test() returns true), in which case you still have to take care of freeing it. My guess is that the goal of this is to make the "drop last reference" case a little bit faster by avoiding the cacheline dirtying and the atomic op, at the expense of an extra memory op and branch every time we drop a non-final reference. But that's a pretty low-level optimization, and forking by itself isn't exactly fast... I think the clean thing to do would be to either move this detail into the refcount implementation (if it turns out to actually be valuable in at least a microbenchmark), or just get rid of it. Given the overhead of fork()/clone(), I would be surprised if you could actually measure this effect here. Eric, can you remember the rationale for doing it that way in commit 92476d7fc0326a409ab1d3864a04093a6be9aca7? Am I guessing correctly?