Received: by 2002:ac0:bc90:0:0:0:0:0 with SMTP id a16csp175797img; Wed, 27 Mar 2019 20:00:07 -0700 (PDT) X-Google-Smtp-Source: APXvYqzfso3jsnZV0KLhIxq1MCcqLzN6UlR2xzetFICghczEPsdcmXcFXYxPxfR/tNSCo1K0H9S9 X-Received: by 2002:a63:7e0a:: with SMTP id z10mr37055845pgc.144.1553742007103; Wed, 27 Mar 2019 20:00:07 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1553742007; cv=none; d=google.com; s=arc-20160816; b=VzVVES8RdDbptgVIWVo9yO3o0wH+/YNRYdTbVJpoVs7ySRC39OJlKqnXVUqrc1yaXq BJiPl0awk1THRtuvBeoAXWAU2CQh/HovUESQBWUhO2jhPBzozS//DnuGNbNAexm0wjLR k/juGSTqHHz1/CMjfSSzsVKoxQKCa40Yt7mjH+0C97BOoO7Sa+NjOCD+NiwBx01twYjh lW/E6T3n+56fy2npGaLlU/yxflqcqH1jGoc36dxKOA45Q/8VJexYKhZ8gus+ESHri1bt xLMOpsNxklkt7DL++x9giDiNg1oSADgX0qGMf9nOS4JvRPNV1g6qT1+yMn/ZVMJQhtWQ gRyg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:dkim-signature; bh=wpSI6OnXZCLLIwuy8Gv3SdtzWB6luYdaoCxKHcyBJ+4=; b=o2q0ooObFi8LpTRIpq0NXTl5ratvHz+mFwhDWLuqtzqbk9EFzZrXjkL7V1FXtoxLJL Sq58lMD4e6mcxJFqCJ/YpuxKG015Lio2fC0yH9hY10lv6Q8LBVLAEkJvKAr4f2+I3ihT H7/loUusrw770dK0q2H0QmAz4uUkQmqkcHZOyKKbZX5T2tOBRhhKPlnXh7j/EOsZolJk P2H/wUulZClE1KT5wA7hzCtozBHhlvFZXD8YvHEEjdL2bhPQBAqzOdGOaH+hsDkBdWnT +hJDFV4xdMUuBt8c+f8Qqmqo6+mH0HHmaBN1sz4hS/AJ58i+sU5K6+eh0/N1toQ4sd1N 32aQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20161025 header.b=L0bG+1pa; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id s14si19618093pgs.98.2019.03.27.19.59.51; Wed, 27 Mar 2019 20:00:07 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20161025 header.b=L0bG+1pa; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728367AbfC1C6M (ORCPT + 99 others); Wed, 27 Mar 2019 22:58:12 -0400 Received: from mail-ot1-f67.google.com ([209.85.210.67]:39173 "EHLO mail-ot1-f67.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725601AbfC1C6L (ORCPT ); Wed, 27 Mar 2019 22:58:11 -0400 Received: by mail-ot1-f67.google.com with SMTP id f10so16963838otb.6 for ; Wed, 27 Mar 2019 19:58:11 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=wpSI6OnXZCLLIwuy8Gv3SdtzWB6luYdaoCxKHcyBJ+4=; b=L0bG+1paRaGP/BVlNEM3nuwmkU8vhoyjfFGsdW+6q5DWbQdWZF6Z4IzvasAhhD+S6A sJNb47UNmPnkWuuDieiPrJzcZW0WMlzEbC2pvysVKU2kcKTd52ZveBUm2JHrqofkSs/c 5+b95P6VKsvjJyNslovfZoXCLeNrtfdJK1+FUbTtBpxGlYG03TsPGGtH4A9Q5ij3mObK sD/FAt6lEzNG9y2thzucYdKytMdQSBQLHhKr9uVwlCCGeNY7rCxbVMQLdbvhXQtDwgfw sbxNO374v7Vdf9ALFusAWU03YyZjO5INYWS3owUzihpZz829gpyefIMIHdXMmOAbph8g PFDQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=wpSI6OnXZCLLIwuy8Gv3SdtzWB6luYdaoCxKHcyBJ+4=; b=mMpM8Cq8l9xtHTth8ll39FOPb9GeiYpSaRzwsxzDLbk4z+gsiqT9DJGmqHH7yBujLR 4/O3ZnprjqjVC+890ePyUoqHyW3ZIZ6CqRzKD4eb2p2QHdNtWu3LjOlFOjPClKou+bis F0ZXfy4S8K9rr2f2b0HZMNp1mj8g2BWkkdskfc2XxfurlUjFizNTGDCzSny38ERN9nVV ZhnUHULkNItqi7DFiKQyL6aCCnUamIUE03tFFLbL1LlHb1HTHEf7ehE8k4fcwvOxZYqO L1d3y7OnMoA4aOHVK/s989mNLovAMZxj+JFtW1DM4CD46BvZqaGWnr0pY7/KAkTc7cCM KAKg== X-Gm-Message-State: APjAAAWBvmdeU9IJZFdXIq3DDAUnl61gGPnJ4Qk4L+jC7h5VMrd8S51y 0lD18AdHt9AH89hPp/ue+iOL3UkfmpIXdo5YVq89BQ== X-Received: by 2002:a05:6830:1216:: with SMTP id r22mr3125227otp.198.1553741890809; Wed, 27 Mar 2019 19:58:10 -0700 (PDT) MIME-Version: 1.0 References: <20190327145331.215360-1-joel@joelfernandes.org> <20190328023432.GA93275@google.com> In-Reply-To: <20190328023432.GA93275@google.com> From: Jann Horn Date: Thu, 28 Mar 2019 03:57:44 +0100 Message-ID: Subject: Re: [PATCH] Convert struct pid count to refcount_t To: Joel Fernandes Cc: Kees Cook , "Eric W. Biederman" , LKML , Android Kernel Team , Kernel Hardening , Andrew Morton , Matthew Wilcox , Michal Hocko , Oleg Nesterov , "Reshetova, Elena" Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, Mar 28, 2019 at 3:34 AM Joel Fernandes wrote: > On Thu, Mar 28, 2019 at 01:59:45AM +0100, Jann Horn wrote: > > On Thu, Mar 28, 2019 at 1:06 AM Kees Cook wrote: > > > On Wed, Mar 27, 2019 at 7:53 AM Joel Fernandes (Google) > > > wrote: > > > > > > > > struct pid's count is an atomic_t field used as a refcount. Use > > > > refcount_t for it which is basically atomic_t but does additional > > > > checking to prevent use-after-free bugs. No change in behavior if > > > > CONFIG_REFCOUNT_FULL=n. > > > > > > > > Cc: keescook@chromium.org > > > > Cc: kernel-team@android.com > > > > Cc: kernel-hardening@lists.openwall.com > > > > Signed-off-by: Joel Fernandes (Google) > > > > [...] > > > > diff --git a/kernel/pid.c b/kernel/pid.c > > > > index 20881598bdfa..2095c7da644d 100644 > > > > --- a/kernel/pid.c > > > > +++ b/kernel/pid.c > > > > @@ -37,7 +37,7 @@ > > > > #include > > > > #include > > > > #include > > > > -#include > > > > +#include > > > > #include > > > > #include > > > > > > > > @@ -106,8 +106,8 @@ void put_pid(struct pid *pid) > > > > return; > > > > > > > > ns = pid->numbers[pid->level].ns; > > > > - if ((atomic_read(&pid->count) == 1) || > > > > - atomic_dec_and_test(&pid->count)) { > > > > + if ((refcount_read(&pid->count) == 1) || > > > > + refcount_dec_and_test(&pid->count)) { > > > > > > Why is this (and the original code) safe in the face of a race against > > > get_pid()? i.e. shouldn't this only use refcount_dec_and_test()? I > > > don't see this code pattern anywhere else in the kernel. > > > > Semantically, it doesn't make a difference whether you do this or > > leave out the "refcount_read(&pid->count) == 1". If you read a 1 from > > refcount_read(), then you have the only reference to "struct pid", and > > therefore you want to free it. If you don't get a 1, you have to > > atomically drop a reference, which, if someone else is concurrently > > also dropping a reference, may leave you with the last reference (in > > the case where refcount_dec_and_test() returns true), in which case > > you still have to take care of freeing it. > > Also, based on Kees comment, I think it appears to me that get_pid and > put_pid can race in this way in the original code right? > > get_pid put_pid > > atomic_dec_and_test returns 1 This can't happen. get_pid() can only be called on an existing reference. If you are calling get_pid() on an existing reference, and someone else is dropping another reference with put_pid(), then when both functions start running, the refcount must be at least 2. > atomic_inc > kfree > > deref pid /* boom */ > ------------------------------------------------- > > I think get_pid needs to call atomic_inc_not_zero() and put_pid should > not test for pid->count == 1 as condition for freeing, but rather just do > atomic_dec_and_test. So something like the following diff. (And I see a > similar pattern used in drivers/net/mac.c) get_pid() can only be called when you already have a refcounted reference; in other words, when the reference count is at least one. The lifetime management of struct pid differs from the lifetime management of most other objects in the kernel; the usual patterns don't quite apply here. Look at put_pid(): When the refcount has reached zero, there is no RCU grace period (unlike most other objects with RCU-managed lifetimes). Instead, free_pid() has an RCU grace period *before* it invokes delayed_put_pid() to drop a reference; and free_pid() is also the function that removes a PID from the namespace's IDR, and it is used by __change_pid() when a task loses its reference on a PID. In other words: Most refcounted objects with RCU guarantee that the object waits for a grace period after its refcount has reached zero; and during the grace period, the refcount is zero and you're not allowed to increment it again. But for struct pid, the guarantee is instead that there is an RCU grace period after it has been removed from the IDRs and the task, and during the grace period, refcounting is guaranteed to still work normally. > Is the above scenario valid? I didn't see any locking around get_pid or > pud_pid to avoid such a race. > > ---8<----------------------- > > diff --git a/include/linux/pid.h b/include/linux/pid.h > index 8cb86d377ff5..3d79834e3180 100644 > --- a/include/linux/pid.h > +++ b/include/linux/pid.h > @@ -69,8 +69,8 @@ extern struct pid init_struct_pid; > > static inline struct pid *get_pid(struct pid *pid) > { > - if (pid) > - refcount_inc(&pid->count); > + if (!pid || !refcount_inc_not_zero(&pid->count)) > + return NULL; > return pid; > } Nope, this is wrong. Once the refcount is zero, the object goes away, refcount_inc_not_zero() makes no sense here. > > diff --git a/kernel/pid.c b/kernel/pid.c > index 2095c7da644d..89c4849fab5d 100644 > --- a/kernel/pid.c > +++ b/kernel/pid.c > @@ -106,8 +106,7 @@ void put_pid(struct pid *pid) > return; > > ns = pid->numbers[pid->level].ns; > - if ((refcount_read(&pid->count) == 1) || > - refcount_dec_and_test(&pid->count)) { > + if (refcount_dec_and_test(&pid->count)) { > kmem_cache_free(ns->pid_cachep, pid); > put_pid_ns(ns); > } > >