Received: by 2002:a05:6a10:5bc5:0:0:0:0 with SMTP id os5csp725436pxb; Mon, 25 Oct 2021 17:32:05 -0700 (PDT) X-Google-Smtp-Source: ABdhPJyi9SKmJnyI+FzQkdlB1//wMoWl09e3fDU4uHAiwwxMGAIXtpwIHTJlD9cqu2eycqdqoL0Y X-Received: by 2002:a17:90a:a88c:: with SMTP id h12mr1465350pjq.147.1635208325354; Mon, 25 Oct 2021 17:32:05 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1635208325; cv=none; d=google.com; s=arc-20160816; b=LHCH5swqwTFba0JXFEyYcQvDxd2tQF2SF3uOSF6h6nAGgpG+rGbS5sRWRTUZnY9NkG vR7WTlQXJ/a7eEwTZoIoXC+vSrBI1e7qRrMpSKTQdohM2H5hEfE0Dw6izZXBgZuwoEs6 +3C9szhO+yKa6IDVITtiZRbOO9qvJ9/W7zfW4GZZXkGxWqHPnh/bC65Ha6ZKF07w1eUL HhNZ9CyRFjO03hk4UzvI0OYHUdSkPwht3d1ma2XI6rHE0L0xcLWPly72zKIBB2kznRwg e0zRFLlT8hJ5T1t+OKKmMu9JxHxe9lDoVjxsjRceGQAwgBBP9+n4H4HOOAblYW6M6JdN g10g== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :user-agent:references:in-reply-to:message-id:date:subject:cc:to :from:dkim-signature; bh=0be3m5MrfhyZ0lTmtG4TuEgMOQbQIkfNPD3HCHARpe4=; b=u5HW34CUjy7yNdPuOCh0ZRAuImEBpooDCZdeK/Y6VtnMhXlqgLOtQAdRaHTuqtUlBO +Y9cLmuJHfjIScM/kwY6PdKEd8prnJKPXLbj3l7O9je2NAh+2nuM5CA2eNqaAhw9r+P/ OIVfTePyUt6EIFO3oLhERipxLeqnkhvQv+QKGWC3AyP+KT6Ez8ZtdX//Pg1whJYqLe3s teYtfWORQX+HpIadsrR+CZ34u6DWbb8CNpLJNBS7WGSd+zgQASV1bRarL/llbtGmSEBN 6a5VnLbcNAaIn6T5NlmBQQQXcXiIu0CFyMU3AsWHYnWVrxV6cpt3WDuGIcfHjwg02bQV v8GQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linuxfoundation.org header.s=korg header.b=V1QQ0hA+; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linuxfoundation.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id m9si25528662pgv.403.2021.10.25.17.31.51; Mon, 25 Oct 2021 17:32:05 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@linuxfoundation.org header.s=korg header.b=V1QQ0hA+; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linuxfoundation.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S239083AbhJYTyE (ORCPT + 99 others); Mon, 25 Oct 2021 15:54:04 -0400 Received: from mail.kernel.org ([198.145.29.99]:38062 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S238504AbhJYTtA (ORCPT ); Mon, 25 Oct 2021 15:49:00 -0400 Received: by mail.kernel.org (Postfix) with ESMTPSA id D19E761179; Mon, 25 Oct 2021 19:41:29 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linuxfoundation.org; s=korg; t=1635190890; bh=VC2rFYLdYM82o4zhySlqUJzTRi5fcOOuGil3Jb5xoPM=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=V1QQ0hA+LNzqVwcL2zb+3pLr1sLYyZxcZjfw9dVkOyf21nmlNSpxLEGxmjGJMxekX phbXE1nkR9BcLU8MPlGT47C3rcf207B8KmY6o/S1xTaULvbG5zit+9QopETEyVsZx1 27M4/BMAlPAZUeYdj1QYD53K+1UH66gsBjHeSzHA= From: Greg Kroah-Hartman To: linux-kernel@vger.kernel.org Cc: Greg Kroah-Hartman , stable@vger.kernel.org, Alexey Gladkov , Rune Kleveland , Yu Zhao , Jordan Glover , "Eric W. Biederman" Subject: [PATCH 5.14 096/169] ucounts: Fix signal ucount refcounting Date: Mon, 25 Oct 2021 21:14:37 +0200 Message-Id: <20211025191029.841268166@linuxfoundation.org> X-Mailer: git-send-email 2.33.1 In-Reply-To: <20211025191017.756020307@linuxfoundation.org> References: <20211025191017.756020307@linuxfoundation.org> User-Agent: quilt/0.66 MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org From: Eric W. Biederman commit 15bc01effefe97757ef02ca09e9d1b927ab22725 upstream. In commit fda31c50292a ("signal: avoid double atomic counter increments for user accounting") Linus made a clever optimization to how rlimits and the struct user_struct. Unfortunately that optimization does not work in the obvious way when moved to nested rlimits. The problem is that the last decrement of the per user namespace per user sigpending counter might also be the last decrement of the sigpending counter in the parent user namespace as well. Which means that simply freeing the leaf ucount in __free_sigqueue is not enough. Maintain the optimization and handle the tricky cases by introducing inc_rlimit_get_ucounts and dec_rlimit_put_ucounts. By moving the entire optimization into functions that perform all of the work it becomes possible to ensure that every level is handled properly. The new function inc_rlimit_get_ucounts returns 0 on failure to increment the ucount. This is different than inc_rlimit_ucounts which increments the ucounts and returns LONG_MAX if the ucount counter has exceeded it's maximum or it wrapped (to indicate the counter needs to decremented). I wish we had a single user to account all pending signals to across all of the threads of a process so this complexity was not necessary Cc: stable@vger.kernel.org Fixes: d64696905554 ("Reimplement RLIMIT_SIGPENDING on top of ucounts") v1: https://lkml.kernel.org/r/87mtnavszx.fsf_-_@disp2133 Link: https://lkml.kernel.org/r/87fssytizw.fsf_-_@disp2133 Reviewed-by: Alexey Gladkov Tested-by: Rune Kleveland Tested-by: Yu Zhao Tested-by: Jordan Glover Signed-off-by: "Eric W. Biederman" Signed-off-by: Greg Kroah-Hartman --- include/linux/user_namespace.h | 2 + kernel/signal.c | 25 +++++--------------- kernel/ucount.c | 49 +++++++++++++++++++++++++++++++++++++++++ 3 files changed, 57 insertions(+), 19 deletions(-) --- a/include/linux/user_namespace.h +++ b/include/linux/user_namespace.h @@ -127,6 +127,8 @@ static inline long get_ucounts_value(str long inc_rlimit_ucounts(struct ucounts *ucounts, enum ucount_type type, long v); bool dec_rlimit_ucounts(struct ucounts *ucounts, enum ucount_type type, long v); +long inc_rlimit_get_ucounts(struct ucounts *ucounts, enum ucount_type type); +void dec_rlimit_put_ucounts(struct ucounts *ucounts, enum ucount_type type); bool is_ucounts_overlimit(struct ucounts *ucounts, enum ucount_type type, unsigned long max); static inline void set_rlimit_ucount_max(struct user_namespace *ns, --- a/kernel/signal.c +++ b/kernel/signal.c @@ -425,22 +425,10 @@ __sigqueue_alloc(int sig, struct task_st */ rcu_read_lock(); ucounts = task_ucounts(t); - sigpending = inc_rlimit_ucounts(ucounts, UCOUNT_RLIMIT_SIGPENDING, 1); - switch (sigpending) { - case 1: - if (likely(get_ucounts(ucounts))) - break; - fallthrough; - case LONG_MAX: - /* - * we need to decrease the ucount in the userns tree on any - * failure to avoid counts leaking. - */ - dec_rlimit_ucounts(ucounts, UCOUNT_RLIMIT_SIGPENDING, 1); - rcu_read_unlock(); - return NULL; - } + sigpending = inc_rlimit_get_ucounts(ucounts, UCOUNT_RLIMIT_SIGPENDING); rcu_read_unlock(); + if (!sigpending) + return NULL; if (override_rlimit || likely(sigpending <= task_rlimit(t, RLIMIT_SIGPENDING))) { q = kmem_cache_alloc(sigqueue_cachep, gfp_flags); @@ -449,8 +437,7 @@ __sigqueue_alloc(int sig, struct task_st } if (unlikely(q == NULL)) { - if (dec_rlimit_ucounts(ucounts, UCOUNT_RLIMIT_SIGPENDING, 1)) - put_ucounts(ucounts); + dec_rlimit_put_ucounts(ucounts, UCOUNT_RLIMIT_SIGPENDING); } else { INIT_LIST_HEAD(&q->list); q->flags = sigqueue_flags; @@ -463,8 +450,8 @@ static void __sigqueue_free(struct sigqu { if (q->flags & SIGQUEUE_PREALLOC) return; - if (q->ucounts && dec_rlimit_ucounts(q->ucounts, UCOUNT_RLIMIT_SIGPENDING, 1)) { - put_ucounts(q->ucounts); + if (q->ucounts) { + dec_rlimit_put_ucounts(q->ucounts, UCOUNT_RLIMIT_SIGPENDING); q->ucounts = NULL; } kmem_cache_free(sigqueue_cachep, q); --- a/kernel/ucount.c +++ b/kernel/ucount.c @@ -284,6 +284,55 @@ bool dec_rlimit_ucounts(struct ucounts * return (new == 0); } +static void do_dec_rlimit_put_ucounts(struct ucounts *ucounts, + struct ucounts *last, enum ucount_type type) +{ + struct ucounts *iter, *next; + for (iter = ucounts; iter != last; iter = next) { + long dec = atomic_long_add_return(-1, &iter->ucount[type]); + WARN_ON_ONCE(dec < 0); + next = iter->ns->ucounts; + if (dec == 0) + put_ucounts(iter); + } +} + +void dec_rlimit_put_ucounts(struct ucounts *ucounts, enum ucount_type type) +{ + do_dec_rlimit_put_ucounts(ucounts, NULL, type); +} + +long inc_rlimit_get_ucounts(struct ucounts *ucounts, enum ucount_type type) +{ + /* Caller must hold a reference to ucounts */ + struct ucounts *iter; + long dec, ret = 0; + + for (iter = ucounts; iter; iter = iter->ns->ucounts) { + long max = READ_ONCE(iter->ns->ucount_max[type]); + long new = atomic_long_add_return(1, &iter->ucount[type]); + if (new < 0 || new > max) + goto unwind; + if (iter == ucounts) + ret = new; + /* + * Grab an extra ucount reference for the caller when + * the rlimit count was previously 0. + */ + if (new != 1) + continue; + if (!get_ucounts(iter)) + goto dec_unwind; + } + return ret; +dec_unwind: + dec = atomic_long_add_return(-1, &iter->ucount[type]); + WARN_ON_ONCE(dec < 0); +unwind: + do_dec_rlimit_put_ucounts(ucounts, iter, type); + return 0; +} + bool is_ucounts_overlimit(struct ucounts *ucounts, enum ucount_type type, unsigned long max) { struct ucounts *iter;