Received: by 2002:ab2:6857:0:b0:1ef:ffd0:ce49 with SMTP id l23csp2501549lqp; Sun, 24 Mar 2024 23:59:16 -0700 (PDT) X-Forwarded-Encrypted: i=3; AJvYcCW7K7dEAaGKalmOQNDIjNmnK1dm0fZ5ju4hLebeN1SKUxHOpz7nHlURkHjy+OI5hZ894tQebW5guefxaSjYH/PQV7L2AhUo4Q33q+UOTQ== X-Google-Smtp-Source: AGHT+IEbZIBaC4jx+y4aFcM5/oewUu2Ls7Bzjq3y1nZObqEgcJhmBMOf7TgjnjvCe8Uyd1KZbuWn X-Received: by 2002:a17:902:db11:b0:1e0:b3bb:c921 with SMTP id m17-20020a170902db1100b001e0b3bbc921mr4662419plx.32.1711349955734; Sun, 24 Mar 2024 23:59:15 -0700 (PDT) ARC-Seal: i=2; a=rsa-sha256; t=1711349955; cv=pass; d=google.com; s=arc-20160816; b=xisgYSu6gYKI8MTg0AEMFceHpcpzbqg2a69uxc1WOAKmrgV8XAlPAZ4EcZpxPjELAa lIqqKIHgRrO1xNslQvJYpkAp2WyDR++OmQLihP8jkippGbwxBV1J6QJZUAJYgUxKAj0g ApwglgGVpOe+kG/OX4EyNNSPi8mJFNkzjzDziEluXuNoZPJfWEMl6XaeszsBmSFFBqlC vioCEHmI9EcpL03w0SV8aEygl36eqO7erIe/OhSDfZ0/vEDV07JyqlBF6c4kDfnUlHH+ vZXyNBSlrtnKw3JR6l38HqcVoohsGNzDx6ZD5L8yR4dOhZ/61eFa/x3STw4Uj5hApElt 8Xmw== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:mime-version:list-unsubscribe :list-subscribe:list-id:precedence:references:in-reply-to:message-id :date:subject:cc:to:from:dkim-signature; bh=w5bawqFUeJR0eUkv1Yjl1u8XPjaNhcQzZOhv0zdJdug=; fh=+I8c/vEes2FCQhnIs4+z75E0xepVC9Bo0uwGNygk93o=; b=ARldtdkAv7PABxQTRSWjkX71hVyXtZMLEhBNZt4aEe8CYB2ukcuZH6KV6yi8MSkX9Q FQ2np1MFH2Ms8tV2nnLtMe8WurdBjsKAemWjy7NbVvJD9SliGX7GlrK5vc12h3Ouy5f1 WI4+A3r5hSthbC4ZthOyy63QjB7HkyzF9Ps1YMW3t48KgscIsVbEG6aeSwnhbmqwk8sd 2B3l7UtmdzLIuVx3u1hiC621QZ2CKJRWsKm5eBqQ7Sybwx5Xtc/8pksWHjupFCIH5SmA ihpJtB0VsHz7BXLcrqTVgBWZgvygm5fTiSqg96XfYjNP5e3moGI6WhwQnNcCQ95mW1H/ dLkQ==; dara=google.com ARC-Authentication-Results: i=2; mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=nV0ezSB+; arc=pass (i=1 dkim=pass dkdomain=kernel.org); spf=pass (google.com: domain of linux-kernel+bounces-114720-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:45e3:2400::1 as permitted sender) smtp.mailfrom="linux-kernel+bounces-114720-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from sv.mirrors.kernel.org (sv.mirrors.kernel.org. [2604:1380:45e3:2400::1]) by mx.google.com with ESMTPS id h8-20020a170902680800b001e0b3e0692bsi2670404plk.377.2024.03.24.23.59.15 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 24 Mar 2024 23:59:15 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel+bounces-114720-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:45e3:2400::1 as permitted sender) client-ip=2604:1380:45e3:2400::1; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=nV0ezSB+; arc=pass (i=1 dkim=pass dkdomain=kernel.org); spf=pass (google.com: domain of linux-kernel+bounces-114720-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:45e3:2400::1 as permitted sender) smtp.mailfrom="linux-kernel+bounces-114720-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by sv.mirrors.kernel.org (Postfix) with ESMTPS id 78EFC292F3D for ; Mon, 25 Mar 2024 06:58:55 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id CC1B513BC1E; Mon, 25 Mar 2024 00:46:35 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="nV0ezSB+" Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id C07AA2720EA; Sun, 24 Mar 2024 23:35:04 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1711323304; cv=none; b=fvQ2j4SR57x2je+p/QL1XGyoiHVCvSvQRE59DK+CNrmyGhbuxWm2AoHtM745o5efl9LPsNDC8UhrJm47Oq2ent9OgCXUfjYpGPXIKoOpwPjG7oirQY/Nnkrcrpv63COfTaBduJUzajuJMeBb0tPtduYSK8HpO0fvFE0ylgRIDVE= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1711323304; c=relaxed/simple; bh=J4fhyG1iQ5vLe+pk9Wp2Byk5gpEXClLC4Qx6nixNZ7g=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=AzQAEGShLhaAuA0UG/Ljd3An88FAUU23+ipaBy4+EaS41q+10Cu6gMg++xmDU8OWvd0WM8Rs0tuTXy//dH7z1I0NJdAhQcMLqAKIVTsLiZE8PSamezUcnTuXaAhFlvm9J9jN+0/3QeZLevriRSswa13ZaKBTOH65oljOedCR5ug= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=nV0ezSB+; arc=none smtp.client-ip=10.30.226.201 Received: by smtp.kernel.org (Postfix) with ESMTPSA id 9016BC433C7; Sun, 24 Mar 2024 23:35:03 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1711323304; bh=J4fhyG1iQ5vLe+pk9Wp2Byk5gpEXClLC4Qx6nixNZ7g=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=nV0ezSB+wZvggWlddIcCMw/+fCmx+yI7Onju0GJT7HNGNEg/w2iEFzU57IWukPqn2 63L7KRuVZB9DINhyuFxy4Wz+cwQ+BKJgZI5wgSwPT9T+Iq9VvLAoKF5toocqdvJ5Cq Ds0mT8ei98th5lOLO6qUZZVLtHZZ2vsIqRmpMeaqpURpzqJgiPlbj/K6HzUxUhHZK2 LVhYD5DUlhmJ/wuOLuSu2aj7zKtGn7xqOYOHoOvXwc9wj4zA7wHj/VY7+Uivo8pgwp CmPnH7ZcoG5+PyCeQMmfhe1uRCAAMp4aGLH0XBb5eUW7YoUOgn8xOQkDJZITS49qQo SQ1pFWPbdt/xA== From: Sasha Levin To: linux-kernel@vger.kernel.org, stable@vger.kernel.org Cc: Hou Tao , Alexei Starovoitov , Sasha Levin , Robert Kolchmeyer Subject: [PATCH 5.15 004/317] bpf: Defer the free of inner map when necessary Date: Sun, 24 Mar 2024 19:29:44 -0400 Message-ID: <20240324233458.1352854-5-sashal@kernel.org> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20240324233458.1352854-1-sashal@kernel.org> References: <20240324233458.1352854-1-sashal@kernel.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-stable: review X-Patchwork-Hint: Ignore Content-Transfer-Encoding: 8bit From: Hou Tao [ Upstream commit 876673364161da50eed6b472d746ef88242b2368 ] When updating or deleting an inner map in map array or map htab, the map may still be accessed by non-sleepable program or sleepable program. However bpf_map_fd_put_ptr() decreases the ref-counter of the inner map directly through bpf_map_put(), if the ref-counter is the last one (which is true for most cases), the inner map will be freed by ops->map_free() in a kworker. But for now, most .map_free() callbacks don't use synchronize_rcu() or its variants to wait for the elapse of a RCU grace period, so after the invocation of ops->map_free completes, the bpf program which is accessing the inner map may incur use-after-free problem. Fix the free of inner map by invoking bpf_map_free_deferred() after both one RCU grace period and one tasks trace RCU grace period if the inner map has been removed from the outer map before. The deferment is accomplished by using call_rcu() or call_rcu_tasks_trace() when releasing the last ref-counter of bpf map. The newly-added rcu_head field in bpf_map shares the same storage space with work field to reduce the size of bpf_map. Fixes: bba1dc0b55ac ("bpf: Remove redundant synchronize_rcu.") Fixes: 638e4b825d52 ("bpf: Allows per-cpu maps and map-in-map in sleepable programs") Signed-off-by: Hou Tao Link: https://lore.kernel.org/r/20231204140425.1480317-5-houtao@huaweicloud.com Signed-off-by: Alexei Starovoitov Signed-off-by: Sasha Levin (cherry picked from commit 62fca83303d608ad4fec3f7428c8685680bb01b0) Signed-off-by: Robert Kolchmeyer Signed-off-by: Sasha Levin --- include/linux/bpf.h | 7 ++++++- kernel/bpf/map_in_map.c | 11 ++++++++--- kernel/bpf/syscall.c | 26 ++++++++++++++++++++++++-- 3 files changed, 38 insertions(+), 6 deletions(-) diff --git a/include/linux/bpf.h b/include/linux/bpf.h index 97d94bcba1314..df15d4d445ddc 100644 --- a/include/linux/bpf.h +++ b/include/linux/bpf.h @@ -192,9 +192,14 @@ struct bpf_map { */ atomic64_t refcnt ____cacheline_aligned; atomic64_t usercnt; - struct work_struct work; + /* rcu is used before freeing and work is only used during freeing */ + union { + struct work_struct work; + struct rcu_head rcu; + }; struct mutex freeze_mutex; atomic64_t writecnt; + bool free_after_mult_rcu_gp; }; static inline bool map_value_has_spin_lock(const struct bpf_map *map) diff --git a/kernel/bpf/map_in_map.c b/kernel/bpf/map_in_map.c index af0f15db1bf9a..4cf79f86bf458 100644 --- a/kernel/bpf/map_in_map.c +++ b/kernel/bpf/map_in_map.c @@ -110,10 +110,15 @@ void *bpf_map_fd_get_ptr(struct bpf_map *map, void bpf_map_fd_put_ptr(struct bpf_map *map, void *ptr, bool need_defer) { - /* ptr->ops->map_free() has to go through one - * rcu grace period by itself. + struct bpf_map *inner_map = ptr; + + /* The inner map may still be used by both non-sleepable and sleepable + * bpf program, so free it after one RCU grace period and one tasks + * trace RCU grace period. */ - bpf_map_put(ptr); + if (need_defer) + WRITE_ONCE(inner_map->free_after_mult_rcu_gp, true); + bpf_map_put(inner_map); } u32 bpf_map_fd_sys_lookup_elem(void *ptr) diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c index 64206856a05c4..d4b4a47081b51 100644 --- a/kernel/bpf/syscall.c +++ b/kernel/bpf/syscall.c @@ -487,6 +487,25 @@ static void bpf_map_put_uref(struct bpf_map *map) } } +static void bpf_map_free_in_work(struct bpf_map *map) +{ + INIT_WORK(&map->work, bpf_map_free_deferred); + schedule_work(&map->work); +} + +static void bpf_map_free_rcu_gp(struct rcu_head *rcu) +{ + bpf_map_free_in_work(container_of(rcu, struct bpf_map, rcu)); +} + +static void bpf_map_free_mult_rcu_gp(struct rcu_head *rcu) +{ + if (rcu_trace_implies_rcu_gp()) + bpf_map_free_rcu_gp(rcu); + else + call_rcu(rcu, bpf_map_free_rcu_gp); +} + /* decrement map refcnt and schedule it for freeing via workqueue * (unrelying map implementation ops->map_free() might sleep) */ @@ -496,8 +515,11 @@ static void __bpf_map_put(struct bpf_map *map, bool do_idr_lock) /* bpf_map_free_id() must be called first */ bpf_map_free_id(map, do_idr_lock); btf_put(map->btf); - INIT_WORK(&map->work, bpf_map_free_deferred); - schedule_work(&map->work); + + if (READ_ONCE(map->free_after_mult_rcu_gp)) + call_rcu_tasks_trace(&map->rcu, bpf_map_free_mult_rcu_gp); + else + bpf_map_free_in_work(map); } } -- 2.43.0