Received: by 2002:ab2:2441:0:b0:1f3:1f8c:d0c6 with SMTP id k1csp12321lqe; Wed, 3 Apr 2024 19:44:11 -0700 (PDT) X-Forwarded-Encrypted: i=3; AJvYcCX+knLuMfcxYWySFCYBLgjhDPfwzfJFDtDRoFudC9xT/+hCju/uSAkBL/M6yle5Gok2Pw5u3npjpiRmXEQOJbG+ULDTm4kaM3XUqfbprw== X-Google-Smtp-Source: AGHT+IE4pcbWoFCx4sReuArInfabjG0kjS7XqLWkO7+gdRqCVMSaOxll5HOSF+2lKX+wqn+qllVc X-Received: by 2002:a05:6358:e483:b0:183:f2c9:c1cb with SMTP id by3-20020a056358e48300b00183f2c9c1cbmr1140544rwb.30.1712198650829; Wed, 03 Apr 2024 19:44:10 -0700 (PDT) ARC-Seal: i=2; a=rsa-sha256; t=1712198650; cv=pass; d=google.com; s=arc-20160816; b=M9rYT4ojF6jeIU23jrdtrAd9XxuEimh9ctGov1fb6s22B5NPhDVdTczi85nTEQDKg/ 5XV+1sL3PaltGbJlMECg6yTZUY/sH0Xq///e4iyIUIJ5epYVrGuq53yMd7SrdOGij0SM Aa1FhDoc1uxU8JLjMFU4OE20OTARKzCC8oTf9/CeIk45WCWsloABS5U5fycJTB0rq7Ws T+oKSLO+GCjkkZFV3XdiV8VP7oUhkR+zYQ3nLa+wELyIKS9gkl48IBEBlpxewxXK4Lx+ j3ZaBLtMvCDiOAuO8vmqEYWIVX6kLJQizWYnS4F4zA3NVyZiG3XvdYX+/yNlfh82EZHd P5hw== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:list-unsubscribe:list-subscribe :list-id:precedence:dkim-signature; bh=7Y7dHKRjB0zo0YRtbmypS2ui7XH0taREWt+E0r4Nb5M=; fh=H/iNIOzfDA/mn5gYYAbOlEed5VFZ9jO0cLETBBz7dNE=; b=zNXqJwekUP1Qa8uhK9wXmD3eZdII7Vrc5qt+QvtgsIzUtTCH5C/+P14TSneOfu/pHn 5taWtqa8zn5L60a0FJgP788aP1AyHGG/k/vU3Hu1fWo2HALCw91DTbLAiOZ4ayci+HIx fTl7pMTXSX9wC0KZBsmeia9mznVvFzyInljUctGg50g7JbCXZZo0IX3Gej40GLALphkY IpFpwzePt36Nb8tQnyULnyb9nOaRqaVc2S092ymxoNXHcY/KF0WoQ0HbMHMXzfOVVdcB dXg0f8Zb3IshnEFL2R4N7Py/DBxoHdmBesMht2sv7NAX+5nLBeUmBxQyf73HDyy7vwSt dt2g==; dara=google.com ARC-Authentication-Results: i=2; mx.google.com; dkim=pass header.i=@gmail.com header.s=20230601 header.b=Ml1OKUNZ; arc=pass (i=1 spf=pass spfdomain=gmail.com dkim=pass dkdomain=gmail.com dmarc=pass fromdomain=gmail.com); spf=pass (google.com: domain of linux-kernel+bounces-130843-linux.lists.archive=gmail.com@vger.kernel.org designates 147.75.48.161 as permitted sender) smtp.mailfrom="linux-kernel+bounces-130843-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Return-Path: Received: from sy.mirrors.kernel.org (sy.mirrors.kernel.org. [147.75.48.161]) by mx.google.com with ESMTPS id l11-20020a65448b000000b005dc8ccd36d9si14480412pgq.250.2024.04.03.19.44.10 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 03 Apr 2024 19:44:10 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel+bounces-130843-linux.lists.archive=gmail.com@vger.kernel.org designates 147.75.48.161 as permitted sender) client-ip=147.75.48.161; Authentication-Results: mx.google.com; dkim=pass header.i=@gmail.com header.s=20230601 header.b=Ml1OKUNZ; arc=pass (i=1 spf=pass spfdomain=gmail.com dkim=pass dkdomain=gmail.com dmarc=pass fromdomain=gmail.com); spf=pass (google.com: domain of linux-kernel+bounces-130843-linux.lists.archive=gmail.com@vger.kernel.org designates 147.75.48.161 as permitted sender) smtp.mailfrom="linux-kernel+bounces-130843-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by sy.mirrors.kernel.org (Postfix) with ESMTPS id 863CEB21501 for ; Thu, 4 Apr 2024 02:44:09 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 47A801CD1B; Thu, 4 Apr 2024 02:44:00 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="Ml1OKUNZ" Received: from mail-wr1-f45.google.com (mail-wr1-f45.google.com [209.85.221.45]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 8BB751C6B8; Thu, 4 Apr 2024 02:43:57 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.221.45 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1712198639; cv=none; b=JawYT78KR+OlJow0YFR4z5UdKIESLQc2W29vhslma4kpYafAH3NnGXYtEGxfEALaH7pyF+ECmvDKQ+O137oPxvRvirLplcW4bGHZ9l+11/6M1OCmmTeQXvVdus7weqojZTFJ7sEiX/Ez+meZAuekWDRB+dRx0lwl87Pn+7MmSNA= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1712198639; c=relaxed/simple; bh=psKIdU+U6l+b2x6/tbZHXd8caIyW0jeCMOT7oTXRN7w=; h=MIME-Version:References:In-Reply-To:From:Date:Message-ID:Subject: To:Cc:Content-Type; b=hUE5Q05yI8FkB8wQQa3QWPUJ8YuDmQWN4fo/3TwGuB+tS5Vhpgz8Iso+SRMyAtz0yBm94+m7LY8VpvYzShi+ijEnHWrb9be3l20m90UL5G2F6zNsCv7ZuT32iHoEJ/oBIhMN5tARsBlnyi3HVi9oEfORrTwIWibzC0uMKVZ/2nE= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=Ml1OKUNZ; arc=none smtp.client-ip=209.85.221.45 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Received: by mail-wr1-f45.google.com with SMTP id ffacd0b85a97d-3437f70078bso307900f8f.3; Wed, 03 Apr 2024 19:43:57 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1712198636; x=1712803436; darn=vger.kernel.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=7Y7dHKRjB0zo0YRtbmypS2ui7XH0taREWt+E0r4Nb5M=; b=Ml1OKUNZpMk7pgxzkX2agyQzjWVlo5n6oPO3wZozcya6dKGyNXJUSDHKfWBzVA/ndF XdH++BmjrMlX8Xk60EH9KPKUp6vaBzdji0vGdBVb/EY2l1Mlzyep78B7tytyB8QZeuc4 tdwv+dnEmZuAoJCec6zTxXJ2/LwvVJ4a2rAxtc8JHx+Xrmr3orCt4D74RDJdDcvwhmTs EO6RgAmzD//3VuLm5A8zLE1bTlsLYLrNGDoWTALtsMcYdj+WFAyGSZ4Uv/BpGBH0KLXX 3A/z/XHWm35di6r1xArcHsBiw/On8o6GOGKVzViYA7t2wRgjQbSGVM+KpwWTDaHdqFr7 5H7g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1712198636; x=1712803436; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=7Y7dHKRjB0zo0YRtbmypS2ui7XH0taREWt+E0r4Nb5M=; b=RyNAWiSXR+C0UAb58pAv65S0/M+kLkWdXEniHEyVHl+p7GzlbIM8WVrDdBdDsd5e/N kDTHu2kVVtaAh7Ce3li+68YKay/5BP2yiqRYIeo/TmkE12g9Vnw9HiVSFPObUIlPiaBA IHtkyVjMsHqoON8ZLR5N1oXpxZ47dG3dvvMY/PbCEqh/3IH3f5nNEPAq6UjCpUFPCVB6 PVdBPXZnoVBU9bLyuADQM9l42hyLt/+HZP8/diTFAxni4ENGbDBp/yTtxJTkeHZWeZF4 iIk46ahi/hKEbeeCVyEY7Ria2kSnRUsna0OhCHqYFNIFQK0HqNZJJlRC2tt6LciYoL+q VQWw== X-Forwarded-Encrypted: i=1; AJvYcCVr8s1YNDzw1gLnm8yUUgG/P5QJtmKdWgZMksvkbAQ+MIy+vUOxKEAudYTY+vegKa4c2QYv9cJ9v6vlOOI6eesWfPgA0md8F8TSzTlRWRzaR0ZCE2Yop85B+kQ8AMsrVUoLQO8yWdR6LEJWhXZtCyzpFL1DnwfzLpQ17jFKq3RirgdY X-Gm-Message-State: AOJu0YxxcdRAYcIgkSbO8ziAgFrwXPG5Z7K/VwBwOgU96hZX4peU8Gmc Y0Di1ErfUnc03xCCGs8q4IbpQkWsrXS/2V9G0BelwAQ901peXKmqoojAFHhEqA4fRFWJBSsF/RX tjaLipXNrZdA0D/ko5jOJHcNqyKA= X-Received: by 2002:adf:a21a:0:b0:33d:64c7:5619 with SMTP id p26-20020adfa21a000000b0033d64c75619mr639707wra.70.1712198635659; Wed, 03 Apr 2024 19:43:55 -0700 (PDT) Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 References: <20240322-hid-bpf-sleepable-v5-0-179c7b59eaaa@kernel.org> <20240322-hid-bpf-sleepable-v5-1-179c7b59eaaa@kernel.org> In-Reply-To: From: Alexei Starovoitov Date: Wed, 3 Apr 2024 19:43:44 -0700 Message-ID: Subject: Re: [PATCH bpf-next v5 1/6] bpf/helpers: introduce sleepable bpf_timers To: Benjamin Tissoires Cc: Benjamin Tissoires , Alexei Starovoitov , Daniel Borkmann , Andrii Nakryiko , Martin KaFai Lau , Eduard Zingerman , Song Liu , Yonghong Song , John Fastabend , KP Singh , Stanislav Fomichev , Hao Luo , Jiri Olsa , Mykola Lysenko , Shuah Khan , bpf , LKML , "open list:KERNEL SELFTEST FRAMEWORK" Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable On Wed, Apr 3, 2024 at 6:01=E2=80=AFPM Alexei Starovoitov wrote: > > On Wed, Apr 3, 2024 at 11:50=E2=80=AFAM Alexei Starovoitov > wrote: > > > > On Wed, Mar 27, 2024 at 10:02=E2=80=AFAM Benjamin Tissoires > > wrote: > > > > > goto out; > > > > > } > > > > > + spin_lock(&t->sleepable_lock); > > > > > drop_prog_refcnt(t); > > > > > + spin_unlock(&t->sleepable_lock); > > > > > > > > this also looks odd. > > > > > > I basically need to protect "t->prog =3D NULL;" from happening while > > > bpf_timer_work_cb is setting up the bpf program to be run. > > > > Ok. I think I understand the race you're trying to fix. > > The bpf_timer_cancel_and_free() is doing > > cancel_work() > > and proceeds with > > kfree_rcu(t, rcu); > > > > That's the only race and these extra locks don't help. > > > > The t->prog =3D NULL is nothing to worry about. > > The bpf_timer_work_cb() might still see callback_fn =3D=3D NULL > > "when it's being setup" and it's ok. > > These locks don't help that. > > > > I suggest to drop sleepable_lock everywhere. > > READ_ONCE of callback_fn in bpf_timer_work_cb() is enough. > > Add rcu_read_lock_trace() before calling bpf prog. > > > > The race to fix is above 'cancel_work + kfree_rcu' > > since kfree_rcu might free 'struct bpf_hrtimer *t' > > while the work is pending and work_queue internal > > logic might UAF struct work_struct work. > > By the time it may luckily enter bpf_timer_work_cb() it's too late. > > The argument 'struct work_struct *work' might already be freed. > > > > To fix this problem, how about the following: > > don't call kfree_rcu and instead queue the work to free it. > > After cancel_work(&t->work); the work_struct can be reused. > > So set it up to call "freeing callback" and do > > schedule_work(&t->work); > > > > There is a big assumption here that new work won't be > > executed before cancelled work completes. > > Need to check with wq experts. > > > > Another approach is to do something smart with > > cancel_work() return code. > > If it returns true set a flag inside bpf_hrtimer and > > make bpf_timer_work_cb() free(t) after bpf prog finishes. > > Looking through wq code... I think I have to correct myself. > cancel_work and immediate free is probably fine from wq pov. > It has this comment: > worker->current_func(work); > /* > * While we must be careful to not use "work" after this, the tra= ce > * point will only record its address. > */ > trace_workqueue_execute_end(work, worker->current_func); > > the bpf_timer_work_cb() might still be running bpf prog. > So it shouldn't touch 'struct bpf_hrtimer *t' after bpf prog returns, > since kfree_rcu(t, rcu); could have freed it by then. > There is also this code in net/rxrpc/rxperf.c > cancel_work(&call->work); > kfree(call); Correction to correction. Above piece in rxrpc is buggy. The following race is possible: cpu 0 process_one_work() set_work_pool_and_clear_pending(work, pool->id, 0); cpu 1 cancel_work() kfree_rcu(work) worker->current_func(work); Here 'work' is a pointer to freed memory. Though wq code will not be touching it, callback will UAF. Also what I proposed earlier as: INIT_WORK(A); schedule_work(); cancel_work(); INIT_WORK(B); schedule_work()= ; won't guarantee the ordering. Since the callback function is different, find_worker_executing_work() will consider it a separate work item. Another option is to to keep bpf_timer_work_cb callback and add a 'bool free_me;' to struct bpf_hrtimer and let the callback free it. But it's also racy. cancel_work() may return false, though worker->current_func(work) wasn't called yet. So we cannot set 'free_me' in bpf_timer_cancel_and_free() in race free maner. After brainstorming with Tejun it seems the best is to use another work_struct to call a different callback and do cancel_work_sync() there. So we need something like: struct bpf_hrtimer { union { struct hrtimer timer; + struct work_struct work; }; struct bpf_map *map; struct bpf_prog *prog; void __rcu *callback_fn; void *value; union { struct rcu_head rcu; + struct work_struct sync_work; }; + u64 flags; // bpf_timer_init() will require BPF_F_TIMER_SLEEPABLE }; 'work' will be used to call bpf_timer_work_cb. 'sync_work' will be used to call cancel_work_sync() + kfree_rcu(). And, of course, schedule_work(&t->sync_work); from bpf_timer_cancel_and_free() instead of kfree_rcu.