Received: by 2002:ab2:7a55:0:b0:1f4:4a7d:290d with SMTP id u21csp117784lqp; Thu, 4 Apr 2024 08:28:05 -0700 (PDT) X-Forwarded-Encrypted: i=3; AJvYcCUmCWyAx4Gr6jNo0EykpvZM1LuTgPSFZy0Yb7xjgMG0KgGTx8oJbed8SfMYZztmSUcG6On09bO7PNfkxftWuufNNqXT2e+ph2koGYraaQ== X-Google-Smtp-Source: AGHT+IHoZu/VspZvDVO8kRVT9hHs8i46jcL1vX1r5bt0iIhLHCnlsoWWx9H7+XS6xXORgUtGQrsJ X-Received: by 2002:a05:620a:957:b0:78a:7431:757 with SMTP id w23-20020a05620a095700b0078a74310757mr2952006qkw.35.1712244485051; Thu, 04 Apr 2024 08:28:05 -0700 (PDT) ARC-Seal: i=2; a=rsa-sha256; t=1712244484; cv=pass; d=google.com; s=arc-20160816; b=WKDsJ9/fc2wp9erKUyN8oIZz+MpLSSoqjWrORqasEjLZZqMa9oxs1hKUoaVETH0+Om CAl9kxaiMyj9EbOdEJzc+SdrcQeHg1w7QJpXIm7j3R5XtU69VsAO9w0d0Y6W1HDihkc3 dyXYzMUTtWiygQOH1pKw/6LPMxdoDALlLik33+up0GgmEQmDKMeSXGzv+acSokZ/0Xc0 sCV4OY7qyEVU0R+ao+rhQAsWQXb+ZVm+0iRo4N3k/blULdc2WF8ISa/obaIQqVy5WLVh jXBkysSm0K2aZJWPJY6bT5sGg9tF4lFCOOCHE35oFxc8e290bRqWDg/dEiakHFuPa+uE FvuQ== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:list-unsubscribe:list-subscribe :list-id:precedence:dkim-signature; bh=TsXAvRQjzL1CZKVA+ILugHBX9nT/LmlG2bAHBvIYHjk=; fh=xWKrqeHvoi0rK/mQUs0wQEq7Jj1WarHc3Y1It+Se/qU=; b=n0UGUk5Urtd26YMPE4gC2mPX/532CcNWVu9frzXJ7kJbVDzhfqJG+wHMkneVmj5bsb a0a/qVGEeD5dMFCNBi4khLpL9umvbJfikgHsOwqMMy4IzQE3SIKPpGnRJ33ub2CWU+eR T9K+KuQk64jCk0hQ+oiQVf/+N0noGpJWjAQPzlZaMJO52NyR8eXXgVRKAQ4rF1Hfwlw0 sunskzTFskjXKinztnSJIxeY1VHr+U5Tpq52+lgP67FrfIUos0RgF7J3kjHoUAqBpYtf VICS1pHcjTV2APx7vQY0FFwCK2KF8OO8ETW37WrHD/R+xNJKdEuoO1ReP7ydpn070edI i8bQ==; dara=google.com ARC-Authentication-Results: i=2; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=DXAB0skP; arc=pass (i=1 spf=pass spfdomain=redhat.com dkim=pass dkdomain=redhat.com dmarc=pass fromdomain=redhat.com); spf=pass (google.com: domain of linux-kernel+bounces-131724-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:45d1:ec00::1 as permitted sender) smtp.mailfrom="linux-kernel+bounces-131724-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Return-Path: Received: from ny.mirrors.kernel.org (ny.mirrors.kernel.org. [2604:1380:45d1:ec00::1]) by mx.google.com with ESMTPS id d27-20020a05620a167b00b00789fa89c355si16533419qko.451.2024.04.04.08.28.04 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 04 Apr 2024 08:28:04 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel+bounces-131724-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:45d1:ec00::1 as permitted sender) client-ip=2604:1380:45d1:ec00::1; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=DXAB0skP; arc=pass (i=1 spf=pass spfdomain=redhat.com dkim=pass dkdomain=redhat.com dmarc=pass fromdomain=redhat.com); spf=pass (google.com: domain of linux-kernel+bounces-131724-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:45d1:ec00::1 as permitted sender) smtp.mailfrom="linux-kernel+bounces-131724-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ny.mirrors.kernel.org (Postfix) with ESMTPS id EE0E81C22505 for ; Thu, 4 Apr 2024 15:27:14 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 1A76E7580B; Thu, 4 Apr 2024 15:27:08 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="DXAB0skP" Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 3BA202C189 for ; Thu, 4 Apr 2024 15:27:05 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.129.124 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1712244426; cv=none; b=PFh0RNKrP/+CqSK5S5twu+GryiqBFCMLLr70ZlqjsGa+gcmDdqC67cPFDykqcEZGKIxNep7AzPudptFod3DqO5j97FfrSBi4JOOo0qQgGPtf4N7am/xaAMnzmyTkmplM99gxBIL1eTxtzuco0gfOS6Sd1lLzvAPZQNN7vXhEMdw= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1712244426; c=relaxed/simple; bh=p5Op0dT9T/tw0pGCFGKEF1A6U7RmXYiaja1POYeAK80=; h=MIME-Version:References:In-Reply-To:From:Date:Message-ID:Subject: To:Cc:Content-Type; b=HgaW85VPvM9OnOkk5prtqmLXes1fpwAoEXla6Gimr2BQFPv+67Jr/IWOT7RCrkDrHu2s3XaLw7g+Xs0b30d23T77jWR0Bi16zjMiXWCWutIZOpqZcU+FPi1kIoOyaO5nhhtMoBxAVvJRZdoHV3LsoroTeo3jzB8vtKaYXMdrTEM= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=DXAB0skP; arc=none smtp.client-ip=170.10.129.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1712244424; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=TsXAvRQjzL1CZKVA+ILugHBX9nT/LmlG2bAHBvIYHjk=; b=DXAB0skPsCZgFJ0EIIgyv9Hf+hIuHEUInU3DPf/X3ioZ1Yh2bhjAdKwGAEzNDkbE/aqYjm YxiCeJ0dVKXlY/4xN1qjd04EAabl9A3yKj8R5faT+eHt2ZLKB8tUugzwSVjdlpppj7OPJ4 SZtvtO6fVYiTeT/Ix/6M19mlgeL9kl8= Received: from mail-ej1-f71.google.com (mail-ej1-f71.google.com [209.85.218.71]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-97-mtUq3XY8OAGJUmfkIDlxeA-1; Thu, 04 Apr 2024 11:27:03 -0400 X-MC-Unique: mtUq3XY8OAGJUmfkIDlxeA-1 Received: by mail-ej1-f71.google.com with SMTP id a640c23a62f3a-a4943e972d1so228114466b.0 for ; Thu, 04 Apr 2024 08:27:02 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1712244422; x=1712849222; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=TsXAvRQjzL1CZKVA+ILugHBX9nT/LmlG2bAHBvIYHjk=; b=dOmCrERA2XdjWGu7fcfVrLFumd0n/IR4+TU23OUci3qOhRqClBYBDUwQAbV/2WF0CP upJJ6W49nYiazLXtGXbF4FsuXbIsuhmW2HFvrjX/WoYOUbYehsN5UHQA0VzNVGninlmh j8ZNTQW7gOAeHNaPrdHB64yrRrE6ifxVoqN2XVSesaEbK29Z9AsLOihjxYOUuiZem/gG JOOmMe3/cbzXX69dRN9angBeKIt0ApEAWthJnuRjsTJmEuxnxqkWkojdXSwpyq36tzbM KDR4bDPCe6qzn3qePKLI48RbXhomdVbYZ3GAxe0JdbLPJjKOinEaRdnWyHfl13xjfOrU PhYQ== X-Forwarded-Encrypted: i=1; AJvYcCVkrtNUvIo+gtiWdjGudAART31SzOEgECyF0S25iix5OCg66ONx8EqcpxVKDlL5x6LSs4Xiyv4vBaDiGmyLUrg0EDAOisz6vsukk0/0 X-Gm-Message-State: AOJu0Yy/hV9n7ccIyK0wvAFg0bsCAgAfantnhRC7YOQy/6VwcbkSOgWm YREtQymES6jNUk/pGXMEHd9mYAW40I12qp+3InxCJGE9NyfkTZRXe8Ml0mZwRWapKpUe+vrduhu R7NvjTGVoqezWmauiG8shOqBdflIJMvRhJS5KKUFDiIWd/GftN5Hztdaw+BH3WCtuPDwkznjJ2t N6NPNxpVj0xr9FKagTcolWIQjddjoorOdpkRuM X-Received: by 2002:a17:907:94c4:b0:a4e:3841:8da9 with SMTP id dn4-20020a17090794c400b00a4e38418da9mr3266379ejc.23.1712244421922; Thu, 04 Apr 2024 08:27:01 -0700 (PDT) X-Received: by 2002:a17:907:94c4:b0:a4e:3841:8da9 with SMTP id dn4-20020a17090794c400b00a4e38418da9mr3266348ejc.23.1712244421504; Thu, 04 Apr 2024 08:27:01 -0700 (PDT) Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 References: <20240322-hid-bpf-sleepable-v5-0-179c7b59eaaa@kernel.org> <20240322-hid-bpf-sleepable-v5-1-179c7b59eaaa@kernel.org> In-Reply-To: From: Benjamin Tissoires Date: Thu, 4 Apr 2024 17:26:49 +0200 Message-ID: Subject: Re: [PATCH bpf-next v5 1/6] bpf/helpers: introduce sleepable bpf_timers To: Alexei Starovoitov Cc: Benjamin Tissoires , Alexei Starovoitov , Daniel Borkmann , Andrii Nakryiko , Martin KaFai Lau , Eduard Zingerman , Song Liu , Yonghong Song , John Fastabend , KP Singh , Stanislav Fomichev , Hao Luo , Jiri Olsa , Mykola Lysenko , Shuah Khan , bpf , LKML , "open list:KERNEL SELFTEST FRAMEWORK" Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable On Thu, Apr 4, 2024 at 4:44=E2=80=AFAM Alexei Starovoitov wrote: > > On Wed, Apr 3, 2024 at 6:01=E2=80=AFPM Alexei Starovoitov > wrote: > > > > On Wed, Apr 3, 2024 at 11:50=E2=80=AFAM Alexei Starovoitov > > wrote: > > > > > > On Wed, Mar 27, 2024 at 10:02=E2=80=AFAM Benjamin Tissoires > > > wrote: > > > > > > goto out; > > > > > > } > > > > > > + spin_lock(&t->sleepable_lock); > > > > > > drop_prog_refcnt(t); > > > > > > + spin_unlock(&t->sleepable_lock); > > > > > > > > > > this also looks odd. > > > > > > > > I basically need to protect "t->prog =3D NULL;" from happening whil= e > > > > bpf_timer_work_cb is setting up the bpf program to be run. > > > > > > Ok. I think I understand the race you're trying to fix. > > > The bpf_timer_cancel_and_free() is doing > > > cancel_work() > > > and proceeds with > > > kfree_rcu(t, rcu); > > > > > > That's the only race and these extra locks don't help. Thanks a lot for pinpointing the location of the race. Indeed, when I read your email this morning I said "of course, this was obvious" :( > > > > > > > The t->prog =3D NULL is nothing to worry about. > > > The bpf_timer_work_cb() might still see callback_fn =3D=3D NULL > > > "when it's being setup" and it's ok. > > > These locks don't help that. > > > > > > I suggest to drop sleepable_lock everywhere. > > > READ_ONCE of callback_fn in bpf_timer_work_cb() is enough. > > > Add rcu_read_lock_trace() before calling bpf prog. > > > > > > The race to fix is above 'cancel_work + kfree_rcu' > > > since kfree_rcu might free 'struct bpf_hrtimer *t' > > > while the work is pending and work_queue internal > > > logic might UAF struct work_struct work. > > > By the time it may luckily enter bpf_timer_work_cb() it's too late. > > > The argument 'struct work_struct *work' might already be freed. > > > > > > To fix this problem, how about the following: > > > don't call kfree_rcu and instead queue the work to free it. > > > After cancel_work(&t->work); the work_struct can be reused. > > > So set it up to call "freeing callback" and do > > > schedule_work(&t->work); > > > > > > There is a big assumption here that new work won't be > > > executed before cancelled work completes. > > > Need to check with wq experts. > > > > > > Another approach is to do something smart with > > > cancel_work() return code. > > > If it returns true set a flag inside bpf_hrtimer and > > > make bpf_timer_work_cb() free(t) after bpf prog finishes. > > > > Looking through wq code... I think I have to correct myself. > > cancel_work and immediate free is probably fine from wq pov. > > It has this comment: > > worker->current_func(work); > > /* > > * While we must be careful to not use "work" after this, the t= race > > * point will only record its address. > > */ > > trace_workqueue_execute_end(work, worker->current_func); > > > > the bpf_timer_work_cb() might still be running bpf prog. > > So it shouldn't touch 'struct bpf_hrtimer *t' after bpf prog returns, > > since kfree_rcu(t, rcu); could have freed it by then. > > There is also this code in net/rxrpc/rxperf.c > > cancel_work(&call->work); > > kfree(call); > > Correction to correction. > Above piece in rxrpc is buggy. > The following race is possible: > cpu 0 > process_one_work() > set_work_pool_and_clear_pending(work, pool->id, 0); > > cpu 1 > cancel_work() > kfree_rcu(work) > > worker->current_func(work); > > Here 'work' is a pointer to freed memory. > Though wq code will not be touching it, callback will UAF. > > Also what I proposed earlier as: > INIT_WORK(A); schedule_work(); cancel_work(); INIT_WORK(B); schedule_work= (); > won't guarantee the ordering. > Since the callback function is different, > find_worker_executing_work() will consider it a separate work item. > > Another option is to to keep bpf_timer_work_cb callback > and add a 'bool free_me;' to struct bpf_hrtimer > and let the callback free it. > But it's also racy. > cancel_work() may return false, though worker->current_func(work) > wasn't called yet. > So we cannot set 'free_me' in bpf_timer_cancel_and_free() > in race free maner. > > After brainstorming with Tejun it seems the best is to use > another work_struct to call a different callback and do > cancel_work_sync() there. Works for me. I should be able to spina v6 soon enough, but I have a couple of remaining questions below: > > So we need something like: > > struct bpf_hrtimer { > union { > struct hrtimer timer; > + struct work_struct work; > }; > struct bpf_map *map; > struct bpf_prog *prog; > void __rcu *callback_fn; > void *value; > union { Are you sure we need an union here? If we get to call kfree_rcu() we need to have both struct rcu_head and sync_work, not one or the other. > struct rcu_head rcu; > + struct work_struct sync_work; > }; > + u64 flags; // bpf_timer_init() will require BPF_F_TIMER_SLEEPABLE If I understand, you want BPF_F_TIMER_SLEEPABLE in bpf_timer_init() (like in my v2 or v3 IIRC). But that means that once a timer is initialized it needs to be of one or the other type (especially true with the first union in this struct). So should we reject during run time bpf_timer_set_callback() for sleepable timers and only allow bpf_timer_set_sleepable_cb() for those? (and the invert in the other case). This version of the patch allows for one timer to be used as softIRQ or WQ, depending on the timer_set_callback that is used. But it might be simpler for the kfree_rcu race to define the bpf_timer to be of one kind, so we are sure to call the correct kfree method. > }; > > 'work' will be used to call bpf_timer_work_cb. > 'sync_work' will be used to call cancel_work_sync() + kfree_rcu(). > > And, of course, > schedule_work(&t->sync_work); from bpf_timer_cancel_and_free() > instead of kfree_rcu. > Cheers, Benjamin