Received: by 2002:a05:7412:cfc7:b0:fc:a2b0:25d7 with SMTP id by7csp216086rdb; Sat, 17 Feb 2024 05:43:00 -0800 (PST) X-Forwarded-Encrypted: i=3; AJvYcCX3cB/JLQuG42XFFiNv/Jw0k8l3+rkVGNoFalB+CqZkd6bNT861hYqq4j3idDej2EzkxLnNXMN8fY+AW6frZRoYg6YXHcg1ZLp76PZGFA== X-Google-Smtp-Source: AGHT+IEWbXple/cXzo0SYzEpd/eWMfUBS/NVGILCeQ6+l2103JO/c4LZirI/TwoB6AUg2bfKD/QC X-Received: by 2002:a17:903:41c5:b0:1d9:d444:313 with SMTP id u5-20020a17090341c500b001d9d4440313mr9279783ple.43.1708177380258; Sat, 17 Feb 2024 05:43:00 -0800 (PST) ARC-Seal: i=2; a=rsa-sha256; t=1708177380; cv=pass; d=google.com; s=arc-20160816; b=tcQoD0V26PP+034cZh5jP+4pGCbO2A+voimll9sUazJVUo6cViCUjnt31Pt3JIRKxy 3/6MMCptaJiWjMhHBM8KT8NW8Out3U1YktEOKiInibccOmYLN/wi+CnLOZIAio3UwFBw CCkGgnmF1N5pzvH0PRVqgJX3d5YjcsPIPP5UmoTOUaiy6po56zbOitkh8zu5D0sQdNut CpX/8/VAeMqGMK4s/rbdcp05hFBqZM0m3QI3T6eHQyxyw+73sFs/A8X0Ulx7jRp0uKSy yGGGrDZ6rxg+fMFxLbTkWLoZS292uIzRNVf6iRc94TNVdzbPpUJJOc1O5gBRWk5Jxrm+ fUsA== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:mime-version:list-unsubscribe :list-subscribe:list-id:precedence:message-id:date:references :in-reply-to:subject:cc:to:from:dkim-signature; bh=o1qCU6uoJSv5O0JLAhuPC8zEwWfa2duu6xW/6J/EB3s=; fh=KNKr3IZdxfPJ8gEJC8tffK7G+OvfEK3rCJzB5xE2NPc=; b=nPAy2ZyCiz44fmXYluJ5rVv3ElsrciyiEwPvZ9PuhdZfp5PhJ0DrC3AsP9kJDTKmvt gz2NiQao0bXJ97P0XHd+/K2RCHsg3FMrpmuc6HYIPb9ymasNgq9ltSNl5EAhT2PqCOzs frZ2qtyRRHaFhlroIcajBSSIi0unb7MJ/a8OBchpkM8Ywht+CJ6yMbM3Nx8BkM8XCXxa D9jfh1s+7VWMbHUWsgKE9Yc2q6PKYUfB0SmFHHI3mcZhXKrzAM1s6O66xc2c1++ZtyeH sxnXlJVURCD572Z8RTtuKXbhaD2cHXzM3b4J3KEDyDeFqeGy8/2ffbZztZwol04Diu5w zIXQ==; dara=google.com ARC-Authentication-Results: i=2; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=KpK0gP6r; arc=pass (i=1 spf=pass spfdomain=redhat.com dkim=pass dkdomain=redhat.com dmarc=pass fromdomain=redhat.com); spf=pass (google.com: domain of linux-kernel+bounces-69877-linux.lists.archive=gmail.com@vger.kernel.org designates 147.75.48.161 as permitted sender) smtp.mailfrom="linux-kernel+bounces-69877-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Return-Path: Received: from sy.mirrors.kernel.org (sy.mirrors.kernel.org. [147.75.48.161]) by mx.google.com with ESMTPS id f10-20020a170902684a00b001d9e1c4ad1asi1494266pln.100.2024.02.17.05.42.59 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sat, 17 Feb 2024 05:43:00 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel+bounces-69877-linux.lists.archive=gmail.com@vger.kernel.org designates 147.75.48.161 as permitted sender) client-ip=147.75.48.161; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=KpK0gP6r; arc=pass (i=1 spf=pass spfdomain=redhat.com dkim=pass dkdomain=redhat.com dmarc=pass fromdomain=redhat.com); spf=pass (google.com: domain of linux-kernel+bounces-69877-linux.lists.archive=gmail.com@vger.kernel.org designates 147.75.48.161 as permitted sender) smtp.mailfrom="linux-kernel+bounces-69877-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by sy.mirrors.kernel.org (Postfix) with ESMTPS id 07124B21508 for ; Sat, 17 Feb 2024 13:42:59 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 644917AE56; Sat, 17 Feb 2024 13:42:38 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="KpK0gP6r" Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 6DAA77B3C7 for ; Sat, 17 Feb 2024 13:42:35 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.133.124 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1708177357; cv=none; b=ex5Z7nIVEIEBpl8JrsMQbCs5gPcpFpHX7xth5qaKxefod+tirM3C119vIzIPT/Je4q1Luj+7s9sgW3lPZ0a7KKsrkoXY5Yf7HQvHusCBgOLZnr8wKvdq3SUEuQpphSoWF4S4m67VsOcYyBEUV1ibd/ut6nmN2r9IFyORZb5JsQU= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1708177357; c=relaxed/simple; bh=OtK3IzgJ3fGYA4kSv5N/upSwJ7FGiRagAHvl4Kht5OI=; h=From:To:Cc:Subject:In-Reply-To:References:Date:Message-ID: MIME-Version:Content-Type; b=Skhphj01uA3pOL1QSoSDbYySX24uUGWV5zUq9gWpBtVwZWcK3Rg+Ccx84yN+tWsp+VgWvXBU18RNA3rTZ0FpisbGLIyTayNnR5jZl7m/Vr2zcls5lzFamDD+VUQk69bUVV5HZ00teddcAS3D47yNefzoaypiDFH8kLLqafJxink= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=KpK0gP6r; arc=none smtp.client-ip=170.10.133.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1708177354; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=o1qCU6uoJSv5O0JLAhuPC8zEwWfa2duu6xW/6J/EB3s=; b=KpK0gP6r/WqR5Olg2TYJ4JSxLpEpvmpIhgQWmfrw9EpNU4h3rJWw606mZZZyLTbaMenByK q/seQxuq1Wja1tUxTOl89nKgupT/oaoEz3YDZb8lRmllOU9hMwGxVHsiAqdG1hbbcjqxfu fSaTtSU4e4W9LjcLgOmeci2o3ZjIWJo= Received: from mail-lf1-f71.google.com (mail-lf1-f71.google.com [209.85.167.71]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-586-jsVpwXH8MWyuAe_EKh8jvQ-1; Sat, 17 Feb 2024 08:42:32 -0500 X-MC-Unique: jsVpwXH8MWyuAe_EKh8jvQ-1 Received: by mail-lf1-f71.google.com with SMTP id 2adb3069b0e04-5129e5b5556so856745e87.2 for ; Sat, 17 Feb 2024 05:42:32 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1708177351; x=1708782151; h=content-transfer-encoding:mime-version:message-id:date:references :in-reply-to:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=o1qCU6uoJSv5O0JLAhuPC8zEwWfa2duu6xW/6J/EB3s=; b=eYwRHCECxjpP0PlyvTACnANZ3DBYk8kUIMi0uFxi0GndMEF63dEKpYEnBrzPlZyjA3 8NqwV79O/CQSmk7aVqK6EDEsf67wGp5fAMeDQEaMlwlFllqek8t0Nn0hrGKxSZyyD6lc li3vl+GUkJKIyTU+fLLy4bbWqt+8upoodZ2McZuKxn4D3u3AR8PZmRR52lgkt3DAUz/v r9Z2nmb4U6r27XMLoK5lLDPF2lPLpgKdG0+BCOt4oziHQgFdZVVPcDStaiAsfzppcJdY xZDrvR3cJubkMkr1+l5pRdPybA7TDFCJz/pfvbzjfUTAJy0VAwVBaphLgBZ/vGBqAM5H GX2A== X-Forwarded-Encrypted: i=1; AJvYcCW0oy9eGo3eNiK8heJzaUjcSoqxKcs04TIqNIvwMW0SH5duRXJ6krdZjYQ3dskLFRGi5x7GaeStqk116IL1AouXuYYt1+ZLJ8yAEl7V X-Gm-Message-State: AOJu0YyG2JhBm2g1Gx6nGJJ6wXPJS6apNQ83pCwZwzVu7t7mYaE6j4pb ziCwKCSoXAhz/EYXcutaEgppg9lzmPhj8v26y630pzHQE8wzexVRkFJ1YosC68IbEnDevI7wSka vqcxphVEYxksV1HL7IgOTBFDvn7Gez8h70hrWcjPENnTlNa+dvvnhS2+ZEffFGQ== X-Received: by 2002:a19:690e:0:b0:511:87b5:7ddb with SMTP id e14-20020a19690e000000b0051187b57ddbmr4737935lfc.37.1708177351164; Sat, 17 Feb 2024 05:42:31 -0800 (PST) X-Received: by 2002:a19:690e:0:b0:511:87b5:7ddb with SMTP id e14-20020a19690e000000b0051187b57ddbmr4737925lfc.37.1708177350717; Sat, 17 Feb 2024 05:42:30 -0800 (PST) Received: from alrua-x1.borgediget.toke.dk ([2a0c:4d80:42:443::2]) by smtp.gmail.com with ESMTPSA id n18-20020a1709062bd200b00a3d3bc0d689sm992614ejg.72.2024.02.17.05.42.29 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sat, 17 Feb 2024 05:42:29 -0800 (PST) Received: by alrua-x1.borgediget.toke.dk (Postfix, from userid 1000) id 6D46710F5DDE; Sat, 17 Feb 2024 14:42:29 +0100 (CET) From: Toke =?utf-8?Q?H=C3=B8iland-J=C3=B8rgensen?= To: Benjamin Tissoires Cc: Martin KaFai Lau , bpf@vger.kernel.org, linux-kernel@vger.kernel.org, linux-input@vger.kernel.org, linux-doc@vger.kernel.org, linux-kselftest@vger.kernel.org, Alexei Starovoitov , Daniel Borkmann , John Fastabend , Andrii Nakryiko , Eduard Zingerman , Song Liu , Yonghong Song , KP Singh , Stanislav Fomichev , Hao Luo , Jiri Olsa , Jiri Kosina , Benjamin Tissoires , Jonathan Corbet , Shuah Khan Subject: Re: [PATCH RFC bpf-next v2 02/10] bpf/helpers: introduce sleepable timers In-Reply-To: References: <20240214-hid-bpf-sleepable-v2-0-5756b054724d@kernel.org> <20240214-hid-bpf-sleepable-v2-2-5756b054724d@kernel.org> <87eddccx1q.fsf@toke.dk> X-Clacks-Overhead: GNU Terry Pratchett Date: Sat, 17 Feb 2024 14:42:29 +0100 Message-ID: <878r3jcim2.fsf@toke.dk> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Benjamin Tissoires writes: > On Feb 16 2024, Toke H=C3=B8iland-J=C3=B8rgensen wrote: >> Benjamin Tissoires writes: >>=20 >> > On Feb 15 2024, Martin KaFai Lau wrote: >> >> On 2/14/24 9:18 AM, Benjamin Tissoires wrote: >> >> > +static void bpf_timer_work_cb(struct work_struct *work) >> >> > +{ >> >> > + struct bpf_hrtimer *t =3D container_of(work, struct bpf_hrtimer, = work); >> >> > + struct bpf_map *map =3D t->map; >> >> > + void *value =3D t->value; >> >> > + bpf_callback_t callback_fn; >> >> > + void *key; >> >> > + u32 idx; >> >> > + >> >> > + BTF_TYPE_EMIT(struct bpf_timer); >> >> > + >> >> > + rcu_read_lock(); >> >> > + callback_fn =3D rcu_dereference(t->sleepable_cb_fn); >> >> > + rcu_read_unlock(); >> >>=20 >> >> I took a very brief look at patch 2. One thing that may worth to ask = here, >> >> the rcu_read_unlock() seems to be done too early. It is protecting the >> >> t->sleepable_cb_fn (?), so should it be done after finished using the >> >> callback_fn? >> > >> > Probably :) >> > >> > TBH, everytime I work with RCUs I spent countless hours trying to >> > re-understand everything, and in this case I'm currently in the "let's >> > make it work" process than fixing concurrency issues. >> > I still gave it a shot in case it solves my issue, but no, I still have >> > the crash. >> > >> > But given that callback_fn might sleep, isn't it an issue to keep the >> > RCU_reader lock so long? (we don't seem to call synchronize_rcu() so it >> > might be fine, but I'd like the confirmation from someone else). >>=20 >> You're right, it isn't. From the RCU/checklist.rst doc: >>=20 >> 13. Unlike most flavors of RCU, it *is* permissible to block in an >> SRCU read-side critical section (demarked by srcu_read_lock() >> and srcu_read_unlock()), hence the "SRCU": "sleepable RCU". >> Please note that if you don't need to sleep in read-side critical >> sections, you should be using RCU rather than SRCU, because RCU >> is almost always faster and easier to use than is SRCU. >>=20 >> So we can't use the regular RCU protection for the callback in this >> usage. We'll need to either convert it to SRCU, or add another >> protection mechanism to make sure the callback function is not freed >> from under us (like a refcnt). I suspect the latter may be simpler (from >> reading the rest of that documentation around SRCU. > > Currently I'm thinking at also incrementing the ->prog held in the > bpf_hrtimer which should prevent the callback to be freed, if I'm not wro= ng. > Then I should be able to just release the rcu_read_unlock before calling > the actual callback. And then put the ref on ->prog once done. > > But to be able to do that I might need to protect ->prog with an RCU > too. Hmm, bpf_timer_set_callback() already increments the bpf refcnt; so it's a matter of ensuring that bpf_timer_cancel() and bpf_timer_cancel_and_free() wait for the callback to complete even in the workqueue case. The current 'hrtimer_running' percpu global var is not going to cut it for that, so I guess some other kind of locking will be needed? Not really sure what would be appropriate here, a refcnt, or maybe a full mutex? I am not actually sure the RCU protection of the callback field itself is that important given all the other protections that make sure the callback has exited before cancelling? As long as we add another such protection I think it can just be a READ_ONCE() for getting the cb pointer? >> >> A high level design question. The intention of the new >> >> bpf_timer_set_sleepable_cb() kfunc is actually to delay work to a wor= kqueue. >> >> It is useful to delay work from the bpf_timer_cb and it may also usef= ul to >> >> delay work from other bpf running context (e.g. the networking hooks = like >> >> "tc"). The bpf_timer_set_sleepable_cb() seems to be unnecessary forci= ng >> >> delay-work must be done in a bpf_timer_cb. >> > >> > Basically I'm just a monkey here. I've been told that I should use >> > bpf_timer[0]. But my implementation is not finished, as Alexei mention= ed >> > that we should bypass hrtimer if I'm not wrong [1]. >>=20 >> I don't think getting rid of the hrtimer in favour of >> schedule_delayed_work() makes any sense. schedule_delayed_work() does >> exactly the same as you're doing in this version of the patch: it >> schedules a timer callback, and calls queue_work() from inside that >> timer callback. It just uses "regular" timers instead of hrtimers. So I >> don't think there's any performance benefit from using that facility; on >> the contrary, it would require extra logic to handle cancellation etc; >> might as well just re-use the existing hrtimer-based callback logic we >> already have, and do a schedule_work() from the hrtimer callback like >> you're doing now. > > I agree that we can nicely emulate delayed_timer with the current patch > series. However, if I understand Alexei's idea (and Martin's) there are > cases where we just want schedule_work(), without any timer involved. > That makes a weird timer (with a delay always equal to 0), but it would > allow to satisfy those latency issues. > > So (and this also answers your second email today) I'm thinking at: > - have multiple flags to control the timer (with dedicated timer_cb > kernel functions): > - BPF_F_TIMER_HRTIMER (default) > - BPF_F_TIMER_WORKER (no timer, just workqueue) > - BPF_F_TIMER_DELAYED_WORKER (hrtimer + workqueue, or actual > delayed_work, but that's re-implementing stuffs) I don't think the "delayed" bit needs to be a property of the timer; the context in which the timer is executed (softirq vs workqueue) is, because that has consequences for how the callback is verified (it would be neat if we could know the flag at verification time, but since we can't we need the pairing with the _set_sleepable_cb()). But the same timer could be used both as an immediate and a delayed callback during its lifetime; so I think this should rather be governed by a flag to bpf_timer_start(). In fact, the patch I linked earlier[0] does just that, adding a BPF_TIMER_IMMEDIATE flag to bpf_timer_start(). I.e., keep the hrtimer allocated at all times, but skip going through it if that flag is set. An alternative could also be to just special-case a zero timeout in bpf_timer_start(); I don't actually recall why I went with the flag instead when I wrote that patch... -Toke [0] https://git.kernel.org/pub/scm/linux/kernel/git/toke/linux.git/commit/?= h=3Dxdp-queueing-08&id=3D54bc201a358d1ac6ebfe900099315bbd0a76e862