Received: by 2002:ac0:a591:0:0:0:0:0 with SMTP id m17-v6csp1896660imm; Sun, 8 Jul 2018 13:56:23 -0700 (PDT) X-Google-Smtp-Source: AAOMgpfyxCU3POPPgRxQkrNmOIx+uMCOeUZ2FD0NPpdOT+ZRT+DejoCnudDHdVmmw4Jde0FLycj0 X-Received: by 2002:a17:902:7202:: with SMTP id ba2-v6mr17663512plb.119.1531083383117; Sun, 08 Jul 2018 13:56:23 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1531083383; cv=none; d=google.com; s=arc-20160816; b=EfM8rC2CXFTEGDrQKKvruyKMtJjaPgGtknHuyRCAGCXCx0FdE422GYnlGXIRE20Y1N 3KJbXW4LbqV4iqS1b50kFm9Pw7fxoILb488S3sAE4NqPGBKpVwtV1hFIG75Q/bcpCHYW TyXEVZO+Dvnv2vxo/8fS4MiDWhq2IxikG1+kr1HsG4NttNUdqfDW0fG9unSBid1B7WSp zuwncmUD1Y6U2FGnfXw/nxEtTnJbV5ytALI6cFIWCdXaaBLFeA0MkrD6xG3Rjd3HpgWX m0tJQHrWDu4USu8osRtSSXPuFiO2P3gmBA9gpxYYQdctAZlvo4Mm2wP8MLQmgG8xTz3n ZS/A== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:thread-index:thread-topic :content-transfer-encoding:mime-version:subject:references :in-reply-to:message-id:cc:to:from:date:dkim-signature:dkim-filter :arc-authentication-results; bh=mM71J+/LViYif+m7jR5kAfELntDhRbObc/R+JIESkvI=; b=gbO9hj1kULjGbqTFveZhaBaAbNbM3DNOB22tG/c1nX28eZfnLpAlju4lW431NuI5DJ OOqSV4XeL2J1+KuAJxKW63vbpNeToAM7DjVvCCqzlP2ILGA9hB1IZYrO4OVlwpcHPszj Rao3GA1ZaeENNX6SLobaEclr2FqlixKHAnXxACpgNDyVd7ILUIJ/JvjwCwhMC2PlK7ig QryXDKr2rELMHnNGTCtXUlTFiBr+hHXrruX13t2K/5+olwqORftVESy0MgSZC8IBMIsR UewLuyn14q8noAUO0Kg0uvQuaKtO4eFIQtZJ+bSetCi5COF7x1udOEjAAmRzacYyZ4bW QQIA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@efficios.com header.s=default header.b=dX3fl0A4; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=efficios.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id i62-v6si12328013pge.93.2018.07.08.13.56.07; Sun, 08 Jul 2018 13:56:23 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@efficios.com header.s=default header.b=dX3fl0A4; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=efficios.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S932857AbeGHUyl (ORCPT + 99 others); Sun, 8 Jul 2018 16:54:41 -0400 Received: from mail.efficios.com ([167.114.142.138]:41458 "EHLO mail.efficios.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S932162AbeGHUyj (ORCPT ); Sun, 8 Jul 2018 16:54:39 -0400 Received: from localhost (ip6-localhost [IPv6:::1]) by mail.efficios.com (Postfix) with ESMTP id DBAD51C3914; Sun, 8 Jul 2018 16:54:38 -0400 (EDT) Received: from mail.efficios.com ([IPv6:::1]) by localhost (mail02.efficios.com [IPv6:::1]) (amavisd-new, port 10032) with ESMTP id l3WmjNE431O7; Sun, 8 Jul 2018 16:54:38 -0400 (EDT) Received: from localhost (ip6-localhost [IPv6:::1]) by mail.efficios.com (Postfix) with ESMTP id 509A61C3911; Sun, 8 Jul 2018 16:54:38 -0400 (EDT) DKIM-Filter: OpenDKIM Filter v2.10.3 mail.efficios.com 509A61C3911 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=efficios.com; s=default; t=1531083278; bh=mM71J+/LViYif+m7jR5kAfELntDhRbObc/R+JIESkvI=; h=Date:From:To:Message-ID:MIME-Version; b=dX3fl0A4tAuJAXRSRdnYbiDP7ULO3aI4GXPbMiJZuagjzliujSuwuwSNIsrArMjcB LCHU8PrEiGkPiuqA17gFm/4oBorCsxSY5K/AXZ6qAxs1RVYgWR2tHasaMp05GRBQ34 Ot5G+jgnsVlre1uZ3QgSvyz1Q6rxCEz7dBFC0A4UlsXDd15nIMXGlr0WeP2TbMd3Al r+gdySL/uW946KPX0nouNRPVjFahqGBmdkjfp5eHgLBLYML9PyKaXkNAcZfUDngdc+ tIi2QoQPDUwCtGKMWWlVwEE54WQcWDc3UnDCUtLLyeADBD5VAk+chiKjPpgdJhtEQf uiEhO19jGwpEQ== X-Virus-Scanned: amavisd-new at efficios.com Received: from mail.efficios.com ([IPv6:::1]) by localhost (mail02.efficios.com [IPv6:::1]) (amavisd-new, port 10026) with ESMTP id FW9qTFhAkfGW; Sun, 8 Jul 2018 16:54:38 -0400 (EDT) Received: from mail02.efficios.com (mail02.efficios.com [167.114.142.138]) by mail.efficios.com (Postfix) with ESMTP id 38A911C3907; Sun, 8 Jul 2018 16:54:38 -0400 (EDT) Date: Sun, 8 Jul 2018 16:54:38 -0400 (EDT) From: Mathieu Desnoyers To: Joel Fernandes Cc: Alexei Starovoitov , Daniel Colascione , Alexei Starovoitov , linux-kernel , Tim Murray , Daniel Borkmann , netdev , fengc@google.com Message-ID: <951478560.1636.1531083278064.JavaMail.zimbra@efficios.com> In-Reply-To: <20180707203340.GA74719@joelaf.mtv.corp.google.com> References: <20180707015616.25988-1-dancol@google.com> <20180707025426.ssxipi7hsehoiuyo@ast-mbp.dhcp.thefacebook.com> <20180707203340.GA74719@joelaf.mtv.corp.google.com> Subject: Re: [RFC] Add BPF_SYNCHRONIZE bpf(2) command MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Originating-IP: [167.114.142.138] X-Mailer: Zimbra 8.8.8_GA_2096 (ZimbraWebClient - FF52 (Linux)/8.8.8_GA_1703) Thread-Topic: Add BPF_SYNCHRONIZE bpf(2) command Thread-Index: d4mP6mMOiXVK+NC69FwbBPWAi3kndg== Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org ----- On Jul 7, 2018, at 4:33 PM, Joel Fernandes joelaf@google.com wrote: > On Fri, Jul 06, 2018 at 07:54:28PM -0700, Alexei Starovoitov wrote: >> On Fri, Jul 06, 2018 at 06:56:16PM -0700, Daniel Colascione wrote: >> > BPF_SYNCHRONIZE waits for any BPF programs active at the time of >> > BPF_SYNCHRONIZE to complete, allowing userspace to ensure atomicity of >> > RCU data structure operations with respect to active programs. For >> > example, userspace can update a map->map entry to point to a new map, >> > use BPF_SYNCHRONIZE to wait for any BPF programs using the old map to >> > complete, and then drain the old map without fear that BPF programs >> > may still be updating it. >> > >> > Signed-off-by: Daniel Colascione >> > --- >> > include/uapi/linux/bpf.h | 1 + >> > kernel/bpf/syscall.c | 14 ++++++++++++++ >> > 2 files changed, 15 insertions(+) >> > >> > diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h >> > index b7db3261c62d..4365c50e8055 100644 >> > --- a/include/uapi/linux/bpf.h >> > +++ b/include/uapi/linux/bpf.h >> > @@ -98,6 +98,7 @@ enum bpf_cmd { >> > BPF_BTF_LOAD, >> > BPF_BTF_GET_FD_BY_ID, >> > BPF_TASK_FD_QUERY, >> > + BPF_SYNCHRONIZE, >> > }; >> > >> > enum bpf_map_type { >> > diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c >> > index d10ecd78105f..60ec7811846e 100644 >> > --- a/kernel/bpf/syscall.c >> > +++ b/kernel/bpf/syscall.c >> > @@ -2272,6 +2272,20 @@ SYSCALL_DEFINE3(bpf, int, cmd, union bpf_attr __user *, >> > uattr, unsigned int, siz >> > if (sysctl_unprivileged_bpf_disabled && !capable(CAP_SYS_ADMIN)) >> > return -EPERM; >> > >> > + if (cmd == BPF_SYNCHRONIZE) { >> > + if (uattr != NULL || size != 0) >> > + return -EINVAL; >> > + err = security_bpf(cmd, NULL, 0); >> > + if (err < 0) >> > + return err; >> > + /* BPF programs are run with preempt disabled, so >> > + * synchronize_sched is sufficient even with >> > + * RCU_PREEMPT. >> > + */ >> > + synchronize_sched(); >> > + return 0; >> >> I don't think it's necessary. sys_membarrier() can do this already >> and some folks use it exactly for this use case. > > Alexei, the use of sys_membarrier for this purpose seems kind of weird to me > though. No where does the manpage say membarrier should be implemented this > way so what happens if the implementation changes? > > Further, membarrier manpage says that a memory barrier should be matched with > a matching barrier. In this use case there is no matching barrier, so it > makes it weirder. > > Lastly, sys_membarrier seems will not work on nohz-full systems, so its a bit > fragile to depend on it for this? > > case MEMBARRIER_CMD_GLOBAL: > /* MEMBARRIER_CMD_GLOBAL is not compatible with nohz_full. */ > if (tick_nohz_full_enabled()) > return -EINVAL; > if (num_online_cpus() > 1) > synchronize_sched(); > return 0; > > > Adding Mathieu as well who I believe is author/maintainer of membarrier. See commit 907565337 "Fix: Disable sys_membarrier when nohz_full is enabled" "Userspace applications should be allowed to expect the membarrier system call with MEMBARRIER_CMD_SHARED command to issue memory barriers on nohz_full CPUs, but synchronize_sched() does not take those into account." So AFAIU you'd want to re-use membarrier to issue synchronize_sched, and you only care about kernel preempt off critical sections. Clearly bpf code does not run in user-space, so it would "work". But the guarantees provided by membarrier are not to synchronize against preempt off per se. It's just that the current implementation happens to do that. The point of membarrier is to turn user-space memory barriers into compiler barriers. If what you need is to wait for a RCU grace period for whatever RCU flavor ebpf is using, I would against using membarrier for this. I would rather recommend adding a dedicated BPF_SYNCHRONIZE so you won't leak implementation details to user-space, *and* you can eventually change you RCU implementation for e.g. SRCU in the future if needed. Thanks, Mathieu > > thanks! > > - Joel -- Mathieu Desnoyers EfficiOS Inc. http://www.efficios.com