Received: by 2002:ac0:a5a7:0:0:0:0:0 with SMTP id m36-v6csp116106imm; Tue, 31 Jul 2018 14:58:07 -0700 (PDT) X-Google-Smtp-Source: AAOMgpffKEk12/V//1+sy59NdCweyla9wv2r1VvJuofwihFmz+jw5pslbSO30CxW8Sfe91cnozrJ X-Received: by 2002:a17:902:20e3:: with SMTP id v32-v6mr22374340plg.232.1533074287234; Tue, 31 Jul 2018 14:58:07 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1533074287; cv=none; d=google.com; s=arc-20160816; b=isZu81L2u21Ip/G5IL5UN2qEqczKkn9RnOBCe/lATLKC7Eetz3nxcUMGzXR7zGixAS ekl/Rh7UoiXFStfO+7pr/QnUDMT9DU+mGDI7UopvV5DriI2OwNhHJ/kozmNjXeSQwCYI ZKE+xMxsoICtZvze//8M+ZD346rWoik5UdNoVpfaUB9safnIJ/OVV6UaBIWqnpUTN1tU wVPTkeTe7wzqifbCM1wVMM8yEtkSeEUAGJD+MiT9tn07a5zCtCdYgktskHnSK5kfWwrh 9wBkXUePbQxU5RvxoeDO9WVw3LExtdJNK6YQRdo6q1eASWMUME4D7MG1BP21L5tItmww Lhmg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date:dkim-signature:arc-authentication-results; bh=XOgYa2JMBfUd+BYFAACjpWha32kLil2yShoK65f+9kM=; b=P/C3h5lCD6AWkOAAlGQruWMMYZ6X47+xdhlGCdVMdn8m3LvFSkAk5y4W0nDfUQnh0o ieLaZ8ocu/cqXvoyo2+Qyeo8qCa54UNDLkHjQcAlwF9dkuUAmjqqaYmpvBfup/fnGVdH 6bY9446/DlI0euywVz8dBa3lP1/sONn3ECToNeAIVRJeKiZAftJuS9pbJqzvc02eY+Oe TplLHEATRJ5BoUPu0zlk8LdwoF9iXmUiGEctO3SHD1uc0wp34KgBBoYU6/YgbuGPBe4F gecfYpTuA3fJ5MMkBQejUMvoADPxYYpVefronGL3L5YG1AWHvcj15JL/5MqIilqeOAXm S04Q== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20161025 header.b=veDgC1zS; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id x17-v6si12669449pln.465.2018.07.31.14.57.50; Tue, 31 Jul 2018 14:58:07 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20161025 header.b=veDgC1zS; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1732588AbeGaXjR (ORCPT + 99 others); Tue, 31 Jul 2018 19:39:17 -0400 Received: from mail-pg1-f195.google.com ([209.85.215.195]:41465 "EHLO mail-pg1-f195.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1729685AbeGaXjR (ORCPT ); Tue, 31 Jul 2018 19:39:17 -0400 Received: by mail-pg1-f195.google.com with SMTP id z8-v6so9725337pgu.8 for ; Tue, 31 Jul 2018 14:56:53 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to:user-agent; bh=XOgYa2JMBfUd+BYFAACjpWha32kLil2yShoK65f+9kM=; b=veDgC1zSMFcy1Ppeuw3MSJJyXe8lFO58aQ8IaTu5vUu1NqgKtn4fmBPnJsSILcYOTz fIgatLuEx58ZQfN6Y3vt4JUpQOWynJktTdy1gzsg/SMsYHLxf+JqQgiBxKChKOHZod3r NGaa03j2YRYOa2txlnnpCAoY380v6oHeYQKWFOn2qgLpI41RjVQZbkMyyFdw2VB8mZAK QcRCNFqOGS4wmKY/bSyOMGo5yTJrrWUmBMWSi8GTUtGVn6R94/HPiXM9yhNBut62t5DV vwfWgGTbVwLJY1fFZqWQ0CCoGyBnt74JD5cU30eOjWAZ0T/25My7NImObFfFjwnt71Pu zCCw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to:user-agent; bh=XOgYa2JMBfUd+BYFAACjpWha32kLil2yShoK65f+9kM=; b=MP2PYtg+7xSrwVPanBeGlTqDncS5/Amzh6XwhLDP1PLbrDpoPZLR6pMuufAtd89k/4 UiyTuO/Ko3052IPsnP4NRdcAXZdHo3pMH7NlrJDBx7ZjkvlsppYUFIJ5r2z1ERPcCKcp iHm7tf0VJxuDUF/OcWaeB5RE5mZxN7t3BP8ExlI+SyJyn7C3T1KLAUEpxtnNfYkatJQr Sgu7Q/ntxe4nMCiMk+rfb+crwWpV/WhQXQuraQQA0PYjZutj2r+fqSVvHdbFEQOwm4z0 jscmm9FqJuzFA1DNS8RTvshK7OUfJ8UHaOWSxzyF666Q+cKxnVUn/UnMZpipHVUW4fK9 NyjQ== X-Gm-Message-State: AOUpUlGa1ee90S+iZ+nH42y9fLwoHA5JDiu7mgaqCq7SGfBzAp8BQs6q yiYzpTTTezU4vD19L58pouXOeA== X-Received: by 2002:a65:6109:: with SMTP id z9-v6mr21925146pgu.243.1533074212983; Tue, 31 Jul 2018 14:56:52 -0700 (PDT) Received: from localhost ([2620:0:1000:1600:3122:ea9c:d178:eb]) by smtp.gmail.com with ESMTPSA id f18-v6sm38443951pff.29.2018.07.31.14.56.52 (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Tue, 31 Jul 2018 14:56:52 -0700 (PDT) Date: Tue, 31 Jul 2018 14:56:51 -0700 From: Joel Fernandes To: Y Song Cc: Joel Fernandes , Alexei Starovoitov , Daniel Colascione , LKML , Tim Murray , Network Development , Lorenzo Colitti , Chenbo Feng , Mathieu Desnoyers , Alexei Starovoitov , Daniel Borkmann Subject: Re: [PATCH v2] Add BPF_SYNCHRONIZE_MAPS bpf(2) command Message-ID: <20180731215651.GA141321@joelaf.mtv.corp.google.com> References: <20180731020122.GA22311@joelaf.mtv.corp.google.com> <20180731020629.GB22311@joelaf.mtv.corp.google.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.9.2 (2017-12-15) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, Jul 30, 2018 at 09:03:18PM -0700, Y Song wrote: > On Mon, Jul 30, 2018 at 7:06 PM, Joel Fernandes wrote: > > On Mon, Jul 30, 2018 at 07:01:22PM -0700, Joel Fernandes wrote: > >> On Sun, Jul 29, 2018 at 06:51:18PM +0300, Alexei Starovoitov wrote: > >> > On Thu, Jul 26, 2018 at 7:51 PM, Daniel Colascione wrote: > >> > > BPF_SYNCHRONIZE_MAPS waits for the release of any references to a BPF > >> > > map made by a BPF program that is running at the time the > >> > > BPF_SYNCHRONIZE_MAPS command is issued. The purpose of this command is > >> > > to provide a means for userspace to replace a BPF map with another, > >> > > newer version, then ensure that no component is still using the "old" > >> > > map before manipulating the "old" map in some way. > >> > > > >> > > Signed-off-by: Daniel Colascione > >> > > --- > >> > > include/uapi/linux/bpf.h | 9 +++++++++ > >> > > kernel/bpf/syscall.c | 13 +++++++++++++ > >> > > 2 files changed, 22 insertions(+) > >> > > > >> > > diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h > >> > > index b7db3261c62d..5b27e9117d3e 100644 > >> > > --- a/include/uapi/linux/bpf.h > >> > > +++ b/include/uapi/linux/bpf.h > >> > > @@ -75,6 +75,14 @@ struct bpf_lpm_trie_key { > >> > > __u8 data[0]; /* Arbitrary size */ > >> > > }; > >> > > > >> > > +/* BPF_SYNCHRONIZE_MAPS waits for the release of any references to a > >> > > + * BPF map made by a BPF program that is running at the time the > >> > > + * BPF_SYNCHRONIZE_MAPS command is issued. The purpose of this command > >> > > >> > that doesn't sound right to me. > >> > such command won't wait for the release of the references. > >> > in case of map-in-map the program does not hold > >> > the references to inner map (only to outer map). > >> > >> I didn't follow this completely. > >> > >> The userspace program is using the inner map per your description of the > >> algorithm for using map-in-map to solve the race conditions that this patch > >> is trying to address: > >> > >> If you don't mind, I copy-pasted it below from your netdev post: > >> > >> if you use map-in-map you don't need extra boolean map. > >> 0. bpf prog can do > >> inner_map = lookup(map_in_map, key=0); > >> lookup(inner_map, your_real_key); > >> 1. user space writes into map_in_map[0] <- FD of new map > >> 2. some cpus are using old inner map and some a new > >> 3. user space does sys_membarrier(CMD_GLOBAL) which will do synchronize_sched() > >> which in CONFIG_PREEMPT_NONE=y servers is the same as synchronize_rcu() > >> which will guarantee that progs finished. > >> 4. scan old inner map > >> > >> In step 2, as you mentioned there are CPUs using different inner maps. So > >> could you clarify how the synchronize_rcu mechanism will even work if you're > >> now saying "program does not hold references to the inner maps"? > > The program only held references to the outer maps, and the outer map > held references to the inner maps. The user space program can add/remove > the inner map for a particular outer map while the prog <-> outer-map > relationship is not changed. My definition of "reference" in this context is protection by rcu_read_lock. So I was concerned the above map-in-map access isn't protected as such when Alexei said "program doesn't have reference on inner map" in the above steps. Maybe I misunderstood what is the meaning of reference here. To make the map-in-map thing to work for Chenbo/Lorenzo's usecase, both the access of outer map at key=0 and the inner map have to protect by rcu_read_lock so that the membarrier call will work. So basically step 0 in the steps above should be rcu_read_lock protected to satisfy Chenbo/Lorenzo's usecase. I know today the entire program is run as preempt disabled (unless something changed) so this shouldn't be a problem, but in the future if the verifier is doing similar things at a finer grainer level, then the above has to be taken into consideration. Does that make sense or am I missing something? thanks, - Joel