Received: by 2002:ac0:a5a7:0:0:0:0:0 with SMTP id m36-v6csp3787004imm; Mon, 30 Jul 2018 03:34:09 -0700 (PDT) X-Google-Smtp-Source: AAOMgpcvt849uX7OJvSUuhMltdn5agt653pzHquo3LwhKgOgLzHtCUOQ1wwzioC6kKear/z6XqeL X-Received: by 2002:a63:8548:: with SMTP id u69-v6mr15828635pgd.346.1532946849267; Mon, 30 Jul 2018 03:34:09 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1532946849; cv=none; d=google.com; s=arc-20160816; b=AoYrloEbvT437GMIk2kEpVuAYiwfatMDp+XfBfXsmYdD17UMHtqSyb45PTNFLdzAOe JNQNJw4FqM/EMbmIWrGdaIaeiMpXq5WI5OaHDImG1jGNMxr8TYzLROVf+9OCpopACnin gJktpHsHPmcOof4N8IauwTus3w35ufU9p4yM2ArFQEmbwhcHS05u5iu7UqQzH3fCyzuf VC0lnItNGMkWtg8zfNkxGndhJCGHfvsTg6a1FHwT+xeszi6wzBoatG7J1LNc8EfzUaXl dzZenrZLHut4XKsza5BN3XtXKnHGgxU25VU6AclKKQeZnrFQPR5L2cZP4o4IdV6J5JLC I+cQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding :content-language:in-reply-to:mime-version:user-agent:date :message-id:from:references:cc:to:subject:arc-authentication-results; bh=skWhvitSMEvJ018PLdBuOq81aJE/oeyzEbECfb+J20E=; b=oPwGkfJa203PwRMCrOYS3AKgXAQVn0bebLrszNT26wFSyOiz9TTNJ1zT3pFckvrRsR JqlWvth1q7uIHqlhp9+n4dhZ0aCRpBfLQzZfyDiALIPeg0CR3Dy6ORTrgA4osU4Uo0PK QkWtW+1dH8tgRFY6Wc+SJofSy1wUAO8kSVGmfrn+mzYoTNTrXWygSITqOeFNRyqObbDU JQA5bnIngMzTFOmI8+imQ7790CCRVFIYhwKzIyKeZkM8xTR3mboxExHskY3rWrz4fPsq eE3SGWv17VLnwJDJHAyXIF2FEI6a69YMXW899Ek9dwQqaluKAJAzLR7wquRp8+29HSpN xY/A== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id c20-v6si10947679pgg.367.2018.07.30.03.33.54; Mon, 30 Jul 2018 03:34:09 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726935AbeG3LiV (ORCPT + 99 others); Mon, 30 Jul 2018 07:38:21 -0400 Received: from www62.your-server.de ([213.133.104.62]:44206 "EHLO www62.your-server.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726755AbeG3LiV (ORCPT ); Mon, 30 Jul 2018 07:38:21 -0400 Received: from [78.46.172.3] (helo=sslproxy06.your-server.de) by www62.your-server.de with esmtpsa (TLSv1.2:DHE-RSA-AES256-GCM-SHA384:256) (Exim 4.85_2) (envelope-from ) id 1fk522-0002E4-CZ; Mon, 30 Jul 2018 12:04:02 +0200 Received: from [62.203.87.61] (helo=linux.home) by sslproxy06.your-server.de with esmtpsa (TLSv1.2:ECDHE-RSA-AES256-GCM-SHA384:256) (Exim 4.89) (envelope-from ) id 1fk522-000KsY-7b; Mon, 30 Jul 2018 12:04:02 +0200 Subject: Re: [PATCH v3] Add BPF_SYNCHRONIZE_MAP_TO_MAP_REFERENCES bpf(2) command To: Daniel Colascione , joelaf@google.com Cc: linux-kernel@vger.kernel.org, timmurray@google.com, netdev@vger.kernel.org, Alexei Starovoitov , Lorenzo Colitti , Chenbo Feng , Mathieu Desnoyers , Alexei Starovoitov References: <20180729205835.34850-1-dancol@google.com> From: Daniel Borkmann Message-ID: Date: Mon, 30 Jul 2018 12:04:01 +0200 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.3.0 MIME-Version: 1.0 In-Reply-To: <20180729205835.34850-1-dancol@google.com> Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 7bit X-Authenticated-Sender: daniel@iogearbox.net X-Virus-Scanned: Clear (ClamAV 0.100.0/24796/Mon Jul 30 10:41:17 2018) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 07/29/2018 10:58 PM, Daniel Colascione wrote: > BPF_SYNCHRONIZE_MAP_TO_MAP_REFERENCES waits for the release of all > references to maps active at the instant the > BPF_SYNCHRONIZE_MAP_TO_MAP_REFERENCES is > issued. BPF_SYNCHRONIZE_MAP_TO_MAP_REFERENCES waits only for the > expiration of map references obtained by BPF programs from other maps. > > The purpose of this command is to provide a means for userspace to > replace a BPF map with another, newer version, then ensure that no > component is still using the "old" map before manipulating the "old" > map in some way. > > Signed-off-by: Daniel Colascione > --- > include/uapi/linux/bpf.h | 14 ++++++++++++++ > kernel/bpf/syscall.c | 13 +++++++++++++ > 2 files changed, 27 insertions(+) > > diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h > index b7db3261c62d..ca3cfca76edc 100644 > --- a/include/uapi/linux/bpf.h > +++ b/include/uapi/linux/bpf.h > @@ -75,6 +75,19 @@ struct bpf_lpm_trie_key { > __u8 data[0]; /* Arbitrary size */ > }; > > +/* BPF_SYNCHRONIZE_MAP_TO_MAP_REFERENCES waits for the release of all > + * references to maps active at the instant the > + * BPF_SYNCHRONIZE_MAP_TO_MAP_REFERENCES is > + * issued. BPF_SYNCHRONIZE_MAP_TO_MAP_REFERENCES waits only for the > + * expiration of map references obtained by BPF programs from > + * other maps. > + * > + * The purpose of this command is to provide a means for userspace to > + * replace a BPF map with another, newer version, then ensure that no > + * component is still using the "old" map before manipulating the > + * "old" map in some way. > + */ > + > /* BPF syscall commands, see bpf(2) man-page for details. */ > enum bpf_cmd { > BPF_MAP_CREATE, > @@ -98,6 +111,7 @@ enum bpf_cmd { > BPF_BTF_LOAD, > BPF_BTF_GET_FD_BY_ID, > BPF_TASK_FD_QUERY, > + BPF_SYNCHRONIZE_MAP_TO_MAP_REFERENCES, > }; > > enum bpf_map_type { > diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c > index a31a1ba0f8ea..bc9a0713f47d 100644 > --- a/kernel/bpf/syscall.c > +++ b/kernel/bpf/syscall.c > @@ -2274,6 +2274,19 @@ SYSCALL_DEFINE3(bpf, int, cmd, union bpf_attr __user *, uattr, unsigned int, siz > if (sysctl_unprivileged_bpf_disabled && !capable(CAP_SYS_ADMIN)) > return -EPERM; > > + if (cmd == BPF_SYNCHRONIZE_MAP_TO_MAP_REFERENCES) { > + if (uattr != NULL || size != 0) > + return -EINVAL; > + err = security_bpf(cmd, NULL, 0); > + if (err < 0) > + return err; > + /* BPF programs always enter a critical section while > + * they have a map reference outstanding. > + */ > + synchronize_rcu(); > + return 0; > + } > + > err = bpf_check_uarg_tail_zero(uattr, sizeof(attr), size); > if (err) > return err; > Hmm, I don't think such UAPI as above is future-proof. In case we would want a similar mechanism in future for other maps, we would need a whole new bpf command or reuse BPF_SYNCHRONIZE_MAP_TO_MAP_REFERENCES as a workaround though the underlying map may not even be a map-to-map. Additionally, we don't have any map object at hand in the above, so we couldn't make any finer grained decisions either. Something like below would be more suitable and leaves room for extending this further in future. Thanks, Daniel From 8dfea71b73fa0d402633b76f78c106e82a7a5007 Mon Sep 17 00:00:00 2001 From: Daniel Borkmann Date: Mon, 30 Jul 2018 11:47:37 +0200 Subject: [PATCH] sync map refs Signed-off-by: Daniel Borkmann --- include/linux/bpf.h | 1 + include/uapi/linux/bpf.h | 1 + kernel/bpf/arraymap.c | 1 + kernel/bpf/hashtab.c | 1 + kernel/bpf/map_in_map.c | 6 ++++++ kernel/bpf/map_in_map.h | 1 + kernel/bpf/syscall.c | 24 ++++++++++++++++++++++++ 7 files changed, 35 insertions(+) diff --git a/include/linux/bpf.h b/include/linux/bpf.h index 5b5ad95..7b51f86 100644 --- a/include/linux/bpf.h +++ b/include/linux/bpf.h @@ -34,6 +34,7 @@ struct bpf_map_ops { void (*map_free)(struct bpf_map *map); int (*map_get_next_key)(struct bpf_map *map, void *key, void *next_key); void (*map_release_uref)(struct bpf_map *map); + int (*map_sync_refs)(struct bpf_map *map); /* funcs callable from userspace and from eBPF programs */ void *(*map_lookup_elem)(struct bpf_map *map, void *key); diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h index 8701139..e6ec1de 100644 --- a/include/uapi/linux/bpf.h +++ b/include/uapi/linux/bpf.h @@ -98,6 +98,7 @@ enum bpf_cmd { BPF_BTF_LOAD, BPF_BTF_GET_FD_BY_ID, BPF_TASK_FD_QUERY, + BPF_MAP_SYNC_REFS, }; enum bpf_map_type { diff --git a/kernel/bpf/arraymap.c b/kernel/bpf/arraymap.c index 544e58f..ddaf42a 100644 --- a/kernel/bpf/arraymap.c +++ b/kernel/bpf/arraymap.c @@ -748,5 +748,6 @@ const struct bpf_map_ops array_of_maps_map_ops = { .map_fd_get_ptr = bpf_map_fd_get_ptr, .map_fd_put_ptr = bpf_map_fd_put_ptr, .map_fd_sys_lookup_elem = bpf_map_fd_sys_lookup_elem, + .map_sync_refs = bpf_map_sync_refs, .map_gen_lookup = array_of_map_gen_lookup, }; diff --git a/kernel/bpf/hashtab.c b/kernel/bpf/hashtab.c index 513d9df..05380ea 100644 --- a/kernel/bpf/hashtab.c +++ b/kernel/bpf/hashtab.c @@ -1407,5 +1407,6 @@ const struct bpf_map_ops htab_of_maps_map_ops = { .map_fd_get_ptr = bpf_map_fd_get_ptr, .map_fd_put_ptr = bpf_map_fd_put_ptr, .map_fd_sys_lookup_elem = bpf_map_fd_sys_lookup_elem, + .map_sync_refs = bpf_map_sync_refs, .map_gen_lookup = htab_of_map_gen_lookup, }; diff --git a/kernel/bpf/map_in_map.c b/kernel/bpf/map_in_map.c index 1da5746..698a50f 100644 --- a/kernel/bpf/map_in_map.c +++ b/kernel/bpf/map_in_map.c @@ -96,6 +96,12 @@ void bpf_map_fd_put_ptr(void *ptr) bpf_map_put(ptr); } +int bpf_map_sync_refs(struct bpf_map *map) +{ + synchronize_rcu(); + return 0; +} + u32 bpf_map_fd_sys_lookup_elem(void *ptr) { return ((struct bpf_map *)ptr)->id; diff --git a/kernel/bpf/map_in_map.h b/kernel/bpf/map_in_map.h index 6183db9..ac02456 100644 --- a/kernel/bpf/map_in_map.h +++ b/kernel/bpf/map_in_map.h @@ -19,6 +19,7 @@ bool bpf_map_meta_equal(const struct bpf_map *meta0, void *bpf_map_fd_get_ptr(struct bpf_map *map, struct file *map_file, int ufd); void bpf_map_fd_put_ptr(void *ptr); +int bpf_map_sync_refs(struct bpf_map *map); u32 bpf_map_fd_sys_lookup_elem(void *ptr); #endif diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c index a31a1ba..b1286cc 100644 --- a/kernel/bpf/syscall.c +++ b/kernel/bpf/syscall.c @@ -896,6 +896,27 @@ static int map_get_next_key(union bpf_attr *attr) return err; } +#define BPF_MAP_SYNC_REFS_LAST_FIELD map_fd + +static int map_sync_refs(union bpf_attr *attr) +{ + int err = -ENOTSUPP, ufd = attr->map_fd; + struct bpf_map *map; + struct fd f; + + if (CHECK_ATTR(BPF_MAP_SYNC_REFS)) + return -EINVAL; + + f = fdget(ufd); + map = __bpf_map_get(f); + if (IS_ERR(map)) + return PTR_ERR(map); + if (map->ops->map_sync_refs) + err = map->ops->map_sync_refs(map); + fdput(f); + return err; +} + static const struct bpf_prog_ops * const bpf_prog_types[] = { #define BPF_PROG_TYPE(_id, _name) \ [_id] = & _name ## _prog_ops, @@ -2303,6 +2324,9 @@ SYSCALL_DEFINE3(bpf, int, cmd, union bpf_attr __user *, uattr, unsigned int, siz case BPF_MAP_GET_NEXT_KEY: err = map_get_next_key(&attr); break; + case BPF_MAP_SYNC_REFS: + err = map_sync_refs(&attr); + break; case BPF_PROG_LOAD: err = bpf_prog_load(&attr); break; -- 2.9.5