Received: by 2002:a25:7ec1:0:0:0:0:0 with SMTP id z184csp927177ybc; Tue, 19 Nov 2019 11:32:27 -0800 (PST) X-Google-Smtp-Source: APXvYqzWOLyjRhb+uRQuxp5x0RLr86o8jOBjU2oeVJgMlfEZGeidKPBrh92tgK+n04hjgNNcwtFa X-Received: by 2002:a05:600c:28c:: with SMTP id 12mr7677509wmk.25.1574191947554; Tue, 19 Nov 2019 11:32:27 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1574191947; cv=none; d=google.com; s=arc-20160816; b=c19G/YWT3ZJOvlRk9eZ4sM/z7CzJaFGukFsNU8VqtDb6MwjyeRs/NfdxRdE205RNhf ETlSq8ARzccMFM3V7Ee99x/x6+06QDNReq93A7AQ/rWWAgZBZ4rYugI+qQJCKOTWv74t Ug/jQzNahnpfWQQK6QbodWIgCFrVJSG8XOatU4Mny9dSl4iCL+I4FLimKeY2+4xLwYwQ 6EThjq+z8eJK7vMOQBfAL7pR/2dSBk6OuW/4lbqlIIwkltsBevpTYVjKKfyzuyK6OZYk b5UaREo44sJ9bAkHRhMGSEIemPpN01x6nAWZBB3CHUoCbEjdAV4tD3y2VjGoGYcylPvn OVGg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:cc:to:from:subject:references :mime-version:message-id:in-reply-to:date:dkim-signature; bh=hBTV2OVOS+v8kQKNTas6sjat//bLzlmUxL3n7cHoG5Y=; b=pcKmef80z6kOzUxjwUHFe7OLPMDYa0FDjZw4/GN7DaVnpJE4j+3aISxiigQfkKv6IT LWTixbiCynNrbjrie94/ntOuQCR1DaWqaDdQ/KZ8PFcebFFhF9EJx+3dcjD337oiNBDL cWBgdZnFN+GrQk9P3pCdLBpCZna696UTBrDd4zS6DHKo38yDmckzhCfuwZa0yUNpZXuQ woeyZm6yZgXIsC7aIy7wcskgAqXS0mdAt67tmP8psXvcx5Nd4sM0x3jbImEjlBAC+xiG z1Tmwy1xq6GxieaW4+MyVuUBN4mbHomuROdU9oatKW+FLXxdWLxUMKLLxiUEslz2lymQ cSDQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20161025 header.b=nI0IQZFS; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id f7si15793833edy.92.2019.11.19.11.32.04; Tue, 19 Nov 2019 11:32:27 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20161025 header.b=nI0IQZFS; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727486AbfKSTbA (ORCPT + 99 others); Tue, 19 Nov 2019 14:31:00 -0500 Received: from mail-pf1-f201.google.com ([209.85.210.201]:51335 "EHLO mail-pf1-f201.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727474AbfKSTa6 (ORCPT ); Tue, 19 Nov 2019 14:30:58 -0500 Received: by mail-pf1-f201.google.com with SMTP id r29so5811037pfq.18 for ; Tue, 19 Nov 2019 11:30:56 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=date:in-reply-to:message-id:mime-version:references:subject:from:to :cc; bh=hBTV2OVOS+v8kQKNTas6sjat//bLzlmUxL3n7cHoG5Y=; b=nI0IQZFSv4DaaqUXa2pdOg8ruFkFRjPfPzFg/21BQnYhJX90E5O8wd1AMdIJgBHtd1 dTbMjbCEGvxdSor2owRB/cf1xHVImFpw8q3U1jjzMLg002nB4uX4VLq4q5qzdd0RMaXU v41bgl/klHFQWORHO3tuNB7XcyyBu6sAYucSo1T7HutZ6wmUbb5XCd2WQzYgX74EoTGz 3WmP+y9+NCcflnyaDyAYDfMnpeSn6+d9kobjZWq0q04r7n4xd6hv3h0chkH4kzoO9v7d v3hDK6g42k1MvYGFwtAJD7OrkMxhZSAAjyU6XaDZmAbSXQicDoI4fh+EF2LMbXJxj0bQ e/Tg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:in-reply-to:message-id:mime-version :references:subject:from:to:cc; bh=hBTV2OVOS+v8kQKNTas6sjat//bLzlmUxL3n7cHoG5Y=; b=BjHJKKAz+tONs3QIyIqORVucBcEJ6ntgVoZie7rUYWHFQKklPbzuUs4Md7cShhXyzy MTzRMGfHD0DrhwNRH8i4VcrIUMe/0WZZNBErwprXM2CPoh70i02sqGNmm74UYRsSZtF1 k2eTOcbibjwAowqRF0kaPjaFELWqsOnkb57+xdi04YdBXeh4E9vHB1d3oG5hdy+P8MFN wj1ieaeuoK9UXMrTjJyQfE1KIbCaVSG77dBztjmTBPDvQj1j2rNLkeCM+jsJQqIV/8F/ Zcd7i8KC42VPugKxtWfW1hjUWgMD6D8IqvZo12onHu/QEaHzz9pBTa/l5mjnSQEHGHiI LXgQ== X-Gm-Message-State: APjAAAUkOeykNatRL1xBrTI+tiVNwxI4vK8q4V21c5fxSDUP2eQgcdf7 F4sG88CWCaVdfbDKzd/Qhcqdo61N4qKl X-Received: by 2002:a63:4562:: with SMTP id u34mr7239596pgk.399.1574191855970; Tue, 19 Nov 2019 11:30:55 -0800 (PST) Date: Tue, 19 Nov 2019 11:30:32 -0800 In-Reply-To: <20191119193036.92831-1-brianvv@google.com> Message-Id: <20191119193036.92831-6-brianvv@google.com> Mime-Version: 1.0 References: <20191119193036.92831-1-brianvv@google.com> X-Mailer: git-send-email 2.24.0.432.g9d3f5f5b63-goog Subject: [PATCH v2 bpf-next 5/9] bpf: add batch ops to all htab bpf map From: Brian Vazquez To: Brian Vazquez , Alexei Starovoitov , Daniel Borkmann , "David S . Miller" Cc: Yonghong Song , Stanislav Fomichev , Petar Penkov , Willem de Bruijn , linux-kernel@vger.kernel.org, netdev@vger.kernel.org, bpf@vger.kernel.org, Brian Vazquez Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org From: Yonghong Song htab can't use generic batch support due some problematic behaviours inherent to the data structre, i.e. while iterating the bpf map a concurrent program might delete the next entry that batch was about to use, in that case there's no easy solution to retrieve the next entry, the issue has been discussed multiple times (see [1] and [2]). The only way hmap can be traversed without the problem previously exposed is by making sure that the map is traversing entire buckets. This commit implements those strict requirements for hmap, the implementation follows the same interaction that generic support with some exceptions: - If keys/values buffer are not big enough to traverse a bucket, ENOSPC will be returned. - out_batch contains the value of the next bucket in the iteration, not the next key, but this is transparent for the user since the user should never use out_batch for other than bpf batch syscalls. Note that only lookup and lookup_and_delete batch ops require the hmap specific implementation, update/delete batch ops can be the generic ones. [1] https://lore.kernel.org/bpf/20190724165803.87470-1-brianvv@google.com/ [2] https://lore.kernel.org/bpf/20190906225434.3635421-1-yhs@fb.com/ Signed-off-by: Yonghong Song Signed-off-by: Brian Vazquez --- kernel/bpf/hashtab.c | 244 +++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 244 insertions(+) diff --git a/kernel/bpf/hashtab.c b/kernel/bpf/hashtab.c index 22066a62c8c97..3402174b292ea 100644 --- a/kernel/bpf/hashtab.c +++ b/kernel/bpf/hashtab.c @@ -17,6 +17,17 @@ (BPF_F_NO_PREALLOC | BPF_F_NO_COMMON_LRU | BPF_F_NUMA_NODE | \ BPF_F_ACCESS_MASK | BPF_F_ZERO_SEED) +#define BATCH_OPS(_name) \ + .map_lookup_batch = \ + _name##_map_lookup_batch, \ + .map_lookup_and_delete_batch = \ + _name##_map_lookup_and_delete_batch, \ + .map_update_batch = \ + generic_map_update_batch, \ + .map_delete_batch = \ + generic_map_delete_batch + + struct bucket { struct hlist_nulls_head head; raw_spinlock_t lock; @@ -1232,6 +1243,235 @@ static void htab_map_seq_show_elem(struct bpf_map *map, void *key, rcu_read_unlock(); } +static int +__htab_map_lookup_and_delete_batch(struct bpf_map *map, + const union bpf_attr *attr, + union bpf_attr __user *uattr, + bool do_delete, bool is_lru_map, + bool is_percpu) +{ + struct bpf_htab *htab = container_of(map, struct bpf_htab, map); + u32 bucket_cnt, total, key_size, value_size, roundup_key_size; + void *keys = NULL, *values = NULL, *value, *dst_key, *dst_val; + void __user *ukeys, *uvalues, *ubatch; + u64 elem_map_flags, map_flags; + struct hlist_nulls_head *head; + struct hlist_nulls_node *n; + u32 batch, max_count, size; + unsigned long flags; + struct htab_elem *l; + struct bucket *b; + int ret = 0; + + max_count = attr->batch.count; + if (!max_count) + return 0; + + elem_map_flags = attr->batch.elem_flags; + if ((elem_map_flags & ~BPF_F_LOCK) || + ((elem_map_flags & BPF_F_LOCK) && !map_value_has_spin_lock(map))) + return -EINVAL; + + map_flags = attr->batch.flags; + if (map_flags) + return -EINVAL; + + batch = 0; + ubatch = u64_to_user_ptr(attr->batch.in_batch); + if (ubatch && copy_from_user(&batch, ubatch, sizeof(batch))) + return -EFAULT; + + if (batch >= htab->n_buckets) + return -ENOENT; + + /* We cannot do copy_from_user or copy_to_user inside + * the rcu_read_lock. Allocate enough space here. + */ + key_size = htab->map.key_size; + roundup_key_size = round_up(htab->map.key_size, 8); + value_size = htab->map.value_size; + size = round_up(value_size, 8); + if (is_percpu) + value_size = size * num_possible_cpus(); + keys = kvmalloc(key_size * max_count, GFP_USER | __GFP_NOWARN); + values = kvmalloc(value_size * max_count, GFP_USER | __GFP_NOWARN); + if (!keys || !values) { + ret = -ENOMEM; + goto out; + } + + dst_key = keys; + dst_val = values; + total = 0; + + preempt_disable(); + this_cpu_inc(bpf_prog_active); + rcu_read_lock(); + +again: + b = &htab->buckets[batch]; + head = &b->head; + raw_spin_lock_irqsave(&b->lock, flags); + + bucket_cnt = 0; + hlist_nulls_for_each_entry_rcu(l, n, head, hash_node) + bucket_cnt++; + + if (bucket_cnt > (max_count - total)) { + if (total == 0) + ret = -ENOSPC; + goto after_loop; + } + + hlist_nulls_for_each_entry_rcu(l, n, head, hash_node) { + memcpy(dst_key, l->key, key_size); + + if (is_percpu) { + int off = 0, cpu; + void __percpu *pptr; + + pptr = htab_elem_get_ptr(l, map->key_size); + for_each_possible_cpu(cpu) { + bpf_long_memcpy(dst_val + off, + per_cpu_ptr(pptr, cpu), size); + off += size; + } + } else { + value = l->key + roundup_key_size; + if (elem_map_flags & BPF_F_LOCK) + copy_map_value_locked(map, dst_val, value, + true); + else + copy_map_value(map, dst_val, value); + check_and_init_map_lock(map, dst_val); + } + dst_key += key_size; + dst_val += value_size; + total++; + } + + if (do_delete) { + hlist_nulls_for_each_entry_rcu(l, n, head, hash_node) { + hlist_nulls_del_rcu(&l->hash_node); + if (is_lru_map) + bpf_lru_push_free(&htab->lru, &l->lru_node); + else + free_htab_elem(htab, l); + } + } + + batch++; + if (batch >= htab->n_buckets) { + ret = -ENOENT; + goto after_loop; + } + + raw_spin_unlock_irqrestore(&b->lock, flags); + goto again; + +after_loop: + raw_spin_unlock_irqrestore(&b->lock, flags); + + rcu_read_unlock(); + this_cpu_dec(bpf_prog_active); + preempt_enable(); + + if (ret && ret != -ENOENT) + goto out; + + /* copy data back to user */ + ukeys = u64_to_user_ptr(attr->batch.keys); + uvalues = u64_to_user_ptr(attr->batch.values); + ubatch = u64_to_user_ptr(attr->batch.out_batch); + if (copy_to_user(ubatch, &batch, sizeof(batch)) || + copy_to_user(ukeys, keys, total * key_size) || + copy_to_user(uvalues, values, total * value_size) || + put_user(total, &uattr->batch.count)) + ret = -EFAULT; + +out: + kvfree(keys); + kvfree(values); + return ret; +} + +static int +htab_percpu_map_lookup_batch(struct bpf_map *map, const union bpf_attr *attr, + union bpf_attr __user *uattr) +{ + return __htab_map_lookup_and_delete_batch(map, attr, uattr, false, + false, true); +} + +static int +htab_percpu_map_lookup_and_delete_batch(struct bpf_map *map, + const union bpf_attr *attr, + union bpf_attr __user *uattr) +{ + return __htab_map_lookup_and_delete_batch(map, attr, uattr, true, + false, true); +} + +static int +htab_map_lookup_batch(struct bpf_map *map, const union bpf_attr *attr, + union bpf_attr __user *uattr) +{ + return __htab_map_lookup_and_delete_batch(map, attr, uattr, false, + false, false); +} + +static int +htab_map_lookup_and_delete_batch(struct bpf_map *map, + const union bpf_attr *attr, + union bpf_attr __user *uattr) +{ + return __htab_map_lookup_and_delete_batch(map, attr, uattr, true, + false, false); +} + +static int +htab_map_delete_batch(struct bpf_map *map, + const union bpf_attr *attr, + union bpf_attr __user *uattr) +{ + return generic_map_delete_batch(map, attr, uattr); +} + +static int +htab_lru_percpu_map_lookup_batch(struct bpf_map *map, + const union bpf_attr *attr, + union bpf_attr __user *uattr) +{ + return __htab_map_lookup_and_delete_batch(map, attr, uattr, false, + true, true); +} + +static int +htab_lru_percpu_map_lookup_and_delete_batch(struct bpf_map *map, + const union bpf_attr *attr, + union bpf_attr __user *uattr) +{ + return __htab_map_lookup_and_delete_batch(map, attr, uattr, true, + true, true); +} + +static int +htab_lru_map_lookup_batch(struct bpf_map *map, const union bpf_attr *attr, + union bpf_attr __user *uattr) +{ + return __htab_map_lookup_and_delete_batch(map, attr, uattr, false, + true, false); +} + +static int +htab_lru_map_lookup_and_delete_batch(struct bpf_map *map, + const union bpf_attr *attr, + union bpf_attr __user *uattr) +{ + return __htab_map_lookup_and_delete_batch(map, attr, uattr, true, + true, false); +} + const struct bpf_map_ops htab_map_ops = { .map_alloc_check = htab_map_alloc_check, .map_alloc = htab_map_alloc, @@ -1242,6 +1482,7 @@ const struct bpf_map_ops htab_map_ops = { .map_delete_elem = htab_map_delete_elem, .map_gen_lookup = htab_map_gen_lookup, .map_seq_show_elem = htab_map_seq_show_elem, + BATCH_OPS(htab), }; const struct bpf_map_ops htab_lru_map_ops = { @@ -1255,6 +1496,7 @@ const struct bpf_map_ops htab_lru_map_ops = { .map_delete_elem = htab_lru_map_delete_elem, .map_gen_lookup = htab_lru_map_gen_lookup, .map_seq_show_elem = htab_map_seq_show_elem, + BATCH_OPS(htab_lru), }; /* Called from eBPF program */ @@ -1368,6 +1610,7 @@ const struct bpf_map_ops htab_percpu_map_ops = { .map_update_elem = htab_percpu_map_update_elem, .map_delete_elem = htab_map_delete_elem, .map_seq_show_elem = htab_percpu_map_seq_show_elem, + BATCH_OPS(htab_percpu), }; const struct bpf_map_ops htab_lru_percpu_map_ops = { @@ -1379,6 +1622,7 @@ const struct bpf_map_ops htab_lru_percpu_map_ops = { .map_update_elem = htab_lru_percpu_map_update_elem, .map_delete_elem = htab_lru_map_delete_elem, .map_seq_show_elem = htab_percpu_map_seq_show_elem, + BATCH_OPS(htab_lru_percpu), }; static int fd_htab_map_alloc_check(union bpf_attr *attr) -- 2.24.0.432.g9d3f5f5b63-goog