LinuxLists.cc - [PATCH v2 bpf] bpf: Do not grab the bucket spinlock by default on htab batch ops

2020-02-18 17:26:24

Subject: [PATCH v2 bpf] bpf: Do not grab the bucket spinlock by default on htab batch ops

Grabbing the spinlock for every bucket even if it's empty, was causing
significant perfomance cost when traversing htab maps that have only a
few entries. This patch addresses the issue by checking first the
bucket_cnt, if the bucket has some entries then we go and grab the
spinlock and proceed with the batching.

Tested with a htab of size 50K and different value of populated entries.

Before:
Benchmark Time(ns) CPU(ns)
---------------------------------------------
BM_DumpHashMap/1 2759655 2752033
BM_DumpHashMap/10 2933722 2930825
BM_DumpHashMap/200 3171680 3170265
BM_DumpHashMap/500 3639607 3635511
BM_DumpHashMap/1000 4369008 4364981
BM_DumpHashMap/5k 11171919 11134028
BM_DumpHashMap/20k 69150080 69033496
BM_DumpHashMap/39k 190501036 190226162

After:
Benchmark Time(ns) CPU(ns)
---------------------------------------------
BM_DumpHashMap/1 202707 200109
BM_DumpHashMap/10 213441 210569
BM_DumpHashMap/200 478641 472350
BM_DumpHashMap/500 980061 967102
BM_DumpHashMap/1000 1863835 1839575
BM_DumpHashMap/5k 8961836 8902540
BM_DumpHashMap/20k 69761497 69322756
BM_DumpHashMap/39k 187437830 186551111

Fixes: 057996380a42 ("bpf: Add batch ops to all htab bpf map")
Cc: Yonghong Song <[email protected]>
Signed-off-by: Brian Vazquez <[email protected]>
---
v1 -> v2: Skip hlist_nulls_for_each_entry_safe if lock is not held

kernel/bpf/hashtab.c | 22 +++++++++++++++++++++-
1 file changed, 21 insertions(+), 1 deletion(-)

diff --git a/kernel/bpf/hashtab.c b/kernel/bpf/hashtab.c
index 2d182c4ee9d99..ea3bf04a0a7b6 100644
--- a/kernel/bpf/hashtab.c
+++ b/kernel/bpf/hashtab.c
@@ -1260,6 +1260,7 @@ __htab_map_lookup_and_delete_batch(struct bpf_map *map,
struct hlist_nulls_head *head;
struct hlist_nulls_node *n;
unsigned long flags;
+ bool locked = false;
struct htab_elem *l;
struct bucket *b;
int ret = 0;
@@ -1319,15 +1320,25 @@ __htab_map_lookup_and_delete_batch(struct bpf_map *map,
dst_val = values;
b = &htab->buckets[batch];
head = &b->head;
- raw_spin_lock_irqsave(&b->lock, flags);
+ /* do not grab the lock unless need it (bucket_cnt > 0). */
+ if (locked)
+ raw_spin_lock_irqsave(&b->lock, flags);

bucket_cnt = 0;
hlist_nulls_for_each_entry_rcu(l, n, head, hash_node)
bucket_cnt++;

+ if (bucket_cnt && !locked) {
+ locked = true;
+ goto again_nocopy;
+ }
+
if (bucket_cnt > (max_count - total)) {
if (total == 0)
ret = -ENOSPC;
+ /* Note that since bucket_cnt > 0 here, it is implicit
+ * that the locked was grabbed, so release it.
+ */
raw_spin_unlock_irqrestore(&b->lock, flags);
rcu_read_unlock();
this_cpu_dec(bpf_prog_active);
@@ -1337,6 +1348,9 @@ __htab_map_lookup_and_delete_batch(struct bpf_map *map,

if (bucket_cnt > bucket_size) {
bucket_size = bucket_cnt;
+ /* Note that since bucket_cnt > 0 here, it is implicit
+ * that the locked was grabbed, so release it.
+ */
raw_spin_unlock_irqrestore(&b->lock, flags);
rcu_read_unlock();
this_cpu_dec(bpf_prog_active);
@@ -1346,6 +1360,10 @@ __htab_map_lookup_and_delete_batch(struct bpf_map *map,
goto alloc;
}

+ /* Next block is only safe to run if you have grabbed the lock */
+ if (!locked)
+ goto next_batch;
+
hlist_nulls_for_each_entry_safe(l, n, head, hash_node) {
memcpy(dst_key, l->key, key_size);

@@ -1380,6 +1398,8 @@ __htab_map_lookup_and_delete_batch(struct bpf_map *map,
}

raw_spin_unlock_irqrestore(&b->lock, flags);
+ locked = false;
+next_batch:
/* If we are not copying data, we can go to next bucket and avoid
* unlocking the rcu.
*/
--
2.25.0.265.gbab2e86ba0-goog

2020-02-18 21:24:05

by Yonghong Song

[permalink] [raw]

Subject: Re: [PATCH v2 bpf] bpf: Do not grab the bucket spinlock by default on htab batch ops

On 2/18/20 9:25 AM, Brian Vazquez wrote:
> Grabbing the spinlock for every bucket even if it's empty, was causing
> significant perfomance cost when traversing htab maps that have only a
> few entries. This patch addresses the issue by checking first the
> bucket_cnt, if the bucket has some entries then we go and grab the
> spinlock and proceed with the batching.
>
> Tested with a htab of size 50K and different value of populated entries.
>
> Before:
> Benchmark Time(ns) CPU(ns)
> ---------------------------------------------
> BM_DumpHashMap/1 2759655 2752033
> BM_DumpHashMap/10 2933722 2930825
> BM_DumpHashMap/200 3171680 3170265
> BM_DumpHashMap/500 3639607 3635511
> BM_DumpHashMap/1000 4369008 4364981
> BM_DumpHashMap/5k 11171919 11134028
> BM_DumpHashMap/20k 69150080 69033496
> BM_DumpHashMap/39k 190501036 190226162
>
> After:
> Benchmark Time(ns) CPU(ns)
> ---------------------------------------------
> BM_DumpHashMap/1 202707 200109
> BM_DumpHashMap/10 213441 210569
> BM_DumpHashMap/200 478641 472350
> BM_DumpHashMap/500 980061 967102
> BM_DumpHashMap/1000 1863835 1839575
> BM_DumpHashMap/5k 8961836 8902540
> BM_DumpHashMap/20k 69761497 69322756
> BM_DumpHashMap/39k 187437830 186551111
>
> Fixes: 057996380a42 ("bpf: Add batch ops to all htab bpf map")
> Cc: Yonghong Song <[email protected]>
> Signed-off-by: Brian Vazquez <[email protected]>

Thanks for the fix.
Acked-by: Yonghong Song <[email protected]>

> ---
> v1 -> v2: Skip hlist_nulls_for_each_entry_safe if lock is not held
>
> kernel/bpf/hashtab.c | 22 +++++++++++++++++++++-
> 1 file changed, 21 insertions(+), 1 deletion(-)

2020-02-19 23:12:56

by Alexei Starovoitov

[permalink] [raw]

Subject: Re: [PATCH v2 bpf] bpf: Do not grab the bucket spinlock by default on htab batch ops

On Tue, Feb 18, 2020 at 1:23 PM Yonghong Song <[email protected]> wrote:
>
>
>
> On 2/18/20 9:25 AM, Brian Vazquez wrote:
> > Grabbing the spinlock for every bucket even if it's empty, was causing
> > significant perfomance cost when traversing htab maps that have only a
> > few entries. This patch addresses the issue by checking first the
> > bucket_cnt, if the bucket has some entries then we go and grab the
> > spinlock and proceed with the batching.
> >
> > Tested with a htab of size 50K and different value of populated entries.
> >
> > Before:
> > Benchmark Time(ns) CPU(ns)
> > ---------------------------------------------
> > BM_DumpHashMap/1 2759655 2752033
> > BM_DumpHashMap/10 2933722 2930825
> > BM_DumpHashMap/200 3171680 3170265
> > BM_DumpHashMap/500 3639607 3635511
> > BM_DumpHashMap/1000 4369008 4364981
> > BM_DumpHashMap/5k 11171919 11134028
> > BM_DumpHashMap/20k 69150080 69033496
> > BM_DumpHashMap/39k 190501036 190226162
> >
> > After:
> > Benchmark Time(ns) CPU(ns)
> > ---------------------------------------------
> > BM_DumpHashMap/1 202707 200109
> > BM_DumpHashMap/10 213441 210569
> > BM_DumpHashMap/200 478641 472350
> > BM_DumpHashMap/500 980061 967102
> > BM_DumpHashMap/1000 1863835 1839575
> > BM_DumpHashMap/5k 8961836 8902540
> > BM_DumpHashMap/20k 69761497 69322756
> > BM_DumpHashMap/39k 187437830 186551111
> >
> > Fixes: 057996380a42 ("bpf: Add batch ops to all htab bpf map")
> > Cc: Yonghong Song <[email protected]>
> > Signed-off-by: Brian Vazquez <[email protected]>
>
> Thanks for the fix.
> Acked-by: Yonghong Song <[email protected]>

Applied. Thanks