Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18;
MIME-Version: 1.0
References: <20200801045722.877331-1-brianvv@google.com> <4cd10805-b056-21a7-fdc2-d3f66e94dcf6@fb.com>
In-Reply-To: <4cd10805-b056-21a7-fdc2-d3f66e94dcf6@fb.com>
From:   Brian Vazquez <brianvv@google.com>
Date:   Sat, 1 Aug 2020 11:11:35 -0700
Message-ID: <CAMzD94TUmUGwDLktpLm-MMXqOE0QX_4acV9mtjU47+f1AMDntQ@mail.gmail.com>
Subject: Re: [PATCH bpf-next] bpf: make __htab_lookup_and_delete_batch faster
 when map is almost empty
To:     Yonghong Song <yhs@fb.com>
Cc:     Brian Vazquez <brianvv.kernel@gmail.com>,
        Alexei Starovoitov <ast@kernel.org>,
        Daniel Borkmann <daniel@iogearbox.net>,
        "David S . Miller" <davem@davemloft.net>,
        open list <linux-kernel@vger.kernel.org>,
        Linux NetDev <netdev@vger.kernel.org>,
        bpf <bpf@vger.kernel.org>, Luigi Rizzo <lrizzo@google.com>
Content-Type: text/plain; charset="UTF-8"
Sender: linux-kernel-owner@vger.kernel.org
Precedence: bulk

On Sat, Aug 1, 2020 at 9:59 AM Yonghong Song <yhs@fb.com> wrote:
>
>
>
> On 7/31/20 9:57 PM, Brian Vazquez wrote:
> > While running some experiments it was observed that map_lookup_batch was much
> > slower than get_next_key + lookup when the syscall overhead is minimal.
> > This was because the map_lookup_batch implementation was more expensive
> > traversing empty buckets, this can be really costly when the pre-allocated
> > map is too big.
> >
> > This patch optimizes the case when the bucket is empty so we can move quickly
> > to next bucket.
> >
> > The benchmark to exercise this is as follows:
> >
> > -The map was populate with a single entry to make sure that the syscall overhead
> > is not helping the map_batch_lookup.
> > -The size of the preallocated map was increased to show the effect of
> > traversing empty buckets.
> >
> > Results:
> >
> >    Using get_next_key + lookup:
> >
> >    Benchmark                Time(ns)        CPU(ns)     Iteration
> >    ---------------------------------------------------------------
> >    BM_DumpHashMap/1/1k          3593           3586         192680
> >    BM_DumpHashMap/1/4k          6004           5972         100000
> >    BM_DumpHashMap/1/16k        15755          15710          44341
> >    BM_DumpHashMap/1/64k        59525          59376          10000
>
> I think "BM_DumpHashMap/1/64k" means the program "BM_DumpHashMap",
> the map having only "1" entry, and the map preallocated size is "64k"?
> What is the "Iteration" here? The number of runs with the same dump?
> The CPU(ns) is the system cpu consumption, right? The Time/CPU is for
> all iterations, not just one, right? It would be good
> if the above results can be described better, so people can
> understand the results better.
>

Hi Yonghong, thanks for reviewing it!

I'll fix it in next iteration.
> >
> >    Using htab_lookup_batch before this patch:
> >    Benchmark                Time(ns)        CPU(ns)     Iterations
> >    ---------------------------------------------------------------
> >    BM_DumpHashMap/1/1k          3933           3927         177978
> >    BM_DumpHashMap/1/4k          9192           9177          73951
> >    BM_DumpHashMap/1/16k        42011          41970          16789
> >    BM_DumpHashMap/1/64k       117895         117661           6135
> >
> >    Using htab_lookup_batch with this patch:
> >    Benchmark                Time(ns)        CPU(ns)     Iterations
> >    ---------------------------------------------------------------
> >    BM_DumpHashMap/1/1k          2809           2803         249212
> >    BM_DumpHashMap/1/4k          5318           5316         100000
> >    BM_DumpHashMap/1/16k        14925          14895          47448
> >    BM_DumpHashMap/1/64k        58870          58674          10000
> >
> > Suggested-by: Luigi Rizzo <lrizzo@google.com>
> > Cc: Yonghong Song <yhs@fb.com>
> > Signed-off-by: Brian Vazquez <brianvv@google.com>
> > ---
> >   kernel/bpf/hashtab.c | 23 ++++++++---------------
> >   1 file changed, 8 insertions(+), 15 deletions(-)
> >
> > diff --git a/kernel/bpf/hashtab.c b/kernel/bpf/hashtab.c
> > index 2137e2200d95..150015ea6737 100644
> > --- a/kernel/bpf/hashtab.c
> > +++ b/kernel/bpf/hashtab.c
> > @@ -1351,7 +1351,6 @@ __htab_map_lookup_and_delete_batch(struct bpf_map *map,
> >       struct hlist_nulls_head *head;
> >       struct hlist_nulls_node *n;
> >       unsigned long flags = 0;
> > -     bool locked = false;
> >       struct htab_elem *l;
> >       struct bucket *b;
> >       int ret = 0;
> > @@ -1410,19 +1409,19 @@ __htab_map_lookup_and_delete_batch(struct bpf_map *map,
> >       dst_val = values;
> >       b = &htab->buckets[batch];
> >       head = &b->head;
> > -     /* do not grab the lock unless need it (bucket_cnt > 0). */
> > -     if (locked)
> > -             flags = htab_lock_bucket(htab, b);
> >
> > +     l = hlist_nulls_entry_safe(rcu_dereference_raw(hlist_nulls_first_rcu(head)),
> > +                                     struct htab_elem, hash_node);
> > +     if (!l && (batch + 1 < htab->n_buckets)) {
> > +             batch++;
> > +             goto again_nocopy;
> > +     }
> > +
> > +     flags = htab_lock_bucket(htab, b);
> [...]