Hello everyone,
We have prepared patches to address an issue from a previous discussion.
The previous discussion email thread is here: https://lore.kernel.org/all/CAADnVQLBt0snxv4bKwg1WKQ9wDFbaDCtZ03v1-LjOTYtsKPckQ@mail.gmail.com/
This patch series adds a new field "used_entries" to struct bpf_map_info
and keeps tracking the "count" field in bpf_htab in both the preallocated
and non-preallocated cases.
bpftool is modified to report the newly added "used_entries" field in
struct bpf_map_info and to mark pre-allocated htab maps with "*".
These make it easier to view the current memory situation of a hashmap.
We have added a new interface function map_get_used_elem in bpf_map_ops
to provide an abstraction layer so that other map type implementations can
support the "used_entries" attribute in a future change.
A concurrency testing for pre-allocated and dynamically allocated
htab maps is introduced to test the correctness and performance of
htab map's used size.
Existing unit tests are integrated to test the correctness of
htab map's used size.
Thank you,
Ho-Ren (Jack) Chuang (4):
bpf: Support reporting BPF htab map's used size for monitoring
bpftool: Add tools support to show BPF htab map's used size
samples/bpf: Add concurrency testing for BPF htab map's used size
selftests/bpf: Add unit tests for BPF htab map's used size
include/linux/bpf.h | 1 +
include/uapi/linux/bpf.h | 1 +
kernel/bpf/hashtab.c | 19 +++
kernel/bpf/syscall.c | 2 +
samples/bpf/Makefile | 4 +
samples/bpf/test_map_used_kern.c | 65 ++++++++
samples/bpf/test_map_used_user.c | 204 ++++++++++++++++++++++++
tools/bpf/bpftool/map.c | 9 +-
tools/include/uapi/linux/bpf.h | 1 +
tools/testing/selftests/bpf/test_maps.c | 74 ++++++++-
10 files changed, 377 insertions(+), 3 deletions(-)
create mode 100644 samples/bpf/test_map_used_kern.c
create mode 100644 samples/bpf/test_map_used_user.c
--
Ho-Ren (Jack) Chuang
On Fri, Nov 4, 2022 at 7:52 PM Ho-Ren (Jack) Chuang
<[email protected]> wrote:
>
> Hello everyone,
>
> We have prepared patches to address an issue from a previous discussion.
> The previous discussion email thread is here: https://lore.kernel.org/all/CAADnVQLBt0snxv4bKwg1WKQ9wDFbaDCtZ03v1-LjOTYtsKPckQ@mail.gmail.com/
Rephrasing what was said earlier.
We're not keeping the count of elements in a preallocated hash map
and we are not going to add one.
The bpf prog needs to do the accounting on its own if it needs
this kind of statistics.
Keeping the count for non-prealloc is already significant performance
overhead. We don't trade performance for stats.
Hi Alexei,
We understand the concern on added performance overhead. We had some
discussion about this while working on the patch and decided to give
it a try (my bad).
Adding some more context. We are leveraging the BPF_OBJ_GET_INFO_BY_FD
syscall to trace CPU usage per prog and memory usage per map. We would
like to use this patch to add an interface for map types to return its
internal "count". For instance, we are thinking of having the below
map types to report the "count" and those won't add overhead to the
hot path.
1. ringbuf to return its "count" by calculating the distance between
producer_pos and consumer_pos
2. queue and stack to return its "count" from the head's position
3. dev map hash to returns its "count" from items
There are other map types, similar to the hashtab pre-allocation case,
will introduce overhead in the hot path in order to count the stats. I
think we can find alternative solutions for those (eg, iterate the map
and count, count only if bpf_stats_enabled switch is on, etc). There
are cases where this can't be done at the application level because
applications don't see the internal stats in order to do the right
counting.
We can remove the counting for the pre-allocated case in this patch.
Please let us know what you think.
Thanks, Hao
On Sat, Nov 5, 2022 at 9:20 AM Alexei Starovoitov
<[email protected]> wrote:
>
> On Fri, Nov 4, 2022 at 7:52 PM Ho-Ren (Jack) Chuang
> <[email protected]> wrote:
> >
> > Hello everyone,
> >
> > We have prepared patches to address an issue from a previous discussion.
> > The previous discussion email thread is here: https://lore.kernel.org/all/CAADnVQLBt0snxv4bKwg1WKQ9wDFbaDCtZ03v1-LjOTYtsKPckQ@mail.gmail.com/
>
> Rephrasing what was said earlier.
> We're not keeping the count of elements in a preallocated hash map
> and we are not going to add one.
> The bpf prog needs to do the accounting on its own if it needs
> this kind of statistics.
> Keeping the count for non-prealloc is already significant performance
> overhead. We don't trade performance for stats.
Hi Alexei, we can use the existing switch bpf_stats_enabled around the
added overhead. The switch is turned off by default so I believe there
will be no extra overhead once we do that. Can you please have a
second thought on this?
On Mon, Nov 7, 2022 at 4:30 PM Hao Xiang . <[email protected]> wrote:
>
> Hi Alexei,
>
> We understand the concern on added performance overhead. We had some
> discussion about this while working on the patch and decided to give
> it a try (my bad).
>
> Adding some more context. We are leveraging the BPF_OBJ_GET_INFO_BY_FD
> syscall to trace CPU usage per prog and memory usage per map. We would
> like to use this patch to add an interface for map types to return its
> internal "count". For instance, we are thinking of having the below
> map types to report the "count" and those won't add overhead to the
> hot path.
> 1. ringbuf to return its "count" by calculating the distance between
> producer_pos and consumer_pos
> 2. queue and stack to return its "count" from the head's position
> 3. dev map hash to returns its "count" from items
>
> There are other map types, similar to the hashtab pre-allocation case,
> will introduce overhead in the hot path in order to count the stats. I
> think we can find alternative solutions for those (eg, iterate the map
> and count, count only if bpf_stats_enabled switch is on, etc). There
> are cases where this can't be done at the application level because
> applications don't see the internal stats in order to do the right
> counting.
>
> We can remove the counting for the pre-allocated case in this patch.
> Please let us know what you think.
>
> Thanks, Hao
>
> On Sat, Nov 5, 2022 at 9:20 AM Alexei Starovoitov
> <[email protected]> wrote:
> >
> > On Fri, Nov 4, 2022 at 7:52 PM Ho-Ren (Jack) Chuang
> > <[email protected]> wrote:
> > >
> > > Hello everyone,
> > >
> > > We have prepared patches to address an issue from a previous discussion.
> > > The previous discussion email thread is here: https://lore.kernel.org/all/CAADnVQLBt0snxv4bKwg1WKQ9wDFbaDCtZ03v1-LjOTYtsKPckQ@mail.gmail.com/
> >
> > Rephrasing what was said earlier.
> > We're not keeping the count of elements in a preallocated hash map
> > and we are not going to add one.
> > The bpf prog needs to do the accounting on its own if it needs
> > this kind of statistics.
> > Keeping the count for non-prealloc is already significant performance
> > overhead. We don't trade performance for stats.