LinuxLists.cc - [PATCH v2 bpf-next] libbpf: Fix signed overflow in ringbuf_process

2021-04-29 13:08:32

Subject: [PATCH v2 bpf-next] libbpf: Fix signed overflow in ringbuf_process_ring

One of our benchmarks running in (Google-internal) CI pushes data
through the ringbuf faster htan than userspace is able to consume
it. In this case it seems we're actually able to get >INT_MAX entries
in a single ringbuf_buffer__consume call. ASAN detected that cnt
overflows in this case.

Fix by using 64-bit counter internally and then capping the result to
INT_MAX before converting to the int return type.

Fixes: bf99c936f947 (libbpf: Add BPF ring buffer support)
Signed-off-by: Brendan Jackman <[email protected]>
---

diff v1->v2: Now we don't break the loop at INT_MAX, we just cap the reported
entry count.

Note: I feel a bit guilty about the fact that this makes the reader
think about implicit conversions. Nobody likes thinking about that.

But explicit casts don't really help with clarity:

return (int)min(cnt, (int64_t)INT_MAX); // ugh

shrug..

tools/lib/bpf/ringbuf.c | 10 ++++++----
1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/tools/lib/bpf/ringbuf.c b/tools/lib/bpf/ringbuf.c
index e7a8d847161f..2e114c2d0047 100644
--- a/tools/lib/bpf/ringbuf.c
+++ b/tools/lib/bpf/ringbuf.c
@@ -204,7 +204,9 @@ static inline int roundup_len(__u32 len)

static int ringbuf_process_ring(struct ring* r)
{
- int *len_ptr, len, err, cnt = 0;
+ int *len_ptr, len, err;
+ /* 64-bit to avoid overflow in case of extreme application behavior */
+ int64_t cnt = 0;
unsigned long cons_pos, prod_pos;
bool got_new_data;
void *sample;
@@ -240,7 +242,7 @@ static int ringbuf_process_ring(struct ring* r)
}
} while (got_new_data);
done:
- return cnt;
+ return min(cnt, INT_MAX);
}

/* Consume available ring buffer(s) data without event polling.
@@ -263,8 +265,8 @@ int ring_buffer__consume(struct ring_buffer *rb)
}

/* Poll for available data and consume records, if any are available.
- * Returns number of records consumed, or negative number, if any of the
- * registered callbacks returned error.
+ * Returns number of records consumed (or INT_MAX, whichever is less), or
+ * negative number, if any of the registered callbacks returned error.
*/
int ring_buffer__poll(struct ring_buffer *rb, int timeout_ms)
{
--
2.31.1.498.g6c1eba8ee3d-goog

2021-04-30 16:34:04

by Andrii Nakryiko

[permalink] [raw]

Subject: Re: [PATCH v2 bpf-next] libbpf: Fix signed overflow in ringbuf_process_ring

On Thu, Apr 29, 2021 at 6:05 AM Brendan Jackman <[email protected]> wrote:
>
> One of our benchmarks running in (Google-internal) CI pushes data
> through the ringbuf faster htan than userspace is able to consume
> it. In this case it seems we're actually able to get >INT_MAX entries
> in a single ringbuf_buffer__consume call. ASAN detected that cnt
> overflows in this case.
>
> Fix by using 64-bit counter internally and then capping the result to
> INT_MAX before converting to the int return type.
>
> Fixes: bf99c936f947 (libbpf: Add BPF ring buffer support)
> Signed-off-by: Brendan Jackman <[email protected]>
> ---
>
> diff v1->v2: Now we don't break the loop at INT_MAX, we just cap the reported
> entry count.
>
> Note: I feel a bit guilty about the fact that this makes the reader
> think about implicit conversions. Nobody likes thinking about that.
>
> But explicit casts don't really help with clarity:
>
> return (int)min(cnt, (int64_t)INT_MAX); // ugh
>

I'd go with

if (cnt > INT_MAX)
return INT_MAX;

return cnt;

If you don't mind, I can patch it up while applying?

> shrug..
>
> tools/lib/bpf/ringbuf.c | 10 ++++++----
> 1 file changed, 6 insertions(+), 4 deletions(-)
>
> diff --git a/tools/lib/bpf/ringbuf.c b/tools/lib/bpf/ringbuf.c
> index e7a8d847161f..2e114c2d0047 100644
> --- a/tools/lib/bpf/ringbuf.c
> +++ b/tools/lib/bpf/ringbuf.c
> @@ -204,7 +204,9 @@ static inline int roundup_len(__u32 len)
>
> static int ringbuf_process_ring(struct ring* r)
> {
> - int *len_ptr, len, err, cnt = 0;
> + int *len_ptr, len, err;
> + /* 64-bit to avoid overflow in case of extreme application behavior */
> + int64_t cnt = 0;
> unsigned long cons_pos, prod_pos;
> bool got_new_data;
> void *sample;
> @@ -240,7 +242,7 @@ static int ringbuf_process_ring(struct ring* r)
> }
> } while (got_new_data);
> done:
> - return cnt;
> + return min(cnt, INT_MAX);
> }
>
> /* Consume available ring buffer(s) data without event polling.
> @@ -263,8 +265,8 @@ int ring_buffer__consume(struct ring_buffer *rb)
> }
>
> /* Poll for available data and consume records, if any are available.
> - * Returns number of records consumed, or negative number, if any of the
> - * registered callbacks returned error.
> + * Returns number of records consumed (or INT_MAX, whichever is less), or
> + * negative number, if any of the registered callbacks returned error.
> */
> int ring_buffer__poll(struct ring_buffer *rb, int timeout_ms)
> {
> --
> 2.31.1.498.g6c1eba8ee3d-goog
>

2021-05-03 13:38:17

by Brendan Jackman

[permalink] [raw]

Subject: Re: [PATCH v2 bpf-next] libbpf: Fix signed overflow in ringbuf_process_ring

On Fri, 30 Apr 2021 at 18:31, Andrii Nakryiko <[email protected]> wrote:
>
> On Thu, Apr 29, 2021 at 6:05 AM Brendan Jackman <[email protected]> wrote:

> > Note: I feel a bit guilty about the fact that this makes the reader
> > think about implicit conversions. Nobody likes thinking about that.
> >
> > But explicit casts don't really help with clarity:
> >
> > return (int)min(cnt, (int64_t)INT_MAX); // ugh
> >
>
> I'd go with
>
> if (cnt > INT_MAX)
> return INT_MAX;
>
> return cnt;

Sure, it has all the same implicit casts/promotions but I guess it's
clearer anyway.

> If you don't mind, I can patch it up while applying?

Yes please do, thanks!

2021-05-03 17:47:10

by Andrii Nakryiko

[permalink] [raw]

Subject: Re: [PATCH v2 bpf-next] libbpf: Fix signed overflow in ringbuf_process_ring

On Mon, May 3, 2021 at 5:01 AM Brendan Jackman <[email protected]> wrote:
>
> On Fri, 30 Apr 2021 at 18:31, Andrii Nakryiko <[email protected]> wrote:
> >
> > On Thu, Apr 29, 2021 at 6:05 AM Brendan Jackman <[email protected]> wrote:
>
> > > Note: I feel a bit guilty about the fact that this makes the reader
> > > think about implicit conversions. Nobody likes thinking about that.
> > >
> > > But explicit casts don't really help with clarity:
> > >
> > > return (int)min(cnt, (int64_t)INT_MAX); // ugh
> > >
> >
> > I'd go with
> >
> > if (cnt > INT_MAX)
> > return INT_MAX;
> >
> > return cnt;
>
> Sure, it has all the same implicit casts/promotions but I guess it's
> clearer anyway.

I might be wrong, but given INT_MAX is defined as

# define INT_MAX 2147483647

(notice no suffix specifying which type it is), this constant will be
interpreted by compiler as desired type in the given context. So in

if (cnt > INT_MAX)

INT_MAX should be a uint64_t constant. But even if not, it is
up-converted to int64_t with no loss anyway.

>
> > If you don't mind, I can patch it up while applying?
>
> Yes please do, thanks!

So while doing that I noticed that you didn't fix ring_buffer__poll(),
so I had to fix it up a bit more extensively. Please check the end
result in bpf tree and let me know if there are any problems with it:

2a30f9440640 ("libbpf: Fix signed overflow in ringbuf_process_ring")

2021-05-04 09:19:58

by Brendan Jackman

[permalink] [raw]

Subject: Re: [PATCH v2 bpf-next] libbpf: Fix signed overflow in ringbuf_process_ring

On Mon, 3 May 2021 at 19:46, Andrii Nakryiko <[email protected]> wrote:
>
> On Mon, May 3, 2021 at 5:01 AM Brendan Jackman <[email protected]> wrote:
> >
> > On Fri, 30 Apr 2021 at 18:31, Andrii Nakryiko <[email protected]> wrote:

> So while doing that I noticed that you didn't fix ring_buffer__poll(),
> so I had to fix it up a bit more extensively. Please check the end
> result in bpf tree and let me know if there are any problems with it:
>
> 2a30f9440640 ("libbpf: Fix signed overflow in ringbuf_process_ring")

Ah, thanks for that. Yep, the additional fix looks good to me.

I think it actually fixes another very niche issue:

int ring_buffer__poll(struct ring_buffer *rb, int timeout_ms)
{
- int i, cnt, err, res = 0;
+ int i, cnt;
+ int64_t err, res = 0;

cnt = epoll_wait(rb->epoll_fd, rb->events, rb->ring_cnt, timeout_ms);
+ if (cnt < 0)
+ return -errno;
+
for (i = 0; i < cnt; i++) {
__u32 ring_id = rb->events[i].data.fd;
struct ring *ring = &rb->rings[ring_id];
@@ -280,7 +290,9 @@ int ring_buffer__poll(struct ring_buffer *rb, int
timeout_ms)
return err;
res += err;
}
- return cnt < 0 ? -errno : res;

If the callback returns an error but errno is 0 this fails to report the error.

errno(3) says "the value of errno is never set to zero by any system
call or library function" but then describes a scenario where an
application might usefully set it to zero itself. Maybe it can also be
0 in new threads, depending on your metaphysical interpretation of "by
a system call or library function".

+ if (res > INT_MAX)
+ return INT_MAX;
+ return res;

2021-05-05 00:41:41

by Andrii Nakryiko

[permalink] [raw]

Subject: Re: [PATCH v2 bpf-next] libbpf: Fix signed overflow in ringbuf_process_ring

On Tue, May 4, 2021 at 2:01 AM Brendan Jackman <[email protected]> wrote:
>
> On Mon, 3 May 2021 at 19:46, Andrii Nakryiko <[email protected]> wrote:
> >
> > On Mon, May 3, 2021 at 5:01 AM Brendan Jackman <[email protected]> wrote:
> > >
> > > On Fri, 30 Apr 2021 at 18:31, Andrii Nakryiko <[email protected]> wrote:
>
> > So while doing that I noticed that you didn't fix ring_buffer__poll(),
> > so I had to fix it up a bit more extensively. Please check the end
> > result in bpf tree and let me know if there are any problems with it:
> >
> > 2a30f9440640 ("libbpf: Fix signed overflow in ringbuf_process_ring")
>
> Ah, thanks for that. Yep, the additional fix looks good to me.
>
> I think it actually fixes another very niche issue:
>
> int ring_buffer__poll(struct ring_buffer *rb, int timeout_ms)
> {
> - int i, cnt, err, res = 0;
> + int i, cnt;
> + int64_t err, res = 0;
>
> cnt = epoll_wait(rb->epoll_fd, rb->events, rb->ring_cnt, timeout_ms);
> + if (cnt < 0)
> + return -errno;
> +
> for (i = 0; i < cnt; i++) {
> __u32 ring_id = rb->events[i].data.fd;
> struct ring *ring = &rb->rings[ring_id];
> @@ -280,7 +290,9 @@ int ring_buffer__poll(struct ring_buffer *rb, int
> timeout_ms)
> return err;
> res += err;
> }
> - return cnt < 0 ? -errno : res;
>
> If the callback returns an error but errno is 0 this fails to report the error.

Yeah, there was no need to be clever about that. Explicit if (cnt < 0)
check is obvious and correct.

>
> errno(3) says "the value of errno is never set to zero by any system
> call or library function" but then describes a scenario where an
> application might usefully set it to zero itself. Maybe it can also be
> 0 in new threads, depending on your metaphysical interpretation of "by
> a system call or library function".
>
> + if (res > INT_MAX)
> + return INT_MAX;
> + return res;