2021-08-30 14:18:04

by Etienne Martineau

[permalink] [raw]
Subject: Question related to ( commit 9f691549f76d "bpf: fix struct htab_elem layout" )

Hi,

I've been staring at this commit for some time and I wonder what were the
symptoms when the issue was reproduced?
"The bug was discovered by manual code analysis and reproducible
only with explicit udelay() in lookup_elem_raw()."

I tried various stress test + timing combinations in lookup_elem_raw() but no
luck.

I believe that one of our production boxes ran into that issue lately with a GPF
in the area of htab_map_lookup_elem(). The crash was seen on an outdated
4.9 stable.

Please CC me as I'm not on the list.

thanks in advance,
Etienne


2021-08-30 16:40:23

by Alexei Starovoitov

[permalink] [raw]
Subject: Re: Question related to ( commit 9f691549f76d "bpf: fix struct htab_elem layout" )

On Mon, Aug 30, 2021 at 7:17 AM Etienne Martineau <[email protected]> wrote:
>
> Hi,
>
> I've been staring at this commit for some time and I wonder what were the
> symptoms when the issue was reproduced?
> "The bug was discovered by manual code analysis and reproducible
> only with explicit udelay() in lookup_elem_raw()."
>
> I tried various stress test + timing combinations in lookup_elem_raw() but no
> luck.

That fix was a long time ago :)
afair the issue will not look like a crash, but rather an element
will not be found.
That's what lookup_nulls_elem_raw() is fixing.

> I believe that one of our production boxes ran into that issue lately with a GPF
> in the area of htab_map_lookup_elem(). The crash was seen on an outdated
> 4.9 stable.

Would be great if you can reproduce it on the latest kernel.

2021-08-30 17:42:15

by Etienne Martineau

[permalink] [raw]
Subject: Re: Question related to ( commit 9f691549f76d "bpf: fix struct htab_elem layout" )

On Mon, Aug 30, 2021 at 12:39 PM Alexei Starovoitov
<[email protected]> wrote:
>
> On Mon, Aug 30, 2021 at 7:17 AM Etienne Martineau <[email protected]> wrote:
> >
> > Hi,
> >
> > I've been staring at this commit for some time and I wonder what were the
> > symptoms when the issue was reproduced?
> > "The bug was discovered by manual code analysis and reproducible
> > only with explicit udelay() in lookup_elem_raw()."
> >
> > I tried various stress test + timing combinations in lookup_elem_raw() but no
> > luck.
>
> That fix was a long time ago :)
> afair the issue will not look like a crash, but rather an element
> will not be found.
> That's what lookup_nulls_elem_raw() is fixing.

Under that same scenario I wonder if it's also possible to have a
messed up element somehow?

>
> > I believe that one of our production boxes ran into that issue lately with a GPF
> > in the area of htab_map_lookup_elem(). The crash was seen on an outdated
> > 4.9 stable.
>
> Would be great if you can reproduce it on the latest kernel.

We have another deployment on 5.4 stable running the same bpf code so
will let you know.