On 02/09/2018 06:14 AM, Li Zhijian wrote:
> Hi
>
> INTEL 0-Day noticed that bpf/test_maps has different results at different platforms.
> when it fails, the details are like
Sorry for the late reply and thanks for reporting! More below:
> ------------------
> 880 Failed to create hashmap key=16 value=131072 'Cannot allocate memory'
> 881 Failed to create hashmap key=8 value=32768 'Cannot allocate memory'
> 882 Failed to create hashmap key=8 value=131072 'Cannot allocate memory'
> 883 Failed to create hashmap key=16 value=32768 'Cannot allocate memory'
> 884 Failed to create hashmap key=8 value=16384 'Cannot allocate memory'
> 885 Failed to create hashmap key=16 value=16384 'Cannot allocate memory'
> 886 Failed to create hashmap key=8 value=65536 'Cannot allocate memory'
> 887 Failed to create hashmap key=16 value=131072 'Cannot allocate memory'
> 888 Failed to create hashmap key=16 value=32768 'Cannot allocate memory'
> 889 Failed to create hashmap key=16 value=65536 'Cannot allocate memory'
> 890 Failed to create hashmap key=8 value=65536 'Cannot allocate memory'
> 891 Failed to create hashmap key=8 value=131072 'Cannot allocate memory'
> 892 Failed to create hashmap key=8 value=131072 'Cannot allocate memory'
> 893 Failed to create hashmap key=16 value=32768 'Cannot allocate memory'
> 894 Failed to create hashmap key=8 value=16384 'Cannot allocate memory'
> 895 Failed to create hashmap key=8 value=131072 'Cannot allocate memory'
> 896 Failed to create hashmap key=16 value=8192 'Cannot allocate memory'
> 897 Failed to create hashmap key=8 value=32768 'Cannot allocate memory'
> 898 Failed to create hashmap key=16 value=8192 'Cannot allocate memory'
> 899 Failed to create hashmap key=8 value=262144 'Cannot allocate memory'
> 900 Failed to create hashmap key=8 value=262144 'Cannot allocate memory'
> 901 Failed to create hashmap key=8 value=262144 'Cannot allocate memory'
> 902 Failed to create hashmap key=16 value=262144 'Cannot allocate memory'
> 903 Failed to create hashmap key=8 value=262144 'Cannot allocate memory'
> 904 Failed to create hashmap key=8 value=262144 'Cannot allocate memory'
> 905 test_maps: test_maps.c:955: run_parallel: Assertion `status == 0' failed.
> 906 Aborted
> 907 not ok 1..3 selftests: test_maps [FAIL]
> ------------------
>
> After a simply looking at the code, looks it's related to the cpu number and system memory.
>
> below are the result under different platform
> 1. Good
> model: Sandy Bridge
> nr_node: 1
> nr_cpu: 4
> memory: 6G
>
> 2. Good
> model: qemu-system-x86_64 -enable-kvm
> nr_cpu: 2
> memory: 4G
>
> 3. Bad
> model: Ivytown Ivy Bridge-EP
> nr_cpu: 48
> memory: 64G
>
> 4. Bad
> model: Skylake
> nr_cpu: 104
> memory: 64G
>
> I try to change the process number to 10 from 100, so it can pass at above Skylake(4) machine.
> ------------
> lizhijian@haswell-OptiPlex-9020:~/lkp/linux/tools/testing/selftests/bpf$ git diff
> diff --git a/tools/testing/selftests/bpf/test_maps.c b/tools/testing/selftests/bpf/test_maps.c
> index 040356e..b788ca1 100644
> --- a/tools/testing/selftests/bpf/test_maps.c
> +++ b/tools/testing/selftests/bpf/test_maps.c
> @@ -960,7 +960,7 @@ static void test_map_stress(void)
> {
> run_parallel(100, test_hashmap, NULL);
> run_parallel(100, test_hashmap_percpu, NULL);
> - run_parallel(100, test_hashmap_sizes, NULL);
> + run_parallel(10, test_hashmap_sizes, NULL);
> run_parallel(100, test_hashmap_walk, NULL);
>
> run_parallel(100, test_arraymap, NULL);
Unless Alexei has some better idea, I think if the bpf_create_map() error in
the stress test is about ENOMEM, then we shouldn't fail hard via exit(), for
all other cases we should however. So probably makes sense to just check for
errno == ENOMEM in case of fd < 0 in test_hashmap_sizes() and then continue
to keep trying under stress. Feel free to send a patch, Li.
Thanks again,
Daniel
On Fri, Feb 09, 2018 at 03:01:57PM +0100, Daniel Borkmann wrote:
> On 02/09/2018 06:14 AM, Li Zhijian wrote:
> > Hi
> >
> > INTEL 0-Day noticed that bpf/test_maps has different results at different platforms.
> > when it fails, the details are like
>
> Sorry for the late reply and thanks for reporting! More below:
>
> > ------------------
> > ? 880 Failed to create hashmap key=16 value=131072 'Cannot allocate memory'
> > ? 881 Failed to create hashmap key=8 value=32768 'Cannot allocate memory'
> > ? 882 Failed to create hashmap key=8 value=131072 'Cannot allocate memory'
> > ? 883 Failed to create hashmap key=16 value=32768 'Cannot allocate memory'
> > ? 884 Failed to create hashmap key=8 value=16384 'Cannot allocate memory'
> > ? 885 Failed to create hashmap key=16 value=16384 'Cannot allocate memory'
> > ? 886 Failed to create hashmap key=8 value=65536 'Cannot allocate memory'
> > ? 887 Failed to create hashmap key=16 value=131072 'Cannot allocate memory'
> > ? 888 Failed to create hashmap key=16 value=32768 'Cannot allocate memory'
> > ? 889 Failed to create hashmap key=16 value=65536 'Cannot allocate memory'
> > ? 890 Failed to create hashmap key=8 value=65536 'Cannot allocate memory'
> > ? 891 Failed to create hashmap key=8 value=131072 'Cannot allocate memory'
> > ? 892 Failed to create hashmap key=8 value=131072 'Cannot allocate memory'
> > ? 893 Failed to create hashmap key=16 value=32768 'Cannot allocate memory'
> > ? 894 Failed to create hashmap key=8 value=16384 'Cannot allocate memory'
> > ? 895 Failed to create hashmap key=8 value=131072 'Cannot allocate memory'
> > ? 896 Failed to create hashmap key=16 value=8192 'Cannot allocate memory'
> > ? 897 Failed to create hashmap key=8 value=32768 'Cannot allocate memory'
> > ? 898 Failed to create hashmap key=16 value=8192 'Cannot allocate memory'
> > ? 899 Failed to create hashmap key=8 value=262144 'Cannot allocate memory'
> > ? 900 Failed to create hashmap key=8 value=262144 'Cannot allocate memory'
> > ? 901 Failed to create hashmap key=8 value=262144 'Cannot allocate memory'
> > ? 902 Failed to create hashmap key=16 value=262144 'Cannot allocate memory'
> > ? 903 Failed to create hashmap key=8 value=262144 'Cannot allocate memory'
> > ? 904 Failed to create hashmap key=8 value=262144 'Cannot allocate memory'
> > ? 905 test_maps: test_maps.c:955: run_parallel: Assertion `status == 0' failed.
> > ? 906 Aborted
> > ? 907 not ok 1..3 selftests:? test_maps [FAIL]
> > ------------------
> >
> > After a simply looking at the code, looks it's related to the cpu number and system memory.
> >
> > below are the result under different platform
> > 1. Good
> > model: Sandy Bridge
> > nr_node: 1
> > nr_cpu: 4
> > memory: 6G
> >
> > 2. Good
> > model: qemu-system-x86_64 -enable-kvm
> > nr_cpu: 2
> > memory: 4G
> >
> > 3. Bad
> > model: Ivytown Ivy Bridge-EP
> > nr_cpu: 48
> > memory: 64G
> >
> > 4. Bad
> > model: Skylake
> > nr_cpu: 104
> > memory: 64G
> >
> > I try to change the process number to 10 from 100, so it can pass at above Skylake(4) machine.
> > ------------
> > lizhijian@haswell-OptiPlex-9020:~/lkp/linux/tools/testing/selftests/bpf$ git diff
> > diff --git a/tools/testing/selftests/bpf/test_maps.c b/tools/testing/selftests/bpf/test_maps.c
> > index 040356e..b788ca1 100644
> > --- a/tools/testing/selftests/bpf/test_maps.c
> > +++ b/tools/testing/selftests/bpf/test_maps.c
> > @@ -960,7 +960,7 @@ static void test_map_stress(void)
> > ?{
> > ??????? run_parallel(100, test_hashmap, NULL);
> > ??????? run_parallel(100, test_hashmap_percpu, NULL);
> > -?????? run_parallel(100, test_hashmap_sizes, NULL);
> > +?????? run_parallel(10, test_hashmap_sizes, NULL);
> > ??????? run_parallel(100, test_hashmap_walk, NULL);
> > ?
> > ??????? run_parallel(100, test_arraymap, NULL);
>
> Unless Alexei has some better idea, I think if the bpf_create_map() error in
> the stress test is about ENOMEM, then we shouldn't fail hard via exit(), for
> all other cases we should however. So probably makes sense to just check for
> errno == ENOMEM in case of fd < 0 in test_hashmap_sizes() and then continue
> to keep trying under stress. Feel free to send a patch, Li.
that's probably good path for now.
I also see that test_maps fails on freshly booted kernel with such assert,
but then restarting test_maps again works and repeated runs succeed too.
I suspect there is a deeper issue here related to memory allocation.
Either slab or percpu allocator are behaving funky.
It needs to be further debugged.