Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754017AbcCHF53 (ORCPT ); Tue, 8 Mar 2016 00:57:29 -0500 Received: from mx0a-00082601.pphosted.com ([67.231.145.42]:51321 "EHLO mx0a-00082601.pphosted.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753783AbcCHF5X (ORCPT ); Tue, 8 Mar 2016 00:57:23 -0500 From: Alexei Starovoitov To: "David S . Miller" CC: Daniel Borkmann , Daniel Wagner , Tom Zanussi , Wang Nan , He Kuang , Martin KaFai Lau , Brendan Gregg , , , Subject: [PATCH v2 net-next 0/12] bpf: map pre-alloc Date: Mon, 7 Mar 2016 21:57:12 -0800 Message-ID: <1457416641-306326-1-git-send-email-ast@fb.com> X-Mailer: git-send-email 2.8.0.rc1 X-FB-Internal: Safe MIME-Version: 1.0 Content-Type: text/plain X-Proofpoint-Spam-Reason: safe X-FB-Internal: Safe X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10432:,, definitions=2016-03-08_04:,, signatures=0 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 3906 Lines: 88 v1->v2: . fix few issues spotted by Daniel . converted stackmap into pre-allocation as well . added a workaround for lockdep false positive . added pcpu_freelist_populate to be used by hashmap and stackmap this path set switches bpf hash map to use pre-allocation by default and introduces BPF_F_NO_PREALLOC flag to keep old behavior for cases where full map pre-allocation is too memory expensive. Some time back Daniel Wagner reported crashes when bpf hash map is used to compute time intervals between preempt_disable->preempt_enable and recently Tom Zanussi reported a dead lock in iovisor/bcc/funccount tool if it's used to count the number of invocations of kernel '*spin*' functions. Both problems are due to the recursive use of slub and can only be solved by pre-allocating all map elements. A lot of different solutions were considered. Many implemented, but at the end pre-allocation seems to be the only feasible answer. As far as pre-allocation goes it also was implemented 4 different ways: - simple free-list with single lock - percpu_ida with optimizations - blk-mq-tag variant customized for bpf use case - percpu_freelist For bpf style of alloc/free patterns percpu_freelist is the best and implemented in this patch set. Detailed performance numbers in patch 3. Patch 2 introduces percpu_freelist Patch 1 fixes simple deadlocks due to missing recursion checks Patch 5: converts stackmap to pre-allocation Patches 6-9: prepare test infra Patch 10: stress test for hash map infra. It attaches to spin_lock functions and bpf_map_update/delete are called from different contexts Patch 11: stress for bpf_get_stackid Patch 12: map performance test Reported-by: Daniel Wagner Reported-by: Tom Zanussi Alexei Starovoitov (12): bpf: prevent kprobe+bpf deadlocks bpf: introduce percpu_freelist bpf: pre-allocate hash map elements bpf: check for reserved flag bits in array and stack maps bpf: convert stackmap to pre-allocation samples/bpf: make map creation more verbose samples/bpf: move ksym_search() into library samples/bpf: add map_flags to bpf loader samples/bpf: test both pre-alloc and normal maps samples/bpf: add bpf map stress test samples/bpf: stress test bpf_get_stackid samples/bpf: add map performance test include/linux/bpf.h | 6 + include/uapi/linux/bpf.h | 3 + kernel/bpf/Makefile | 2 +- kernel/bpf/arraymap.c | 2 +- kernel/bpf/hashtab.c | 240 +++++++++++++++++++++++++++------------ kernel/bpf/percpu_freelist.c | 100 ++++++++++++++++ kernel/bpf/percpu_freelist.h | 31 +++++ kernel/bpf/stackmap.c | 89 ++++++++++++--- kernel/bpf/syscall.c | 30 ++++- kernel/trace/bpf_trace.c | 2 - samples/bpf/Makefile | 8 ++ samples/bpf/bpf_helpers.h | 1 + samples/bpf/bpf_load.c | 70 +++++++++++- samples/bpf/bpf_load.h | 6 + samples/bpf/fds_example.c | 2 +- samples/bpf/libbpf.c | 5 +- samples/bpf/libbpf.h | 2 +- samples/bpf/map_perf_test_kern.c | 100 ++++++++++++++++ samples/bpf/map_perf_test_user.c | 155 +++++++++++++++++++++++++ samples/bpf/offwaketime_user.c | 67 +---------- samples/bpf/sock_example.c | 2 +- samples/bpf/spintest_kern.c | 68 +++++++++++ samples/bpf/spintest_user.c | 50 ++++++++ samples/bpf/test_maps.c | 29 +++-- samples/bpf/test_verifier.c | 4 +- 25 files changed, 895 insertions(+), 179 deletions(-) create mode 100644 kernel/bpf/percpu_freelist.c create mode 100644 kernel/bpf/percpu_freelist.h create mode 100644 samples/bpf/map_perf_test_kern.c create mode 100644 samples/bpf/map_perf_test_user.c create mode 100644 samples/bpf/spintest_kern.c create mode 100644 samples/bpf/spintest_user.c -- 2.8.0.rc1