Received: by 2002:a05:6a10:a0d1:0:0:0:0 with SMTP id j17csp945835pxa; Sat, 1 Aug 2020 11:12:48 -0700 (PDT) X-Google-Smtp-Source: ABdhPJwU6mC2odTC4PPi8So9IDxr6T5hPMF5wfosq6z+KgT5+PnPHFJMNZOR/2OeF+Nc17YDFZAd X-Received: by 2002:a17:906:c7d2:: with SMTP id dc18mr9944508ejb.135.1596305568471; Sat, 01 Aug 2020 11:12:48 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1596305568; cv=none; d=google.com; s=arc-20160816; b=gDB7/4GWqGWMjujnCM8RU9p9J+cag3OJFMPbcjOzkx7Jw9cdkJfm8/A1dh00ToRDcd tteb96rmTi8kio+CeJXNXaaeWhIx85bt4nNOYZTD8yUwFvLsVktr7iDTumZD4zuB9UcH cpnhTC5q60zzwYkchB3aoiq9MR6GGjDIav1gVkXl7CPptUWA8d1ngNmfpbcryDOHzzwd 5H1LbZKDFj6D8+GqKxwpl/TShc3txQCe56K8wxPmEZ2htGusjR2cuIs55TY75g0tk8WE CAnAn9YR9fgIDz3PaMalKoFp0Azcn77e6QY/08GN7DLmVFXch7FOVrzLIXCeMMTcUikf TXow== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:cc:to:from:subject:mime-version :message-id:date:dkim-signature; bh=KBK3iODiHqCfZW9cfD4X0Y+4sS0bjVew+4Qx1E5a8fw=; b=hCLoiNz5NL3decMJG7k0w/xQEx0kvrMG6ezMKYLWaFCswc76y0dHu6ndKldjj5jUPi G3pRQw7jFFzxPx/RI7WCjQNQqlmPgH65Cs1CqBB8Uu5ayN7JIgGKGSCPegkzpbm5JNhk tyowOtQ7Oz0IV3qif51vlRIWJ3sSFWeHR2BRlXQXDk0DOGzhHCj7IjRIYoXshzvLmwir S1nRv3WD3pgssY7MkjOtJp6L3B9IvuO8V9fTWCIQF+CRd7tezITx5U2A0MmpaQQRioUF c447F9g/eCkR9idHiEDlxyAlYWwAozqSfTttWUxfDb6cu/hJV8xNXjo49BcrqyE+wij4 wKfg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20161025 header.b=Npllj9n2; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id rl7si7007213ejb.264.2020.08.01.11.12.26; Sat, 01 Aug 2020 11:12:48 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20161025 header.b=Npllj9n2; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726947AbgHASJn (ORCPT + 99 others); Sat, 1 Aug 2020 14:09:43 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:50340 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725803AbgHASJm (ORCPT ); Sat, 1 Aug 2020 14:09:42 -0400 Received: from mail-pl1-x64a.google.com (mail-pl1-x64a.google.com [IPv6:2607:f8b0:4864:20::64a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 8C83EC06174A for ; Sat, 1 Aug 2020 11:09:42 -0700 (PDT) Received: by mail-pl1-x64a.google.com with SMTP id p4so24681978pls.16 for ; Sat, 01 Aug 2020 11:09:42 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=date:message-id:mime-version:subject:from:to:cc; bh=KBK3iODiHqCfZW9cfD4X0Y+4sS0bjVew+4Qx1E5a8fw=; b=Npllj9n2Q669VxD0v/McKELqdTmGuLlT0ULCUEsQcW69X2g34NXPXQF05paRHixr6V bX70YTowWkT+yAUkFo4BfbmdtLljszspRuDpyhnjhQeyA08dgLlsUnMdRRGRgMG25ffx JZVTGsrYyVVBdh3x43ReVvKzjkpQyLruiRhnb/BEXLDVa5Larg9NLBza85P3A55nZTOK 82MLnM4O9eg4Bx8M5eD0BcOpwfFDYUwBwLMhD+8CxcSvvOQhGOlmuQB0O74DxtZDvfDR klNw64Qm1qxWMS36ufDr9m24xrqnPOgzm66J80vuB/DcGNtj6bWEiyNwNBSPNrOupkGa F5jw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:message-id:mime-version:subject:from:to:cc; bh=KBK3iODiHqCfZW9cfD4X0Y+4sS0bjVew+4Qx1E5a8fw=; b=UPrDmnX9W8HcHFoZPdgsE5mzy4/v/rM/eWKGDVkd4GDq8Jvnvs0nDRH8eGvilfc22p sPRxkjULvC3C5LQe2I4b/UrjzkJfTnFWtzZ7XJUb/OaCir1qDOP0NGSwwEQ4AEOWw33h MUDl21Lm7rJwx3hkmc8bcjrJiOafQV8tldeBG6guW7YmrIbfS5bqZwrbNsVHBl2eWtMp NwnGAg/+6BShNe24uDM0nmbWJ2mfIB+3EheDxRJebWgNTrzBDeySQyWoSkC8HnhtupmQ TWuNzv76DdT5d6H7JieYZ99WFP6jGSpI+LbWxFdoukmNZ9KXklBCFrUEOq6bPZavA21j yYXg== X-Gm-Message-State: AOAM533JG6KX0IBqld/EFbsDkyXwGv1gDYlgXGJBylTjIu7wgZly6+gA 0TYKELfRf9U6qHwFtQaEeIjOGcwc9D2Q X-Received: by 2002:a63:5821:: with SMTP id m33mr8888598pgb.43.1596305381999; Sat, 01 Aug 2020 11:09:41 -0700 (PDT) Date: Sat, 1 Aug 2020 11:09:27 -0700 Message-Id: <20200801180927.1003340-1-brianvv@google.com> Mime-Version: 1.0 X-Mailer: git-send-email 2.28.0.163.g6104cc2f0b6-goog Subject: [PATCH V2 bpf-next] bpf: make __htab_lookup_and_delete_batch faster when map is almost empty From: Brian Vazquez To: Brian Vazquez , Brian Vazquez , Alexei Starovoitov , Daniel Borkmann , "David S . Miller" Cc: linux-kernel@vger.kernel.org, netdev@vger.kernel.org, bpf@vger.kernel.org, Luigi Rizzo , Yonghong Song Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org While running some experiments it was observed that map_lookup_batch was 2x slower than get_next_key + lookup when the syscall overhead is minimal. This was because the map_lookup_batch implementation was more expensive traversing empty buckets, this can be really costly when the pre-allocated map is too big. This patch optimizes the case when the bucket is empty so we can move quickly to next bucket. The Benchmark was generated using the google/benchmark library[1]. When the benckmark is executed the number of iterations is governed by the amount of time the benckmarks takes, the number of iterations is at least 1 and not more than 1e9, until CPU time(of the entire binary, not just the part to measure), is greater than 0.5s. Time and CPU reported are the average of a single iteration over the iteration runs. The experiments to exercise the empty buckets are as follows: -The map was populated with a single entry to make sure that the syscall overhead is not helping the map_batch_lookup. -The size of the preallocated map was increased to show the effect of traversing empty buckets. To interpret the results, Benchmark is the name of the experiment where the first number correspond to the number of elements in the map, and the next one correspond to the size of the pre-allocated map. Time and CPU are average and correspond to the time elapsed per iteration and the system time consumtion per iteration. Results: Using get_next_key + lookup: Benchmark Time(ns) CPU(ns) Iteration --------------------------------------------------------------- BM_DumpHashMap/1/1k 3593 3586 192680 BM_DumpHashMap/1/4k 6004 5972 100000 BM_DumpHashMap/1/16k 15755 15710 44341 BM_DumpHashMap/1/64k 59525 59376 10000 Using htab_lookup_batch before this patch: Benchmark Time(ns) CPU(ns) Iterations --------------------------------------------------------------- BM_DumpHashMap/1/1k 3933 3927 177978 BM_DumpHashMap/1/4k 9192 9177 73951 BM_DumpHashMap/1/16k 42011 41970 16789 BM_DumpHashMap/1/64k 117895 117661 6135 Using htab_lookup_batch with this patch: Benchmark Time(ns) CPU(ns) Iterations --------------------------------------------------------------- BM_DumpHashMap/1/1k 2809 2803 249212 BM_DumpHashMap/1/4k 5318 5316 100000 BM_DumpHashMap/1/16k 14925 14895 47448 BM_DumpHashMap/1/64k 58870 58674 10000 [1] https://github.com/google/benchmark.git Changelog: v1 -> v2: - Add more information about how to interpret the results Suggested-by: Luigi Rizzo Cc: Yonghong Song Signed-off-by: Brian Vazquez --- kernel/bpf/hashtab.c | 23 ++++++++--------------- 1 file changed, 8 insertions(+), 15 deletions(-) diff --git a/kernel/bpf/hashtab.c b/kernel/bpf/hashtab.c index 024276787055..b6d28bd6345b 100644 --- a/kernel/bpf/hashtab.c +++ b/kernel/bpf/hashtab.c @@ -1349,7 +1349,6 @@ __htab_map_lookup_and_delete_batch(struct bpf_map *map, struct hlist_nulls_head *head; struct hlist_nulls_node *n; unsigned long flags = 0; - bool locked = false; struct htab_elem *l; struct bucket *b; int ret = 0; @@ -1408,19 +1407,19 @@ __htab_map_lookup_and_delete_batch(struct bpf_map *map, dst_val = values; b = &htab->buckets[batch]; head = &b->head; - /* do not grab the lock unless need it (bucket_cnt > 0). */ - if (locked) - flags = htab_lock_bucket(htab, b); + l = hlist_nulls_entry_safe(rcu_dereference_raw(hlist_nulls_first_rcu(head)), + struct htab_elem, hash_node); + if (!l && (batch + 1 < htab->n_buckets)) { + batch++; + goto again_nocopy; + } + + flags = htab_lock_bucket(htab, b); bucket_cnt = 0; hlist_nulls_for_each_entry_rcu(l, n, head, hash_node) bucket_cnt++; - if (bucket_cnt && !locked) { - locked = true; - goto again_nocopy; - } - if (bucket_cnt > (max_count - total)) { if (total == 0) ret = -ENOSPC; @@ -1446,10 +1445,6 @@ __htab_map_lookup_and_delete_batch(struct bpf_map *map, goto alloc; } - /* Next block is only safe to run if you have grabbed the lock */ - if (!locked) - goto next_batch; - hlist_nulls_for_each_entry_safe(l, n, head, hash_node) { memcpy(dst_key, l->key, key_size); @@ -1492,7 +1487,6 @@ __htab_map_lookup_and_delete_batch(struct bpf_map *map, } htab_unlock_bucket(htab, b, flags); - locked = false; while (node_to_free) { l = node_to_free; @@ -1500,7 +1494,6 @@ __htab_map_lookup_and_delete_batch(struct bpf_map *map, bpf_lru_push_free(&htab->lru, &l->lru_node); } -next_batch: /* If we are not copying data, we can go to next bucket and avoid * unlocking the rcu. */ -- 2.28.0.163.g6104cc2f0b6-goog