Received: by 2002:a25:86ce:0:0:0:0:0 with SMTP id y14csp2187689ybm; Thu, 23 May 2019 12:48:43 -0700 (PDT) X-Google-Smtp-Source: APXvYqyKchtp+IFAmxjPiSmHqWfDr9u8gLDgxrySixMj3BSkEZioNP1dZY4cxCkzONDwtYGwvvl8 X-Received: by 2002:a65:6494:: with SMTP id e20mr91914038pgv.117.1558640923746; Thu, 23 May 2019 12:48:43 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1558640923; cv=none; d=google.com; s=arc-20160816; b=W93eotgkyvlZhKIO5LgadccKkSCQif9KKR9LlHJFMbtSIzKVZVFWcPIB5+e//k1VHc wqxQvTPCjK3wAJX9eqEjutc7TUqHCw/Eq/EVVcFCLfzxtuIcMgB0vdlud//9wKomYSNS 8YCi0e1y+Mpgzu80w08MJkZLPnOb1PxeLGdZ/v95F2ljfGGdfN85FygUX8yAq2SwNWF9 RtPaWaiFqpOlKz259XIhezbJDqxfS8jcN0lrKBBy2j/xbbALrl7W6SSbcxLvhfQd0PK9 oqaWfT5DuE7daaj7qkfqM7hjRtBMV2YJju1YqN3LNQDaJXOvg7Ii4XQ0EpzxmjZjiSbM kkLQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding:mime-version :user-agent:references:in-reply-to:message-id:date:subject:cc:to :from:dkim-signature; bh=1W/Fm+NrZhCjUvwf8AYfs4Bi2azDyDirrzE1bs4EfwQ=; b=jgKWnGy8mU5dXvjDd7oXRuCx3puWBHoY2dOKtQJeJeW+zj7G7AlrRDfgquyAcq0UtK geUvfnlxWM2PAOoVmAA/AOfUWG3xc/vVBO/Ys95HKaepWs7dVERlMi3I4+lxu0pjrWei 9KZxVOgcQ2f2gBemgCmCbGiS88CqvMSvYG6R1w1qwj1vY4XRUvMvOmS0n4/VSWLIx4Uh Qm+ydkMbrUGZso98a3eBa/2FRsZJhq2vfZ9xbpAiMaPnOnhROnk6D3cvxhQUk7AOfTDX d/AHruu/XvFUoUQ2Gu+oLerwfIQm3RIYEtuLyVPqIWe9MA7OQgALzjcND/RQpfESd/MH SXoA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@kernel.org header.s=default header.b=hwfscTuF; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id v130si368141pfc.99.2019.05.23.12.48.29; Thu, 23 May 2019 12:48:43 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=default header.b=hwfscTuF; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2388225AbfEWTON (ORCPT + 99 others); Thu, 23 May 2019 15:14:13 -0400 Received: from mail.kernel.org ([198.145.29.99]:48134 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S2388459AbfEWTOK (ORCPT ); Thu, 23 May 2019 15:14:10 -0400 Received: from localhost (83-86-89-107.cable.dynamic.v4.ziggo.nl [83.86.89.107]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPSA id 0C79D2133D; Thu, 23 May 2019 19:14:08 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1558638849; bh=3HPlDIKoNiW5KkLhHXh369P9nPblupMd2QgOesjxNKM=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=hwfscTuF/L6/3V1BWU9LC5E2Xe78Q5REQG3laWHav3eQAKama1woIAf41uwJ18IwG 53VaPo5r4tHpamelsW5+TjMKfo/e/YOulO1RxalUVQQqCvK7pYtI/3JLyV7eJLYeVj VpSfgWqqCNt8w4Ixg0j1d69BNRwmMC1SvwpVHZGc= From: Greg Kroah-Hartman To: linux-kernel@vger.kernel.org Cc: Greg Kroah-Hartman , stable@vger.kernel.org, Daniel Borkmann , Martin KaFai Lau , Alexei Starovoitov Subject: [PATCH 4.14 76/77] bpf, lru: avoid messing with eviction heuristics upon syscall lookup Date: Thu, 23 May 2019 21:06:34 +0200 Message-Id: <20190523181730.447456891@linuxfoundation.org> X-Mailer: git-send-email 2.21.0 In-Reply-To: <20190523181719.982121681@linuxfoundation.org> References: <20190523181719.982121681@linuxfoundation.org> User-Agent: quilt/0.66 MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org From: Daniel Borkmann commit 50b045a8c0ccf44f76640ac3eea8d80ca53979a3 upstream. One of the biggest issues we face right now with picking LRU map over regular hash table is that a map walk out of user space, for example, to just dump the existing entries or to remove certain ones, will completely mess up LRU eviction heuristics and wrong entries such as just created ones will get evicted instead. The reason for this is that we mark an entry as "in use" via bpf_lru_node_set_ref() from system call lookup side as well. Thus upon walk, all entries are being marked, so information of actual least recently used ones are "lost". In case of Cilium where it can be used (besides others) as a BPF based connection tracker, this current behavior causes disruption upon control plane changes that need to walk the map from user space to evict certain entries. Discussion result from bpfconf [0] was that we should simply just remove marking from system call side as no good use case could be found where it's actually needed there. Therefore this patch removes marking for regular LRU and per-CPU flavor. If there ever should be a need in future, the behavior could be selected via map creation flag, but due to mentioned reason we avoid this here. [0] http://vger.kernel.org/bpfconf.html Fixes: 29ba732acbee ("bpf: Add BPF_MAP_TYPE_LRU_HASH") Fixes: 8f8449384ec3 ("bpf: Add BPF_MAP_TYPE_LRU_PERCPU_HASH") Signed-off-by: Daniel Borkmann Acked-by: Martin KaFai Lau Signed-off-by: Alexei Starovoitov Signed-off-by: Greg Kroah-Hartman --- kernel/bpf/hashtab.c | 23 ++++++++++++++++++----- 1 file changed, 18 insertions(+), 5 deletions(-) --- a/kernel/bpf/hashtab.c +++ b/kernel/bpf/hashtab.c @@ -498,18 +498,30 @@ static u32 htab_map_gen_lookup(struct bp return insn - insn_buf; } -static void *htab_lru_map_lookup_elem(struct bpf_map *map, void *key) +static __always_inline void *__htab_lru_map_lookup_elem(struct bpf_map *map, + void *key, const bool mark) { struct htab_elem *l = __htab_map_lookup_elem(map, key); if (l) { - bpf_lru_node_set_ref(&l->lru_node); + if (mark) + bpf_lru_node_set_ref(&l->lru_node); return l->key + round_up(map->key_size, 8); } return NULL; } +static void *htab_lru_map_lookup_elem(struct bpf_map *map, void *key) +{ + return __htab_lru_map_lookup_elem(map, key, true); +} + +static void *htab_lru_map_lookup_elem_sys(struct bpf_map *map, void *key) +{ + return __htab_lru_map_lookup_elem(map, key, false); +} + static u32 htab_lru_map_gen_lookup(struct bpf_map *map, struct bpf_insn *insn_buf) { @@ -1160,6 +1172,7 @@ const struct bpf_map_ops htab_lru_map_op .map_free = htab_map_free, .map_get_next_key = htab_map_get_next_key, .map_lookup_elem = htab_lru_map_lookup_elem, + .map_lookup_elem_sys_only = htab_lru_map_lookup_elem_sys, .map_update_elem = htab_lru_map_update_elem, .map_delete_elem = htab_lru_map_delete_elem, .map_gen_lookup = htab_lru_map_gen_lookup, @@ -1190,7 +1203,6 @@ static void *htab_lru_percpu_map_lookup_ int bpf_percpu_hash_copy(struct bpf_map *map, void *key, void *value) { - struct bpf_htab *htab = container_of(map, struct bpf_htab, map); struct htab_elem *l; void __percpu *pptr; int ret = -ENOENT; @@ -1206,8 +1218,9 @@ int bpf_percpu_hash_copy(struct bpf_map l = __htab_map_lookup_elem(map, key); if (!l) goto out; - if (htab_is_lru(htab)) - bpf_lru_node_set_ref(&l->lru_node); + /* We do not mark LRU map element here in order to not mess up + * eviction heuristics when user space does a map walk. + */ pptr = htab_elem_get_ptr(l, map->key_size); for_each_possible_cpu(cpu) { bpf_long_memcpy(value + off,