Received: by 2002:ad5:474a:0:0:0:0:0 with SMTP id i10csp1247838imu; Wed, 23 Jan 2019 13:21:53 -0800 (PST) X-Google-Smtp-Source: ALg8bN6YhQr1eoOeh7wJTKxRTTBq7NfM2hxnmdxM2M2z+Ih5X5hLD+2Bn8cU34bcKQhB8jz8iKjl X-Received: by 2002:a63:3e05:: with SMTP id l5mr3304509pga.96.1548278513250; Wed, 23 Jan 2019 13:21:53 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1548278513; cv=none; d=google.com; s=arc-20160816; b=x1eSS7xKJaLuPGLH2k0Wxv2ka6LdRtK2YjEBa6R1fW8wkkt6Q0YmaQjpaxOvDCOzKj djTJl9Qso2n9IOF073Np3hzCUGnO9w66ZCqF56HHCm7/OGy+pfiNMpGH6wqk8ySK/jtR pLCnG9hyZfR0bJm5Oy6IcMERMNwtZGnr53M6LGgsFt0wuAd+ehb3dReSAWauzF59t0HA FTiH43avmyEdhceW9NXnUb5QjZsBoO0VT/RW1mTNSNpvDAfPp9p4Qxr+mpVorkpdbx9z d0zeKROZbDMQZb7846mWTMPVEN+nUjXxADjDRq66Co7PxWd2lCT65jnAIfUE3jrlMdhL 8UPw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding:mime-version :message-id:date:subject:cc:to:from:dkim-signature; bh=bx0GVwN+/IXTNkh6VmHwdIZW0CTo69XIFMKKCgUqC/w=; b=OLcgtDcFmPWijjwVTTwCdrYD0nylVhehFWJ1e2iqh+KysO8wPWJU2uREI2CIn+idbt /bWH4xef+8OKydGJ7aOMHWAvwLIw+ggenDVylFpqED/+FsI8zaIRDzAbgdgUiLUM6dyl QyU9Jb3aSwJs2gm9sfhrmDBpge24+E1rHzZAkCWcd8ZyJzUhIEb6to6U21pqdNgvtnun NdCHJqaJgUfjgGnVKRfgZy8xANlRudahuXpO5KFruYV4GiXzUtVJcpLo3DRnuTUOb6ia TF0hBqNRVVKOI1rSzrX0kiEr4lEKLC8Qkrh1ReIjl8+dsOwk3WnJZLYrqy8poAPKJFJX iS8g== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@appneta.com header.s=google header.b=hkVutrwa; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=appneta.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id z19si5208150pfc.95.2019.01.23.13.21.37; Wed, 23 Jan 2019 13:21:53 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@appneta.com header.s=google header.b=hkVutrwa; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=appneta.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726864AbfAWVUx (ORCPT + 99 others); Wed, 23 Jan 2019 16:20:53 -0500 Received: from mail-pf1-f194.google.com ([209.85.210.194]:36409 "EHLO mail-pf1-f194.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726352AbfAWVUx (ORCPT ); Wed, 23 Jan 2019 16:20:53 -0500 Received: by mail-pf1-f194.google.com with SMTP id b85so1822684pfc.3 for ; Wed, 23 Jan 2019 13:20:53 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=appneta.com; s=google; h=from:to:cc:subject:date:message-id:mime-version :content-transfer-encoding; bh=bx0GVwN+/IXTNkh6VmHwdIZW0CTo69XIFMKKCgUqC/w=; b=hkVutrwa5Huy/h8BU5djJ5x/ro6a1dL2C8b2CBsiJplrrwU1cvcLycu01aszyRqJD3 WzDVGhcZzv5/JqL15hEys8jmlVdiRKTMcCqt+GyZNpqpNVI29An3/AxGJwaTVclO3a/w IArAO8tEt8KNs4q0W9Tw5RdhZStEXWqG5Yt3E= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:mime-version :content-transfer-encoding; bh=bx0GVwN+/IXTNkh6VmHwdIZW0CTo69XIFMKKCgUqC/w=; b=AfiX284E6goI+h6MvPVF/wU48ks3hHDx+u5jeLNNo3YHFQCNuvAo5DYMqlqFoL+mWD wnYqk6kwTGtVep/PzdMURdwRyXxpuqX6canfpeTy69lpetbwNoz+MC+cwvVIZczQ7ZK+ RbRFFCRQpEJV8SOXAkteivRCQPXgxJNHOBIsZ7DCOHcd3RY/l7icHHFETEkodwb8N9/J wqNAJQkhlMKZddfKqyiEXsa8yPrpO+fT4FX0RTjDQOzCnStCSBa+nzVslLBD8Wf6mCk6 uik+KWIDfy3MKYSDXsYAsMz908z8+MeXXfRVOS5T3yS5GKGptEoyRqpFd3NS3YP0t8f9 /YKg== X-Gm-Message-State: AJcUukeA/6Zbs+3f6vTXIYtQT+mwvXKp/Yu5qz/YxGnkaaAEim0aMpO2 Fqh04p4ylVuubREazxljB6cb X-Received: by 2002:a63:a91a:: with SMTP id u26mr3436095pge.349.1548278452786; Wed, 23 Jan 2019 13:20:52 -0800 (PST) Received: from debian9-jae.jaalam.net ([209.139.228.33]) by smtp.gmail.com with ESMTPSA id u6sm22801696pgr.79.2019.01.23.13.20.51 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Wed, 23 Jan 2019 13:20:52 -0800 (PST) From: Josh Elsasser To: "David S . Miller" Cc: josh@elsasser.ca, Josh Elsasser , Thomas Graf , Herbert Xu , netdev@vger.kernel.org, linux-kernel@vger.kernel.org Subject: [PATCH net] rhashtable: avoid reschedule loop after rapid growth and shrink Date: Wed, 23 Jan 2019 13:17:58 -0800 Message-Id: <20190123211758.104275-1-jelsasser@appneta.com> X-Mailer: git-send-email 2.19.1 MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org When running workloads with large bursts of fragmented packets, we've seen a few machines stuck returning -EEXIST from rht_shrink() and endlessly rescheduling their hash table's deferred work, pegging a CPU core. Root cause is commit da20420f83ea ("rhashtable: Add nested tables"), which stops ignoring the return code of rhashtable_shrink() and the reallocs used to grow the hashtable. This uncovers a bug in the shrink logic where "needs to shrink" check runs against the last table but the actual shrink operation runs on the first bucket_table in the hashtable (see below): +-------+ +--------------+ +---------------+ | ht | | "first" tbl | | "last" tbl | | - tbl ---> | - future_tbl ---------> | - future_tbl ---> NULL +-------+ +--------------+ +---------------+ ^^^ ^^^ used by rhashtable_shrink() used by rht_shrink_below_30() A rehash then stalls out when both the last table needs to shrink, the first table has more elements than the target size, but rht_shrink() hits a non-NULL future_tbl and returns -EEXIST. This skips the item rehashing and kicks off a reschedule loop, as no forward progress can be made while the rhashtable needs to shrink. Extend rhashtable_shrink() with a "tbl" param to avoid endless exit-and- reschedules after hitting the EEXIST, allowing it to check a future_tbl pointer that can actually be non-NULL and make forward progress when the hashtable needs to shrink. Fixes: da20420f83ea ("rhashtable: Add nested tables") Signed-off-by: Josh Elsasser --- lib/rhashtable.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/lib/rhashtable.c b/lib/rhashtable.c index 852ffa5160f1..98e91f9544fa 100644 --- a/lib/rhashtable.c +++ b/lib/rhashtable.c @@ -377,9 +377,9 @@ static int rhashtable_rehash_alloc(struct rhashtable *ht, * It is valid to have concurrent insertions and deletions protected by per * bucket locks or concurrent RCU protected lookups and traversals. */ -static int rhashtable_shrink(struct rhashtable *ht) +static int rhashtable_shrink(struct rhashtable *ht, + struct bucket_table *old_tbl) { - struct bucket_table *old_tbl = rht_dereference(ht->tbl, ht); unsigned int nelems = atomic_read(&ht->nelems); unsigned int size = 0; @@ -412,7 +412,7 @@ static void rht_deferred_worker(struct work_struct *work) if (rht_grow_above_75(ht, tbl)) err = rhashtable_rehash_alloc(ht, tbl, tbl->size * 2); else if (ht->p.automatic_shrinking && rht_shrink_below_30(ht, tbl)) - err = rhashtable_shrink(ht); + err = rhashtable_shrink(ht, tbl); else if (tbl->nest) err = rhashtable_rehash_alloc(ht, tbl, tbl->size); -- 2.19.1