Received: by 2002:a05:6520:4d:b0:139:a872:a4c9 with SMTP id i13csp2565061lkm; Mon, 20 Sep 2021 18:51:03 -0700 (PDT) X-Google-Smtp-Source: ABdhPJyIth7n/1pwrslqjkX6U02Btx7ZtfMS7zg8jFB5tCRzBF+7h+eWMpTkSVxsjctNGy3YI6Ys X-Received: by 2002:a02:1081:: with SMTP id 123mr21201342jay.83.1632188973329; Mon, 20 Sep 2021 18:49:33 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1632188973; cv=none; d=google.com; s=arc-20160816; b=n7/OuD6nZY41SuKsKEV/vPk8NJVd4A+MO/Xxv7z/bbUajzVdZJV8R4/IWTJ+wYmNoM z/SflXbTGP4P9SHksQ3JQBBCGt6AHOsAeuM1UOtg/E5xeDHk8H6Cj+TF7V2uxE93VSCZ gLnGC+ede7qohCdbv9GxCoUx9a0FRhIAGtnYre1pYwb3XHGEVPjmyaXeNSch2ri0EnnQ tKFOyOrwhx4YL5YY7VkjfHDEGULyM8t+QsGlZNmVo/ea+wY+vSa/XiQotejT/47Xg0eS G/5jgv1J1l0cFmnzcH8pdGzwSNKpnIZ7kq0HlF7tcQt40ZRifz9/JTs5Vf0ERUDGUac2 9Row== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :user-agent:references:in-reply-to:message-id:date:subject:cc:to :from:dkim-signature; bh=DLl5SUuFM+/eoxd5FvnkeXn1vT8PogijsqA7OorMIb0=; b=IoTOvwzYPZMGAHAgCoYXbNeOVTogWSLILzWJR3IWqh3Bw8035CkBO5Mpz430e4gkiE 2jXQ8lbwhdZ4dseJWHTqMyZF4XyW8kNADJMTZZ1+uMIvEXwPgTE6pGUCgAJtKJvITs02 CJKNXGSHbrqfZ5VFDb1YIRXmJCozisRqkANm9qHQJwEB1jtwiW/CEafVI1K7qSWQvgXo JWaX2zuoM4bAzI+FvDCZbsyPmF9WWvY/yWDM0tGLOvYxFl4J8wx8iAPjyLRgKt5fmzdr DdovGCjnv3noFS8m7RtdyHYnkFNzz1ugpT4QJ0OYCHEIyEf0LK1WbtZGVOsfKeb/auoq cGFQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linuxfoundation.org header.s=korg header.b=DzoN+Dhn; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linuxfoundation.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id n2si15424006ilt.29.2021.09.20.18.49.22; Mon, 20 Sep 2021 18:49:33 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@linuxfoundation.org header.s=korg header.b=DzoN+Dhn; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linuxfoundation.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1354352AbhITRxh (ORCPT + 99 others); Mon, 20 Sep 2021 13:53:37 -0400 Received: from mail.kernel.org ([198.145.29.99]:53072 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1349188AbhITRrV (ORCPT ); Mon, 20 Sep 2021 13:47:21 -0400 Received: by mail.kernel.org (Postfix) with ESMTPSA id 33B3361BA9; Mon, 20 Sep 2021 17:10:48 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linuxfoundation.org; s=korg; t=1632157848; bh=hT/rhYBaEHAleBUYq9bC8rfhROFSvmZe5HHjoSa0uus=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=DzoN+Dhn7JD090Yl/QyITJ3mhsmh5dc0d36rdu7V9SkQXNdODgKgf33X+kqQM0kAp fdYDELBY6r+sLAwfTRO3Nm40wROV2C4s+gfLJFWykpPcP6rPESFGe2no5lJ632OZ66 qvkL+E56OrLZIG/U+o3GVKUUq6WOfuAF+WM5yuto= From: Greg Kroah-Hartman To: linux-kernel@vger.kernel.org Cc: Greg Kroah-Hartman , stable@vger.kernel.org, DJ Gregor , Mikulas Patocka , Arne Welzel , Mike Snitzer Subject: [PATCH 4.19 146/293] dm crypt: Avoid percpu_counter spinlock contention in crypt_page_alloc() Date: Mon, 20 Sep 2021 18:41:48 +0200 Message-Id: <20210920163938.277392203@linuxfoundation.org> X-Mailer: git-send-email 2.33.0 In-Reply-To: <20210920163933.258815435@linuxfoundation.org> References: <20210920163933.258815435@linuxfoundation.org> User-Agent: quilt/0.66 MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org From: Arne Welzel commit 528b16bfc3ae5f11638e71b3b63a81f9999df727 upstream. On systems with many cores using dm-crypt, heavy spinlock contention in percpu_counter_compare() can be observed when the page allocation limit for a given device is reached or close to be reached. This is due to percpu_counter_compare() taking a spinlock to compute an exact result on potentially many CPUs at the same time. Switch to non-exact comparison of allocated and allowed pages by using the value returned by percpu_counter_read_positive() to avoid taking the percpu_counter spinlock. This may over/under estimate the actual number of allocated pages by at most (batch-1) * num_online_cpus(). Currently, batch is bounded by 32. The system on which this issue was first observed has 256 CPUs and 512GB of RAM. With a 4k page size, this change may over/under estimate by 31MB. With ~10G (2%) allowed dm-crypt allocations, this seems an acceptable error. Certainly preferred over running into the spinlock contention. This behavior was reproduced on an EC2 c5.24xlarge instance with 96 CPUs and 192GB RAM as follows, but can be provoked on systems with less CPUs as well. * Disable swap * Tune vm settings to promote regular writeback $ echo 50 > /proc/sys/vm/dirty_expire_centisecs $ echo 25 > /proc/sys/vm/dirty_writeback_centisecs $ echo $((128 * 1024 * 1024)) > /proc/sys/vm/dirty_background_bytes * Create 8 dmcrypt devices based on files on a tmpfs * Create and mount an ext4 filesystem on each crypt devices * Run stress-ng --hdd 8 within one of above filesystems Total %system usage collected from sysstat goes to ~35%. Write throughput on the underlying loop device is ~2GB/s. perf profiling an individual kworker kcryptd thread shows the following profile, indicating spinlock contention in percpu_counter_compare(): 99.98% 0.00% kworker/u193:46 [kernel.kallsyms] [k] ret_from_fork | --ret_from_fork kthread worker_thread | --99.92%--process_one_work | |--80.52%--kcryptd_crypt | | | |--62.58%--mempool_alloc | | | | | --62.24%--crypt_page_alloc | | | | | --61.51%--__percpu_counter_compare | | | | | --61.34%--__percpu_counter_sum | | | | | |--58.68%--_raw_spin_lock_irqsave | | | | | | | --58.30%--native_queued_spin_lock_slowpath | | | | | --0.69%--cpumask_next | | | | | --0.51%--_find_next_bit | | | |--10.61%--crypt_convert | | | | | |--6.05%--xts_crypt ... After applying this patch and running the same test, %system usage is lowered to ~7% and write throughput on the loop device increases to ~2.7GB/s. perf report shows mempool_alloc() as ~8% rather than ~62% in the profile and not hitting the percpu_counter() spinlock anymore. |--8.15%--mempool_alloc | | | |--3.93%--crypt_page_alloc | | | | | --3.75%--__alloc_pages | | | | | --3.62%--get_page_from_freelist | | | | | --3.22%--rmqueue_bulk | | | | | --2.59%--_raw_spin_lock | | | | | --2.57%--native_queued_spin_lock_slowpath | | | --3.05%--_raw_spin_lock_irqsave | | | --2.49%--native_queued_spin_lock_slowpath Suggested-by: DJ Gregor Reviewed-by: Mikulas Patocka Signed-off-by: Arne Welzel Fixes: 5059353df86e ("dm crypt: limit the number of allocated pages") Cc: stable@vger.kernel.org Signed-off-by: Mike Snitzer Signed-off-by: Greg Kroah-Hartman --- drivers/md/dm-crypt.c | 7 ++++++- 1 file changed, 6 insertions(+), 1 deletion(-) --- a/drivers/md/dm-crypt.c +++ b/drivers/md/dm-crypt.c @@ -2181,7 +2181,12 @@ static void *crypt_page_alloc(gfp_t gfp_ struct crypt_config *cc = pool_data; struct page *page; - if (unlikely(percpu_counter_compare(&cc->n_allocated_pages, dm_crypt_pages_per_client) >= 0) && + /* + * Note, percpu_counter_read_positive() may over (and under) estimate + * the current usage by at most (batch - 1) * num_online_cpus() pages, + * but avoids potential spinlock contention of an exact result. + */ + if (unlikely(percpu_counter_read_positive(&cc->n_allocated_pages) >= dm_crypt_pages_per_client) && likely(gfp_mask & __GFP_NORETRY)) return NULL;