Received: by 2002:a25:8b12:0:0:0:0:0 with SMTP id i18csp1045831ybl; Wed, 14 Aug 2019 09:52:38 -0700 (PDT) X-Google-Smtp-Source: APXvYqxIS+/hQMW5vinU29WpNTFMnexxYvmu7b/SPT+KKFRRA7SSfT2ooVlHgBZq0KSV4BRwEYOb X-Received: by 2002:a62:7d93:: with SMTP id y141mr913382pfc.164.1565801558383; Wed, 14 Aug 2019 09:52:38 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1565801558; cv=none; d=google.com; s=arc-20160816; b=MiQbie2gc7rq9K0Z3BjiaN9Lr2cYjUXmOA2liYziffCX9d96ulagdV03KrymSUuwPr cC2zFNfUGS9vw8nXV1Z27Ap1SDT7phK5Qxkq8oXg8NeiTDl3CzQTDyJ5zxqIHCIf+WKs fWb8IqCQV1d3mSZJnGnBWxKsB5zvWVqhfSp3EBcYGcx7nvJdGjj7wo/BVtP16BcVngkc KOoVWE7ebx0b4eyMXmspLhO6YPPKl95sZxAUmh+GRE9R5rrtWCgZMnxT8z0I6PQj6jaH hTLnSfr18e25RE2RRkim/odYRrWGZ52xpLocOX/gvHFiCO515FWl9rHuwshm4LGEOdYt jF2Q== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding :content-language:in-reply-to:mime-version:user-agent:date :message-id:organization:openpgp:from:references:cc:to:subject; bh=Zv1wSyubw55xftuejOP0FkqtktGZhl7yN9hLOZgXyYw=; b=ba6xpWawxlizLRAARJYkdPgoGUB/84+5vt3/bA5DkY8O4CLrBOk0rXg7XHZ7MahjV7 /vHbuYGmvhmsfHfIlUEXhPDsUTLlG2afGhs99Y9heFTugezfaekwLih8egHYzeB5rAdG zylAqS48uNdtrhLWSCTCl/W2VBGigXNuG6IX2e+ZoEokti1Ir9ytC2AJfihXZ/kvM5to 8Z70JNNUEohJVvl7zGYD7DLr+ZvzntdHx4y/uJ0KouOA91o2IEcJCZ5FMxu48QqmvEYa fbmJQmNRA4IHwngcKQnPG7X34Cp0UxXTXdoQtwgzBHKT2+E1SUTjlNZnmwi21DkXQ6/R z8Rw== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id d27si148534pgd.328.2019.08.14.09.52.21; Wed, 14 Aug 2019 09:52:38 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728277AbfHNQuN (ORCPT + 99 others); Wed, 14 Aug 2019 12:50:13 -0400 Received: from mx2.suse.de ([195.135.220.15]:51580 "EHLO mx1.suse.de" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1725828AbfHNQuN (ORCPT ); Wed, 14 Aug 2019 12:50:13 -0400 X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.220.254]) by mx1.suse.de (Postfix) with ESMTP id C8C4BACC2; Wed, 14 Aug 2019 16:50:11 +0000 (UTC) Subject: Re: [PATCH] bcache: add cond_resched() in __bch_cache_cmp() To: Heitor Alves de Siqueira Cc: kent.overstreet@gmail.com, linux-bcache@vger.kernel.org, linux-kernel@vger.kernel.org, shile.zhang@linux.alibaba.com, vojtech@suse.com References: <20190814142301.32556-1-halves@canonical.com> From: Coly Li Openpgp: preference=signencrypt Organization: SUSE Labs Message-ID: <74950e24-245a-c627-0e2e-32ac0b304a6c@suse.de> Date: Thu, 15 Aug 2019 00:50:03 +0800 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.14; rv:60.0) Gecko/20100101 Thunderbird/60.8.0 MIME-Version: 1.0 In-Reply-To: <20190814142301.32556-1-halves@canonical.com> Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 8bit Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 2019/8/14 10:23 下午, Heitor Alves de Siqueira wrote: > Hi Coly, > > We've had users impacted by system stalls and were able to trace it back to the > bcache priority_stats query. After investigating a bit further, it seems that > the sorting step in the quantiles calculation can cause heavy CPU > contention. This has a severe performance impact on any task that is running in > the same CPU as the sysfs query, and caused issues even for non-bcache > workloads. > > We did some test runs with fio to get a better picture of the impact on > read/write workloads while a priority_stats query is running, and came up with > some interesting results. The bucket locking doesn't seem to have that > much performance impact even in full-write workloads, but the 'sort' can affect > bcache device throughput and latency pretty heavily (and any other tasks that > are "unlucky" to be scheduled together with it). In some of our tests, there was > a performance reduction of almost 90% in random read IOPS to the bcache device > (refer to the comparison graph at [0]). There's a few more details on the > Launchpad bug [1] we've created to track this, together with the complete fio > results + comparison graphs. > > The cond_resched() patch suggested by Shile Zhang actually improved performance > a lot, and eliminated the stalls we've observed during the priority_stats > query. Even though it may cause the sysfs query to take a bit longer, it seems > like a decent tradeoff for general performance when running that query on a > system under heavy load. It's also a cheap short-term solution until we can look > into a more complex re-write of the priority_stats calculation (perhaps one that > doesn't require the locking?). Could we revisit Shile's patch, and consider > merging it? > > Thanks! > Heitor > > [0] https://people.canonical.com/~halves/priority_stats/read/4k-iops-2Dsmooth.png > [1] https://bugs.launchpad.net/bugs/1840043 > Hi Heitor, With your very detailed explanation I come to understand why people cares about performance impact of pririty_stats. In the case of system monitoring, how long priority_stats returns is less important than overall system throughput. Yes I agree with your opinion and the benchmark chart makes me confident with the original patch. I will add this patch to v5.4 window with tested-by: Heitor Alves de Siqueira Thanks for your detailed information. And thanks for Shile Zhang originally composing this patch. -- Coly Li