Received: by 2002:a05:6358:16cc:b0:ea:6187:17c9 with SMTP id r12csp6023948rwl; Mon, 9 Jan 2023 03:15:20 -0800 (PST) X-Google-Smtp-Source: AMrXdXuLho44wPgdHyyydRvpV2B+69h8r8+wOST60+3eTHt5JY0a2anbpQDf7O/tT22vle0OWN1D X-Received: by 2002:a05:6402:cb4:b0:483:6d73:ad02 with SMTP id cn20-20020a0564020cb400b004836d73ad02mr54182404edb.35.1673262920071; Mon, 09 Jan 2023 03:15:20 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1673262920; cv=none; d=google.com; s=arc-20160816; b=sQ7oOtvMP0ADBZNkLBOt9LLfELNl1s1Zkpo924y79DIhWX3kDVBvhMI9PK2GRBuq2r I+GUf+UGCNzsrRiBlQNO5Iy5CeXnIGRhK7kbgrVpWxWmftdg+iAEYgC1pofwVoQmLdJX C6RM8BspsnO3qXbgi0q1oX9OB1g9Z1uhsvjM1T5/sp+BZz6lpmd1zCUH+EOavkjSUekT OuRjDi7Wde8XSF0hsqsM5JraeCNYg8B/ULjdiLEr5IMg+1Hz3xYNm24dgheAZMJOYvGT 6rnEbwmzsi6HIe2p9Pg76n4FseCTjWeLESNf0xzd1/DFCGWnjxfBDHM6QYv+ZL9pDpXF uuPQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-disposition:mime-version :references:message-id:subject:cc:to:from:date:dkim-signature :dkim-signature; bh=0zum01+6W6CWGMuMAoT/HPm/pZWcmXb0ZmN9Mhpgb7I=; b=kGsWEzF3J8TamqkgXawf5ae8l7/9Z9NZKwoQwxenvS/f+RYcjU2KfGCAiaYxCFTw6w olbKKZmEMoPz7y6JrWsqMhKOl2Aijve4LmRjUiZgILXJ3u78r8Rlsmss4rOFmTS3bm47 TjYrYsCvGLthS/Me6macCfJIKKpbpW0jxUtAbUJ0sbmm3ZhZXUexg5M5S2eun2kkRaFd BDHmvA2ZfIA1abfReS32JLxh+c804tqhoZMtN8YR7JoeypTsMdgCHOOqu5F+xmVq+4dr F+7G+4k00Gmm83ZJEJM5SEnE3GPN6AcvY8354Y2njQG0AmwEMh5L1yfJYmtdssisL854 b/SA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@suse.cz header.s=susede2_rsa header.b=ymj96arA; dkim=neutral (no key) header.i=@suse.cz header.s=susede2_ed25519; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id cy15-20020a0564021c8f00b0047e6f8f62b6si8027174edb.140.2023.01.09.03.15.07; Mon, 09 Jan 2023 03:15:20 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@suse.cz header.s=susede2_rsa header.b=ymj96arA; dkim=neutral (no key) header.i=@suse.cz header.s=susede2_ed25519; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233672AbjAIK7Z (ORCPT + 53 others); Mon, 9 Jan 2023 05:59:25 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:33112 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233331AbjAIK7U (ORCPT ); Mon, 9 Jan 2023 05:59:20 -0500 Received: from smtp-out2.suse.de (smtp-out2.suse.de [195.135.220.29]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id DC1DAF5AD; Mon, 9 Jan 2023 02:59:18 -0800 (PST) Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by smtp-out2.suse.de (Postfix) with ESMTPS id 656897690D; Mon, 9 Jan 2023 10:59:17 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.cz; s=susede2_rsa; t=1673261957; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=0zum01+6W6CWGMuMAoT/HPm/pZWcmXb0ZmN9Mhpgb7I=; b=ymj96arAyGY+TaDCybETNaeMAySXQp5KaSrVNcO0/OpY+kQw5rdon1sjJB7tKOPbOyxBwd A2CoKC/TqexuuhGvp62eh3K/FkSGbUpPtWfJM7Q8WOHWgpjlmaSUOJRn9OKafa1k8qdnjW t4yKDvi7JUiI9afyU16poZuHMUoIIU4= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.cz; s=susede2_ed25519; t=1673261957; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=0zum01+6W6CWGMuMAoT/HPm/pZWcmXb0ZmN9Mhpgb7I=; b=i8nH3Nw8dN1tGgWZlu3bylvucdc8gG2H7zcFWPr66gE2yEASNbya0Ohs/HqWFhRbPegPeF LQGbI2DPoix2hhAA== Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by imap2.suse-dmz.suse.de (Postfix) with ESMTPS id 561CC134AD; Mon, 9 Jan 2023 10:59:17 +0000 (UTC) Received: from dovecot-director2.suse.de ([192.168.254.65]) by imap2.suse-dmz.suse.de with ESMTPSA id 337/FIXzu2ORLwAAMHmgww (envelope-from ); Mon, 09 Jan 2023 10:59:17 +0000 Received: by quack3.suse.cz (Postfix, from userid 1000) id D8AE9A0749; Mon, 9 Jan 2023 11:59:16 +0100 (CET) Date: Mon, 9 Jan 2023 11:59:16 +0100 From: Jan Kara To: Tejun Heo Cc: Jan Kara , Michal =?utf-8?Q?Koutn=C3=BD?= , Jinke Han , josef@toxicpanda.com, axboe@kernel.dk, cgroups@vger.kernel.org, linux-block@vger.kernel.org, linux-kernel@vger.kernel.org, yinxin.x@bytedance.com Subject: Re: [PATCH v3] blk-throtl: Introduce sync and async queues for blk-throtl Message-ID: <20230109105916.jvnhjdseqkwejmws@quack3> References: <20221226130505.7186-1-hanjinke.666@bytedance.com> <20230105161854.GA1259@blackbody.suse.cz> <20230106153813.4ttyuikzaagkk2sc@quack3> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-Spam-Status: No, score=-4.4 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_MED,SPF_HELO_NONE, SPF_PASS autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hello! On Fri 06-01-23 06:58:05, Tejun Heo wrote: > Hello, > > On Fri, Jan 06, 2023 at 04:38:13PM +0100, Jan Kara wrote: > > Generally, problems are this are taken care of by IO schedulers. E.g. BFQ > > has quite a lot of logic exactly to reduce problems like this. Sync and > > async queues are one part of this logic inside BFQ (but there's more). > > With modern ssd's, even deadline's overhead is too high and a lot (but > clearly not all) of what the IO schedulers do are no longer necessary. I > don't see a good way back to elevators. Yeah, I agree there's no way back :). But actually I think a lot of the functionality of IO schedulers is not needed (by you ;)) only because the HW got performant enough and so some issues became less visible. And that is all fine but if you end up in a configuration where your cgroup's IO limits and IO demands are similar to how the old rotational disks were underprovisioned for the amount of IO needed to be done by the system (i.e., you can easily generate amount of IO that then takes minutes or tens of minutes for your IO subsystem to crunch through), you hit all the same problems IO schedulers were trying to solve again. And maybe these days we incline more towards the answer "buy more appropriate HW / buy higher limits from your infrastructure provider" but it is not like the original issues in such configurations disappeared. > > But given current architecture of the block layer IO schedulers are below > > throttling frameworks such as blk-throtl so they have no chance of > > influencing problems like this. So we are bound to reinvent the scheduling > > logic IO schedulers are already doing. That being said I don't have a good > > solution for this or architecture suggestion. Because implementing various > > throttling frameworks within IO schedulers is cumbersome (complex > > interactions) and generally the perfomance is too slow for some usecases. > > We've been there (that's why there's cgroup support in BFQ) and really > > the current architecture is much easier to reason about. > > Another layering problem w/ controlling from elevators is that that's after > request allocation and the issuer has already moved on. We used to have > per-cgroup rq pools but ripped that out, so it's pretty easy to cause severe > priority inversions by depleting the shared request pool, and the fact that > throttling takes place after the issuing task returned from issue path makes > propagating the throttling operation upwards more challenging too. Well, we do have .limit_depth IO scheduler callback these days so BFQ uses that to solve the problem of exhaustion of shared request pool but I agree it's a bit of a hack on the side. > At least in terms of cgroup control, the new bio based behavior is a lot > better. In the fb fleet, iocost is deployed on most (virtually all) of the > machines and we don't see issues with severe priority inversions. > Cross-cgroup control is pretty well controlled. Inside each cgroup, sync > writes aren't prioritized but nobody seems to be troubled by that. > > My bet is that inversion issues are a lot more severe with blk-throttle > because it's not work-conserving and not doing things like issue-as-root or > other measures to alleviate issues which can arise from inversions. Yes, I agree these features of blk-throttle make the problems much more likely to happen in practice. Honza -- Jan Kara SUSE Labs, CR