Received: by 2002:a05:6a10:16a7:0:0:0:0 with SMTP id gp39csp3673229pxb; Mon, 9 Nov 2020 18:31:19 -0800 (PST) X-Google-Smtp-Source: ABdhPJx6G91+B0GRqxRycMLyv4bvvDawRfiMaYKTRG+VU/jI7ITxSvcXCRsqDBoFg6/0Ht16x9mz X-Received: by 2002:a05:6402:8cc:: with SMTP id d12mr18794236edz.134.1604975478852; Mon, 09 Nov 2020 18:31:18 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1604975478; cv=none; d=google.com; s=arc-20160816; b=O/kd6/xhQSawUTSAwxVf6G5Rea1VMTOt0e6AMu2dolujZGymtnGRmZhN7TTUbyRtyJ Flgrjqzf5gPQiW5DBSTnqdC6g8HuTM5xu1QnNl1m62JPYRXdrtFIFlbxdVE4M5Dw9GD+ 06LSKgfb+OkQxQWOT/BERFmgoWSNpsZYlGt9XLGVfInDGUDNx6/mqYRoWisX84VsuOLz Mr6bgzWjsd96svTTbzJMW4fib5qUTbnQt4ElSvfNRtHxQybJppSPQxjv49/CShqBol6S xnfDGAph1U6Dl3rcJFUj+Q7ew9J7ZHnK9i2/EWWxIWftJd6fubpwn5sYU0aHjJOSv0nv 3Y8w== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:mime-version:message-id:references:in-reply-to :subject:cc:date:to:from; bh=UR5s6C6PCveJdXWdS3yobVR3dVAnVqjz/3sCpMWlfX8=; b=GjC7OltJjjdHqBdnHAVilicrnS0Yo7Nr1d7dBtNZAwKDGZR75M999Fj9Wmk9zya7om 5KtOiklIZ6KknWpcoGSwzJSTiPD2R3HxloVKhXUza8v7EDr+/CFKNJZkXgtluhSvMXUi LtutuZYfPNrrP3yDqV7VvTINOhAfI405+7Nbk66jjkCVjwzgYQLAVuGJlC3SaqNPvYxi +xh9Uef7jHpaAj/UKZ3hf1dRwgLVL8WugbTwzgfCwG51yazomJ5Nd/cU2ISKspoy48rS VcULaVfI/6zu6eqHT/KYfi/R2ls9YJJncKbNp16JFdUDKwSuEC+1IW2IqQUCpnGYpZ4h FdoQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id dn22si8791147edb.603.2020.11.09.18.30.55; Mon, 09 Nov 2020 18:31:18 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1730910AbgKJC0g (ORCPT + 99 others); Mon, 9 Nov 2020 21:26:36 -0500 Received: from mx2.suse.de ([195.135.220.15]:35952 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727311AbgKJC0g (ORCPT ); Mon, 9 Nov 2020 21:26:36 -0500 X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.221.27]) by mx2.suse.de (Postfix) with ESMTP id 6520DAB95; Tue, 10 Nov 2020 02:26:34 +0000 (UTC) From: NeilBrown To: Peter Zijlstra , Trond Myklebust Date: Tue, 10 Nov 2020 13:26:27 +1100 Cc: "juri.lelli@redhat.com" , "mingo@redhat.com" , "jiangshanlai@gmail.com" , "tj@kernel.org" , "mhocko@suse.com" , "linux-kernel@vger.kernel.org" , "vincent.guittot@linaro.org" Subject: Re: [PATCH rfc] workqueue: honour cond_resched() more effectively. In-Reply-To: <20201109142016.GK2611@hirez.programming.kicks-ass.net> References: <87v9efp7cs.fsf@notabene.neil.brown.name> <20201109080038.GY2594@hirez.programming.kicks-ass.net> <20201109142016.GK2611@hirez.programming.kicks-ass.net> Message-ID: <87pn4mosks.fsf@notabene.neil.brown.name> MIME-Version: 1.0 Content-Type: multipart/signed; boundary="=-=-="; micalg=pgp-sha256; protocol="application/pgp-signature" Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org --=-=-= Content-Type: text/plain Content-Transfer-Encoding: quoted-printable On Mon, Nov 09 2020, Peter Zijlstra wrote: > On Mon, Nov 09, 2020 at 01:50:40PM +0000, Trond Myklebust wrote: >> On Mon, 2020-11-09 at 09:00 +0100, Peter Zijlstra wrote: > >> > I'm thinking the real problem is that you're abusing workqueues. Just >> > don't stuff so much work into it that this becomes a problem. Or >> > rather, >> > if you do, don't lie to it about it. >>=20 >> If we can't use workqueues to call iput_final() on an inode, then what >> is the point of having them at all? > > Running short stuff, apparently. Also running stuff that sleeps. If only does work in short bursts, and sleeps between the works, it can run as long as it likes. It is only sustained bursts that are currently not supported with explicit code. > >> Neil's use case is simply a file that has managed to accumulate a >> seriously large page cache, and is therefore taking a long time to >> complete the call to truncate_inode_pages_final(). Are you saying we >> have to allocate a dedicated thread for every case where this happens? > > I'm not saying anything, but you're trying to wreck the scheduler > because of a workqueue 'feature'. The 'new' workqueues limit concurrency > by design, if you're then relying on concurrency for things, you're > using it wrong. > > I really don't know what the right answer is here, but I thoroughly hate > the one proposed. Oh good - plenty for room for improvement then :-) I feel strongly that this should work transparently. Expecting people too choose the right option to handle cases that don't often some up in testing is naive. A warning whenever a bound,non-CPU-intensive worker calls cond_resched() is trivial to implement and extremely noise. As mentioned, I get twenty just to boot. One amusing example is rhashtable which schedule a worker to rehash a table. This is expected to be cpu-intensive because it calls cond_resched(), but it is run with schedule_work() - clearly not realizing that will block other scheduled work on that CPU. An amusing example for the flip-side is crypto/cryptd.c which creates a WQ_CPU_INTENSIVE workqueue (cryptd) but the cryptd_queue_worker() has a comment "Only handle one request at a time to avoid hogging crypto workqueue." !!! The whole point of WQ_CPU_INTENSIVE is that you cannot hog the workqueue!! Anyway, I digress.... warning on ever cond_resched() generates lots of warnings, including some from printk.... so any work item that might ever print a message needs to be CPU_INTENSIVE??? I don't think that scales. Is there some way the scheduler can help? Does the scheduler notice "time to check on that CPU over there" and then: - if it is in user-space- force it to schedule - if it is in kernel-space (and preempt is disabled), then leave it alone ?? If so, could there be a third case - if it is a bound,non-cpu-intensive worker, switch it to cpu-intensive??? I wonder how long workers typically run - do many run long enough that the scheduler might want to ask them to take a break? Thanks, NeilBrown --=-=-= Content-Type: application/pgp-signature; name="signature.asc" -----BEGIN PGP SIGNATURE----- iQJCBAEBCAAsFiEEG8Yp69OQ2HB7X0l6Oeye3VZigbkFAl+p+lMOHG5laWxiQHN1 c2UuZGUACgkQOeye3VZigbkmURAAkHR6kDOsZoFw+EbfmtbJdJ7p7ffqHcGxk9JH a1xngFazd+peJEahsg93p+8E04KEKkXbJW8Pn/7dwhJm2K0sub/sWf8cGquMsKHt 2pGN61qOkw3PAXJbVm4JWrvi1dXbAYLuj/D0BxKhHB6B8JVf0lTXwgjbGA1c1WsK c3snQ2Iaon/y/r5gICHe/vQ3AHFPNZ6ikG3jVT9pyyYecF/UEh1Cdz3fxMzlkfZk K3i5u6T9j8QPAkwedU8YY2AtFHba7sElJu8hQ8URQAhsUSh6huRAP5cs0qtviVqw eQrEPHk6aPQhuPxgMp2EtjWzL3olu3pWdwLoCD387oRC9CLtTjx5WyTPKA9Pv6Hy hqQj6odc+enoLmjz5tSbihaUvluW0Ae4ddRTKX3Jqeqt4BvJWjVlCs055iTMlh25 CUAop0/hcf6lkmA5teKG3EMlrMuJH/X+XWaV7VS28HnD+H76dL33wNpUIlfZDPrr yEk3VU+hBA8Hr6lyiktIFW2XpeIolWA6lTXsmcoLZFpX3VQG3l70BQf4XWtmCenq HPCcetsHK3vcCByxaCVj2K7AnNuv7dpQKRkiExBQBPviRKk4IBIC6MXAxAzNafUK tWcb+rvDJgyGCSRL0F/OEiQQ8akcQwWRAc4ClYlvYlsq4QYQJWRKhH1oUA5Kx1Vc TqV/4VE= =z4yF -----END PGP SIGNATURE----- --=-=-=--