Received: by 2002:a05:6a10:413:0:0:0:0 with SMTP id 19csp2765799pxp; Tue, 8 Mar 2022 01:06:06 -0800 (PST) X-Google-Smtp-Source: ABdhPJwNdHpRwZHGs94Wh25Vfzp8K8tznhtE09uNwl2AhRJTzsp5bmuQkeBy7o23fV+e6v5XSj3K X-Received: by 2002:a17:907:6d9d:b0:6da:7d4c:287f with SMTP id sb29-20020a1709076d9d00b006da7d4c287fmr12136160ejc.741.1646730366229; Tue, 08 Mar 2022 01:06:06 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1646730366; cv=none; d=google.com; s=arc-20160816; b=Oo+mJzLCoW/KZM/VTAFtk/X/fQIyxPciDwwnPGGwQ7Jcg+19fKNLtRJPMqTpJQ2clG WxgDS0WBYCD0E9tTZ2b448/PE2yzeX8/kzTdl8iilSOjs7BG0AQ4jLPwosA4KLbu5gbg kBZelqbMyrSCgskY0htQYMlDgkj/9xb3oeUD8jSLA8FGgVqPBIb4TISs7LQiYOjcSvqQ Uux6UedokMyXgCt5pMmhoVgxFKKqHkQeivDTx8r2v9Az0KtiDCy8YeN7pIzjMWxhPkWb K2Q17gJMZOHB5wx8/pWNwPjN8sDUKfp1uCLHzhKWvHY/QZ0BkpQdqs6wYJpat3lba6V9 y4/A== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-disposition:mime-version :references:message-id:subject:cc:to:from:date:dkim-signature; bh=wS6Ue02k7hebk/Mgz2MF8d/Kcu7OchpiW9RtZ99n1Tk=; b=T5zxo+mbKmxoSXtCQ3jUcL4OdnSjV7F3rUhB1HkRMazBwTyib5eedg5MB6zBh0da5s A6E+92D3ajKnhyoPDtUPhdptB/Fp4CBv0a7F0B9uYgms0MDuW4UauoPqsZJEo0wJxOz5 ipaSrL6c6sCs7OyYj4+XeeJdc7YcxCTqqIpYTxvgoQ6mFrIol9/B4VUmMmbrFM3arvI8 mBssn9vxXRKgrv3qR3vVaOWOFDUJE479myjoH5kZdKaMGgKG+8tQS6pnJcOGZL0xKmow /zFL0p8eTOR/v+Qa0k9k8M0YUD2cnfct9+3gFjXlWWZTZprzhW2HIv4I961lKfNj/vyL nmmA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@suse.com header.s=susede1 header.b="h/KHreq1"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=suse.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id gn37-20020a1709070d2500b006da7463a634si9406479ejc.443.2022.03.08.01.05.42; Tue, 08 Mar 2022 01:06:06 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@suse.com header.s=susede1 header.b="h/KHreq1"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=suse.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S239316AbiCGRFo (ORCPT + 99 others); Mon, 7 Mar 2022 12:05:44 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:52270 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S238330AbiCGRFn (ORCPT ); Mon, 7 Mar 2022 12:05:43 -0500 Received: from smtp-out1.suse.de (smtp-out1.suse.de [195.135.220.28]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 3971664BC5 for ; Mon, 7 Mar 2022 09:04:48 -0800 (PST) Received: from relay2.suse.de (relay2.suse.de [149.44.160.134]) by smtp-out1.suse.de (Postfix) with ESMTP id E1365210EF; Mon, 7 Mar 2022 17:04:46 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=susede1; t=1646672686; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=wS6Ue02k7hebk/Mgz2MF8d/Kcu7OchpiW9RtZ99n1Tk=; b=h/KHreq1aZibiQdnyTkaKYnSAiIewWKzL98qQei2xPe5fc++FeoxnTlennRxorxjCBkG8E qU/VQcHwTsKYY92NJCHgJiZtajusWWDiEbF/ynBNVGjDB42z9zrpLCc4EedxDiitQ7xWuK ekphZJHyi1XjrHagrf0AtbVKScVsDj0= Received: from suse.cz (unknown [10.100.201.86]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by relay2.suse.de (Postfix) with ESMTPS id B10DAA3B83; Mon, 7 Mar 2022 17:04:46 +0000 (UTC) Date: Mon, 7 Mar 2022 18:04:43 +0100 From: Michal Hocko To: Suren Baghdasaryan Cc: akpm@linux-foundation.org, hannes@cmpxchg.org, pmladek@suse.com, peterz@infradead.org, guro@fb.com, shakeelb@google.com, minchan@kernel.org, timmurray@google.com, linux-mm@kvack.org, linux-kernel@vger.kernel.org, kernel-team@android.com Subject: Re: [RFC 1/1] mm: page_alloc: replace mm_percpu_wq with kthreads in drain_all_pages Message-ID: References: <20220225012819.1807147-1-surenb@google.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20220225012819.1807147-1-surenb@google.com> X-Spam-Status: No, score=-4.4 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_MED,SPF_HELO_NONE, SPF_PASS,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu 24-02-22 17:28:19, Suren Baghdasaryan wrote: > Sending as an RFC to confirm if this is the right direction and to > clarify if other tasks currently executed on mm_percpu_wq should be > also moved to kthreads. The patch seems stable in testing but I want > to collect more performance data before submitting a non-RFC version. > > > Currently drain_all_pages uses mm_percpu_wq to drain pages from pcp > list during direct reclaim. The tasks on a workqueue can be delayed > by other tasks in the workqueues using the same per-cpu worker pool. > This results in sizable delays in drain_all_pages when cpus are highly > contended. This is not about cpus being highly contended. It is about too much work on the WQ context. > Memory management operations designed to relieve memory pressure should > not be allowed to block by other tasks, especially if the task in direct > reclaim has higher priority than the blocking tasks. Agreed here. > Replace the usage of mm_percpu_wq with per-cpu low priority FIFO > kthreads to execute draining tasks. This looks like a natural thing to do when WQ context is not suitable but I am not sure the additional resources is really justified. Large machines with a lot of cpus would create a lot of kernel threads. Can we do better than that? Would it be possible to have fewer workers (e.g. 1 or one per numa node) and it would perform the work on a dedicated cpu by changing its affinity? Or would that introduce an unacceptable overhead? Or would it be possible to update the existing WQ code to use rescuer well before the WQ is completely clogged? -- Michal Hocko SUSE Labs