Received: by 2002:a05:6a10:413:0:0:0:0 with SMTP id 19csp2243707pxp; Mon, 7 Mar 2022 11:09:12 -0800 (PST) X-Google-Smtp-Source: ABdhPJwu9N3/7+MxQJCH4lCPnLA7J5biaQqC7BnWsaDgRmxZsFoSeb/4Wtv9EBpyzpP6UBp9HmO2 X-Received: by 2002:a17:90a:2c0c:b0:1b9:fa47:1caf with SMTP id m12-20020a17090a2c0c00b001b9fa471cafmr495345pjd.34.1646680152250; Mon, 07 Mar 2022 11:09:12 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1646680152; cv=none; d=google.com; s=arc-20160816; b=TYJb4t1wAttmSOTcFgTi/iD6+uQKMYGgNYqUA1pHpulrj3kI0TlE21yNQWwaXJ/stZ p4c/srtpw1Qdxf+kHLrjkOSOa6LwbGrrj4LfL8yKFsz+hzR2pfYy2MFktDU79dbHSwku XgXKiMqCAOswKkxenJlQmtebkiXUJmtINPA9uh9mkG8EhBy4s9xsUr/JJw3F6IzAnOUA PwSLd9o+57COPfUnqXzvzOUh6GS798+jkxxQ5vXZckl9zJ8bTV2rPwO7e/XQS1wqWL7+ IM/HLGr/CmAshyHkL4fcqGKMpnqgcovWmnJ0Zx/aojvixwDcoTdjBnDS83Tscz6zFn2w B9mA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:subject:message-id:date:from:in-reply-to :references:mime-version:dkim-signature; bh=wOaEmSLT8Hs8VbykY7UZSpJotv9MXZiDNybb91mB0/A=; b=GOwJFZIvKWo0v4+2dN6BQW7zKUx4i0QwdVDQS5ZB26e/5AAmQ296gMAt3HqvENI+4O otJ6YwvTeK8TK/myRMe5o5SBB7qzcjHzl8krmdkChcQYCAA437T7ejjipcfyZT9k/pQB T80cveFaxRXQaZ2TetZek51iN9up7jwB0c/4eowkdAZEuZnGlIh1mH1HFlp8KmV4WwMH RJmqY3udE1TNf/y1KRrpsPIpgUTC6uunYcXRsQGC31VDQlil56Dl4TA6lAF2XgG4YlTi KQ84BiBEEL20E+84psKbsuE9Ewg1IgOhv5coAotE9YM5KNtmQa0w58Ot0GDrFLh9isod SHug== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=tkjRQRTQ; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id t2-20020a17090340c200b00151cda93c4esi7515067pld.608.2022.03.07.11.08.53; Mon, 07 Mar 2022 11:09:12 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=tkjRQRTQ; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S241978AbiCGQuB (ORCPT + 99 others); Mon, 7 Mar 2022 11:50:01 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:43794 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232809AbiCGQt7 (ORCPT ); Mon, 7 Mar 2022 11:49:59 -0500 Received: from mail-yb1-xb30.google.com (mail-yb1-xb30.google.com [IPv6:2607:f8b0:4864:20::b30]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 3DF4B220ED for ; Mon, 7 Mar 2022 08:49:05 -0800 (PST) Received: by mail-yb1-xb30.google.com with SMTP id f38so32226731ybi.3 for ; Mon, 07 Mar 2022 08:49:05 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=wOaEmSLT8Hs8VbykY7UZSpJotv9MXZiDNybb91mB0/A=; b=tkjRQRTQjIx8Brz4GulR3BnbRaCpEJmULkorenyHZmon6Eu1SEIeV38HFh3LGAKi5+ juJvYGVmuC/0a/kntYr1BjDzZKQ10Fr+ESwqmkd++wkKI77bD3c1ztQHm5t3GU0p/1/c Z3V+RGoyxvJxXbxDygUsntM2raFKNCS1/cryvUkXbqPJ3zfEOCGiQT9zGoSbV6BSKMod 14Q5rea9Q+Ju2Xgm0RPTicufRonSNumVfaIjKCO+52jYo/ZtHPIj/pZ+q2fDw1qrtbUP yrIabyYVpPfMdBbMp8IhTAMwylN6tujsMC/Xixdt4uh0aAu5uTBw5aWWG5pV7TdXZgua hvoQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=wOaEmSLT8Hs8VbykY7UZSpJotv9MXZiDNybb91mB0/A=; b=307LH/sLN6j2oVtUYSuln4fyZNu1aYWAiMRHCDu+YUYldh286hhp504LbBbF+cIr4o stMsD8dtAL+xbfS/ns7ng9OuEyzkvGQiglBAj07LnEdgjVDi8e6Y6cA1DyWp2f06G48m 7wQiu8YgmLeINof1tSqfCHYh4p8RMxATk9YGrHNvx84DdBZ0sW8rsL7VkLME8+1PahTC vrQNw2nrKBR3cHONy0PYp73PW20s/mQ58dXCWtQtGxHzOUpUGeKkDpqS2Xqg2d5gV3dz 753uuA3TKJGLVl4cL94cuqOCtISpwikknNfGOm24e0Yn84WtkzfCc38vmsTORd24JwdU lRAQ== X-Gm-Message-State: AOAM531d+Xbi270DctVlIkWCmjOHYZw/jG263FQlkBiYoaZDU0c5nAJx sMwitZ5x3qnf5+8ecaqxdhsPuoS+OMJ6DEvnFAHX0w== X-Received: by 2002:a25:1906:0:b0:61d:9576:754e with SMTP id 6-20020a251906000000b0061d9576754emr8661270ybz.426.1646671744163; Mon, 07 Mar 2022 08:49:04 -0800 (PST) MIME-Version: 1.0 References: <20220225012819.1807147-1-surenb@google.com> <20220302002150.2113-1-hdanton@sina.com> In-Reply-To: From: Suren Baghdasaryan Date: Mon, 7 Mar 2022 08:48:53 -0800 Message-ID: Subject: Re: [RFC 1/1] mm: page_alloc: replace mm_percpu_wq with kthreads in drain_all_pages To: Petr Mladek Cc: Hillf Danton , linux-mm , LKML , Mel Gorman , Michal Hocko , Tim Murray , Minchan Kim , Johannes Weiner Content-Type: text/plain; charset="UTF-8" X-Spam-Status: No, score=-17.6 required=5.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF, ENV_AND_HDR_SPF_MATCH,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS, T_SCC_BODY_TEXT_LINE,USER_IN_DEF_DKIM_WL,USER_IN_DEF_SPF_WL autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, Mar 7, 2022 at 8:35 AM Petr Mladek wrote: > > On Wed 2022-03-02 15:06:24, Suren Baghdasaryan wrote: > > On Tue, Mar 1, 2022 at 4:22 PM Hillf Danton wrote: > > > > > > On Thu, 24 Feb 2022 17:28:19 -0800 Suren Baghdasaryan wrote: > > > > Sending as an RFC to confirm if this is the right direction and to > > > > clarify if other tasks currently executed on mm_percpu_wq should be > > > > also moved to kthreads. The patch seems stable in testing but I want > > > > to collect more performance data before submitting a non-RFC version. > > > > > > > > > > > > Currently drain_all_pages uses mm_percpu_wq to drain pages from pcp > > > > list during direct reclaim. The tasks on a workqueue can be delayed > > > > by other tasks in the workqueues using the same per-cpu worker pool. > > > > > > The pending works may be freeing a couple of slabs/pages each. Who knows? > > > > If we are talking about work specifically scheduled on mm_percpu_wq > > then apart from drain_all_pages, mm_percpu_wq is used to execute > > vmstat_update and lru_add_drain_cpu for drainig pagevecs. If OTOH what > > you mean is that the work might be blocked by say kswapd, which is > > freeing memory, then sure, who knows... > > Note that the same worker pool is used by many workqueues. And > work items in per-cpu workqueues are serialized on a single worker. > Another worker is used only when a work goes into a sleeping wait. > > I want to say that "drain_all_pages" are not blocked only by other > works using "mm_percpu_wq" but also by works from many other > workqueues, including "system_wq". > > These works might do anything, including memory allocation, freeing. Ah, I didn't know this (I think you mentioned it in one of your previous replies but I missed it). Thank you for clarifying! > > > > > > > > This results in sizable delays in drain_all_pages when cpus are highly > > > > contended. > > > > Memory management operations designed to relieve memory pressure should > > > > not be allowed to block by other tasks, especially if the task in direct > > > > reclaim has higher priority than the blocking tasks. > > > > > > Wonder why priority is the right cure to tight memory - otherwise it was > > > not a problem given a direct reclaimer of higher priority. > > > > > > Off topic question - why is it making sense from begining for a task of > > > lower priority to peel pages off from another of higher priority if > > > priority is considered in direct reclaim? > > > > The way I understood your question is that you are asking why we have > > to use workqueues of potentially lower priority to drain pages for a > > potentially higher priority process in direct reclaim (which is > > blocked waiting for workqueues to complete draining)? > > If so, IIUC this mechanism was introduced in > > https://lore.kernel.org/all/20170117092954.15413-4-mgorman@techsingularity.net > > to avoid draining from IPI context (CC'ing Mel Gorman to correct me if > > I'm wrong). > > I think the issue here is that in the process we are losing > > information about the priority of the process in direct reclaim, which > > might lead to priority inversion. > > Note that priority of workqueue workers is static. It is defined > by the workqueue parameters. > > kthread_worker API allows to create custom kthreads. The user could > modify the priority as needed. It allows to prevent priority > inversion. It can hardly be achieved by workques where the workers > are heavily shared by unrelated tasks. Yes but I suspect we would not want to dynamically change the priority of the kthreads performing drain_all_pages? I guess we could adjust it based on the highest priority among the tasks waiting for drain_all_pages and that would eliminate priority inversion. However I'm not sure about the possible overhead associated with such dynamic priority adjustments. My RFC sets up the kthreads to be low priority FIFO threads. It's a simplification and I'm not sure that is the right approach here... > > Best Regards, > Petr