Received: by 2002:a05:6358:d09b:b0:dc:cd0c:909e with SMTP id jc27csp8328629rwb; Wed, 23 Nov 2022 20:09:31 -0800 (PST) X-Google-Smtp-Source: AA0mqf4WNdKo+50ERBMW629xvkEeywjNPieKbOESQAT1HQ6bigYUPVQx0RQQcu3zwm5qXRe7Pstn X-Received: by 2002:a17:90b:378d:b0:218:9ea5:b081 with SMTP id mz13-20020a17090b378d00b002189ea5b081mr22841739pjb.105.1669262971078; Wed, 23 Nov 2022 20:09:31 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1669262971; cv=none; d=google.com; s=arc-20160816; b=R8m3KVTHGaMGnDFJqOFEfVw4kcnmwqVuk33O8e/E+Rdx+6jpPP0UvpMtmcSgmiTeM7 KWFz5Z08Zpr415gE+0xUEJF/02SP/HHdtb896MKWwY79Gsi8FRcbG44bZOyNWN86EkHQ wE4UdfjfAxuZct84y4NnVWLsYkPW1X8SlEBRptvl+hwwjQPG27UnU6EOmDguqiqe4NQO JXNb6jggB9X6xhjBS7RrdPkldmsQp/afHnmmUvnceFRiDnYwxtWnTiLQSjxV8tbpkJt3 /DNAdoShqumaW++eIGL/DLlTxr1TjlKrqikzp7OzJi3yVPnf33Of3kbhqNYzvk2DrDfc w59w== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:in-reply-to:from :references:cc:to:subject:user-agent:mime-version:date:message-id :dkim-signature; bh=0A6IGdis8UFb2G4Z4L2JyFM1cHNL6BHSqh/4YAf0zi4=; b=e9o2mpJeuEOtKsJbaFLES9sIzyfb98UGsceRcqdMkjlajzWxctqbMjWmohmDODfOHw nfsK94UVTzJaye6ocSSfGWC2ngzDe2aLfXRdP4xSxnzOlE7h5Vm3KE2wjIrcVRJ88mm8 CPuoCjrMin07R2vkOMqIN63NcBOwUaxX59JVvKrmF+WnahaV6bt2zZEgrKgHHMduxLap aAFLEVdeLWuvfV9LHr7+DPjsEU9MvIjOA6+x98WpNsBQunkkYEx8YBV6FVGiMnZ0cdKh IbVCUHWOgaQR0LAVwaaDSbqdGPHSfztlg4P4iDlZWtg3k89z6wEUsGFHb+2FiPBUN4c1 Z53Q== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@shopee.com header.s=shopee.com header.b=kujSbxlT; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=shopee.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id rj13-20020a17090b3e8d00b001fd70129092si311690pjb.15.2022.11.23.20.09.18; Wed, 23 Nov 2022 20:09:31 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@shopee.com header.s=shopee.com header.b=kujSbxlT; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=shopee.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229640AbiKXDdy (ORCPT + 88 others); Wed, 23 Nov 2022 22:33:54 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:53278 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229700AbiKXDdx (ORCPT ); Wed, 23 Nov 2022 22:33:53 -0500 Received: from mail-pj1-x102e.google.com (mail-pj1-x102e.google.com [IPv6:2607:f8b0:4864:20::102e]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 29A82C6579 for ; Wed, 23 Nov 2022 19:33:50 -0800 (PST) Received: by mail-pj1-x102e.google.com with SMTP id ci10so369428pjb.1 for ; Wed, 23 Nov 2022 19:33:50 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=shopee.com; s=shopee.com; h=content-transfer-encoding:in-reply-to:from:references:cc:to:subject :user-agent:mime-version:date:message-id:from:to:cc:subject:date :message-id:reply-to; bh=0A6IGdis8UFb2G4Z4L2JyFM1cHNL6BHSqh/4YAf0zi4=; b=kujSbxlTKyrW4i9i1/6mTJGpSMZE0X6OJ5IPMp3RRmF7qSCPjYv+2bVxe8n5Z8h/1g bJGk7Cy/wlCFbrYiAvpjMcNitplpF2Irh4BVshd+J0kP2wCcUImQPf7nRdWE0KFTBCBA lUI3rkT+qdxxs4B5g9tz4i2utI3kBEdq2Z51ez6CUhh/TLH5lZGFJlLxdgiXfecaVzJR TOJx8T7q5eFaM7U0Lm/aFpSPHRnULvKj6FlOBrG6m8NdlPkNBKxV+tDtgNgcv0Kf4Fpg 9O/nQtv9++vzQ8l/n6hQMNKzXkEgHwAAHgN6yKPHGneCTt+7qdcfdgrJ8uepOPM1Q7kM RAdA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:in-reply-to:from:references:cc:to:subject :user-agent:mime-version:date:message-id:x-gm-message-state:from:to :cc:subject:date:message-id:reply-to; bh=0A6IGdis8UFb2G4Z4L2JyFM1cHNL6BHSqh/4YAf0zi4=; b=pdJvhUsm5yiRLimMNGTfKY6uLGK1TjG05Y7hOiFr2F1iLf8M9tzvoTtxruMXzH2x7N CN0POdR/WH6xaeCYw7hkjgIxZu6xlSx5pWfjqoIvPQZcJ7IsWeca7crP6Gvk2tARgtRH DIK1uv63IMcaNZ0UQ1K9/NeGZs3ptnb+EmcWqUpAC8lOoAHiRvLWHLcrFNneGjDPSoSy NRv2nbfVuHXNzSjE3vfLNO9OILOu3QQQZEjBOz5vzO3wKYR51oHYOVZ9CDPhxb6YeuLr 4EbOZzLesWmyruVyDWt3WTAfw/+2Ga/mKKbmvjOpHTFm5qaVEfLapb197m3p4VLuxZuk 3VaQ== X-Gm-Message-State: ANoB5pnmio58ZffjjpG9XSpjwYHJzwlSp/0o+V0ccKKdyPdVUzkSVsxX gDtvzzMyy/UiLsJxflEyosMyRw== X-Received: by 2002:a17:902:6b89:b0:188:bb79:4892 with SMTP id p9-20020a1709026b8900b00188bb794892mr14662401plk.60.1669260829624; Wed, 23 Nov 2022 19:33:49 -0800 (PST) Received: from [10.54.24.49] (static-ip-147-99-134-202.rev.dyxnet.com. [202.134.99.147]) by smtp.gmail.com with ESMTPSA id ik18-20020a170902ab1200b001767f6f04efsm4416322plb.242.2022.11.23.19.33.45 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Wed, 23 Nov 2022 19:33:48 -0800 (PST) Message-ID: <4de8821b-e0c0-bf63-4d76-b0ce208cce3b@shopee.com> Date: Thu, 24 Nov 2022 11:33:39 +0800 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:102.0) Gecko/20100101 Thunderbird/102.5.0 Subject: Re: [PATCH] cgroup/cpuset: Optimize update_tasks_nodemask() To: Waiman Long Cc: lizefan.x@bytedance.com, tj@kernel.org, hannes@cmpxchg.org, cgroups@vger.kernel.org, linux-kernel@vger.kernel.org References: <20221123082157.71326-1-haifeng.xu@shopee.com> <2ac6f207-e08a-2a7f-01ae-dfaf15eefaf6@redhat.com> From: Haifeng Xu In-Reply-To: <2ac6f207-e08a-2a7f-01ae-dfaf15eefaf6@redhat.com> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,NICE_REPLY_A,RCVD_IN_DNSWL_NONE, SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 2022/11/24 04:23, Waiman Long wrote: > On 11/23/22 03:21, haifeng.xu wrote: >> When change the 'cpuset.mems' under some cgroup, system will hung >> for a long time. From the dmesg, many processes or theads are >> stuck in fork/exit. The reason is show as follows. >> >> thread A: >> cpuset_write_resmask /* takes cpuset_rwsem */ >>    ... >>      update_tasks_nodemask >>        mpol_rebind_mm /* waits mmap_lock */ >> >> thread B: >> worker_thread >>    ... >>      cpuset_migrate_mm_workfn >>        do_migrate_pages /* takes mmap_lock */ >> >> thread C: >> cgroup_procs_write /* takes cgroup_mutex and cgroup_threadgroup_rwsem */ >>    ... >>      cpuset_can_attach >>        percpu_down_write /* waits cpuset_rwsem */ >> >> Once update the nodemasks of cpuset, thread A wakes up thread B to >> migrate mm. But when thread A iterates through all tasks, including >> child threads and group leader, it has to wait the mmap_lock which >> has been take by thread B. Unfortunately, thread C wants to migrate >> tasks into cgroup at this moment, it must wait thread A to release >> cpuset_rwsem. If thread B spends much time to migrate mm, the >> fork/exit which acquire cgroup_threadgroup_rwsem also need to >> wait for a long time. >> >> There is no need to migrate the mm of child threads which is >> shared with group leader. Just iterate through the group >> leader only. >> >> Signed-off-by: haifeng.xu >> --- >>   kernel/cgroup/cpuset.c | 3 +++ >>   1 file changed, 3 insertions(+) >> >> diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c >> index 589827ccda8b..43cbd09546d0 100644 >> --- a/kernel/cgroup/cpuset.c >> +++ b/kernel/cgroup/cpuset.c >> @@ -1968,6 +1968,9 @@ static void update_tasks_nodemask(struct cpuset >> *cs) >>             cpuset_change_task_nodemask(task, &newmems); >>   +        if (!thread_group_leader(task)) >> +            continue; >> + >>           mm = get_task_mm(task); >>           if (!mm) >>               continue; > > Could you try the attached test patch to see if it can fix your problem? > Something along the line of this patch will be more acceptable. > > Thanks, > Longman > Hi, Longman. Thanks for your patch, but there are still some problems. 1) (group leader, node: 0,1) cgroup0 / \ / \ cgroup1 cgroup2 (threads) (threads) If set node 0 in cgroup1 and node 1 in cgroup2, both of them will update the mm. And the nodemask of mm depends on who set the node last. 2) (process, node: 0,1) cgroup0 / \ / \ cgroup1 cgroup2 (node: 0) (node: 1) If migrate thread from cgroup0 to cgroup1 or cgroup2, cpuset_attach won't update the mm. So the nodemask of thread, including mems_allowed and mempolicy(updated in cpuset_change_task_nodemask), is different from the vm_policy in vma(updated in mpol_rebind_mm). In a word, if threads have different cpusets with different nodemask, it will cause inconsistent memory behavior. Thanks, Haifeng.