Received: by 2002:a05:6358:d09b:b0:dc:cd0c:909e with SMTP id jc27csp7839424rwb; Wed, 23 Nov 2022 11:26:16 -0800 (PST) X-Google-Smtp-Source: AA0mqf4FrBSX80fnn9Fg0pz7Dmeg9UtBvxqF0a9h0rhLaiBXUtdedRcRQV1uK20JUj+iWUP3tysm X-Received: by 2002:a63:fa41:0:b0:476:e84c:ab65 with SMTP id g1-20020a63fa41000000b00476e84cab65mr10381560pgk.513.1669231576607; Wed, 23 Nov 2022 11:26:16 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1669231576; cv=none; d=google.com; s=arc-20160816; b=nCg+j3NBWxZG16uAZEDS0kZdq+gyNphcESetOPPJLZxpF9wUl+ywn8F0k7GeaeMJZp rlNq99C3BrN3khdg3cNDH1MCE8xgZfXEKe1OmR9vsncwaoOzOdSJR9x/+NPIdZF2Cf6c XbDmyYIFLAOX/dRbKjRrFmYFZnXrfq1AtjY7W9i4y2k1g9WPCgT9H9HB7lCi8yrZgFIn f1FzmYOmKL3PSrmmino0/Wx6PYZjmyxr6veBigDnFxm8XMEwyhrhyPYz+Kpss1yZxif1 cCwuJczJo3ZvcwmM4BvqZyZkHxjf9vdvHj3JLyKSsl2ZxZbL48x2KN3bQt0pyG8TOW9P e8lA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:in-reply-to:from :references:cc:to:content-language:subject:user-agent:mime-version :date:message-id:dkim-signature; bh=gaE/aOdW+dlAJ1rXJX3+9BcoLCIGWukFU4WSM4KWxKA=; b=cM/IE/+gZYdfjZnxcH+maXyjw5WdUQ6QuCwx8Y4N5wNlM4QmGTjgkooEYClu8MzgA1 u89LpV5GwDliwyn53etnO/JKmf/LFLLWJWvVATMxgXoKxZo94qzEmBGe2C9KY6DIbPxW pgZH1yUAF+Zw3OW/jYibXsxK7vo8DY/29hW7JeHlSaPU8FvYCF2lwuWJu+o8ztEjmMc6 rPQBLHdoQLysuFE7tiGKVHGGZeNWhkGLrs5u2pRrHr8iN2h752dnoyHpH4m5HN7g5zAA rKULSZ2Q/P76b27lqLeMXqzRj8KaL+k36CWhTprS/yDUZcnd6tcyQ7tIIMMaemYyV7ky wBpQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=JEymKtwh; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id ea23-20020a056a004c1700b0057453f5750csi2426506pfb.82.2022.11.23.11.26.04; Wed, 23 Nov 2022 11:26:16 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=JEymKtwh; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S239113AbiKWSt6 (ORCPT + 88 others); Wed, 23 Nov 2022 13:49:58 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:50670 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S237068AbiKWSt4 (ORCPT ); Wed, 23 Nov 2022 13:49:56 -0500 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 1523C2BB2A for ; Wed, 23 Nov 2022 10:48:53 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1669229333; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=gaE/aOdW+dlAJ1rXJX3+9BcoLCIGWukFU4WSM4KWxKA=; b=JEymKtwh+RVYBcnSiVnmRnuyryddAoOb7XySy6AkhQ7D0PfPrSHbocKxVLIW2VEapKBoo8 yba2jCUknLLt3owjPWO/U0noPSoQJ8XWuC3t/Mky9oZM99XklDQu7rPMFIXKj5NM6+M7sd 65rfpWfpUvO0Jrg17wjViVe6NVFA9qY= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-377-m9LxdOb8OsqjXZsGFLInog-1; Wed, 23 Nov 2022 13:48:49 -0500 X-MC-Unique: m9LxdOb8OsqjXZsGFLInog-1 Received: from smtp.corp.redhat.com (int-mx07.intmail.prod.int.rdu2.redhat.com [10.11.54.7]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 31C1A101A528; Wed, 23 Nov 2022 18:48:49 +0000 (UTC) Received: from [10.22.17.47] (unknown [10.22.17.47]) by smtp.corp.redhat.com (Postfix) with ESMTP id A3F9E1415114; Wed, 23 Nov 2022 18:48:48 +0000 (UTC) Message-ID: <5fccf438-fdbe-1bc8-6460-b3911cc51566@redhat.com> Date: Wed, 23 Nov 2022 13:48:46 -0500 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.4.0 Subject: Re: [PATCH] cgroup/cpuset: Optimize update_tasks_nodemask() Content-Language: en-US To: Tejun Heo , "haifeng.xu" Cc: lizefan.x@bytedance.com, hannes@cmpxchg.org, cgroups@vger.kernel.org, linux-kernel@vger.kernel.org References: <20221123082157.71326-1-haifeng.xu@shopee.com> From: Waiman Long In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Scanned-By: MIMEDefang 3.1 on 10.11.54.7 X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,NICE_REPLY_A, RCVD_IN_DNSWL_NONE,RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_NONE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 11/23/22 12:05, Tejun Heo wrote: > On Wed, Nov 23, 2022 at 08:21:57AM +0000, haifeng.xu wrote: >> When change the 'cpuset.mems' under some cgroup, system will hung >> for a long time. From the dmesg, many processes or theads are >> stuck in fork/exit. The reason is show as follows. >> >> thread A: >> cpuset_write_resmask /* takes cpuset_rwsem */ >> ... >> update_tasks_nodemask >> mpol_rebind_mm /* waits mmap_lock */ >> >> thread B: >> worker_thread >> ... >> cpuset_migrate_mm_workfn >> do_migrate_pages /* takes mmap_lock */ >> >> thread C: >> cgroup_procs_write /* takes cgroup_mutex and cgroup_threadgroup_rwsem */ >> ... >> cpuset_can_attach >> percpu_down_write /* waits cpuset_rwsem */ >> >> Once update the nodemasks of cpuset, thread A wakes up thread B to >> migrate mm. But when thread A iterates through all tasks, including >> child threads and group leader, it has to wait the mmap_lock which >> has been take by thread B. Unfortunately, thread C wants to migrate >> tasks into cgroup at this moment, it must wait thread A to release >> cpuset_rwsem. If thread B spends much time to migrate mm, the >> fork/exit which acquire cgroup_threadgroup_rwsem also need to >> wait for a long time. >> >> There is no need to migrate the mm of child threads which is >> shared with group leader. > This is only a problem in cgroup1 and cgroup1 doesn't require the threads of > a given task to be in the same cgroup. I don't think you can optimize it > this way. I think it is an issue anyway if different threads of a process are in different cpusets with different node mask. It is not a configuration that should be used at all. This patch makes update_tasks_nodemask() somewhat similar to cpuset_attach() where all tasks are iterated to update the node mask but only the task leaders are required to update the mm. For a non-group leader task, maybe we can check if the group leader is in the same cpuset. If so, we can skip the mm update. Do we need similar change in cpuset_attach()? I do think the "migrate = is_memory_migrate(cs);" line can be moved outside of the loop, though. Of course, that won't help much in this case. Cheers, Longman