Received: by 10.223.164.221 with SMTP id h29csp783845wrb; Thu, 26 Oct 2017 07:06:11 -0700 (PDT) X-Google-Smtp-Source: ABhQp+RkHkWcRNZgQGQRpcRsZabIGLEpz89oy+UlBJ+fGf0C6KzjsRaKbL0Zqxlnlki8M7VeJzEv X-Received: by 10.101.72.1 with SMTP id h1mr5142833pgs.249.1509026770937; Thu, 26 Oct 2017 07:06:10 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1509026770; cv=none; d=google.com; s=arc-20160816; b=IFo7cZxJNqvEHTEUX7OyvTrVG6yAbWMAEHv3lWcpmAJL6oBsv/SUSHMnf/71T0UTc8 pzQljkB7W6lvJA0sdExaEme/w8c2lL8ZtFJG5R8+KTM+PWRgjhT9ZuG+AxUEwxnmbHPr Cz6KPfcifLJQge93FZFtHAsM1t3jKtRlN9ejY9ltez5RITIWnKQlRxOkRg0YMbPN/4IQ 9ntjiLK5rOiS1ExR4vdlTShda3vahgIq4vFPxccpgKppaLz45seVf1fK2fegEoeO55xS 3PVk0dT7WLX7URdIK1/3HQyTXeOm0nnIzexQnN3Ie9SO0V3tT9MZGIQV4f0ZB4+Y9p8D /kkQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-language :content-transfer-encoding:in-reply-to:mime-version:user-agent:date :message-id:organization:from:references:cc:to:subject:dmarc-filter :arc-authentication-results; bh=uUi+Qg8dN1J8Pgfggo4LRXOYiV6yTxzvpmTUnpZ9fFo=; b=cOOIVcuXCZbPyQHO8idWxqbt3Rkek5racdR6+38Oz41J7htQ30YOfRaeXI69tQ73/j pa4zbX4T98NH940YtFUFvpsuu9EsqgfWT8gmNcs1LYL8WHy0aphlIUUloqA0L6032z1u eboztHu2JzgaVN+NPIIr/4TtxJoMvJar5EXRTaagg228kygUrlkw/kUnf/vFXMjgZyFX mZokN1OwnD+cHIZvYb9J8/fF62gVBwxVW6mpPaJvprd2rp87rI79uxNPWXIPG8zpVpb/ QaFOOeCc49ckuSWX2/T+oMGWAKKtvCOR//DhKIF12hXXc1whJUDOEwGpEn2FuyobKwPe SGWQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=redhat.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id e90si3709097pfb.291.2017.10.26.07.05.56; Thu, 26 Oct 2017 07:06:10 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S932443AbdJZOFQ convert rfc822-to-8bit (ORCPT + 99 others); Thu, 26 Oct 2017 10:05:16 -0400 Received: from mx1.redhat.com ([209.132.183.28]:47790 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S932234AbdJZOFN (ORCPT ); Thu, 26 Oct 2017 10:05:13 -0400 Received: from smtp.corp.redhat.com (int-mx05.intmail.prod.int.phx2.redhat.com [10.5.11.15]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 8B6827CB95; Thu, 26 Oct 2017 14:05:13 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mx1.redhat.com 8B6827CB95 Authentication-Results: ext-mx02.extmail.prod.ext.phx2.redhat.com; dmarc=none (p=none dis=none) header.from=redhat.com Authentication-Results: ext-mx02.extmail.prod.ext.phx2.redhat.com; spf=fail smtp.mailfrom=longman@redhat.com Received: from llong.remote.csb (dhcp-17-140.bos.redhat.com [10.18.17.140]) by smtp.corp.redhat.com (Postfix) with ESMTP id 6F4117FD3B; Thu, 26 Oct 2017 14:05:12 +0000 (UTC) Subject: Re: [PATCH] cgroup/cpuset: remove circular dependency deadlock To: Prateek Sood , peterz@infradead.org, tj@kernel.org, lizefan@huawei.com, mingo@kernel.org, boqun.feng@gmail.com, tglx@linutronix.de Cc: cgroups@vger.kernel.org, sramana@codeaurora.org, linux-kernel@vger.kernel.org References: <20171025093041.GO3165@worktop.lehotels.local> <1509018722-30359-1-git-send-email-prsood@codeaurora.org> From: Waiman Long Organization: Red Hat Message-ID: Date: Thu, 26 Oct 2017 10:05:12 -0400 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.2.0 MIME-Version: 1.0 In-Reply-To: <1509018722-30359-1-git-send-email-prsood@codeaurora.org> Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8BIT Content-Language: en-US X-Scanned-By: MIMEDefang 2.79 on 10.5.11.15 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.26]); Thu, 26 Oct 2017 14:05:13 +0000 (UTC) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 10/26/2017 07:52 AM, Prateek Sood wrote: > Remove circular dependency deadlock in a scenario where hotplug of CPU is > being done while there is updation in cgroup and cpuset triggered from > userspace. > > Process A => kthreadd => Process B => Process C => Process A > > Process A > cpu_subsys_offline(); > cpu_down(); > _cpu_down(); > percpu_down_write(&cpu_hotplug_lock); //held > cpuhp_invoke_callback(); > workqueue_offline_cpu(); > wq_update_unbound_numa(); > kthread_create_on_node(); > wake_up_process(); //wakeup kthreadd > flush_work(); > wait_for_completion(); > > kthreadd > kthreadd(); > kernel_thread(); > do_fork(); > copy_process(); > percpu_down_read(&cgroup_threadgroup_rwsem); > __rwsem_down_read_failed_common(); //waiting > > Process B > kernfs_fop_write(); > cgroup_file_write(); > cgroup_procs_write(); > percpu_down_write(&cgroup_threadgroup_rwsem); //held > cgroup_attach_task(); > cgroup_migrate(); > cgroup_migrate_execute(); > cpuset_can_attach(); > mutex_lock(&cpuset_mutex); //waiting > > Process C > kernfs_fop_write(); > cgroup_file_write(); > cpuset_write_resmask(); > mutex_lock(&cpuset_mutex); //held > update_cpumask(); > update_cpumasks_hier(); > rebuild_sched_domains_locked(); > get_online_cpus(); > percpu_down_read(&cpu_hotplug_lock); //waiting > > Eliminating deadlock by reversing the locking order for cpuset_mutex and > cpu_hotplug_lock. General comments: Please add a version number of your patch. I have seen multiple versions of this patch and have lost track how many are there as there is no version number information. In addition, there are changes beyond just swapping the lock order and they are not documented in this change log. I would like to see you discuss about those additional changes here as well. > void rebuild_sched_domains(void) > { > + cpus_read_lock(); > mutex_lock(&cpuset_mutex); > - rebuild_sched_domains_locked(); > + rebuild_sched_domains_cpuslocked(); > mutex_unlock(&cpuset_mutex); > + cpus_read_unlock(); > } I saw a lot of instances where cpus_read_lock() and mutex_lock() come together. Maybe some new lock/unlock helper functions may help. > @@ -2356,25 +2354,29 @@ static void cpuset_hotplug_workfn(struct work_struct *work) > } > > /* rebuild sched domains if cpus_allowed has changed */ > - if (cpus_updated || force_rebuild) { > - force_rebuild = false; > - rebuild_sched_domains(); > + if (cpus_updated) { > + if (use_cpu_hp_lock) > + rebuild_sched_domains(); > + else { > + /* When called during cpu hotplug cpu_hotplug_lock > + * is held by the calling thread, not > + * not cpuhp_thread_fun > + */ ??? The comment is not clear. Cheers, Longman From 1582320954128924822@xxx Thu Oct 26 11:54:14 +0000 2017 X-GM-THRID: 1577859769769316492 X-Gmail-Labels: Inbox,Category Forums