Received: by 2002:a05:6358:700f:b0:131:369:b2a3 with SMTP id 15csp2837583rwo; Thu, 3 Aug 2023 16:14:24 -0700 (PDT) X-Google-Smtp-Source: AGHT+IGlIaB1uuMfA3J0a5FvRl5vnba2qQW1uHZqBzYOxyO1WA2EaNNQuzDJb3PN3/FyZdJrdrnc X-Received: by 2002:a05:6a00:a1b:b0:687:1be4:46d9 with SMTP id p27-20020a056a000a1b00b006871be446d9mr120593pfh.1.1691104463607; Thu, 03 Aug 2023 16:14:23 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1691104463; cv=none; d=google.com; s=arc-20160816; b=vY27iTgKm4LgIWxAe+GUoQnRSw/dPBuWMHxIjCg5w1wcGgIvUjmnwMOP5mLZe5uHxE 5QZ4OvZ40hxmeXX0Dj4BT0FWzGVXRzy2VRQzRUmRcOEsqkcvlgrv/vPpd+YP9V0hL6qL hUQCp7SBZtMrXMez/+P2f6eJg2dP8XcEs+pHAzhcl5M06xHYSI9YMSJKPFVcRSbyMG/5 iUw0PswfCCy+cAm8VOhryRazHqe47waS5ccWYefd7Tp7Z3oRnR38ePhqVB4FGg/K5mzx qjKj9ctoBNZzD+2tz1kLbHTm/7K+eUcmYHrA8tLeBXuTyKZwLXXRW6O05U+lCbtdxJ8h n5JA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=u9SzfK8f8benMkFPZZn5M0dDI6Vik7V2Z9qVmakjIP4=; fh=yeanF0Me5rcnWmeP26dtKBrrHsbjrBGVEKTkD1EBVBI=; b=hVztQRrPLGBYYxxzRw73YTbsCcSSqXhICYkyNSHZFZBxm9361jkG+AAPtag8377SIp g9Abr6WVuINli+69KTfuZP0ooQk6DGrMji/rM89froIYiulssRhaKEYcWFTXRJfL/+uK iDDHkN9zTZ7mcT7f1UxJFjY8+hBb7VOXEoi/2MFa6gwj6GEMOFIqAA9F6pYowkJfTOCo MF7Xjb22+HdAgRdIRVhNYhYcKl+SXiSIk+DRt12G+pjBDgzlbDKTC3ORS8CEFxv0VvgU nDo+Qx+PdeARpqJ3b3x1sNHoqDimQHROUOvcNE1al7BWUT5Y5gufQ0oGGwn06+mqHlXN 70Tg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gmail.com header.s=20221208 header.b=NXFC0SgM; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id bw27-20020a056a00409b00b006875398611csi666166pfb.80.2023.08.03.16.14.11; Thu, 03 Aug 2023 16:14:23 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@gmail.com header.s=20221208 header.b=NXFC0SgM; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232536AbjHCWC5 (ORCPT + 99 others); Thu, 3 Aug 2023 18:02:57 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:53156 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229935AbjHCWCm (ORCPT ); Thu, 3 Aug 2023 18:02:42 -0400 Received: from mail-io1-xd2b.google.com (mail-io1-xd2b.google.com [IPv6:2607:f8b0:4864:20::d2b]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 096874686; Thu, 3 Aug 2023 15:02:31 -0700 (PDT) Received: by mail-io1-xd2b.google.com with SMTP id ca18e2360f4ac-790ab117bd5so50733039f.0; Thu, 03 Aug 2023 15:02:31 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20221208; t=1691100150; x=1691704950; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=u9SzfK8f8benMkFPZZn5M0dDI6Vik7V2Z9qVmakjIP4=; b=NXFC0SgMR6lPcGAuZ/t4OFEGCGcljI+NjqkglYS6burKBeyQa3c/KPDvGFw/b8XhZ5 3Bnn3wRzAdzocpWpqmG8UBKZQJ69ZDqNy3UfGr42OSlem4HHZhZozxf7zQC5RQ3C3EYP 935LW07FwynLc+PUcp2FMfYZLLY3cOZCtAmy7ns6XY7slbsobF4RRiU3Beev+RVQz1aR ycPInvnsfP8VS+2KOcfQOvgppfKGHwcvcdqDgWBckEaP4zIzdZ4UN63yf3NEYyC34WCP pMU0e4SxOVDvedCdyvh9nvrls0BQ6i/8QbtCcXyXaCEWLVrgYD+yBIklV3fTdTuZJieS jy2A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1691100150; x=1691704950; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=u9SzfK8f8benMkFPZZn5M0dDI6Vik7V2Z9qVmakjIP4=; b=hbFvQN7mLTHsa5mK9AJZixweWHSZnBb2NyE2tCsEhPAaoAuyzZTj0PKAwHObgnHmFt BgiNje4doUaH40iLc0NsrPD6IPiQQ4/7nggIrg8w/cMVFaLmIBlNnTs2tGRE9KbSvyIN AtUmnjVbxOeUdpuXjqnfj9E7tjzZkph2g3P5teQk9lIP60ZJOLvdJL8Zmm1DaSZ7n/Pp Tqyixbzt5cWdy2ALm8RTJlfwmMwTSs/B/DYhNjyPFqKjSxFMTC1Xikoq5PkSPltlFnMS AGQIaZgAB0W+/2ltIIUQI2x530FPyDvzHtyaTVZZdY+Av5BVYTYT9g3ZziIS66bf2KMV LayA== X-Gm-Message-State: AOJu0YyKU2agF3Uh95k5OLuOGHg9kD1k4QCot5/D0qe7MXagXJaETv2f DfaVwhFwarjvV7MtfdbuMvc= X-Received: by 2002:a92:cda9:0:b0:346:779a:ff70 with SMTP id g9-20020a92cda9000000b00346779aff70mr87434ild.28.1691100150510; Thu, 03 Aug 2023 15:02:30 -0700 (PDT) Received: from localhost ([2a00:79e1:abd:4a00:6c80:7c10:75a0:44f4]) by smtp.gmail.com with ESMTPSA id bt26-20020a63291a000000b0056438ae5e29sm294747pgb.43.2023.08.03.15.02.29 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 03 Aug 2023 15:02:29 -0700 (PDT) From: Rob Clark To: dri-devel@lists.freedesktop.org Cc: linux-arm-msm@vger.kernel.org, freedreno@lists.freedesktop.org, Rob Clark , Georgi Djakov , linux-pm@vger.kernel.org (open list:INTERCONNECT API), linux-kernel@vger.kernel.org (open list) Subject: [PATCH v3 6/9] interconnect: Fix locking for runpm vs reclaim Date: Thu, 3 Aug 2023 15:01:54 -0700 Message-ID: <20230803220202.78036-7-robdclark@gmail.com> X-Mailer: git-send-email 2.41.0 In-Reply-To: <20230803220202.78036-1-robdclark@gmail.com> References: <20230803220202.78036-1-robdclark@gmail.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,FREEMAIL_FROM, RCVD_IN_DNSWL_BLOCKED,SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org From: Rob Clark For cases where icc_bw_set() can be called in callbaths that could deadlock against shrinker/reclaim, such as runpm resume, we need to decouple the icc locking. Introduce a new icc_bw_lock for cases where we need to serialize bw aggregation and update to decouple that from paths that require memory allocation such as node/link creation/ destruction. Fixes this lockdep splat: ====================================================== WARNING: possible circular locking dependency detected 6.2.0-rc8-debug+ #554 Not tainted ------------------------------------------------------ ring0/132 is trying to acquire lock: ffffff80871916d0 (&gmu->lock){+.+.}-{3:3}, at: a6xx_pm_resume+0xf0/0x234 but task is already holding lock: ffffffdb5aee57e8 (dma_fence_map){++++}-{0:0}, at: msm_job_run+0x68/0x150 which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #4 (dma_fence_map){++++}-{0:0}: __dma_fence_might_wait+0x74/0xc0 dma_resv_lockdep+0x1f4/0x2f4 do_one_initcall+0x104/0x2bc kernel_init_freeable+0x344/0x34c kernel_init+0x30/0x134 ret_from_fork+0x10/0x20 -> #3 (mmu_notifier_invalidate_range_start){+.+.}-{0:0}: fs_reclaim_acquire+0x80/0xa8 slab_pre_alloc_hook.constprop.0+0x40/0x25c __kmem_cache_alloc_node+0x60/0x1cc __kmalloc+0xd8/0x100 topology_parse_cpu_capacity+0x8c/0x178 get_cpu_for_node+0x88/0xc4 parse_cluster+0x1b0/0x28c parse_cluster+0x8c/0x28c init_cpu_topology+0x168/0x188 smp_prepare_cpus+0x24/0xf8 kernel_init_freeable+0x18c/0x34c kernel_init+0x30/0x134 ret_from_fork+0x10/0x20 -> #2 (fs_reclaim){+.+.}-{0:0}: __fs_reclaim_acquire+0x3c/0x48 fs_reclaim_acquire+0x54/0xa8 slab_pre_alloc_hook.constprop.0+0x40/0x25c __kmem_cache_alloc_node+0x60/0x1cc __kmalloc+0xd8/0x100 kzalloc.constprop.0+0x14/0x20 icc_node_create_nolock+0x4c/0xc4 icc_node_create+0x38/0x58 qcom_icc_rpmh_probe+0x1b8/0x248 platform_probe+0x70/0xc4 really_probe+0x158/0x290 __driver_probe_device+0xc8/0xe0 driver_probe_device+0x44/0x100 __driver_attach+0xf8/0x108 bus_for_each_dev+0x78/0xc4 driver_attach+0x2c/0x38 bus_add_driver+0xd0/0x1d8 driver_register+0xbc/0xf8 __platform_driver_register+0x30/0x3c qnoc_driver_init+0x24/0x30 do_one_initcall+0x104/0x2bc kernel_init_freeable+0x344/0x34c kernel_init+0x30/0x134 ret_from_fork+0x10/0x20 -> #1 (icc_lock){+.+.}-{3:3}: __mutex_lock+0xcc/0x3c8 mutex_lock_nested+0x30/0x44 icc_set_bw+0x88/0x2b4 _set_opp_bw+0x8c/0xd8 _set_opp+0x19c/0x300 dev_pm_opp_set_opp+0x84/0x94 a6xx_gmu_resume+0x18c/0x804 a6xx_pm_resume+0xf8/0x234 adreno_runtime_resume+0x2c/0x38 pm_generic_runtime_resume+0x30/0x44 __rpm_callback+0x15c/0x174 rpm_callback+0x78/0x7c rpm_resume+0x318/0x524 __pm_runtime_resume+0x78/0xbc adreno_load_gpu+0xc4/0x17c msm_open+0x50/0x120 drm_file_alloc+0x17c/0x228 drm_open_helper+0x74/0x118 drm_open+0xa0/0x144 drm_stub_open+0xd4/0xe4 chrdev_open+0x1b8/0x1e4 do_dentry_open+0x2f8/0x38c vfs_open+0x34/0x40 path_openat+0x64c/0x7b4 do_filp_open+0x54/0xc4 do_sys_openat2+0x9c/0x100 do_sys_open+0x50/0x7c __arm64_sys_openat+0x28/0x34 invoke_syscall+0x8c/0x128 el0_svc_common.constprop.0+0xa0/0x11c do_el0_svc+0xac/0xbc el0_svc+0x48/0xa0 el0t_64_sync_handler+0xac/0x13c el0t_64_sync+0x190/0x194 -> #0 (&gmu->lock){+.+.}-{3:3}: __lock_acquire+0xe00/0x1060 lock_acquire+0x1e0/0x2f8 __mutex_lock+0xcc/0x3c8 mutex_lock_nested+0x30/0x44 a6xx_pm_resume+0xf0/0x234 adreno_runtime_resume+0x2c/0x38 pm_generic_runtime_resume+0x30/0x44 __rpm_callback+0x15c/0x174 rpm_callback+0x78/0x7c rpm_resume+0x318/0x524 __pm_runtime_resume+0x78/0xbc pm_runtime_get_sync.isra.0+0x14/0x20 msm_gpu_submit+0x58/0x178 msm_job_run+0x78/0x150 drm_sched_main+0x290/0x370 kthread+0xf0/0x100 ret_from_fork+0x10/0x20 other info that might help us debug this: Chain exists of: &gmu->lock --> mmu_notifier_invalidate_range_start --> dma_fence_map Possible unsafe locking scenario: CPU0 CPU1 ---- ---- lock(dma_fence_map); lock(mmu_notifier_invalidate_range_start); lock(dma_fence_map); lock(&gmu->lock); *** DEADLOCK *** 2 locks held by ring0/132: #0: ffffff8087191170 (&gpu->lock){+.+.}-{3:3}, at: msm_job_run+0x64/0x150 #1: ffffffdb5aee57e8 (dma_fence_map){++++}-{0:0}, at: msm_job_run+0x68/0x150 stack backtrace: CPU: 7 PID: 132 Comm: ring0 Not tainted 6.2.0-rc8-debug+ #554 Hardware name: Google Lazor (rev1 - 2) with LTE (DT) Call trace: dump_backtrace.part.0+0xb4/0xf8 show_stack+0x20/0x38 dump_stack_lvl+0x9c/0xd0 dump_stack+0x18/0x34 print_circular_bug+0x1b4/0x1f0 check_noncircular+0x78/0xac __lock_acquire+0xe00/0x1060 lock_acquire+0x1e0/0x2f8 __mutex_lock+0xcc/0x3c8 mutex_lock_nested+0x30/0x44 a6xx_pm_resume+0xf0/0x234 adreno_runtime_resume+0x2c/0x38 pm_generic_runtime_resume+0x30/0x44 __rpm_callback+0x15c/0x174 rpm_callback+0x78/0x7c rpm_resume+0x318/0x524 __pm_runtime_resume+0x78/0xbc pm_runtime_get_sync.isra.0+0x14/0x20 msm_gpu_submit+0x58/0x178 msm_job_run+0x78/0x150 drm_sched_main+0x290/0x370 kthread+0xf0/0x100 ret_from_fork+0x10/0x20 Signed-off-by: Rob Clark --- drivers/interconnect/core.c | 8 ++++++-- 1 file changed, 6 insertions(+), 2 deletions(-) diff --git a/drivers/interconnect/core.c b/drivers/interconnect/core.c index 5fac448c28fd..e15a92a79df1 100644 --- a/drivers/interconnect/core.c +++ b/drivers/interconnect/core.c @@ -28,6 +28,7 @@ static LIST_HEAD(icc_providers); static int providers_count; static bool synced_state; static DEFINE_MUTEX(icc_lock); +static DEFINE_MUTEX(icc_bw_lock); static struct dentry *icc_debugfs_dir; static void icc_summary_show_one(struct seq_file *s, struct icc_node *n) @@ -631,7 +632,7 @@ int icc_set_bw(struct icc_path *path, u32 avg_bw, u32 peak_bw) if (WARN_ON(IS_ERR(path) || !path->num_nodes)) return -EINVAL; - mutex_lock(&icc_lock); + mutex_lock(&icc_bw_lock); old_avg = path->reqs[0].avg_bw; old_peak = path->reqs[0].peak_bw; @@ -663,7 +664,7 @@ int icc_set_bw(struct icc_path *path, u32 avg_bw, u32 peak_bw) apply_constraints(path); } - mutex_unlock(&icc_lock); + mutex_unlock(&icc_bw_lock); trace_icc_set_bw_end(path, ret); @@ -872,6 +873,7 @@ void icc_node_add(struct icc_node *node, struct icc_provider *provider) return; mutex_lock(&icc_lock); + mutex_lock(&icc_bw_lock); node->provider = provider; list_add_tail(&node->node_list, &provider->nodes); @@ -900,6 +902,7 @@ void icc_node_add(struct icc_node *node, struct icc_provider *provider) node->avg_bw = 0; node->peak_bw = 0; + mutex_unlock(&icc_bw_lock); mutex_unlock(&icc_lock); } EXPORT_SYMBOL_GPL(icc_node_add); @@ -1025,6 +1028,7 @@ void icc_sync_state(struct device *dev) return; mutex_lock(&icc_lock); + mutex_lock(&icc_bw_lock); synced_state = true; list_for_each_entry(p, &icc_providers, provider_list) { dev_dbg(p->dev, "interconnect provider is in synced state\n"); -- 2.41.0