Received: by 2002:a05:6358:d09b:b0:dc:cd0c:909e with SMTP id jc27csp641734rwb; Tue, 29 Nov 2022 03:33:39 -0800 (PST) X-Google-Smtp-Source: AA0mqf5vwV6Mr6N5BPQsJ05ZGPCMjR8ixB5xo2wQDAU78z+4hbsm+qGl4OMkyFzYmhpXv7OWuB9Z X-Received: by 2002:a17:902:b611:b0:189:1c94:99c3 with SMTP id b17-20020a170902b61100b001891c9499c3mr36722819pls.140.1669721618814; Tue, 29 Nov 2022 03:33:38 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1669721618; cv=none; d=google.com; s=arc-20160816; b=x97ZaBL5QHa5YSFX4luO2axN4kCmrW0Dhhgapxdhj7LyeLJLBue7kqO5EVBMr+fuRx UeCRixw49hr9np4qAwE6BAmTFVOG1tqRB9Ge4AgrNzzEGHPwB2+GGXWfnW4DujUAirEv pCvUfsB3lFygplmXn6sdBvenX/B7K1tD3tZGGyyM/hlkrQ1whSFmHj95dbjJGJ2XL+n0 8R/e88mnXyysIfsuGq+EmxFCYXZTPxpbNf/X07zF3AIUR5rAih1EmFTodcUHgRAQd7UD q3tVs7WcjzcGkfeAUHBntTlWiT7uuHAUqb2w80E4OsUpPu3BmzeeD2qy6IbC2hgAhd0R HcSg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:in-reply-to:from :references:cc:to:content-language:subject:user-agent:mime-version :date:message-id:dkim-signature; bh=CMHRDKTEunINPo71b13+tNusCsC0jxBYJhiaETYoUJo=; b=OJPD4lqIMN9IYSbehQUh/hfU5u4WfZvCSnGXan+Xu6qdlcrdnsbTiahFY1YAE/s3/a 5h/HE/hbwWAEMRvv1KJQl/7ssEeKNAB/tOjJXQpwhb9FgqUNZPunjTKbDEdp0DmxHO7S BkM1jdXGRt1EN6Da1F3FW4XdHDzyRoFBnCi1soRTK5PGB5EWLsWtVwzwFrDacDrwEgQc X4Yz0BdFZF41RCPT+DRhDLgiyA1G+LE3VjyRSmCCU1CuB+uo1f2vCbQm1q3fd/+f0F41 fxsAZL6WUwmFiiEgsHaHPgLleTBBG0KH3jSkIycsEf/kGP78oajf/VuPDOFYhMgqj+Ve ccZQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@cirrus.com header.s=PODMain02222019 header.b=DuY8md47; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=cirrus.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id b7-20020a63d807000000b00476deba079fsi14345566pgh.507.2022.11.29.03.33.28; Tue, 29 Nov 2022 03:33:38 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@cirrus.com header.s=PODMain02222019 header.b=DuY8md47; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=cirrus.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232049AbiK2Kr2 (ORCPT + 83 others); Tue, 29 Nov 2022 05:47:28 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:34724 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229712AbiK2Kr0 (ORCPT ); Tue, 29 Nov 2022 05:47:26 -0500 Received: from mx0b-001ae601.pphosted.com (mx0b-001ae601.pphosted.com [67.231.152.168]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 2D3D628E3C for ; Tue, 29 Nov 2022 02:47:24 -0800 (PST) Received: from pps.filterd (m0077474.ppops.net [127.0.0.1]) by mx0b-001ae601.pphosted.com (8.17.1.19/8.17.1.19) with ESMTP id 2ATAMhQk001998; Tue, 29 Nov 2022 04:47:07 -0600 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=cirrus.com; h=message-id : date : mime-version : subject : to : cc : references : from : in-reply-to : content-type : content-transfer-encoding; s=PODMain02222019; bh=CMHRDKTEunINPo71b13+tNusCsC0jxBYJhiaETYoUJo=; b=DuY8md47tso5kvSNNyb91PSw1WdCQ+bZOCOmrXxy3EzEQoqS9uwVnUemIsMo8THdZgWq hLX5B7r/u6sO9lSwZM7wtvA7IZJMai8Bag8kR0tFd65jfYo8UL0Z60nxkgSdOHwP6Va1 i0X1jznhz/gWaYIaEtYvKLtR4ylii6YrAEN3c7SgzWj0bNs6odNQDPz8nemxEwK8r7WY U4nK8h1+bJV5ZF6082ZRPBEbTW4QZIlMUJ9BjE5XZftWkeh3S0SSYx2RQxKMcTREnPYX koohuD64X1GZ5Kofyh4waCGC2N3Ea6Hix/95WpnzR6Q3d9vKJJH/yN6ZW03gIkGRU4ws ng== Received: from ediex01.ad.cirrus.com ([84.19.233.68]) by mx0b-001ae601.pphosted.com (PPS) with ESMTPS id 3m5g9200yt-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Tue, 29 Nov 2022 04:47:06 -0600 Received: from ediex02.ad.cirrus.com (198.61.84.81) by ediex01.ad.cirrus.com (198.61.84.80) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.1118.20; Tue, 29 Nov 2022 04:47:05 -0600 Received: from ediswmail.ad.cirrus.com (198.61.86.93) by anon-ediex02.ad.cirrus.com (198.61.84.81) with Microsoft SMTP Server id 15.2.1118.20 via Frontend Transport; Tue, 29 Nov 2022 04:47:05 -0600 Received: from [198.90.251.111] (edi-sw-dsktp-006.ad.cirrus.com [198.90.251.111]) by ediswmail.ad.cirrus.com (Postfix) with ESMTP id E2483B10; Tue, 29 Nov 2022 10:47:04 +0000 (UTC) Message-ID: Date: Tue, 29 Nov 2022 10:47:04 +0000 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101 Thunderbird/91.12.0 Subject: Re: [PATCH] soundwire: bus_type: Avoid lockdep assert in sdw_drv_probe() Content-Language: en-US To: Pierre-Louis Bossart , , , CC: , , References: <20221121162453.1834170-1-rf@opensource.cirrus.com> <2d207a51-d415-726b-3bc1-8788df2f06fd@linux.intel.com> From: Richard Fitzgerald In-Reply-To: <2d207a51-d415-726b-3bc1-8788df2f06fd@linux.intel.com> Content-Type: text/plain; charset="UTF-8"; format=flowed Content-Transfer-Encoding: 8bit X-Proofpoint-GUID: 4sidqpR4KEjP7fvhBtiiqglQJoYDPJVH X-Proofpoint-ORIG-GUID: 4sidqpR4KEjP7fvhBtiiqglQJoYDPJVH X-Proofpoint-Spam-Reason: safe X-Spam-Status: No, score=-3.0 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_EF,NICE_REPLY_A,RCVD_IN_DNSWL_LOW,SPF_HELO_NONE, SPF_PASS autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 28/11/2022 17:18, Pierre-Louis Bossart wrote: > > > On 11/21/22 10:24, Richard Fitzgerald wrote: >> Don't hold sdw_dev_lock while calling the peripheral driver >> probe() and remove() callbacks. >> >> Holding sdw_dev_lock around the probe() and remove() calls >> causes a theoretical mutex inversion which lockdep will >> assert on. The peripheral driver probe will probably register >> a soundcard, which will take ALSA and ASoC locks. During > > It's extremely unlikely that a peripheral driver would register a sound > card, this is what machine drivers do. > > Which leads me to the question: is this a real problem? > Yes, try turning on lockdep checking and you will get an assert. During probe the existing code takes sdw_dev_lock and then calls the codec driver probe, so you will get a mutex sequence like: sdw_dev_lock -> controls_rw_sem -> pcm_mutex but in normal operation the ALSA/ASoC code will take its mutexes first and call runtime_resume which then takes the sdw_dev_lock, so you get pcm_mutex -> sdw_dev_lock and lockdep will assert on that opposite ordering. The full assert is at the end of this email. > Or did you mean 'register components', and if yes what would the problem > with lockdep be? > Argh. Yes, I meant snd_soc_register_component(). I'll re-send this with the comment fixed. >> normal operation a runtime resume suspend can be triggered >> while these locks are held and will then take sdw_dev_lock. >> >> It's not necessary to hold sdw_dev_lock when calling the >> probe() and remove(), it is only used to prevent the bus core >> calling the driver callbacks if there isn't a driver or the >> driver is removing. > > >> If sdw_dev_lock is held while setting and clearing the >> 'probed' flag this is sufficient to guarantee the safety of >> callback functions. > > not really, the 'probed' flag was kept for convenience. what this lock > really protects is the dereferencing of ops after the driver .remove > happens. > Yes, I thought about removing it but that's a larger code change that's not necessary for the fix. The point is that we don't need to hold the mutex around the remove call, only around clearing 'probed' (or, if probed was removed, NULLing the callback pointers) >> The potential race of a bus event happening while probe() is >> executing is the same as the existing race of the bus event >> handler taking the mutex first and processing the event >> before probe() can run. In both cases the event has already >> happened before the driver is probed and ready to accept >> callbacks. > > Sorry, I wasn't able to parse the first sentence in this paragraph. what > 'existing race' are you referring to? > I will re-write that. The point was that without the lock the probe() can run in parallel with a status update. But that is already a race with the existing code because the status update could take the sdw_dev_lock first. So I think that this change isn't introducing any new races. The lockdep assert looks like this: [   46.098514] ====================================================== [   46.104736] WARNING: possible circular locking dependency detected [   46.110961] 6.1.0-rc4-jamerson #1 Tainted: G            E [   46.116842] ------------------------------------------------------ [   46.123063] mpg123/1130 is trying to acquire lock: [   46.127883] ffff8b445031fb80 (&slave->sdw_dev_lock){+.+.}-{3:3}, at: sdw_update_slave_status+0x26/0x70 [   46.137225]                but task is already holding lock: [   46.143074] ffffffffc1455310 (&card->pcm_mutex){+.+.}-{3:3}, at: dpcm_fe_dai_open+0x49/0x830 [   46.151536]                which lock already depends on the new lock.[   46.159732]                the existing dependency chain (in reverse order) is: [   46.167231]                -> #4 (&card->pcm_mutex){+.+.}-{3:3}: [   46.173428]        __mutex_lock+0x94/0x920 [   46.177542]        snd_soc_dpcm_runtime_update+0x2e/0x100 [   46.182958]        snd_soc_dapm_put_enum_double+0x1c2/0x200 [   46.188548]        snd_ctl_elem_write+0x10c/0x1d0 [   46.193268]        snd_ctl_ioctl+0x126/0x850 [   46.197556]        __x64_sys_ioctl+0x87/0xc0 [   46.201845]        do_syscall_64+0x38/0x90 [   46.205959]        entry_SYSCALL_64_after_hwframe+0x63/0xcd [   46.211553]                -> #3 (&card->controls_rwsem){++++}-{3:3}: [   46.218188]        down_write+0x2b/0xd0 [   46.222038]        snd_ctl_add_replace+0x39/0xb0 [   46.226672]        snd_soc_add_controls+0x53/0x80 [   46.231393]        soc_probe_component+0x1e4/0x2a0 [   46.236202]        snd_soc_bind_card+0x51a/0xc80 [   46.240836]        devm_snd_soc_register_card+0x43/0x90 [   46.246079]        mc_probe+0x982/0xfe0 [snd_soc_sof_sdw] [   46.251500]        platform_probe+0x3c/0xa0 [   46.255700]        really_probe+0xde/0x390 [   46.259814]        __driver_probe_device+0x78/0x180 [   46.264710]        driver_probe_device+0x1e/0x90 [   46.269347]        __driver_attach+0x9f/0x1f0 [   46.273721]        bus_for_each_dev+0x78/0xc0 [   46.278098]        bus_add_driver+0x1ac/0x200 [   46.282473]        driver_register+0x8f/0xf0 [   46.286759]        do_one_initcall+0x58/0x310 [   46.291136]        do_init_module+0x4c/0x1f0 [   46.295422]        __do_sys_finit_module+0xb4/0x130 [   46.300321]        do_syscall_64+0x38/0x90 [   46.304434]        entry_SYSCALL_64_after_hwframe+0x63/0xcd [   46.310027]                -> #2 (&card->mutex){+.+.}-{3:3}: [   46.315883]        __mutex_lock+0x94/0x920 [   46.320000]        snd_soc_bind_card+0x3e/0xc80 [   46.324551]        devm_snd_soc_register_card+0x43/0x90 [   46.329798]        mc_probe+0x982/0xfe0 [snd_soc_sof_sdw] [   46.335219]        platform_probe+0x3c/0xa0 [   46.339420]        really_probe+0xde/0x390 [   46.343532]        __driver_probe_device+0x78/0x180 [   46.348430]        driver_probe_device+0x1e/0x90 [   46.353065]        __driver_attach+0x9f/0x1f0 [   46.357437]        bus_for_each_dev+0x78/0xc0 [   46.361812]        bus_add_driver+0x1ac/0x200 [   46.366716]        driver_register+0x8f/0xf0 [   46.371528]        do_one_initcall+0x58/0x310 [   46.376424]        do_init_module+0x4c/0x1f0 [   46.381239]        __do_sys_finit_module+0xb4/0x130 [   46.386665]        do_syscall_64+0x38/0x90 [   46.391299]        entry_SYSCALL_64_after_hwframe+0x63/0xcd [   46.397416]                -> #1 (client_mutex){+.+.}-{3:3}: [   46.404307]        __mutex_lock+0x94/0x920 [   46.408941]        snd_soc_add_component+0x24/0x2c0 [   46.414345]        devm_snd_soc_register_component+0x54/0xa0 [   46.420522]        cs35l56_common_probe+0x280/0x370 [snd_soc_cs35l56] [   46.427487]        cs35l56_sdw_probe+0xf4/0x170 [snd_soc_cs35l56_sdw] [   46.434442]        sdw_drv_probe+0x80/0x1a0 [   46.439136]        really_probe+0xde/0x390 [   46.443738]        __driver_probe_device+0x78/0x180 [   46.449120]        driver_probe_device+0x1e/0x90 [   46.454247]        __driver_attach+0x9f/0x1f0 [   46.459106]        bus_for_each_dev+0x78/0xc0 [   46.463971]        bus_add_driver+0x1ac/0x200 [   46.468825]        driver_register+0x8f/0xf0 [   46.473592]        do_one_initcall+0x58/0x310 [   46.478441]        do_init_module+0x4c/0x1f0 [   46.483202]        __do_sys_finit_module+0xb4/0x130 [   46.488572]        do_syscall_64+0x38/0x90 [   46.493158]        entry_SYSCALL_64_after_hwframe+0x63/0xcd [   46.499229]                -> #0 (&slave->sdw_dev_lock){+.+.}-{3:3}: [   46.506737]        __lock_acquire+0x1121/0x1df0 [   46.511765]        lock_acquire+0xd5/0x300 [   46.516360]        __mutex_lock+0x94/0x920 [   46.520949]        sdw_update_slave_status+0x26/0x70 [   46.526409]        sdw_clear_slave_status+0xd8/0xe0 [   46.531783]        intel_resume_runtime+0x139/0x2a0 [   46.537155]        __rpm_callback+0x41/0x120 [   46.541919]        rpm_callback+0x5d/0x70 [   46.546422]        rpm_resume+0x531/0x7e0 [   46.550920]        __pm_runtime_resume+0x4a/0x80 [   46.556024]        snd_soc_pcm_component_pm_runtime_get+0x2f/0xc0 [   46.562611]        __soc_pcm_open+0x62/0x520 [   46.567375]        dpcm_be_dai_startup+0x116/0x210 [   46.572661]        dpcm_fe_dai_open+0xf7/0x830 [   46.577597]        snd_pcm_open_substream+0x54a/0x8b0 [   46.583145]        snd_pcm_open.part.0+0xdc/0x200 [   46.588341]        snd_pcm_playback_open+0x51/0x80 [   46.593625]        chrdev_open+0xc0/0x250 [   46.598129]        do_dentry_open+0x15f/0x430 [   46.602981]        path_openat+0x75e/0xa80 [   46.607575]        do_filp_open+0xb2/0x160 [   46.612162]        do_sys_openat2+0x9a/0x160 [   46.616922]        __x64_sys_openat+0x53/0xa0 [   46.621767]        do_syscall_64+0x38/0x90 [   46.626352]        entry_SYSCALL_64_after_hwframe+0x63/0xcd [   46.632414]                other info that might help us debug this:[   46.641862] Chain exists of:                  &slave->sdw_dev_lock --> &card->controls_rwsem --> &card->pcm_mutex[   46.655145]  Possible unsafe locking scenario:[   46.662048]        CPU0                    CPU1 [   46.667080]        ----                    ---- [   46.672108]   lock(&card->pcm_mutex); [   46.676267]                                lock(&card->controls_rwsem); [   46.683382]                                lock(&card->pcm_mutex); [   46.690063]   lock(&slave->sdw_dev_lock); [   46.694574]                 *** DEADLOCK ***[   46.701942] 2 locks held by mpg123/1130: [   46.706356]  #0: ffff8b4457b22b90 (&pcm->open_mutex){+.+.}-{3:3}, at: snd_pcm_open.part.0+0xc9/0x200 [   46.715999]  #1: ffffffffc1455310 (&card->pcm_mutex){+.+.}-{3:3}, at: dpcm_fe_dai_open+0x49/0x830 [   46.725390]                stack backtrace: [   46.730752] CPU: 0 PID: 1130 Comm: mpg123 Tainted: G            E      6.1.0-rc4-jamerson #1 [   46.739703] Hardware name: AAEON UP-WHL01/UP-WHL01, BIOS UPW1AM19 11/10/2020 [   46.747270] Call Trace: [   46.750239]   [   46.752857]  dump_stack_lvl+0x56/0x73 [   46.757045]  check_noncircular+0x102/0x120 [   46.761664]  __lock_acquire+0x1121/0x1df0 [   46.766197]  lock_acquire+0xd5/0x300 [   46.770292]  ? sdw_update_slave_status+0x26/0x70 [   46.775432]  ? lock_is_held_type+0xe2/0x140 [   46.780143]  __mutex_lock+0x94/0x920 [   46.784241]  ? sdw_update_slave_status+0x26/0x70 [   46.789387]  ? find_held_lock+0x2b/0x80 [   46.793750]  ? sdw_update_slave_status+0x26/0x70 [   46.798894]  ? lock_release+0x147/0x2f0 [   46.803262]  ? lockdep_init_map_type+0x47/0x250 [   46.808315]  ? sdw_update_slave_status+0x26/0x70 [   46.813456]  sdw_update_slave_status+0x26/0x70 [   46.818422]  sdw_clear_slave_status+0xd8/0xe0 [   46.823302]  ? pm_generic_runtime_suspend+0x30/0x30 [   46.828706]  intel_resume_runtime+0x139/0x2a0 [   46.833583]  ? _raw_spin_unlock_irq+0x24/0x50 [   46.838462]  ? pm_generic_runtime_suspend+0x30/0x30 [   46.843866]  __rpm_callback+0x41/0x120 [   46.848142]  ? pm_generic_runtime_suspend+0x30/0x30 [   46.853550]  rpm_callback+0x5d/0x70 [   46.857568]  rpm_resume+0x531/0x7e0 [   46.861578]  ? _raw_spin_lock_irqsave+0x62/0x70 [   46.866634]  __pm_runtime_resume+0x4a/0x80 [   46.871258]  snd_soc_pcm_component_pm_runtime_get+0x2f/0xc0 [   46.877358]  __soc_pcm_open+0x62/0x520 [   46.881634]  ? dpcm_add_paths.isra.0+0x35d/0x4c0 [   46.886784]  dpcm_be_dai_startup+0x116/0x210 [   46.891592]  dpcm_fe_dai_open+0xf7/0x830 [   46.896046]  ? debug_mutex_init+0x33/0x50 [   46.900591]  snd_pcm_open_substream+0x54a/0x8b0 [   46.905658]  snd_pcm_open.part.0+0xdc/0x200 [   46.910376]  ? wake_up_q+0x90/0x90 [   46.914312]  snd_pcm_playback_open+0x51/0x80 [   46.919118]  chrdev_open+0xc0/0x250 [   46.923147]  ? cdev_device_add+0x90/0x90 [   46.927608]  do_dentry_open+0x15f/0x430 [   46.931976]  path_openat+0x75e/0xa80 [   46.936086]  do_filp_open+0xb2/0x160 [   46.940194]  ? lock_release+0x147/0x2f0 [   46.944563]  ? _raw_spin_unlock+0x29/0x50 [   46.949101]  do_sys_openat2+0x9a/0x160 [   46.953377]  __x64_sys_openat+0x53/0xa0 [   46.957733]  do_syscall_64+0x38/0x90 [   46.961829]  entry_SYSCALL_64_after_hwframe+0x63/0xcd [   46.967402] RIP: 0033:0x7fa6397ccd3b [   46.971506] Code: 25 00 00 41 00 3d 00 00 41 00 74 4b 64 8b 04 25 18 00 00 00 85 c0 75 67 44 89 e2 48 89 ee bf 9c ff ff ff b8 01 01 00 00 0f 05 <48> 3d 00 f0 ff ff 0f 87 91 00 00 00 48 8b 4c 24 28 64 48 33 0c 25 [   46.991413] RSP: 002b:00007fff838e8990 EFLAGS: 00000246 ORIG_RAX: 0000000000000101 [   46.999580] RAX: ffffffffffffffda RBX: 0000000000080802 RCX: 00007fa6397ccd3b [   47.007311] RDX: 0000000000080802 RSI: 00007fff838e8b50 RDI: 00000000ffffff9c [   47.015047] RBP: 00007fff838e8b50 R08: 0000000000000000 R09: 0000000000000011 [   47.022787] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000080802 [   47.030539] R13: 0000000000000004 R14: 0000000000000000 R15: 00007fff838e8b50 [   47.038289]