Received: by 2002:a05:6902:102b:0:0:0:0 with SMTP id x11csp1670191ybt; Thu, 2 Jul 2020 10:51:08 -0700 (PDT) X-Google-Smtp-Source: ABdhPJzuTAKUfzfGmcOOh+8OZjg5JnEI7RaJwlqazq1eYld5z5sxyDNs1xIQNsrqQx/3gCwZSMc5 X-Received: by 2002:a50:ee84:: with SMTP id f4mr35019698edr.183.1593712268308; Thu, 02 Jul 2020 10:51:08 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1593712268; cv=none; d=google.com; s=arc-20160816; b=zJOL3cLkaKCyjpdZyFflN1JlA7jJPd2YhH18+DlZy5jIwhstxCcz6ISDFnz2KraaDR zaQG4tdkIIw+VHIe/YcvdEFLCHKUfjIM7BpdQ25XMnujhPbqKD5oKTV/ztr8TT+WyhWa D1cyBvl/rnQ7Z4ToHUR2pzheVe8qK/dXqFyJHx9Sjm53UDMvfyO4L8d1FNU+bK3UhyHS AfT/i9Y5A2WRzJoPNC0QFnJiOd4bDRacoo3cp0zPWK+ZyibOMtHZM4h0MPJXk6+LbDrO 6wRo7uOdS3APmZItCXQI2ClQXNdIHo/zIGbNE8Exa9A8tfM7Ncybnrk5X5i1hMGpkJX+ oq9A== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding :content-language:in-reply-to:mime-version:user-agent:date :message-id:from:references:cc:to:subject:dkim-signature; bh=xQ/jbEIoi0E5v/8Oaxv5C/iCmVriuxnyUPKYmwbLhJI=; b=xhbZjTrtiJVqCZ4GxYD0jvOFmEkbLVpcUsfSrbfTUn7CzL2H6OkCyNYWW5Me4l8EHb uF9FJA3kAS2q3mjbR6I6/b6cLa4jfUM7H2kpq9MUb7usHH4LostnGii1FenfXETBbU/b Vdb4Pfnq4nsVybuQXGdUTw1KWs+XAfOnU2hhuj6pupjlp0bwpFIhJ7/X+QAg4aesrcqo mtAYpHI9gAlZBPRKtVvMNz28bgFwOCFqjKF8gkYTKR5zuHMDkJej8nqMqBMo7JysGIaj V8cG1zGJEQHiDICXdyPR15B6OS44DHCe0DXJSJcodFssZyoTOCr1gLdCnAZv2xls0gtc ElVg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=Su8z7mc7; spf=pass (google.com: domain of linux-wireless-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-wireless-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id g3si6377424ejm.90.2020.07.02.10.50.35; Thu, 02 Jul 2020 10:51:08 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-wireless-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=Su8z7mc7; spf=pass (google.com: domain of linux-wireless-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-wireless-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727983AbgGBRt5 (ORCPT + 99 others); Thu, 2 Jul 2020 13:49:57 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:50410 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727977AbgGBRt5 (ORCPT ); Thu, 2 Jul 2020 13:49:57 -0400 Received: from mail-wm1-x342.google.com (mail-wm1-x342.google.com [IPv6:2a00:1450:4864:20::342]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id A794DC08C5E0 for ; Thu, 2 Jul 2020 10:49:56 -0700 (PDT) Received: by mail-wm1-x342.google.com with SMTP id 22so28067078wmg.1 for ; Thu, 02 Jul 2020 10:49:56 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; h=subject:to:cc:references:from:message-id:date:user-agent :mime-version:in-reply-to:content-language:content-transfer-encoding; bh=xQ/jbEIoi0E5v/8Oaxv5C/iCmVriuxnyUPKYmwbLhJI=; b=Su8z7mc7dYnPY7oZG6Qfy2JYbC+KUSrWJlUQisUIJi7j2laFEpLi08Hzg9aJEscsNx CNgySZxbqQ8zHjlIkbChmzaDwLiFGQwPsO3X5dNSPEv/D5J2XxENVkxE3rCu+0kw3Es7 gAu0S3jeVQa36FEZB8+0pnpQLob/I5kajgBvYbdEAUC2zXC03tOVuW0syjSnQ5dcFOLr ZhU6sAuB6QYQCHmMi/h06UeD2lfNsx6eXpsJmeVe2jge71IivDIZvo67ktv2TyZILyZG EmVXGKoi77uzrMMN0wg4385tboOrgBWpkgPF2IGUOOhSAlQOZN0/WykOwi01hI1ZnfMr vsSw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:subject:to:cc:references:from:message-id:date :user-agent:mime-version:in-reply-to:content-language :content-transfer-encoding; bh=xQ/jbEIoi0E5v/8Oaxv5C/iCmVriuxnyUPKYmwbLhJI=; b=JgcLyideMUOXo1Pd6xedvCTCBN2m7AKr/n+yr1wC/Vkmn5uzTV7Yps9nfVSuGf9HXz MYJgRAhrKRAx2S1bcMenJwvxQXiMDhzlr9PzJDpg7Hg/qI4/hQSWI4k7Gg6zPRoHpHns PNfEU+D4qNL+2XA+c6IVmIBZ772oxbnBRfNOiczwlIZp09JnSZcBeJauOg3vh+zU2a3W cCeAK+G6M8lj+40spCKSZnp0EcVPxH3056F3x8x7mVox8HyA3brrFq5gLrOlfNE3kWpK LHtFj4RENvuEejSZFxi5c4a6kuoc0XYDNVlQv7QjdVi36B4wzrVRgr2PZ00eOnpiaiGj vPQA== X-Gm-Message-State: AOAM530YAWcPaAFKsWuMg9OFc+nC2Xcxy06BfHWMSJk+98v+dmw6uXJs qCKi9ZfZCo7NkwbFdNrPdDU1Dw== X-Received: by 2002:a1c:f203:: with SMTP id s3mr31694303wmc.126.1593712194925; Thu, 02 Jul 2020 10:49:54 -0700 (PDT) Received: from ?IPv6:2a01:e34:ed2f:f020:c88a:7b2b:a4a1:46d0? ([2a01:e34:ed2f:f020:c88a:7b2b:a4a1:46d0]) by smtp.googlemail.com with ESMTPSA id b18sm2353454wrs.46.2020.07.02.10.49.52 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Thu, 02 Jul 2020 10:49:54 -0700 (PDT) Subject: Re: [PATCH v7 00/11] Stop monitoring disabled devices To: Andrzej Pietrasiewicz , linux-pm@vger.kernel.org, linux-acpi@vger.kernel.org, netdev@vger.kernel.org, linux-wireless@vger.kernel.org, platform-driver-x86@vger.kernel.org, linux-arm-kernel@lists.infradead.org, linux-renesas-soc@vger.kernel.org, linux-rockchip@lists.infradead.org Cc: "Rafael J . Wysocki" , Len Brown , Vishal Kulkarni , "David S . Miller" , Jiri Pirko , Ido Schimmel , Johannes Berg , Emmanuel Grumbach , Luca Coelho , Intel Linux Wireless , Kalle Valo , Peter Kaestle , Darren Hart , Andy Shevchenko , Sebastian Reichel , Miquel Raynal , Amit Kucheria , Support Opensource , Shawn Guo , Sascha Hauer , Pengutronix Kernel Team , Fabio Estevam , NXP Linux Team , =?UTF-8?Q?Niklas_S=c3=b6derlund?= , Heiko Stuebner , Orson Zhai , Baolin Wang , Chunyan Zhang , Zhang Rui , Allison Randal , Enrico Weigelt , Gayatri Kammela , Thomas Gleixner , Bartlomiej Zolnierkiewicz , kernel@collabora.com References: <20200629122925.21729-1-andrzej.p@collabora.com> <3d03d1a2-ac06-b69b-93cb-e0203be62c10@collabora.com> <47111821-d691-e71d-d740-e4325e290fa4@linaro.org> <4353a939-3f5e-8369-5bc0-ad8162b5ffc7@linaro.org> <73942aea-ae79-753c-fe90-d4a99423d548@collabora.com> <374dddd9-b600-3a30-d6c3-8cfcefc944d9@linaro.org> <5a28deb7-f307-8b03-faad-ab05cb8095d1@collabora.com> <8aeb4f51-1813-63c1-165b-06640af5968f@linaro.org> <685ef627-e377-bbf1-da11-7f7556ca2dd7@collabora.com> From: Daniel Lezcano Message-ID: Date: Thu, 2 Jul 2020 19:49:51 +0200 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101 Thunderbird/68.8.0 MIME-Version: 1.0 In-Reply-To: <685ef627-e377-bbf1-da11-7f7556ca2dd7@collabora.com> Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 8bit Sender: linux-wireless-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-wireless@vger.kernel.org On 02/07/2020 19:19, Andrzej Pietrasiewicz wrote: > Hi, > > W dniu 02.07.2020 o 19:01, Daniel Lezcano pisze: >> On 02/07/2020 15:53, Andrzej Pietrasiewicz wrote: >>> Hi Daniel, >>> >>> >>> >>>>>>>> >>>>>>>> I did reproduce: >>>>>>>> >>>>>>>> v5.8-rc3 + series => imx6 hang at boot time >>>>>>>> v5.8-rc3 => imx6 boots correctly >>>> >>>> So finally I succeeded to reproduce it on my imx7 locally. The sensor >>>> was failing to initialize for another reason related to the legacy >>>> cooling device, this is why it is not appearing on the imx7. >>>> >>>> I can now git-bisect :) >>>> >>> >>> That would be very kind of you, thank you! >> >> With the lock correctness option enabled: >> >> [    4.179223] imx_thermal tempmon: Extended Commercial CPU temperature >> grade - max:105C critical:100C passive:95C >> [    4.189557] >> [    4.191060] ============================================ >> [    4.196378] WARNING: possible recursive locking detected >> [    4.201699] 5.8.0-rc3-00011-gf5e50bf4d3ef #42 Not tainted >> [    4.207102] -------------------------------------------- >> [    4.212421] kworker/0:3/54 is trying to acquire lock: >> [    4.217480] ca09a3e4 (&tz->lock){+.+.}-{3:3}, at: >> thermal_zone_device_is_enabled+0x18/0x34 >> [    4.225777] >> [    4.225777] but task is already holding lock: >> [    4.231615] ca09a3e4 (&tz->lock){+.+.}-{3:3}, at: >> thermal_zone_get_temp+0x38/0x6c >> [    4.239121] >> [    4.239121] other info that might help us debug this: >> [    4.245655]  Possible unsafe locking scenario: >> [    4.245655] >> [    4.251579]        CPU0 >> [    4.254031]        ---- >> [    4.256481]   lock(&tz->lock); >> [    4.259544]   lock(&tz->lock); >> [    4.262608] >> [    4.262608]  *** DEADLOCK *** >> [    4.262608] >> [    4.268533]  May be due to missing lock nesting notation >> [    4.268533] >> [    4.275329] 4 locks held by kworker/0:3/54: >> [    4.279517]  #0: cb0066a8 ((wq_completion)events){+.+.}-{0:0}, at: >> process_one_work+0x224/0x808 >> [    4.288241]  #1: ca075f10 (deferred_probe_work){+.+.}-{0:0}, at: >> process_one_work+0x224/0x808 >> [    4.296787]  #2: cb1a48d8 (&dev->mutex){....}-{3:3}, at: >> __device_attach+0x30/0x140 >> [    4.304468]  #3: ca09a3e4 (&tz->lock){+.+.}-{3:3}, at: >> thermal_zone_get_temp+0x38/0x6c >> [    4.312408] >> [    4.312408] stack backtrace: >> [    4.316778] CPU: 0 PID: 54 Comm: kworker/0:3 Not tainted >> 5.8.0-rc3-00011-gf5e50bf4d3ef #42 >> [    4.325048] Hardware name: Freescale i.MX7 Dual (Device Tree) >> [    4.330809] Workqueue: events deferred_probe_work_func >> [    4.335973] [] (unwind_backtrace) from [] >> (show_stack+0x10/0x14) >> [    4.343734] [] (show_stack) from [] >> (dump_stack+0xe8/0x114) >> [    4.351062] [] (dump_stack) from [] >> (__lock_acquire+0xbfc/0x2cb4) >> [    4.358909] [] (__lock_acquire) from [] >> (lock_acquire+0xf4/0x4e4) >> [    4.366758] [] (lock_acquire) from [] >> (__mutex_lock+0xb0/0xaa8) >> [    4.374431] [] (__mutex_lock) from [] >> (mutex_lock_nested+0x1c/0x24) >> [    4.382452] [] (mutex_lock_nested) from [] >> (thermal_zone_device_is_enabled+0x18/0x34) >> [    4.392036] [] (thermal_zone_device_is_enabled) from >> [] (imx_get_temp+0x30/0x208) >> [    4.401271] [] (imx_get_temp) from [] >> (thermal_zone_get_temp+0x4c/0x6c) >> [    4.409640] [] (thermal_zone_get_temp) from [] >> (thermal_zone_device_update+0x8c/0x258) >> [    4.419310] [] (thermal_zone_device_update) from >> [] (thermal_zone_device_set_mode+0x60/0x88) >> [    4.429500] [] (thermal_zone_device_set_mode) from >> [] (imx_thermal_probe+0x3e4/0x578) >> [    4.439082] [] (imx_thermal_probe) from [] >> (platform_drv_probe+0x48/0x98) >> [    4.447622] [] (platform_drv_probe) from [] >> (really_probe+0x218/0x348) >> [    4.455903] [] (really_probe) from [] >> (driver_probe_device+0x5c/0xb4) >> [    4.464098] [] (driver_probe_device) from [] >> (bus_for_each_drv+0x58/0xb8) >> [    4.472638] [] (bus_for_each_drv) from [] >> (__device_attach+0xd4/0x140) >> [    4.480919] [] (__device_attach) from [] >> (bus_probe_device+0x88/0x90) >> [    4.489112] [] (bus_probe_device) from [] >> (deferred_probe_work_func+0x68/0x98) >> [    4.498088] [] (deferred_probe_work_func) from [] >> (process_one_work+0x2e0/0x808) >> [    4.507237] [] (process_one_work) from [] >> (worker_thread+0x2a0/0x59c) >> [    4.515432] [] (worker_thread) from [] >> (kthread+0x16c/0x178) >> [    4.522843] [] (kthread) from [] >> (ret_from_fork+0x14/0x20) >> [    4.530074] Exception stack(0xca075fb0 to 0xca075ff8) >> [    4.535138] 5fa0:                                     00000000 >> 00000000 00000000 00000000 >> [    4.543328] 5fc0: 00000000 00000000 00000000 00000000 00000000 >> 00000000 00000000 00000000 >> [    4.551516] 5fe0: 00000000 00000000 00000000 00000000 00000013 >> 00000000 >> >> >> > > Thanks! > > That confirms your suspicions. > > So the reason is that ->get_temp() is called while the mutex is held and > thermal_zone_device_is_enabled() wants to take the same mutex. Yes, that's correct. > Is adding a comment to thermal_zone_device_is_enabled() to never call > it while the mutex is held and adding another version of it which does > not take the mutex ok? The thermal_zone_device_is_enabled() is only used in two places, acpi and this imx driver, and given: 1. as soon as the mutex is released, there is no guarantee the thermal zone won't be changed right after, the lock is pointless, thus the information also. 2. from a design point of view, I don't see why a driver should know if a thermal zone is disabled or not It would make sense to end with this function and do not give the different drivers an opportunity to access this information. Why not add change_mode for the acpi in order to enable or disable the events and for imx_thermal use irq_enabled flag instead of the thermal zone mode? Moreover it is very unclear why this function is needed in imx_get_temp(), and I suspect we should be able to get rid of it. -- Linaro.org │ Open source software for ARM SoCs Follow Linaro: Facebook | Twitter | Blog