Received: by 2002:a25:1985:0:0:0:0:0 with SMTP id 127csp4158075ybz; Tue, 28 Apr 2020 06:45:08 -0700 (PDT) X-Google-Smtp-Source: APiQypIPJETw8+t0XPwAVW2xGa9aDW6yb9g/L8OtATqkrqm54k6thWdJoDzIc0N7V+GcMicnumPP X-Received: by 2002:a17:906:3004:: with SMTP id 4mr25062637ejz.381.1588081507838; Tue, 28 Apr 2020 06:45:07 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1588081507; cv=none; d=google.com; s=arc-20160816; b=WGnCfKi63GnhbsZ1B6GZAYgrEdD2Lp2EttyJuaf/FeU+Xoti1AQPlwwRjz6EOOOlY8 9rDB/NAt4+ae3JTK9wDHg9d09rk+Gwm5qKaBButnIxEYhYlSemShRXAcb45JMlfZ7ji3 s01FEh0/C5MkUpjTLMewFrkKYWRIUIgPIkjVlaq+ylGaTRGJkS4dNapg4RfjhI5cKr1x IQmNA4xstUBbt01cSYNO9fAwmkcadGRRYLEHmb0rLCc+bPO0VEoW3tS62m/fqwZVBZSv Bhl99CYiQcJjg4qq63UBqKphiczifBMHBnkFc1eo/QvsV8PZ398ZWgE8dpYenUazAxg+ 7BZg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding :content-language:in-reply-to:mime-version:user-agent:date :message-id:from:references:cc:to:subject:dkim-signature; bh=HbjQTjr+XRHj+ktYZXBrRQzqPApgTg8d5bCxhxNKr5w=; b=jl/6UceTcBLE1PJOMqmPhEvq/yIBN4OTL06+KE07nbl9tJ/o7iOiNyzS2JpfK+nvEN daXxuSqstPQX9Hu5K0LdG7kZRSPtzZjMb0/Xqf6tqv9Sczr0mTjtRbQ90XX/jFy0L4wz 1x4ppNk9xdFoBfYDQP9RtsoAruHp++c0tw5TEigsATzHR7xJdNREKLd6/V86pYVF76l2 eMe/bdjOGEpeKNr5XrEvz46A/mpGtv4YhdS5xO26sM6jdziakiTXz7F1NxvwTsqIWU9Y 9/oP2gjhj4cTwEBJ0LscY5xpqf3TFBLVyngyJk1v2MLLYqX2YEoOtMXpvxdMds5kCH8x odfQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=TaSC623P; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id f3si1613350edw.598.2020.04.28.06.44.44; Tue, 28 Apr 2020 06:45:07 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=TaSC623P; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727828AbgD1NnS (ORCPT + 99 others); Tue, 28 Apr 2020 09:43:18 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:55078 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727057AbgD1NnR (ORCPT ); Tue, 28 Apr 2020 09:43:17 -0400 Received: from mail-lf1-x142.google.com (mail-lf1-x142.google.com [IPv6:2a00:1450:4864:20::142]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 9F5EEC03C1A9; Tue, 28 Apr 2020 06:43:16 -0700 (PDT) Received: by mail-lf1-x142.google.com with SMTP id u10so16917784lfo.8; Tue, 28 Apr 2020 06:43:16 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=subject:to:cc:references:from:message-id:date:user-agent :mime-version:in-reply-to:content-language:content-transfer-encoding; bh=HbjQTjr+XRHj+ktYZXBrRQzqPApgTg8d5bCxhxNKr5w=; b=TaSC623PEBBl8tTV4vGF1t5aBVqKcJtupPhQBYdV0CBB7K0xS4Ym9Qtn3BhmsAOdda iXCeioBHNMXj+6PQuk80d2agWwsEFWFBrd5A1I1gHvQjstT5ODGmD7ikSsdXOTlhfIFI ABKazAgfZhAUsYUZjqsgtBul7F+oOjCFjqJzSdk6mBlYFaQz30/jj1PXOGOtBUfkgVgg oQrhUVu1ZdsW7qmHZX6jbKUF/zoH609JxbjkMdQ13Pb02/oiZZLgk9isKJw6WrLBi49x TK9n8oJSzG3TT8UXD58XuRVz3aY12uoNUZnK0l1NQ9nnd2TFbVZksdQjJXSGxpjXsNQl lUUA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:subject:to:cc:references:from:message-id:date :user-agent:mime-version:in-reply-to:content-language :content-transfer-encoding; bh=HbjQTjr+XRHj+ktYZXBrRQzqPApgTg8d5bCxhxNKr5w=; b=UZlUgJAJivmZjLyogPjrBW3CamcneNUSiwwzfiFCGStAbVlvB+w8EV4S5XIrWQTOp0 +efJsTPVSdcGE/yfTcYN9GRMAU6Ldr9pxmaPiY7uInJEkfMgEU9pSGxRuoPUTYZgml0R 9UGJ6PWmF8pP1qxImk/FqQoZQepq88OcImZH+J9fOFZwAh7yzKlme+cIrQi6LlPUdqqE 4DaStcdm93M/zk0iRzPhZNoUmlnwrZdvqohivvMzG9F+51SBCHH1AJx8u7bmWawG8d3D n+CPoX6N98OCtnJRj2wksu0Z8OGw4UXd+nEqbTZwiaZTkHKFBIIcSXg4XoDFGG//JZlg LibQ== X-Gm-Message-State: AGi0PuZq+fYYwyE5K+le4v+aL68pzUTYe6DvDYt0XHmmW1aXK9JKFhlh MDJfKO2ZZ2hV+MUNFaX7tZ3fvL78 X-Received: by 2002:a19:4a05:: with SMTP id x5mr2294327lfa.17.1588081394531; Tue, 28 Apr 2020 06:43:14 -0700 (PDT) Received: from [192.168.2.145] (ppp91-78-208-152.pppoe.mtu-net.ru. [91.78.208.152]) by smtp.googlemail.com with ESMTPSA id b28sm12732526ljo.1.2020.04.28.06.43.13 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Tue, 28 Apr 2020 06:43:13 -0700 (PDT) Subject: Re: [PATCH v2 1/2] i2c: tegra: Better handle case where CPU0 is busy for a long time To: Jon Hunter , Thierry Reding , Laxman Dewangan , Wolfram Sang Cc: linux-i2c@vger.kernel.org, linux-tegra@vger.kernel.org, linux-kernel@vger.kernel.org References: <20200324191217.1829-1-digetx@gmail.com> <20200324191217.1829-2-digetx@gmail.com> <1e259e22-c300-663a-e537-18d854e0f478@nvidia.com> From: Dmitry Osipenko Message-ID: <9c6a358e-3334-faa9-d198-55a8aed88922@gmail.com> Date: Tue, 28 Apr 2020 16:43:12 +0300 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101 Thunderbird/68.7.0 MIME-Version: 1.0 In-Reply-To: <1e259e22-c300-663a-e537-18d854e0f478@nvidia.com> Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 8bit Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org 20.04.2020 22:53, Jon Hunter пишет: > Hi Dmitry, > > On 24/03/2020 19:12, Dmitry Osipenko wrote: >> Boot CPU0 always handle I2C interrupt and under some rare circumstances >> (like running KASAN + NFS root) it may stuck in uninterruptible state for >> a significant time. In this case we will get timeout if I2C transfer is >> running on a sibling CPU, despite of IRQ being raised. In order to handle >> this rare condition, the IRQ status needs to be checked after completion >> timeout. >> >> Signed-off-by: Dmitry Osipenko >> --- >> drivers/i2c/busses/i2c-tegra.c | 27 +++++++++++++++------------ >> 1 file changed, 15 insertions(+), 12 deletions(-) > > > I have noticed a regression on tegra30-cardhu-a04 when testing system > suspend. Git bisect is pointing to this commit and reverting it fixes > the problem. In the below console log I2C fails to resume ... >> [ 40.888512] usb1_vbus: supplied by 5v0 > [ 40.892408] vddio_sdmmc,avdd_vdac: supplied by 5v0 > [ 40.897401] cam_1v8: disabling > [ 40.900548] modem_3v3: disabling > [ 40.903875] vdd_cam1_ldo: disabling > [ 40.907501] vdd_cam2_ldo: disabling > [ 40.911092] vdd_cam3_ldo: disabling > [ 40.914714] vdd_fuse_3v3: disabling > [ 40.918305] vddio_vid: disabling > [ 40.921623] usb1_vbus: disabling > [ 59.445032] PM: suspend entry (deep) > [ 59.448852] Filesystems sync: 0.000 seconds > [ 59.456161] Freezing user space processes ... (elapsed 0.001 seconds) done. > [ 59.457645] OOM killer disabled. > [ 59.457649] Freezing remaining freezable tasks ... (elapsed 0.001 seconds) done. 1. Previously, without my patch, I2C was failing here: [ 59.543528] tegra-i2c 7000d000.i2c: i2c transfer timed out [ 59.549036] vdd_sata,avdd_plle: failed to disable [ 59.553778] Failed to disable avdd-plle: -110 [ 59.558150] tegra-pcie 3000.pcie: failed to disable regulators: -110 The I2C was failing because interrupts are force-disabled in the NOIRQ suspend phase. This means that the regulator_bulk_disable() of the PCIe driver failed on suspend and regulators were kept to the "enabled" state. https://elixir.bootlin.com/linux/v5.7-rc3/source/drivers/regulator/core.c#L4526 Although, the vdd_sata,avdd_plle should be disabled actually, since I2C returned a spurious error. 2. Now, with my patch, the I2C also times out, but I2C interrupt is manually handled by the I2C driver after the timeout. Hence regulators are getting disabled successfully on the PCIe suspend. > [ 59.764926] Disabling non-boot CPUs ... > [ 59.769540] IRQ 18: no longer affine to CPU1 > [ 59.789070] IRQ 19: no longer affine to CPU2 > [ 59.808049] IRQ 20: no longer affine to CPU3 > [ 59.827113] Entering suspend state LP1 > [ 59.827163] Enabling non-boot CPUs ... > [ 59.834797] CPU1 is up > [ 59.840943] CPU2 is up > [ 59.847378] CPU3 is up > [ 59.850577] tegra-i2c 7000d000.i2c: runtime resume failed -13 > [ 59.856432] tegra-i2c 7000d000.i2c: runtime resume failed -13 > [ 59.862231] tegra-i2c 7000d000.i2c: runtime resume failed -13 > [ 59.868070] vdd_pexa,vdd_pexb: is_enabled() failed: -13 > [ 59.873334] tegra-i2c 7000d000.i2c: runtime resume failed -13 > [ 59.879143] vdd_pexa,vdd_pexb: is_enabled() failed: -13 > [ 59.884420] Failed to enable avdd-pex-pll: -13 > [ 59.888877] Failed to enable avdd-plle: -13 > [ 59.893061] Failed to enable avdd-pexb: -13 > [ 59.897279] Failed to enable vdd-pexb: -13 3. This error didn't happen before because regulators were in the enabled state on resume from suspend. Hence on each resume from suspend the PCIe regulator's use-count should be bumped by one. 4. Now, the regulators are in the disabled state on resume from suspend. Hence regulator_bulk_enable() of PCIe tries to enable them on resume, but fails because I2C RPM can't be resumed by that time. I'm not sure why RPM enable/disable behavior is inconsistent for suspend/resume. Apparently the I2C RPM is getting disabled late on suspend and also getting enabled late on resume. This needs a clarification. > [ 59.901383] tegra-pcie 3000.pcie: failed to enable regulators: -13 > [ 60.434185] clk_plle_training: timeout waiting for PLLE > [ 60.439565] tegra-pcie 3000.pcie: failed to enable CML clock: -16 5. The PCIe driver ignores the regulator_bulk_enable/disable() errors, hence hardware can't power up properly. https://elixir.bootlin.com/linux/v5.7-rc3/source/drivers/pci/controller/pci-tegra.c#L1231 > [ 60.445700] ------------[ cut here ]------------ > [ 60.450346] WARNING: CPU: 0 PID: 653 at /home/jonathanh/workdir/tegra/mlt-linux_next/kernel/drivers/regulator/core.c:2603 _regulator_disable+0xb8/0x1b4 > [ 60.463959] unbalanced disables for vdd_pexa,vdd_pexb 6. The CML clock failed to enable and PCIe tries to disable regulators. https://elixir.bootlin.com/linux/v5.7-rc3/source/drivers/pci/controller/pci-tegra.c#L1258 But regulators failed to be enabled before and that error was ignored! Hence now there is the unbalanced disable error. > [ 60.469038] Modules linked in: > [ 60.472107] CPU: 0 PID: 653 Comm: rtcwake Tainted: G W 5.7.0-rc2-next-20200420 #2 > [ 60.480892] Hardware name: NVIDIA Tegra SoC (Flattened Device Tree) > [ 60.487190] [] (unwind_backtrace) from [] (show_stack+0x10/0x14) > [ 60.494951] [] (show_stack) from [] (dump_stack+0xc0/0xd4) > [ 60.502189] [] (dump_stack) from [] (__warn+0xe0/0xf8) > [ 60.509073] [] (__warn) from [] (warn_slowpath_fmt+0x74/0xb8) > [ 60.516568] [] (warn_slowpath_fmt) from [] (_regulator_disable+0xb8/0x1b4) > [ 60.525191] [] (_regulator_disable) from [] (regulator_disable+0x34/0xd0) > [ 60.533729] [] (regulator_disable) from [] (regulator_bulk_disable+0x28/0xb4) > [ 60.542619] [] (regulator_bulk_disable) from [] (tegra_pcie_pm_resume+0xbb0/0x107c) > [ 60.552032] [] (tegra_pcie_pm_resume) from [] (dpm_run_callback+0x38/0x1d4) > [ 60.560741] [] (dpm_run_callback) from [] (device_resume_noirq+0x110/0x248) > [ 60.569451] [] (device_resume_noirq) from [] (dpm_resume_noirq+0x10c/0x36c) > [ 60.578162] [] (dpm_resume_noirq) from [] (suspend_devices_and_enter+0x27c/0x9dc) > [ 60.587393] [] (suspend_devices_and_enter) from [] (pm_suspend+0x308/0x370) > [ 60.596110] [] (pm_suspend) from [] (state_store+0x6c/0xc8) > [ 60.603440] [] (state_store) from [] (kernfs_fop_write+0xf8/0x210) > [ 60.611379] [] (kernfs_fop_write) from [] (__vfs_write+0x2c/0x1c4) > [ 60.619310] [] (__vfs_write) from [] (vfs_write+0xa4/0x188) > [ 60.626632] [] (vfs_write) from [] (ksys_write+0xa4/0xd4) > [ 60.633778] [] (ksys_write) from [] (ret_fast_syscall+0x0/0x54) > [ 60.641437] Exception stack(0xeda91fa8 to 0xeda91ff0) > [ 60.646497] 1fa0: 0000006c 00498438 00000004 00498438 00000004 00000000 > [ 60.654683] 1fc0: 0000006c 00498438 00497228 00000004 00000004 00000004 0048478c 00497228 > [ 60.662866] 1fe0: 00000004 be9029b8 b6ec8c0b b6e53206 > [ 60.668007] ---[ end trace 5453317048e46ae9 ]--- > [ 60.672632] Failed to disable vdd-pexb: -5 > [ 60.676761] tegra-pcie 3000.pcie: tegra pcie power on fail: -16 > [ 60.682694] PM: dpm_run_callback(): tegra_pcie_pm_resume+0x0/0x107c returns -16 > [ 60.690035] PM: Device 3000.pcie failed to resume noirq: error -16 > [ 60.696859] tegra-mc 7000f000.memory-controller: fdcdwr2: write @0x877e8400: EMEM address decode error (SMMU translation error [--S]) > [ 60.708876] tegra-mc 7000f000.memory-controller: fdcdwr2: write @0x877e8400: Page fault (SMMU translation error [--S]) 7. This MC error may happen because the PCIe regulators were not enabled during the resume process and then ungating PCIe powerdomain and clocks caused the SoC hardware malfunction.