Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752320AbdIVMLL (ORCPT ); Fri, 22 Sep 2017 08:11:11 -0400 Received: from esa6.microchip.iphmx.com ([216.71.154.253]:21951 "EHLO esa6.microchip.iphmx.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752148AbdIVMLI (ORCPT ); Fri, 22 Sep 2017 08:11:08 -0400 X-IronPort-AV: E=Sophos;i="5.42,427,1500966000"; d="scan'208";a="4673051" Subject: Re: [PATCH v1 01/10] clk: at91: pmc: Wait for clocks when resuming To: Romain Izard CC: Boris Brezillon , Michael Turquette , Stephen Boyd , Ludovic Desroches , Jonathan Cameron , Wenyou Yang , Josh Wu , David Woodhouse , Brian Norris , Marek Vasut , Cyrille Pitchen , Thierry Reding , Richard Genoud , Greg Kroah-Hartman , Alan Stern , , LKML , , linux-mtd , , , , linux-arm-kernel References: <20170908153604.28383-1-romain.izard.pro@gmail.com> <20170908153604.28383-2-romain.izard.pro@gmail.com> <14fb5261-bb03-a6bd-6653-2a2c655b04ee@microchip.com> From: Nicolas Ferre Organization: microchip Message-ID: <5ffaec74-db4a-d285-1241-ba47e74fb0b8@microchip.com> Date: Fri, 22 Sep 2017 14:12:13 +0200 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.3.0 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset="utf-8" Content-Language: en-US Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 3837 Lines: 94 On 14/09/2017 at 18:15, Romain Izard wrote: > 2017-09-13 14:15 GMT+02:00 Nicolas Ferre : >> On 08/09/2017 at 17:35, Romain Izard wrote: >>> Wait for the syncronization of all clocks when resuming, not only the >>> UPLL clock. Do not use regmap_read_poll_timeout, as it will call BUG() >>> when interrupts are masked, which is the case in here. >>> >>> Signed-off-by: Romain Izard >>> --- >>> drivers/clk/at91/pmc.c | 24 ++++++++++++++++-------- >>> 1 file changed, 16 insertions(+), 8 deletions(-) >>> >>> diff --git a/drivers/clk/at91/pmc.c b/drivers/clk/at91/pmc.c >>> index 775af473fe11..5c2b26de303e 100644 >>> --- a/drivers/clk/at91/pmc.c >>> +++ b/drivers/clk/at91/pmc.c >>> @@ -107,10 +107,20 @@ static int pmc_suspend(void) >>> return 0; >>> } >>> >>> +static bool pmc_ready(unsigned int mask) >>> +{ >>> + unsigned int status; >>> + >>> + regmap_read(pmcreg, AT91_PMC_SR, &status); >>> + >>> + return ((status & mask) == mask) ? 1 : 0; >>> +} >>> + >>> static void pmc_resume(void) >>> { >>> - int i, ret = 0; >>> + int i; >>> u32 tmp; >>> + u32 mask = AT91_PMC_MCKRDY | AT91_PMC_LOCKA; >>> >>> regmap_read(pmcreg, AT91_PMC_MCKR, &tmp); >>> if (pmc_cache.mckr != tmp) >>> @@ -134,13 +144,11 @@ static void pmc_resume(void) >>> AT91_PMC_PCR_CMD); >>> } >>> >>> - if (pmc_cache.uckr & AT91_PMC_UPLLEN) { >>> - ret = regmap_read_poll_timeout(pmcreg, AT91_PMC_SR, tmp, >>> - !(tmp & AT91_PMC_LOCKU), >>> - 10, 5000); >>> - if (ret) >>> - pr_crit("USB PLL didn't lock when resuming\n"); >>> - } >>> + if (pmc_cache.uckr & AT91_PMC_UPLLEN) >>> + mask |= AT91_PMC_LOCKU; >>> + >>> + while (!pmc_ready(mask)) >>> + cpu_relax(); >> >> Okay, but I would prefer to keep the timeout property in it. So we may >> need to re-implement a timeout way-out here. >> > > We need to have a reference clock to measure the timeout delay. If we use > the kernel's timekeeping, it relies on the clocks that we are configuring in > this code. Moreover, my experience with the mainline code is that when > something goes wrong, nothing will work. No oops or panic will be reported, > the device will just stop working. > > In my case, I had obvious failures (it just stopped working unless I removed > USB wakeup or activated the console during suspend) but also very rare > failures, that occurred in the bootloader. Those issues were detected when > testing repeated suspend cycles for a night: the memory controller would > never enter the self-refresh mode during the resume sequence. > > This led me to question the bootloader's code first, and I set up 4 boards > with the backup prototype code on v4.9 to verify that it was stable on > suspend. I've reached 1.5 million sleep cycles over 3 weeks without > failure, so this hinted towards the difference between the prototype and the > backup code provided for v4.12 (which contained the patch that got in > v4.13). Once I integrated this patch, I've run the v4.12 code for 2 weeks > without issue as well. > > In the end, I don't want to touch this code if I do not have to, as checking > that it does not regress is really cumbersome. The timeout was more for PLL like the one use for USB. I didn't want to block only for USB PLL failure (which is kind of hypothetical, I admit). Anyway, I understand your arguments and taking into account the extensive tests that you've run, I agree with your approach. I'm adding my Ack to the v2. Thanks for having take the time to describe your debugging session: it's valuable information for everybody. Best regards, -- Nicolas Ferre