Received: by 10.223.185.116 with SMTP id b49csp1033834wrg; Fri, 16 Feb 2018 11:11:04 -0800 (PST) X-Google-Smtp-Source: AH8x225L6cSgDKjD0c0RHKelRdFyZESXhAxZ6C1x8mr4rbh9rLHIBUdNylyb/xLZ4YnP8PVOHftO X-Received: by 10.98.236.193 with SMTP id e62mr1373144pfm.33.1518808264465; Fri, 16 Feb 2018 11:11:04 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1518808264; cv=none; d=google.com; s=arc-20160816; b=JNmA5D2IDrUptz0+bp15ZU5Mx0LA//ifvwIVDT6pUdSdejggMQu4gKVEvlqlCyIg/+ w2bp+7BpUCApbJP/5gdTLcoonb73h15BB/wk3om7IFwmQtitQYty8P3HRCXDBJ44bVQ8 koTddGOs5Abm6MM3TL3xCmX7NUEU5/1AsQB/bEKp1GvSLo/gvr7GKHT6lBk5zuahtK4T atybGeFVT1/GyhlngBsqS2ve1fSSC8YkVHFBoXO1YPOMRq886bJYwoLXMO8OLQA7aSij YWSk0nNEDuC2Cd1diGecFa1Ef0XdS7dd1vLcM/VJyQGPJb6xH0rr4p6IyD63NdDcutEC mU3A== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding :content-language:in-reply-to:mime-version:user-agent:date :message-id:from:references:cc:to:subject:dkim-signature :arc-authentication-results; bh=/MzLazAUuQ+vb6URMT1dg6Ju4AeOLwNs56U4KFc3nw4=; b=h3surpWrd1LQwDyAugxfFQ5lD2L32tqG+KKdJwlYs6PV5j9m3/cqLu50ZEY/aJdrC4 bSo8FAjqxNhKnQvKoOwBrTymnewfeFBdiwsbx6ORNrqVC5KwTmW08dbl+ZCxdKBO98CB ooHvOQ5495K9uILpuB7uc0tfEeNrl6/oCCEqPMY4BmE80+qM2Q3f+icR8VVquglhpkE+ hCQI3W/C9R8euM+gUyedlZH3MYNVJsbb/B7oCszKjJsjzi0Rwd5SUIs7OWbeGzN7FrdZ vMJKULoLwkiAuNVEE1ILxjkihwEZXdh+Gx82rwURb8svBcheqM8mljhU/SEvD1D9Sw9F 5yQA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=Nul2A0eI; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id v6si2864067pgv.764.2018.02.16.11.10.49; Fri, 16 Feb 2018 11:11:04 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=Nul2A0eI; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1032968AbeBPNx7 (ORCPT + 99 others); Fri, 16 Feb 2018 08:53:59 -0500 Received: from mail-wr0-f195.google.com ([209.85.128.195]:46537 "EHLO mail-wr0-f195.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1032862AbeBPNx5 (ORCPT ); Fri, 16 Feb 2018 08:53:57 -0500 Received: by mail-wr0-f195.google.com with SMTP id 34so2956874wre.13; Fri, 16 Feb 2018 05:53:56 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=subject:to:cc:references:from:message-id:date:user-agent :mime-version:in-reply-to:content-language:content-transfer-encoding; bh=/MzLazAUuQ+vb6URMT1dg6Ju4AeOLwNs56U4KFc3nw4=; b=Nul2A0eIliV2i9T8ulwWNwYueWtcyaMXp6MTmP3qdt25tK7LZUf5q9VxTwHTRbc1Ka ROKxAUA8WhkjrwlVuQ1qvLWav9yNd27AFESZ0SRcluXKu+jH2TN3OoxS3bo8FOwBG/wE Al3tNLPYkB6QAKBr2QgfW1foww1/hx138fV2DqwJozLw3cXqmfYIzhyWg/h7GmKg+ou6 6niP+AZ+ohXRPzI3ddXPfJ77deIsRkxtAiRSo0bPeZbtHIE/NMrRkjiDV/Hz9OTyGVxe M+CbuXJpm6opJMVij3foFx7F3O7NSH5ojQ4n+8vjoamutkO0vbK8ZtpSH2ySnbbLOgx3 ShoQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:subject:to:cc:references:from:message-id:date :user-agent:mime-version:in-reply-to:content-language :content-transfer-encoding; bh=/MzLazAUuQ+vb6URMT1dg6Ju4AeOLwNs56U4KFc3nw4=; b=XMyGkLLEe95Bw1HuRa5zj5Yt+3n1BrmlB+1PAFJ4NouSIlHmw55lRzGH2MjxkrlX+m lZ8rt9oon8oo3tNfic1MKPrhkIWzFTZ2MQuaciGGLsSoSMHKtusonTXejf5JGIMsn5cv PuwnV0t9DdhI4NEaIZoEvkabUQ1eTuVrgn45LCkrNBrmBSaol7meHGWOJ4nSKDF+F/cw LQO3+dD8HJHAuatwQ22r5bZy2rC7JWAskHTPExQM/rOTBj4YJif0X33Aih3EQvVr2mYI nRubquRbnvC2UqSr/CwOPLRD1vBjopWEbICo3SaD/hTeGSg//5B+lGEchKFoCrVtxLKE 9oDA== X-Gm-Message-State: APf1xPADZWzuRKk/aehikAQ7R9o5ij/9JqZnClfOkqUXc4fLNMZkSUaF zrnHkPuVKle6t0EIvIviO0Y= X-Received: by 10.223.161.15 with SMTP id o15mr5408921wro.274.1518789235127; Fri, 16 Feb 2018 05:53:55 -0800 (PST) Received: from [192.168.2.62] (p578F04D2.dip0.t-ipconnect.de. [87.143.4.210]) by smtp.gmail.com with ESMTPSA id d200sm1026227wmd.24.2018.02.16.05.53.53 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Fri, 16 Feb 2018 05:53:54 -0800 (PST) Subject: Re: [PATCH v3] rtc: ac100: Fix ac100 determine rate bug To: Chen-Yu Tsai Cc: Maxime Ripard , Alessandro Zummo , alexandre.belloni@bootlin.com, linux-kernel , linux-sunxi , linux-rtc@vger.kernel.org, linux-clk , Mike Turquette , sboyd@kernel.org, linux-arm-kernel References: <20180214135612.21356-1-embed3d@gmail.com> <20180215141154.monvwx25awwetqrc@flea.lan> <00ece3d8-52ac-c51d-1867-97f7fd84dadf@gmail.com> <0af79515-ed4f-a4c6-6358-2053e8fd1745@gmail.com> From: Philipp Rossak Message-ID: <447afd58-7e9e-40cc-2a58-3ce85ebc5bd2@gmail.com> Date: Fri, 16 Feb 2018 14:53:52 +0100 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.6.0 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 16.02.2018 13:59, Chen-Yu Tsai wrote: > On Fri, Feb 16, 2018 at 8:49 PM, Philipp Rossak wrote: >> >> >> On 16.02.2018 05:10, Chen-Yu Tsai wrote: >>> >>> On Fri, Feb 16, 2018 at 1:53 AM, Philipp Rossak wrote: >>>> >>>> >>>> >>>> On 15.02.2018 15:11, Maxime Ripard wrote: >>>>> >>>>> >>>>> On Wed, Feb 14, 2018 at 02:56:12PM +0100, Philipp Rossak wrote: >>>>>> >>>>>> >>>>>> This patch fixes a bug, that prevents the Allwinner A83T and the A80 >>>>>> from a successful boot. >>>>>> >>>>>> The bug is there since v4.16-rc1 and appeared after the clk branch was >>>>>> merged. >>>>> >>>>> >>>>> >>>>> Out of curiosity, which patch has introduced this? I couldn't find any >>>>> obvious match. >>>>> >>>> >>>> I wasn't also n >>> >>> >>> To be honest, I'm not sure why this is hitting you and not me. >>> I have both A83T boards that have assigned-clock-rates set for >>> the ac100 clock outputs for WiFi. I have them running 4.16-rc1 >>> and have not seen this. The device tree patches that add these >>> are in 4.15. >>> >> >> Now it is getting curious ... . >> I already mentioned that bug in the sunxi-irc and someone else was hitting >> that problem also... >> I tested it also with the same toolchain you are using (gcc 7.3.0-1 Debian), >> but that didn't made any difference. >> >> I don't think that issue is related with the Hardware, but to be on the save >> side: Which Hardware version of the BPI-M3 do you have? I have version 1.2. >> >> Can someone else can confirm this bug? > > So I might have remembered wrong, as I just realized I have your > patch in my a83t branches. I don't hit this on the A80, which also > has the AC100, but doesn't use assigned-clock-rates in the device > tree. > > Could you try rolling back to 4.15 and see if you still hit it? > Clean 4.15, no patches, just cloned before building: No issues. >> >> >>>> >>>>>> You can find the shortend trace below: >>>>>> >>>>>> Unable to handle kernel NULL pointer dereference at virtual address >>>>>> 00000000 >>>>>> pgd = (ptrval) >>>>>> [00000000] *pgd=00000000 >>>>>> Internal error: Oops: 5 [#1] SMP ARM >>>>>> Modules linked in: >>>>>> CPU: 0 PID: 49 Comm: kworker/0:1 Not tainted 4.15.0-10190-gb89e32ccd1be >>>>>> #2 >>>>>> Hardware name: Allwinner sun8i Family >>>>>> Workqueue: events deferred_probe_work_func >>>>>> PC is at clk_hw_get_rate+0x0/0x34 >>>>>> LR is at ac100_clkout_determine_rate+0x48/0x19c >>>>>> >>>>>> [ ... ] >>>>>> >>>>>> (clk_hw_get_rate) from (ac100_clkout_determine_rate+0x48/0x19c) >>>>>> (ac100_clkout_determine_rate) from >>>>>> (clk_core_set_rate_nolock+0x3c/0x1a0) >>>>>> (clk_core_set_rate_nolock) from (clk_set_rate+0x30/0x88) >>>>>> (clk_set_rate) from (of_clk_set_defaults+0x200/0x364) >>>>>> (of_clk_set_defaults) from (platform_drv_probe+0x18/0xb0) >>>>>> >>>>>> To fix that bug, we first check if the return of the >>>>>> clk_hw_get_parent_by_index is non zero. If it is zero we skip that >>>>>> clock parent. >>>>>> >>>>>> The BUG report could be found here: https://lkml.org/lkml/2018/2/10/198 >>>>>> >>>>>> Fixes: 04940631b8d2 ("rtc: ac100: Add clk output support") >>>>>> >>>>>> Signed-off-by: Philipp Rossak >>>>>> --- >>>>>> >>>>>> Changes in v3: >>>>>> * add information when the bug appeared >>>>>> * make the comment more clear >>>>>> Changes in v2: >>>>>> * add tag Fixes: ... to commit message >>>>>> * add comment to if statement why we are doing this check >>>>>> >>>>>> drivers/rtc/rtc-ac100.c | 19 ++++++++++++++++++- >>>>>> 1 file changed, 18 insertions(+), 1 deletion(-) >>>>>> >>>>>> diff --git a/drivers/rtc/rtc-ac100.c b/drivers/rtc/rtc-ac100.c >>>>>> index 8ff9dc3fe5bf..2412aa2e8399 100644 >>>>>> --- a/drivers/rtc/rtc-ac100.c >>>>>> +++ b/drivers/rtc/rtc-ac100.c >>>>>> @@ -183,7 +183,24 @@ static int ac100_clkout_determine_rate(struct >>>>>> clk_hw >>>>>> *hw, >>>>>> for (i = 0; i < num_parents; i++) { >>>>>> struct clk_hw *parent = clk_hw_get_parent_by_index(hw, >>>>>> i); >>>>>> - unsigned long tmp, prate = clk_hw_get_rate(parent); >>>>>> + unsigned long tmp, prate; >>>>>> + >>>>>> + /* >>>>>> + * The clock has two parents, one is a fixed clock >>>>>> which >>>>>> is >>>>>> + * internally registered by the ac100 driver. The other >>>>>> parent >>>>>> + * is a clock from the codec side of the chip, which we >>>>>> + * properly declare and reference in the devicetree and >>>>>> is >>>>>> + * not implemented in any driver right now. >>>>>> + * If the clock core looks for the parent of that >>>>>> second >>>>>> + * missing clock, it can't one that is registered and >>>>>> + * returns NULL. >>>>>> + * Thus we need to check if the parent exists before >>>>>> + * we get the parent rate. >>>>>> + */ >>>>>> + if (!parent) >>>>>> + continue; >>>>> >>>>> >>>>> >>>>> I'm sorry, but I still don't get it. When you register that clock, you >>>>> will give it two parents. Why would that change during the life of the >>>>> clock? >>>>> >>>>> This really looks like a workaround rather than an actual fix. >>>>> >>>>> Maxime >>>>> >>>> I agree this is more a workaround! >>>> A proper solution/fix would be to define the devicetree correct like >>>> this: >>>> >>>> diff --git a/arch/arm/boot/dts/sun8i-a83t-bananapi-m3.dts >>>> b/arch/arm/boot/dts/sun8i-a83t-bananapi-m3.dts >>>> index 6550bf0e594b..6f56d429f17e 100644 >>>> --- a/arch/arm/boot/dts/sun8i-a83t-bananapi-m3.dts >>>> +++ b/arch/arm/boot/dts/sun8i-a83t-bananapi-m3.dts >>>> @@ -175,11 +175,18 @@ >>>> compatible = "x-powers,ac100-rtc"; >>>> interrupt-parent = <&r_intc>; >>>> interrupts = <0 IRQ_TYPE_LEVEL_LOW>; >>>> - clocks = <&ac100_codec>; >>>> + clocks = <&ac100_rtc_32k>; >>>> #clock-cells = <1>; >>>> clock-output-names = "cko1_rtc", >>>> "cko2_rtc", >>>> "cko3_rtc"; >>>> + >>>> + ac100_rtc_32k: rtc-32k-oscillator { >>>> + compatible = "fixed-clock"; >>>> + #clock-cells = <0>; >>>> + clock-frequency = <32768>; >>>> + clock-output-names = "ac100-rtc-32k"; >>>> + }; >>>> }; >>>> }; >>>> }; >>>> >>>> What do you think about that solution? >>> >>> >>> That's not quite right either. As I mentioned before, the >>> RTC block has two clock inputs, one 4MHz signal from the >>> codec block, and one 32.768 kHz signal from an external >>> crystal. The original device tree binding describes the >>> first one, and the 32.768 kHz clock was registered by >>> the RTC driver internally. >>> >>> If you're going to add the crystal clock, you still need >>> to keep the codec one. Note that this does not fix what >>> Maxime is asking you. I've already provided an explanation: >>> >>> The clock core allows registering clocks with not-yet-existing >>> clock parents. Parents are matches by string names. If no >>> clock by that name is registered yet, the clock core simply >>> orphans the new clock if the unregistered parent is its >>> current parent or simply ignores that parent if its not the >>> current parent. This is entirely valid and is what we are >>> counting on here, as we haven't implemented the codec-side >>> driver. >>> >>> Note we also sort of depend on this behavior for sunxi-ng >>> clocks, where in some cases we get circular dependencies >>> between clock blocks. >>> >>> BTW, since this is mostly related to the clk subsystem, >>> please CC the clk maintainers as well. >>> >> >> Ok, I added also now the clk mailing list and maintainers. >> As reference this here is the boot log [1]. >> >> Im not sure if this in relation to this bug: >> But, already with the 4.15 kernel I get from time to time some oops during >> runtime on the a31s (bpi-m2) and all are clock related and causes a crash of >> the system. > > Are you referring to the A31 CLK_OUT bug? That is an entirely separate > bug in the A31/A31s CCU driver. The bug was there from the beginning, > but did not cause a crash as the incompatible data structures had lined > up in a way that the embedded data structure that was used by the incorrect > clk ops was actually compatible. This changed with 4.16-rc1, as a new field > was added to some of the clock types, breaking the alignment. I made some > changes but have yet to write the commit log for it. Also I don't have any > A31/A31s boards to test it on right now. I can send you the patch later. > No I'm talking about an other bug. But, right now I'm not sure if this is related to the CLK_OUT bug. My first plan was to wait until you submitted the CLK_OUT patch, to clarify that. But I thought it would be good to mention. That one I'm talking about is very strange. So you can run your hardware for hours and nothing happens. Then when it appears the system crashs and reliable booting afterwards is not possible (for a few minutes). But that is not related to the thermal behavior of the soc. I'm still trying to find out under which conditions it crashes... I also ran some test with cpuburn and cpufreq-ljt-stress-test but everything looks good, related to those tests. Philipp