Received: by 2002:a25:8b91:0:0:0:0:0 with SMTP id j17csp13695ybl; Tue, 3 Dec 2019 20:18:25 -0800 (PST) X-Google-Smtp-Source: APXvYqz/74CvF8doEAdJSkO3vaTZ2YXJMaw+0dNymNVevbvm3FrJK9FY5VJP8bBgqU59bXdPM4GC X-Received: by 2002:a05:6808:996:: with SMTP id a22mr967628oic.146.1575433105369; Tue, 03 Dec 2019 20:18:25 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1575433105; cv=none; d=google.com; s=arc-20160816; b=Pqp9aZGRlLGopp5lBgdTULnum8i49aeagmjoIb+e48bB3xs7YinaS5z2ZmnxqL4eof HRmRLDhXPHXpwu9NsorCN4l1TUuEuCzmhfjfHCYCH/8nOhQVtgc8PIwTOoWa4oeyIXSj wFRtOhDRy5eHmfVnZ323NbscpfFKXpBJV0ai2CAV1+eZt2VzZYH1+4pZxsygAPy+zRbm 2qkMPIWmJ6gu/urCbYLwHGbanPF+WHE5rIQ9qYccAXuTKYJnP0giDIwPnoPVaqCdqzfX MNVERK1E/+FsmxVv1zDVQTO1Xpjwj2xXwOQLjJMHM8J/nIuSnGqCBSq2lebUES7VcSB6 j+4Q== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding :content-language:in-reply-to:mime-version:user-agent:date :message-id:from:references:cc:to:subject:dkim-signature; bh=BZZl+nxbwr7fP0x4IN+8uQwfbG13ju41LnpcMidPMaU=; b=Ze/rDQVVbEUT+GdjyiiseFq1NnccG465KcazeCYTHJn6x//HOkfgUCs+cJbV3+E7+g 5S+xCXvia3kKda13M7Hr4ZVpe2SfEUwyXKiwQMz1kmz0RGrBHnMBHt9ROE3PasKpIU66 4ranoQR7KclVIWdT15KK+jh+duLOwVoIUkQxtO0JmfA0MaIr6Am0gVqfjbry3aPxpUNb 3+DThPfdUKXPBttk8ucpMLyAJorFquT8n/6/c7onRluWuLEf8bkIQUApXQmOItfvNEov AyqoTR6bFWlPOEX4fmhLLefW0+PNqwh1OHU7tkhGnuo83/WW9O1AFlhpg1TdXDbqNqis LuXA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=p2ovPSWb; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id q126si2532983oia.8.2019.12.03.20.18.11; Tue, 03 Dec 2019 20:18:25 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=p2ovPSWb; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726925AbfLDERm (ORCPT + 99 others); Tue, 3 Dec 2019 23:17:42 -0500 Received: from mail-pl1-f195.google.com ([209.85.214.195]:35730 "EHLO mail-pl1-f195.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726834AbfLDERm (ORCPT ); Tue, 3 Dec 2019 23:17:42 -0500 Received: by mail-pl1-f195.google.com with SMTP id s10so2595785plp.2; Tue, 03 Dec 2019 20:17:41 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=subject:to:cc:references:from:message-id:date:user-agent :mime-version:in-reply-to:content-language:content-transfer-encoding; bh=BZZl+nxbwr7fP0x4IN+8uQwfbG13ju41LnpcMidPMaU=; b=p2ovPSWb/E/cAo7fmM5XjU1B8d8NrSArymp9Q+uczj5fzm7yq+G54UR+dlyGIzX067 DNjJ61R8cL1SI6+e0M9ccFR9uQfTldodyekC1VeG1k/sajgcBKSkkRclwWRgrljtRZZu djvYcyi3pp0+KcJS055JAqQs7y8xAfBXntwNsHxutGqbUa26zq3jy7crUoC2QN0eBMfY rAGLi/p6KF04DA6C4mnY+E4BRSLL1ScLUciQMgU03NF35ttF5y1MrrVHR6m5nKtMsqCZ 9QQrOiF/clX5YhDlDx+J5s6VJjuzpv3Q1BK8/wxDSo/MAsahT3r2pz3U+a9FbET8pEVx 6lYA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:subject:to:cc:references:from:message-id:date :user-agent:mime-version:in-reply-to:content-language :content-transfer-encoding; bh=BZZl+nxbwr7fP0x4IN+8uQwfbG13ju41LnpcMidPMaU=; b=UURrwIQMZJTNo2e3gx03A1KNHzK6ejHuTDp6ND9NCMfHjeHrhtkr6CLOkPRJNHBMmK ra1hjoa3Lgqyl4mXrXDcGtGAvxgUV/YZBVGyV/6DMj/vD7e9A3JWAvHJ+96k+s9KAAgz BwceVvgIGi0BUnTQsZ19CDObMXr4xrxxY+12GQZHBmUZ/3emd+Kbt6QrgIBu4oQ15n8c MmGunKujSE56ebD6+gwASE0tV1zpAXoDK+M9MnnCVy9Jxmr4RkQPJ65pZXonLCWmsNe9 acni7q2hfgZcqglGDfO1q64LmTvvguOjkLO9yl8RPjabNMPcftA4A+TFYeGZ+AWtUiQ4 oLwA== X-Gm-Message-State: APjAAAVjXmklhYppfV31k0Zse6K4LoqmbVxHVKoOXyXjzEaZWFVvcYfK dho4wsYxXpZhzzwJRUNpJNo= X-Received: by 2002:a17:90a:374f:: with SMTP id u73mr1200433pjb.22.1575433061004; Tue, 03 Dec 2019 20:17:41 -0800 (PST) Received: from ?IPv6:240d:1a:90a:7900:fc86:4c94:c9d5:bfda? ([240d:1a:90a:7900:fc86:4c94:c9d5:bfda]) by smtp.gmail.com with ESMTPSA id p21sm5385729pfn.103.2019.12.03.20.17.36 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Tue, 03 Dec 2019 20:17:40 -0800 (PST) Subject: Re: 5e6669387e ("of/platform: Pause/resume sync state during init .."): [ 3.192726] WARNING: CPU: 1 PID: 1 at drivers/base/core.c:688 device_links_supplier_sync_state_resume To: Saravana Kannan , Rob Herring Cc: Greg Kroah-Hartman , "open list:OPEN FIRMWARE AND FLATTENED DEVICE TREE BINDINGS" , LKML , LKP , kernel test robot References: <20191201150015.GC18573@shao2-debian> <7e13b7f9-6c0f-0ab5-a6f9-5fb9b41257c9@gmail.com> From: Frank Rowand Message-ID: <1a30fb19-dfac-6b6c-1e92-8d972fc52b10@gmail.com> Date: Tue, 3 Dec 2019 22:17:33 -0600 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Thunderbird/60.9.0 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 12/3/19 4:50 PM, Saravana Kannan wrote: > On Tue, Dec 3, 2019 at 1:10 PM Rob Herring wrote: >> >> On Tue, Dec 3, 2019 at 2:05 PM Saravana Kannan wrote: >>> >>> On Tue, Dec 3, 2019 at 1:01 AM Frank Rowand wrote: >>>> >>>> On 12/2/19 3:19 PM, Saravana Kannan wrote: >>>>> On Sun, Dec 1, 2019 at 7:00 AM kernel test robot wrote: >>>>>> >>>>>> Greetings, >>>>>> >>>>>> 0day kernel testing robot got the below dmesg and the first bad commit is >>>>>> >>>>>> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git master >>>>>> >>>>>> commit 5e6669387e2287f25f09fd0abd279dae104cfa7e >>>>>> Author: Saravana Kannan >>>>>> AuthorDate: Wed Sep 4 14:11:24 2019 -0700 >>>>>> Commit: Greg Kroah-Hartman >>>>>> CommitDate: Fri Oct 4 17:30:19 2019 +0200 >>>>>> >>>>>> of/platform: Pause/resume sync state during init and of_platform_populate() >>>>>> >>>>>> When all the top level devices are populated from DT during kernel >>>>>> init, the supplier devices could be added and probed before the >>>>>> consumer devices are added and linked to the suppliers. To avoid the >>>>>> sync_state() callback from being called prematurely, pause the >>>>>> sync_state() callbacks before populating the devices and resume them >>>>>> at late_initcall_sync(). >>>>>> >>>>>> Similarly, when children devices are populated from a module using >>>>>> of_platform_populate(), there could be supplier-consumer dependencies >>>>>> between the children devices that are populated. To avoid the same >>>>>> problem with sync_state() being called prematurely, pause and resume >>>>>> sync_state() callbacks across of_platform_populate(). >>>>>> >>>>>> Signed-off-by: Saravana Kannan >>>>>> Link: https://lore.kernel.org/r/20190904211126.47518-6-saravanak@google.com >>>>>> Signed-off-by: Greg Kroah-Hartman >>>>>> >>>>>> fc5a251d0f driver core: Add sync_state driver/bus callback >>>>>> 5e6669387e of/platform: Pause/resume sync state during init and of_platform_populate() >>>>>> 81b6b96475 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux; tag 'dma-mapping-5.5' of git://git.infradead.org/users/hch/dma-mapping >>>>>> +-------------------------------------------------------------------------+------------+------------+------------+ >>>>>> | | fc5a251d0f | 5e6669387e | 81b6b96475 | >>>>>> +-------------------------------------------------------------------------+------------+------------+------------+ >>>>>> | boot_successes | 30 | 0 | 0 | >>>>>> | boot_failures | 1 | 11 | 22 | >>>>>> | Oops:#[##] | 1 | | | >>>>>> | EIP:unmap_vmas | 1 | | | >>>>>> | PANIC:double_fault | 1 | | | >>>>>> | Kernel_panic-not_syncing:Fatal_exception | 1 | | | >>>>>> | WARNING:at_drivers/base/core.c:#device_links_supplier_sync_state_resume | 0 | 11 | 22 | >>>>>> | EIP:device_links_supplier_sync_state_resume | 0 | 11 | 22 | >>>>>> +-------------------------------------------------------------------------+------------+------------+------------+ >>>>>> >>>>>> If you fix the issue, kindly add following tag >>>>>> Reported-by: kernel test robot >>>>>> >>>>>> [ 3.186107] OF: /testcase-data/phandle-tests/consumer-b: #phandle-cells = 2 found -1 >>>>>> [ 3.188595] platform testcase-data:testcase-device2: IRQ index 0 not found >>>>>> [ 3.191047] ### dt-test ### end of unittest - 199 passed, 0 failed >>>>>> [ 3.191932] ------------[ cut here ]------------ >>>>>> [ 3.192571] Unmatched sync_state pause/resume! >>>>>> [ 3.192726] WARNING: CPU: 1 PID: 1 at drivers/base/core.c:688 device_links_supplier_sync_state_resume+0x27/0xc0 >>>>>> [ 3.195084] Modules linked in: >>>>>> [ 3.195494] CPU: 1 PID: 1 Comm: swapper/0 Tainted: G T 5.4.0-rc1-00005-g5e6669387e228 #1 >>>>>> [ 3.196674] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1 04/01/2014 >>>>>> [ 3.197693] EIP: device_links_supplier_sync_state_resume+0x27/0xc0 >>>>>> [ 3.198680] Code: 00 00 00 3e 8d 74 26 00 57 56 31 d2 53 b8 a0 d0 d9 c1 e8 6c b6 38 00 a1 e4 d0 d9 c1 85 c0 75 13 68 84 ba c4 c1 e8 29 30 b1 ff <0f> 0b 58 eb 7f 8d 74 26 00 83 e8 01 85 c0 a3 e4 d0 d9 c1 75 6f 8b >>>>>> [ 3.201560] EAX: 00000022 EBX: 00000000 ECX: 00000000 EDX: 00000000 >>>>>> [ 3.202466] ESI: 000001ab EDI: c02c7f80 EBP: c1e87d27 ESP: c02c7f20 >>>>>> [ 3.203301] DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068 EFLAGS: 00010282 >>>>>> [ 3.204258] CR0: 80050033 CR2: bfa1bf98 CR3: 01f28000 CR4: 00140690 >>>>>> [ 3.205022] DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000 >>>>>> [ 3.205919] DR6: fffe0ff0 DR7: 00000400 >>>>>> [ 3.206529] Call Trace: >>>>>> [ 3.207011] ? of_platform_sync_state_init+0x13/0x16 >>>>>> [ 3.207719] ? do_one_initcall+0xda/0x260 >>>>>> [ 3.208247] ? kernel_init_freeable+0x110/0x197 >>>>>> [ 3.208906] ? rest_init+0x120/0x120 >>>>>> [ 3.209369] ? kernel_init+0xa/0x100 >>>>>> [ 3.209775] ? ret_from_fork+0x19/0x24 >>>>>> [ 3.210283] ---[ end trace 81d0f2d2ee65199b ]--- >>>>>> [ 3.210955] ALSA device list: >>>>> >>>>> Rob/Frank, >>>>> >>>>> This seems to be an issue with the unit test code not properly >>>>> cleaning up the state after it's done. >>>>> >>>>> Specifically, unittest_data_add() setting up of_root on systems where >>>>> there's no device tree (of_root == NULL). It doesn't clean up of_root >>>>> after the tests are done. This affects the of_have_populated_dt() API >>>>> that in turn affects calls to >>>>> device_links_supplier_sync_state_pause/resume(). I think unittests >>>>> shouldn't affect the of_have_populated_dt() API. >>>> There are at least a couple of reasons why the unittest devicetree data >>>> needs to remain after the point where devicetree unittests currently >>>> complete. So cleaning up (removing the data) is not an option. >>>> >>>> I depend on the unittest devicetree entries still existing after the system >>>> boots and I can log into a shell for some validation of the final result of >>>> the devicetree data. >>> >>> IMHO unittests shouldn't have a residual impact on the system after >>> they are done. So, I'll agree to disagree on this one. >> >> They shouldn't be enabled in a production system either. Why would you >> want the extra boot time? > > Should we ask the kernel test robot folks to not enable OF unittest No. If unittests are breaking other code I want to know that. > then? It broke my patch, but I wouldn't be surprised if it's silently > breaking other stuff too. I think we need to do option 4 below. > >> >>>> There is also a desire for the devicetree unittests to be able to be loaded >>>> as a module. That work is not yet scheduled, but I do not want to preclude >>>> the possibility. If unittests are loaded from a module then they will >>>> need some devicetree data to exist that is created in early boot. That >>>> data will be in the devicetree when of_platform_sync_state_init() is >>>> invoked. >>> >>> On a normal system, FDT is parsed and of_root is set (or not set) very >>> early on during setup_arch() before any of the initcall levels are >>> run. The return value of of_have_populated_dt() isn't expected to >>> change across initcall levels. But because of the way the unittest is >>> written (the of_root is changed at late_initcall() level) the return >>> value of of_have_populated_dt() changes across initcall levels. I >>> think that's a real problem with the unittest -- it's breaking API >>> semantics. >> >> I think what's really desired here is a 'Am I booting using DT' call. > > I think the community has decided to use of_have_populated_dt() as > that call. So, we shouldn't break it. Have you analyzed each and every use of of_have_populated_dt() to verify that? I have not yet looked at each of them. The function was created with one user for a specific purpose and the use of it has grown over the years. I was not going to modify it to have the specific meaning of "Am I booting using DT" (thus being able to ignore the existence of the unittest data in the devicetree) without first examining each of the users of the of_have_populated_dt(). [[ This possible change was one of the solutions I considered before I examined what the actual problem leading to the WARNing was. ]] > >> >>> of_have_populated_dt() is being used to check if DT is present in the >>> system and different things are done based on that. We can't have that >>> value change across initcall levels. >>> >>> Couple of thoughts: >>> 1. Don't run unit test if there is no live DT in the system? >> >> That's pretty much the only case I do run. I use UML to run the tests. > > Ah, makes sense. > >>> 2. If you don't want to do (1), then at least set up the unit test >>> data during setup_arch() instead of doing it at some initcall level? >> >> That further breaks making it a module. The plan is also to move to >> kunit which probably will preclude some hacky hook into setup_arch(). >> Side effects may need to be fixed for kunit though. > > Yup. > >>> 3. Can you use overlays for the unit tests if they are loaded as a module? >> >> That was the idea, yes. >> >> >> 4. Make running the unittests a command line option instead of running >> if enabled. Still has side effects, but you have to explicitly run it. I am assuming "command line option" means the kernel boot command line, not a command line interface. I would prefer not. It is a debug option. There is no need to add the extra complexity of an additional switch to control it. Configure it in or configure it out. > > Hmm... this is another good option. I think this should be done. Do we > have a consensus on this? Why would you even ask if there was consensus on something that has not even been discussed? >> A module would still be my preference. If only there was someone >> interested in making everything a module... ;)> > :) > >>>>> I was looking into writing a unittest patch to fix this, but I don't >>>>> know enough about the FDT parsing code to make sure I don't leak any >>>>> memory or free stuff that's in use. I'm not sure I can simply set >>>>> of_root = NULL if it was NULL before the unittest started. Let me know >>>>> how I should proceed or if you plan to write up a patch for this. >>>> >>>> Based on the above, "clean up" of the unittest data is not the solution. >>>> >>>> I haven't looked at the mechanism in device_links_supplier_sync_state_resume() >>>> that leads to the WARN yet. But is does not seem reasonable for that code >>>> to be so sensitive to what valid data is in the devicetree that a WARN results. >>> >>> Sure, I could easily fix it to work around this. But this seems to be >>> a genuine problem with the unittest setup IMO. > > I'll go ahead and do this (basically always doing it instead of > checking on of_have_populated_dt()) but I don't want us to forget this > unittest issue. Thank you for planning to do this fix. The unittest issue will not be forgotten. The possible impacts of unittest on other users of devicetree is something I am very sensitive to and have thought about quite a bit. -Frank > > -Saravana >