Received: by 2002:a05:6a10:eb17:0:0:0:0 with SMTP id hx23csp1238153pxb; Fri, 10 Sep 2021 01:01:43 -0700 (PDT) X-Google-Smtp-Source: ABdhPJxxnHaymjz2uEQuR34hOILYY3vt5yaDgYchKiNr54iKIQO/DaYH5F6uuwGNoHmvOpW9n69x X-Received: by 2002:adf:f00f:: with SMTP id j15mr8362417wro.265.1631260903256; Fri, 10 Sep 2021 01:01:43 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1631260903; cv=none; d=google.com; s=arc-20160816; b=WbTCpAFkhhljAPIBAKAjNZAyjf1Y/BL9J1EFXoC6YoTIIdZZLI1jb/RoXEyX0APGHY qI7Zsjo0DKuwesrKBOV2HoneeUtaVekVuALdn9G/+MRGUqNl2VpN7/qHdtokx57XXi0r D4106XHCjQ3oRTUWXq6WOC/8BvfSoFfJa9lc/BGnIyAed9yi2HTryOJlkNTdMQLLNSS0 I5rnBg+KmRfORv4NslPWRB7KEss6x5X8hGWdp3z8tbMtYY/e0U7jUJmnq3LKvpEIEOdv rspKpMQsf1BEeEKeQz5YlNtLfl8TeTgzbWbgyMGrGk79NqZg/ZBzB2+0h4SZA6qSfmrx K2wQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-language:content-transfer-encoding :in-reply-to:mime-version:user-agent:date:message-id:from:references :cc:to:subject; bh=rR66QSJoj79xa/pu58qQlU+23lXaiMDCg2yYckaeEqM=; b=RRESrrMjycthgkoT4Q2LE73KfMCL0Hc/cyKm7EIG/tlDDHe9sLdOSo4wW6sXi4ZUT/ KNGfDQ+o0wHT2Ga9Z/V7Pt57z1gxCm8UEKZ7Z9oaAFSolmFfKfVd7urLzGpwBCGBlIXQ TdY6C3l6TDaHJp48ch0A+MrW00v8BWGg0aXkgqYcU5n38PkBrH6FTdfW+zEvR2DIgsTb ZYO/tYIn5SkMJ68n9bp3mI48UWJQFt47mxpg5GEN0dJAeqnvrMQnMD/vq/YScA5NFc+Z vvd3wwwiqntvB5vfpqH2YVOIhZ2JLSFm27rGwICqN5vHv4wUWeaAfJqWoQeXCT1ncPWz AzDA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=huawei.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id s13si4312817edh.555.2021.09.10.01.01.19; Fri, 10 Sep 2021 01:01:43 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=huawei.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231665AbhIJIA6 (ORCPT + 99 others); Fri, 10 Sep 2021 04:00:58 -0400 Received: from szxga02-in.huawei.com ([45.249.212.188]:9412 "EHLO szxga02-in.huawei.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231502AbhIJIA6 (ORCPT ); Fri, 10 Sep 2021 04:00:58 -0400 Received: from dggemv704-chm.china.huawei.com (unknown [172.30.72.56]) by szxga02-in.huawei.com (SkyGuard) with ESMTP id 4H5Spg0MfRz8yG8; Fri, 10 Sep 2021 15:55:23 +0800 (CST) Received: from dggpemm500001.china.huawei.com (7.185.36.107) by dggemv704-chm.china.huawei.com (10.3.19.47) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2308.8; Fri, 10 Sep 2021 15:59:44 +0800 Received: from [10.174.177.243] (10.174.177.243) by dggpemm500001.china.huawei.com (7.185.36.107) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2308.8; Fri, 10 Sep 2021 15:59:44 +0800 Subject: Re: [BUG] amba: Remove deferred device addition To: Saravana Kannan CC: Rob Herring , "linux-kernel@vger.kernel.org" , Frank Rowand , , Russell King , "Linus Walleij" , linux-arm-kernel , Dmitry Torokhov , References: <20210816074619.177383-1-wangkefeng.wang@huawei.com> <20210816074619.177383-4-wangkefeng.wang@huawei.com> <85b28900-5f42-b997-2ded-0b952bc2a03e@huawei.com> <265bb783-10da-a7c1-2625-055dec5643a3@huawei.com> From: Kefeng Wang Message-ID: <4a8b0a6d-b1d5-ffe9-8e31-61844cb9bd89@huawei.com> Date: Fri, 10 Sep 2021 15:59:43 +0800 User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:60.0) Gecko/20100101 Thunderbird/60.7.0 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset="utf-8"; format=flowed Content-Transfer-Encoding: 8bit Content-Language: en-US X-Originating-IP: [10.174.177.243] X-ClientProxiedBy: dggems703-chm.china.huawei.com (10.3.19.180) To dggpemm500001.china.huawei.com (7.185.36.107) X-CFilter-Loop: Reflected Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 2021/9/9 11:30, Saravana Kannan wrote: > On Fri, Aug 27, 2021 at 6:09 PM Kefeng Wang wrote: >> >> On 2021/8/28 3:09, Saravana Kannan wrote: >>> On Fri, Aug 27, 2021 at 7:38 AM Kefeng Wang wrote: >>>> On 2021/8/27 8:04, Saravana Kannan wrote: >>>>> On Thu, Aug 26, 2021 at 1:22 AM Kefeng Wang wrote: >>>>>>>>> Btw, I've been working on [1] cleaning up the one-off deferred probe >>>>>>>>> solution that we have for amba devices. That causes a bunch of other >>>>>>>>> headaches. Your patch 3/3 takes us further in the wrong direction by >>>>>>>>> adding more reasons for delaying the addition of the device. >>>>>> Hi Saravana, I try the link[1], but with it, there is a crash when boot >>>>>> (qemu-system-arm -M vexpress-a15), >>> I'm assuming it's this one? >>> arch/arm/boot/dts/vexpress-v2p-ca15_a7.dts >> I use arch/arm/boot/dts/vexpress-v2p-ca15-tc1.dts. >> >> qemu-system-arm -M vexpress-a15 -dtb vexpress-v2p-ca15-tc1.dtb -cpu >> cortex-a15 -smp 2 -m size=3G -kernel zImage -rtc base=localtime -initrd >> initrd-arm32 -append 'console=ttyAMA0 cma=0 kfence.sample_interval=0 >> earlyprintk debug ' -device virtio-net-device,netdev=net8 -netdev >> type=tap,id=net8,script=/etc/qemu-ifup,downscript=/etc/qemu-ifdown >> -nographic >> >>>>> Hi, >>>>> >>>>> It's hard to make sense of the logs. Looks like two different threads >>>>> might be printing to the log at the same time? Can you please enable >>>>> the config that prints the thread ID (forgot what it's called) and >>>>> collect this again? With what I could tell the crash seems to be >>>>> happening somewhere in platform_match(), but that's not related to >>>>> this patch at all? >>>> Can you reproduce it? it is very likely related(without your patch, the >>>> boot is fine), >>> Sorry, I haven't ever setup qemu and booted vexpress. Thanks for your help. >>> >>>> the NULL ptr is about serio, it is registed from amba driver. >>>> >>>> ambakmi_driver_init >>>> >>>> -- amba_kmi_probe >>>> >>>> -- __serio_register_port >>> Thanks for the pointer. I took a look at the logs and the code. It's >>> very strange. As you can see from the backtrace, platform_match() is >>> being called for the device_add() from serio_handle_event(). But the >>> device that gets added there is on the serio_bus which obviously >>> should be using the serio_bus_match. >> Yes, I am confused too. >>>> +Dmitry and input maillist, is there some known issue about serio ? >>>> >>>> I add some debug, the full log is attached. >>>> >>>> [ 2.958355][ T41] input: AT Raw Set 2 keyboard as >>>> /devices/platform/bus@8000000/bus@8000000:motherboard-bus/bus@8000000:motherboard-bus:iofpga-bus@300000000/1c060000.kmi/serio0/input/input0 >>>> [ 2.977441][ T41] serio serio1: pdev c1e05508, pdev->name (null), >>>> drv c1090fc0, drv->name vexpress-reset >>> Based on the logs you added, it's pretty clear we are getting to >>> platform_match(). It's also strange that the drv->name is >>> vexpress-reset >> ... >>>> [ 3.003113][ T41] Backtrace: >>>> [ 3.003451][ T41] [] (strcmp) from [] (platform_match+0xdc/0xf0) >>>> [ 3.003963][ T41] [] (platform_match) from [] (__device_attach_driver+0x3c/0xf4) >>>> [ 3.004769][ T41] [] (__device_attach_driver) from [] (bus_for_each_drv+0x68/0xc8) >>>> [ 3.005481][ T41] [] (bus_for_each_drv) from [] (__device_attach+0xf0/0x16c) >>>> [ 3.006152][ T41] [] (__device_attach) from [] (device_initial_probe+0x1c/0x20) >>>> [ 3.006853][ T41] [] (device_initial_probe) from [] (bus_probe_device+0x94/0x9c) >>>> [ 3.007259][ T41] [] (bus_probe_device) from [] (device_add+0x408/0x8b8) >>>> [ 3.007900][ T41] [] (device_add) from [] (serio_handle_event+0x1b8/0x234) >>>> [ 3.008824][ T41] [] (serio_handle_event) from [] (process_one_work+0x238/0x594) >>>> [ 3.009737][ T41] [] (process_one_work) from [] (worker_thread+0x5c/0x5f4) >>>> [ 3.010638][ T41] [] (worker_thread) from [] (kthread+0x178/0x194) >>>> [ 3.011496][ T41] [] (kthread) from [] (ret_from_fork+0x14/0x24) >>>> [ 3.011860][ T41] Exception stack(0xc1675fb0 to 0xc1675ff8) >>> But the platform_match() is happening for the device_add() from >>> serio_event_handle() that's adding a device to the serio_bus and it >>> should be using serio_bus_match(). >>> >>> I haven't reached any conclusion yet, but my current thought process >>> is that it's either: >>> 1. My patch is somehow causing list corruption. But I don't directly >>> touch any list in my change (other than deleting a list entirely), so >>> it's not clear how that would be happening. >> Maybe some concurrent driver load? >> >>> 2. Without my patch, these AMBA device's probe would be delayed at >>> least until 5 seconds or possibly later. I'm wondering if my patch is >>> catching some bad timing assumptions in other code. >> After Rob's patch, It will retry soon. >> >> commit 039599c92d3b2e73689e8b6e519d653fd4770abb >> Author: Rob Herring >> Date: Wed Apr 29 15:58:12 2020 -0500 >> >> amba: Retry adding deferred devices at late_initcall >> >> If amba bus devices defer when adding, the amba bus code simply retries >> adding the devices every 5 seconds. This doesn't work well as it >> completely unsynchronized with starting the init process which can >> happen in less than 5 secs. Add a retry during late_initcall. If the >> amba devices are added, then deferred probe takes over. If the >> dependencies have not probed at this point, then there's no improvement >> over previous behavior. To completely solve this, we'd need to retry >> after every successful probe as deferred probe does. >> >> The list_empty() check now happens outside the mutex, but the mutex >> wasn't necessary in the first place. >> >> This needed to use deferred probe instead of fragile initcall ordering >> on 32-bit VExpress systems where the apb_pclk has a number of probe >> dependencies (vexpress-sysregs, vexpress-config). >> >> >>> You might be able to test out theory (2) by DEFERRED_DEVICE_TIMEOUT to >>> a much smaller number. Say 500ms or 100ms. If it doesn't crash, it >>> doesn't mean it's not (2), but if it does, then we know for sure it's >>> (2). >> ok, I will try this one, but due to above patch, it may not work. > Were you able to find anything more? I can't find any clue, and have no time to check this for now, is there any news from your side? > > -Saravana > . >