2023-03-17 10:02:26

by Rafał Miłecki

[permalink] [raw]
Subject: Probing devices by their less-specific "compatible" bindings (here: brcmnand)

Hi, I just spent few hours debugging hidden hw lockup and I need to
consult driver core code behaviour.

I have a BCM4908 SoC based board with a NAND controller on it.


### Hardware binding

Hardware details:
arch/arm64/boot/dts/broadcom/bcmbca/bcm4908.dtsi

Relevant part:
nand-controller@1800 {
compatible = "brcm,nand-bcm63138", "brcm,brcmnand-v7.1", "brcm,brcmnand";
reg = <0x1800 0x600>, <0x2000 0x10>;
reg-names = "nand", "nand-int-base";
}:

Above binding is based on the documentation:
Documentation/devicetree/bindings/mtd/brcm,brcmnand.yaml


### Linux drivers

Linux has separated drivers for few Broadcom's NAND controller bindings:

1. drivers/mtd/nand/raw/brcmnand/bcm63138_nand.c for:
brcm,nand-bcm63138

2. drivers/mtd/nand/raw/brcmnand/brcmnand.c for:
brcm,brcmnand-v2.1
brcm,brcmnand-v2.2
brcm,brcmnand-v4.0
brcm,brcmnand-v5.0
brcm,brcmnand-v6.0
brcm,brcmnand-v6.1
brcm,brcmnand-v6.2
brcm,brcmnand-v7.0
brcm,brcmnand-v7.1
brcm,brcmnand-v7.2
brcm,brcmnand-v7.3

3. drivers/mtd/nand/raw/brcmnand/brcmstb_nand.c for:
brcm,brcmnand


### Problem

As first Linux probes my hardware using the "brcm,nand-bcm63138"
compatibility string driver bcm63138_nand.c. That's good.

It that fails however (.probe() returns an error) then Linux core starts
probing using drivers for less specific bindings.

In my case probing with the "brcm,brcmnand" string driver brcmstb_nand.c
results in ignoring SoC specific bits and causes a hardware lockup. Hw
isn't initialized properly and writel_relaxed(0x00000009, base + 0x04)
just make it hang.

That obviously isn't an acceptable behavior for me. So I'm wondering
what's going on wrong here.

Should Linux avoid probing with less-specific compatible strings?
Or should I not claim hw to be "brcm,brcmnand" compatible if it REQUIRES
SoC-specific handling?

An extra note: that fallback probing happens even with .probe()
returning -EPROBE_DEFER. This actually smells fishy for me on the Linux
core part.
I'm not an expect but I think core should wait for actual error without
trying less-specific compatible strings & drivers.


2023-03-17 14:19:55

by Tudor Ambarus

[permalink] [raw]
Subject: Re: Probing devices by their less-specific "compatible" bindings (here: brcmnand)



On 3/17/23 10:02, Rafał Miłecki wrote:
> Hi, I just spent few hours debugging hidden hw lockup and I need to
> consult driver core code behaviour.
>
> I have a BCM4908 SoC based board with a NAND controller on it.
>
>
> ### Hardware binding
>
> Hardware details:
> arch/arm64/boot/dts/broadcom/bcmbca/bcm4908.dtsi
>
> Relevant part:
> nand-controller@1800 {
>     compatible = "brcm,nand-bcm63138", "brcm,brcmnand-v7.1",
> "brcm,brcmnand";
>     reg = <0x1800 0x600>, <0x2000 0x10>;
>     reg-names = "nand", "nand-int-base";
> }:
>
> Above binding is based on the documentation:
> Documentation/devicetree/bindings/mtd/brcm,brcmnand.yaml
>
>
> ### Linux drivers
>
> Linux has separated drivers for few Broadcom's NAND controller bindings:
>
> 1. drivers/mtd/nand/raw/brcmnand/bcm63138_nand.c for:
> brcm,nand-bcm63138
>
> 2. drivers/mtd/nand/raw/brcmnand/brcmnand.c for:
> brcm,brcmnand-v2.1
> brcm,brcmnand-v2.2
> brcm,brcmnand-v4.0
> brcm,brcmnand-v5.0
> brcm,brcmnand-v6.0
> brcm,brcmnand-v6.1
> brcm,brcmnand-v6.2
> brcm,brcmnand-v7.0
> brcm,brcmnand-v7.1
> brcm,brcmnand-v7.2
> brcm,brcmnand-v7.3
>
> 3. drivers/mtd/nand/raw/brcmnand/brcmstb_nand.c for:
> brcm,brcmnand
>
>
> ### Problem
>
> As first Linux probes my hardware using the "brcm,nand-bcm63138"
> compatibility string driver bcm63138_nand.c. That's good.
>
> It that fails however (.probe() returns an error) then Linux core starts
> probing using drivers for less specific bindings.
>
> In my case probing with the "brcm,brcmnand" string driver brcmstb_nand.c
> results in ignoring SoC specific bits and causes a hardware lockup. Hw
> isn't initialized properly and writel_relaxed(0x00000009, base + 0x04)
> just make it hang.
>
> That obviously isn't an acceptable behavior for me. So I'm wondering
> what's going on wrong here.
>
> Should Linux avoid probing with less-specific compatible strings?

No. It's quite common to have a list of compatibles. An use case is that
you have a new IP that works with an existing compatible, but this IP
also has new features that are not supported by the existing compatible
and by the hardware for which the existing compatible was created. So
people introduce a new compatible for the existing IP in order to
differentiate from the older versions. If the new compatible is not yet
defined in the driver, the second compatible from the list is used and
you get the IP working but with a reduced function set.

> Or should I not claim hw to be "brcm,brcmnand" compatible if it REQUIRES
> SoC-specific handling?

Correct.

Cheers,
ta

2023-03-17 14:35:41

by Krzysztof Kozlowski

[permalink] [raw]
Subject: Re: Probing devices by their less-specific "compatible" bindings (here: brcmnand)

On 17/03/2023 11:02, Rafał Miłecki wrote:
> Hi, I just spent few hours debugging hidden hw lockup and I need to
> consult driver core code behaviour.
>
> I have a BCM4908 SoC based board with a NAND controller on it.
>
>
> ### Hardware binding
>
> Hardware details:
> arch/arm64/boot/dts/broadcom/bcmbca/bcm4908.dtsi
>
> Relevant part:
> nand-controller@1800 {
> compatible = "brcm,nand-bcm63138", "brcm,brcmnand-v7.1", "brcm,brcmnand";

(...)

> ### Problem
>
> As first Linux probes my hardware using the "brcm,nand-bcm63138"
> compatibility string driver bcm63138_nand.c. That's good.
>
> It that fails however (.probe() returns an error) then Linux core starts
> probing using drivers for less specific bindings.
>
> In my case probing with the "brcm,brcmnand" string driver brcmstb_nand.c
> results in ignoring SoC specific bits and causes a hardware lockup. Hw
> isn't initialized properly and writel_relaxed(0x00000009, base + 0x04)
> just make it hang.
>
> That obviously isn't an acceptable behavior for me. So I'm wondering
> what's going on wrong here.
>
> Should Linux avoid probing with less-specific compatible strings?

Why? If less-specific compatible is there, it means device is compatible
with it and it should work.

> Or should I not claim hw to be "brcm,brcmnand" compatible if it REQUIRES
> SoC-specific handling?

As you pointed this compatible does not work for your device, so they
are not compatible.


Best regards,
Krzysztof


2023-03-17 21:54:47

by Florian Fainelli

[permalink] [raw]
Subject: Re: Probing devices by their less-specific "compatible" bindings (here: brcmnand)

+William,

On 3/17/23 03:02, Rafał Miłecki wrote:
> Hi, I just spent few hours debugging hidden hw lockup and I need to
> consult driver core code behaviour.
>
> I have a BCM4908 SoC based board with a NAND controller on it.
>
>
> ### Hardware binding
>
> Hardware details:
> arch/arm64/boot/dts/broadcom/bcmbca/bcm4908.dtsi
>
> Relevant part:
> nand-controller@1800 {
>     compatible = "brcm,nand-bcm63138", "brcm,brcmnand-v7.1",
> "brcm,brcmnand";
>     reg = <0x1800 0x600>, <0x2000 0x10>;
>     reg-names = "nand", "nand-int-base";
> }:
>
> Above binding is based on the documentation:
> Documentation/devicetree/bindings/mtd/brcm,brcmnand.yaml
>
>
> ### Linux drivers
>
> Linux has separated drivers for few Broadcom's NAND controller bindings:
>
> 1. drivers/mtd/nand/raw/brcmnand/bcm63138_nand.c for:
> brcm,nand-bcm63138
>
> 2. drivers/mtd/nand/raw/brcmnand/brcmnand.c for:
> brcm,brcmnand-v2.1
> brcm,brcmnand-v2.2
> brcm,brcmnand-v4.0
> brcm,brcmnand-v5.0
> brcm,brcmnand-v6.0
> brcm,brcmnand-v6.1
> brcm,brcmnand-v6.2
> brcm,brcmnand-v7.0
> brcm,brcmnand-v7.1
> brcm,brcmnand-v7.2
> brcm,brcmnand-v7.3
>
> 3. drivers/mtd/nand/raw/brcmnand/brcmstb_nand.c for:
> brcm,brcmnand
>
>
> ### Problem
>
> As first Linux probes my hardware using the "brcm,nand-bcm63138"
> compatibility string driver bcm63138_nand.c. That's good.
>
> It that fails however (.probe() returns an error) then Linux core starts
> probing using drivers for less specific bindings.

Why does it fail?

>
> In my case probing with the "brcm,brcmnand" string driver brcmstb_nand.c
> results in ignoring SoC specific bits and causes a hardware lockup. Hw
> isn't initialized properly and writel_relaxed(0x00000009, base + 0x04)
> just make it hang.

Well, the missing piece here is that brcmnand.c is a library driver,
therefore it needs an entry point, the next one that matches is
brcmstb_nand.c.

>
> That obviously isn't an acceptable behavior for me. So I'm wondering
> what's going on wrong here.
>
> Should Linux avoid probing with less-specific compatible strings?
> Or should I not claim hw to be "brcm,brcmnand" compatible if it REQUIRES
> SoC-specific handling?
>
> An extra note: that fallback probing happens even with .probe()
> returning -EPROBE_DEFER. This actually smells fishy for me on the Linux
> core part.
> I'm not an expect but I think core should wait for actual error without
> trying less-specific compatible strings & drivers.
>
> ______________________________________________________
> Linux MTD discussion mailing list
> http://lists.infradead.org/mailman/listinfo/linux-mtd/

--
Florian


2023-03-18 00:24:39

by William Zhang

[permalink] [raw]
Subject: Re: Probing devices by their less-specific "compatible" bindings (here: brcmnand)



On 03/17/2023 02:54 PM, Florian Fainelli wrote:
> +William,
>
> On 3/17/23 03:02, Rafał Miłecki wrote:
>> Hi, I just spent few hours debugging hidden hw lockup and I need to
>> consult driver core code behaviour.
>>
>> I have a BCM4908 SoC based board with a NAND controller on it.
>>
>>
>> ### Hardware binding
>>
>> Hardware details:
>> arch/arm64/boot/dts/broadcom/bcmbca/bcm4908.dtsi
>>
>> Relevant part:
>> nand-controller@1800 {
>>      compatible = "brcm,nand-bcm63138", "brcm,brcmnand-v7.1",
>> "brcm,brcmnand";
>>      reg = <0x1800 0x600>, <0x2000 0x10>;
>>      reg-names = "nand", "nand-int-base";
>> }:
>>
>> Above binding is based on the documentation:
>> Documentation/devicetree/bindings/mtd/brcm,brcmnand.yaml
>>
>>
>> ### Linux drivers
>>
>> Linux has separated drivers for few Broadcom's NAND controller bindings:
>>
>> 1. drivers/mtd/nand/raw/brcmnand/bcm63138_nand.c for:
>> brcm,nand-bcm63138
>>
>> 2. drivers/mtd/nand/raw/brcmnand/brcmnand.c for:
>> brcm,brcmnand-v2.1
>> brcm,brcmnand-v2.2
>> brcm,brcmnand-v4.0
>> brcm,brcmnand-v5.0
>> brcm,brcmnand-v6.0
>> brcm,brcmnand-v6.1
>> brcm,brcmnand-v6.2
>> brcm,brcmnand-v7.0
>> brcm,brcmnand-v7.1
>> brcm,brcmnand-v7.2
>> brcm,brcmnand-v7.3
>>
>> 3. drivers/mtd/nand/raw/brcmnand/brcmstb_nand.c for:
>> brcm,brcmnand
>>
>>
>> ### Problem
>>
>> As first Linux probes my hardware using the "brcm,nand-bcm63138"
>> compatibility string driver bcm63138_nand.c. That's good.
>>
>> It that fails however (.probe() returns an error) then Linux core starts
>> probing using drivers for less specific bindings.
>
> Why does it fail?
>
Same question here. I just tried latest linux master code on my 4908
reference board and the Micron NAND on my board works fine. Can you post
the log from the brcmnand driver?

>>
>> In my case probing with the "brcm,brcmnand" string driver brcmstb_nand.c
>> results in ignoring SoC specific bits and causes a hardware lockup. Hw
>> isn't initialized properly and writel_relaxed(0x00000009, base + 0x04)
>> just make it hang.
>
> Well, the missing piece here is that brcmnand.c is a library driver,
> therefore it needs an entry point, the next one that matches is
> brcmstb_nand.c.
>
>>
>> That obviously isn't an acceptable behavior for me. So I'm wondering
>> what's going on wrong here.
>>
>> Should Linux avoid probing with less-specific compatible strings?
>> Or should I not claim hw to be "brcm,brcmnand" compatible if it REQUIRES
>> SoC-specific handling?
>>
>> An extra note: that fallback probing happens even with .probe()
>> returning -EPROBE_DEFER. This actually smells fishy for me on the Linux
>> core part.
>> I'm not an expect but I think core should wait for actual error without
>> trying less-specific compatible strings & drivers.
>>
Are you saying the bcm63138_nand.c probe function return -EPROBE_DEFER
and late on kernel call brcmstb_nand.c probe instead of bcm63138_nand's
probe again?

>> ______________________________________________________
>> Linux MTD discussion mailing list
>> http://lists.infradead.org/mailman/listinfo/linux-mtd/
>


Attachments:
smime.p7s (4.11 kB)
S/MIME Cryptographic Signature