2022-12-13 00:53:57

by Peter Delevoryas

[permalink] [raw]
Subject: [PATCH] net/ncsi: Always use unicast source MAC address

I use QEMU for development, and I noticed that NC-SI packets get dropped by
the Linux software bridge[1] because we use a broadcast source MAC address
for the first few NC-SI packets.

The spec requires that the destination MAC address is FF:FF:FF:FF:FF:FF,
but it doesn't require anything about the source MAC address as far as I
know. From testing on a few different NC-SI NIC's (Broadcom 57502, Nvidia
CX4, CX6) I don't think it matters to the network card. I mean, Meta has
been using this in mass production with millions of BMC's [2].

In general, I think it's probably just a good idea to use a unicast MAC.

This might have the effect of causing the NIC to learn 2 MAC addresses from
an NC-SI link if the BMC uses OEM Get MAC Address commands to change its
initial MAC address, but it shouldn't really matter. Who knows if NIC's
even have MAC learning enabled from the out-of-band BMC link, lol.

[1]: https://tinyurl.com/4933mhaj
[2]: https://tinyurl.com/mr3tyadb

Signed-off-by: Peter Delevoryas <[email protected]>
---
net/ncsi/ncsi-cmd.c | 10 +---------
1 file changed, 1 insertion(+), 9 deletions(-)

diff --git a/net/ncsi/ncsi-cmd.c b/net/ncsi/ncsi-cmd.c
index dda8b76b7798..fd090156cf0d 100644
--- a/net/ncsi/ncsi-cmd.c
+++ b/net/ncsi/ncsi-cmd.c
@@ -377,15 +377,7 @@ int ncsi_xmit_cmd(struct ncsi_cmd_arg *nca)
eh = skb_push(nr->cmd, sizeof(*eh));
eh->h_proto = htons(ETH_P_NCSI);
eth_broadcast_addr(eh->h_dest);
-
- /* If mac address received from device then use it for
- * source address as unicast address else use broadcast
- * address as source address
- */
- if (nca->ndp->gma_flag == 1)
- memcpy(eh->h_source, nca->ndp->ndev.dev->dev_addr, ETH_ALEN);
- else
- eth_broadcast_addr(eh->h_source);
+ memcpy(eh->h_source, nca->ndp->ndev.dev->dev_addr, ETH_ALEN);

/* Start the timer for the request that might not have
* corresponding response. Given NCSI is an internal
--
2.30.2


2022-12-13 16:54:57

by Alexander Duyck

[permalink] [raw]
Subject: Re: [PATCH] net/ncsi: Always use unicast source MAC address

On Mon, 2022-12-12 at 16:47 -0800, Peter Delevoryas wrote:
> I use QEMU for development, and I noticed that NC-SI packets get dropped by
> the Linux software bridge[1] because we use a broadcast source MAC address
> for the first few NC-SI packets.

Normally NC-SI packets should never be seen by a bridge. Isn't NC-SI
really supposed to just be between the BMC and the NIC firmware?
Depending on your setup it might make more sense to use something like
macvtap or a socket connection to just bypass the need for the bridge
entirely.

> The spec requires that the destination MAC address is FF:FF:FF:FF:FF:FF,
> but it doesn't require anything about the source MAC address as far as I
> know. From testing on a few different NC-SI NIC's (Broadcom 57502, Nvidia
> CX4, CX6) I don't think it matters to the network card. I mean, Meta has
> been using this in mass production with millions of BMC's [2].
>
> In general, I think it's probably just a good idea to use a unicast MAC.

I'm not sure I agree there. What is the initial value of the address?
If I am not mistaken the gma_flag is used to indicate that the MAC
address has been acquired isn't it? If using the broadcast is an issue
the maybe an all 0's MAC address might be more appropriate. My main
concern would be that the dev_addr is not initialized for those first
few messages so you may be leaking information.

> This might have the effect of causing the NIC to learn 2 MAC addresses from
> an NC-SI link if the BMC uses OEM Get MAC Address commands to change its
> initial MAC address, but it shouldn't really matter. Who knows if NIC's
> even have MAC learning enabled from the out-of-band BMC link, lol.
>
> [1]: https://tinyurl.com/4933mhaj
> [2]: https://tinyurl.com/mr3tyadb

The thing is the OpenBMC approach initializes the value themselves to
broadcast[3]. As a result the two code bases are essentially doing the
same thing since mac_addr is defaulted to the broadcast address when
the ncsi interface is registered.

[3]: tinyurl.com/mr3cxf3b

>
> Signed-off-by: Peter Delevoryas <[email protected]>
> ---
> net/ncsi/ncsi-cmd.c | 10 +---------
> 1 file changed, 1 insertion(+), 9 deletions(-)
>
> diff --git a/net/ncsi/ncsi-cmd.c b/net/ncsi/ncsi-cmd.c
> index dda8b76b7798..fd090156cf0d 100644
> --- a/net/ncsi/ncsi-cmd.c
> +++ b/net/ncsi/ncsi-cmd.c
> @@ -377,15 +377,7 @@ int ncsi_xmit_cmd(struct ncsi_cmd_arg *nca)
> eh = skb_push(nr->cmd, sizeof(*eh));
> eh->h_proto = htons(ETH_P_NCSI);
> eth_broadcast_addr(eh->h_dest);
> -
> - /* If mac address received from device then use it for
> - * source address as unicast address else use broadcast
> - * address as source address
> - */
> - if (nca->ndp->gma_flag == 1)
> - memcpy(eh->h_source, nca->ndp->ndev.dev->dev_addr, ETH_ALEN);
> - else
> - eth_broadcast_addr(eh->h_source);
> + memcpy(eh->h_source, nca->ndp->ndev.dev->dev_addr, ETH_ALEN);
>
> /* Start the timer for the request that might not have
> * corresponding response. Given NCSI is an internal

2022-12-16 01:28:20

by Peter Delevoryas

[permalink] [raw]
Subject: Re: [PATCH] net/ncsi: Always use unicast source MAC address



> On Dec 13, 2022, at 8:41 AM, Alexander H Duyck <[email protected]> wrote:
>
> On Mon, 2022-12-12 at 16:47 -0800, Peter Delevoryas wrote:
>> I use QEMU for development, and I noticed that NC-SI packets get dropped by
>> the Linux software bridge[1] because we use a broadcast source MAC address
>> for the first few NC-SI packets.
>
> Normally NC-SI packets should never be seen by a bridge.

True, and it’s good to keep this in context. I’m trying to make this change
to support simulation environments, but any change in NC-SI could easily
result in the out-of-band network connection to BMC’s in real data centers
failing to come up, which can be really bad and usually impossible to
recover remotely.

> Isn't NC-SI
> really supposed to just be between the BMC and the NIC firmware?

Yep

> Depending on your setup it might make more sense to use something like
> macvtap or a socket connection to just bypass the need for the bridge
> entirely.

For unicast, yes, but I want to test multiple NIC’s sharing an RMII
link and verifying the broadcast behavior, and the failover behavior
when an RX or TX channel goes down.

The multicast UDP socket backend _does_ work, but I was getting some
recirculation problems or some kind of buffering thing. I managed
to get tap0 + tap1 + br0 working faster.

>
>> The spec requires that the destination MAC address is FF:FF:FF:FF:FF:FF,
>> but it doesn't require anything about the source MAC address as far as I
>> know. From testing on a few different NC-SI NIC's (Broadcom 57502, Nvidia
>> CX4, CX6) I don't think it matters to the network card. I mean, Meta has
>> been using this in mass production with millions of BMC's [2].
>>
>> In general, I think it's probably just a good idea to use a unicast MAC.
>
> I'm not sure I agree there. What is the initial value of the address?

Ok so, to be honest, I thought that the BMC’s FTGMAC100 peripherals
came with addresses provisioned from the factory, and that we were just
discarding that value and using an address provisioned through the NIC,
because I hadn’t really dug into the FTGMAC100 datasheet fully. I see now
that the MAC address register I thought was a read-only manufacturing
value is actually 8 different MAC address r/w registers for filtering.
*facepalm*

It suddenly makes a lot more sense why all these OEM Get MAC Address
commands exist: the BMC chip doesn’t come with any MAC addresses from
manufacturing. It’s a necessity, not some convenience artifact/etc.

So, tracing some example systems to see what shows up:

One example:
INIT: Entering runlevel: 5
Configuring network interfaces... [ 25.893118] 8021q: adding VLAN 0 to HW filter on device eth0
[ 25.904809] NCSI: source=ff:ff:ff:ff:ff:ff dest=ff:ff:ff:ff:ff:ff
[ 25.917307] ftgmac100 1e660000.ethernet eth0: NCSI: Handler for packet type 0x82 returned -19
[ 25.958096] NCSI: source=ff:ff:ff:ff:ff:ff dest=ff:ff:ff:ff:ff:ff
[ 25.978124] NCSI: source=ff:ff:ff:ff:ff:ff dest=ff:ff:ff:ff:ff:ff
[ 25.990559] ftgmac100 1e660000.ethernet eth0: NCSI: 'bad' packet ignored for type 0xd0
[ 26.018180] NCSI: source=ff:ff:ff:ff:ff:ff dest=ff:ff:ff:ff:ff:ff
[ 26.030631] ftgmac100 1e660000.ethernet eth0: NCSI: 'bad' packet ignored for type 0xd0
[ 26.046594] ftgmac100 1e660000.ethernet eth0: NCSI: transmit cmd 0x50 ncsi_oem_sma_mlx success during probe
[ 26.168109] NCSI: source=ff:ff:ff:ff:ff:ff dest=ff:ff:ff:ff:ff:ff
[ 26.198101] NCSI: source=ff:ff:ff:ff:ff:ff dest=ff:ff:ff:ff:ff:ff
[ 26.238237] NCSI: source=ff:ff:ff:ff:ff:ff dest=ff:ff:ff:ff:ff:ff
[ 26.272011] NCSI: source=ff:ff:ff:ff:ff:ff dest=ff:ff:ff:ff:ff:ff
[ 26.308155] NCSI: source=ff:ff:ff:ff:ff:ff dest=ff:ff:ff:ff:ff:ff
[ 26.320504] NCSI: source=ff:ff:ff:ff:ff:ff dest=ff:ff:ff:ff:ff:ff
done.
[ 26.408094] NCSI: source=ff:ff:ff:ff:ff:ff dest=ff:ff:ff:ff:ff:ff
[ 26.438100] NCSI: source=ff:ff:ff:ff:ff:ff dest=ff:ff:ff:ff:ff:ff
[ 26.450537] ftgmac100 1e660000.ethernet eth0: NCSI: bcm_gmac16 MAC RE:DA:CT:ED:HE:HE
Starting random number generator[ 26.472388] NCSI: source=ff:ff:ff:ff:ff:ff dest=ff:ff:ff:ff:ff:ff
daemon[ 26.518241] NCSI: source=ff:ff:ff:ff:ff:ff dest=ff:ff:ff:ff:ff:ff
[ 26.559504] NCSI: source=ff:ff:ff:ff:ff:ff dest=ff:ff:ff:ff:ff:ff
.
[ 26.608229] NCSI: source=ff:ff:ff:ff:ff:ff dest=ff:ff:ff:ff:ff:ff
Setup dhclient for IPv6... done.
[ 26.681879] NCSI: source=ff:ff:ff:ff:ff:ff dest=ff:ff:ff:ff:ff:ff
[ 26.730523] NCSI: source=ff:ff:ff:ff:ff:ff dest=ff:ff:ff:ff:ff:ff
[ 26.808191] NCSI: source=ff:ff:ff:ff:ff:ff dest=ff:ff:ff:ff:ff:ff
[ 26.855689] NCSI: source=ff:ff:ff:ff:ff:ff dest=ff:ff:ff:ff:ff:ff

Oddly, due to that code you mentioned, all NC-SI packets are using
a broadcast source MAC address, even after the Get MAC Address sequence
gets the MAC provisioned for the BMC from the Broadcom NIC.

root@bmc-oob:~# ifconfig
eth0 Link encap:Ethernet HWaddr RE:DA:CT:ED:HE:HE
inet addr:XXXXXXX Bcast:XXXXXXXX Mask:XXXXXXXX
inet6 addr: XXXXXXXX Scope:Global
inet6 addr: XXXXXXXX Scope:Link
inet6 addr: XXXXXXXX Scope:Global
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:2965 errors:0 dropped:0 overruns:0 frame:0
TX packets:637 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:872759 (852.3 KiB) TX bytes:59936 (58.5 KiB)
Interrupt:19

But, that’s a system using the 5.0 kernel with lots of old hacks
on top. A system using a 5.15 kernel with this change included:

INIT: Entering runlevel: 5
Configuring network interfaces... [ 6.596537] 8021q: adding VLAN 0 to HW filter on device eth0
[ 6.609264] NCSI: source=RE:DA:CT:ED:AD:DR dest=ff:ff:ff:ff:ff:ff
[ 6.622913] ftgmac100 1e690000.ftgmac eth0: NCSI: Handler for packet type 0x82 returned -19
[ 6.641447] NCSI: source=RE:DA:CT:ED:AD:DR dest=ff:ff:ff:ff:ff:ff
[ 6.662543] NCSI: source=RE:DA:CT:ED:AD:DR dest=ff:ff:ff:ff:ff:ff
[ 6.680454] NCSI: source=RE:DA:CT:ED:AD:DR dest=ff:ff:ff:ff:ff:ff
[ 6.694114] ftgmac100 1e690000.ftgmac eth0: NCSI: transmit cmd 0x50 ncsi_oem_sma_mlx success during probe
[ 6.715722] NCSI: source=RE:DA:CT:ED:AD:DR dest=ff:ff:ff:ff:ff:ff
done.
[ 6.741372] NCSI: source=RE:DA:CT:ED:AD:DR dest=ff:ff:ff:ff:ff:ff
[ 6.741451] NCSI: source=RE:DA:CT:ED:AD:DR dest=ff:ff:ff:ff:ff:ff
[ 6.768714] NCSI: source=RE:DA:CT:ED:AD:DR dest=ff:ff:ff:ff:ff:ff
Starting random [ 6.782599] NCSI: source=RE:DA:CT:ED:AD:DR dest=ff:ff:ff:ff:ff:ff
number generator[ 6.799321] NCSI: source=RE:DA:CT:ED:AD:DR dest=ff:ff:ff:ff:ff:ff
daemon[ 6.815680] NCSI: source=RE:DA:CT:ED:AD:DR dest=ff:ff:ff:ff:ff:ff
[ 6.831388] NCSI: source=RE:DA:CT:ED:AD:DR dest=ff:ff:ff:ff:ff:ff
.
[ 6.846921] ftgmac100 1e690000.ftgmac eth0: NCSI: Network controller supports NC-SI 1.1, querying MAC address through OEM(0x8119) command
Setup dhclient for IPv6... done.
[ 6.908921] NCSI: source=RE:DA:CT:ED:AD:DR dest=ff:ff:ff:ff:ff:ff
reloading rsyslo[ 6.933085] NCSI: source=RE:DA:CT:ED:AD:DR dest=ff:ff:ff:ff:ff:ff

So, this BMC already had the provisioned MAC address somehow,
even before the Nvidia Get MAC Address command towards the bottom.

Adding tracing to ftgmac100:

[ 2.018672] ftgmac100_initial_mac
[ 2.026090] Read MAC address from FTGMAC100 register: RE:DA:CT:ED:AD:DR
[ 2.040771] ftgmac100 1e690000.ftgmac: Read MAC address RE:DA:CT:ED:AD:DR from chip
[ 2.057774] ftgmac100 1e690000.ftgmac: Using NCSI interface
[ 2.070957] ftgmac100 1e690000.ftgmac eth0: irq 33, mapped at (ptrval)

Now, after rewriting the FTGMAC100 register to fa:ce:b0:0c:20:22 and rebooting:

root@dhcp6-2620-10d-c0b9-4b08-0-0-0-e4e:~# devmem 0x1e690008 32 0x0000face
root@dhcp6-2620-10d-c0b9-4b08-0-0-0-e4e:~# devmem 0x1e690008
0x0000FACE
root@dhcp6-2620-10d-c0b9-4b08-0-0-0-e4e:~# devmem 0x1e69000c 32 0xb00c2022
root@dhcp6-2620-10d-c0b9-4b08-0-0-0-e4e:~# devmem 0x1e69000c
0xB00C2022

[ 2.001304] ftgmac100_initial_mac
[ 2.008727] Read MAC address from FTGMAC100 register: fa:ce:b0:0c:20:22
[ 2.023373] ftgmac100 1e690000.ftgmac: Read MAC address fa:ce:b0:0c:20:22 from chip
[ 2.040367] ftgmac100 1e690000.ftgmac: Using NCSI interface

[ 6.581239] ftgmac100_reset_mac
[ 6.589193] ftgmac100_reset_mac
[ 6.596727] 8021q: adding VLAN 0 to HW filter on device eth0
[ 6.609462] NCSI: source=fa:ce:b0:0c:20:22 dest=ff:ff:ff:ff:ff:ff
[ 6.623117] ftgmac100 1e690000.ftgmac eth0: NCSI: Handler for packet type 0x82 returned -19
[ 6.641647] NCSI: source=fa:ce:b0:0c:20:22 dest=ff:ff:ff:ff:ff:ff
[ 6.662398] NCSI: source=fa:ce:b0:0c:20:22 dest=ff:ff:ff:ff:ff:ff
[ 6.680380] NCSI: source=RE:DA:CT:ED:AD:DR dest=ff:ff:ff:ff:ff:ff
[ 6.694000] ftgmac100 1e690000.ftgmac eth0: NCSI: transmit cmd 0x50 ncsi_oem_sma_mlx success during probe
[ 6.715700] NCSI: source=RE:DA:CT:ED:AD:DR dest=ff:ff:ff:ff:ff:ff
[ 6.729528] NCSI: source=RE:DA:CT:ED:AD:DR dest=ff:ff:ff:ff:ff:ff

So, it looks like whatever is initialized in ftgmac100_initial_mac becomes
the address we use for the NCSI queries initially.

The Aspeed datasheet says the FTGMAC100 MAC address registers are initialized to zero,
and in that case the ftgmac100 driver initializes it to something random
with eth_hw_addr_random().

So, I mean correct me if I’m wrong, but I think it all seems fine?

On a hard power cycle (instead of just resetting the ARM cores, which doesn’t seem to
have reset the peripherals), maybe it would actually be zero, and get initialized
to the random value. I’ll test that, need to do some more debug image building to do it
remotely.

> If I am not mistaken the gma_flag is used to indicate that the MAC
> address has been acquired isn't it?

That’s correct.

> If using the broadcast is an issue
> the maybe an all 0's MAC address might be more appropriate.

Possibly yeah, although that would also be dropped by the Linux bridge lol,
so it wouldn’t solve my simulation problem.

> My main
> concern would be that the dev_addr is not initialized for those first
> few messages so you may be leaking information.
>
>> This might have the effect of causing the NIC to learn 2 MAC addresses from
>> an NC-SI link if the BMC uses OEM Get MAC Address commands to change its
>> initial MAC address, but it shouldn't really matter. Who knows if NIC's
>> even have MAC learning enabled from the out-of-band BMC link, lol.
>>
>> [1]: https://tinyurl.com/4933mhaj
>> [2]: https://tinyurl.com/mr3tyadb
>
> The thing is the OpenBMC approach initializes the value themselves to
> broadcast[3]. As a result the two code bases are essentially doing the
> same thing since mac_addr is defaulted to the broadcast address when
> the ncsi interface is registered.

That’s a very good point, thanks for pointing that out, I hadn’t
even noticed that!

Anyways, let me know what you think of the traces I added above.
Sorry for the delay, I’ve just been busy with some other stuff,
but I do really actually care about upstreaming this (and several
other NC-SI changes I’ll submit after this one, which are unrelated
but more useful).

Thanks,
Peter

>
> [3]: tinyurl.com/mr3cxf3b
>
>>
>> Signed-off-by: Peter Delevoryas <[email protected]>
>> ---
>> net/ncsi/ncsi-cmd.c | 10 +---------
>> 1 file changed, 1 insertion(+), 9 deletions(-)
>>
>> diff --git a/net/ncsi/ncsi-cmd.c b/net/ncsi/ncsi-cmd.c
>> index dda8b76b7798..fd090156cf0d 100644
>> --- a/net/ncsi/ncsi-cmd.c
>> +++ b/net/ncsi/ncsi-cmd.c
>> @@ -377,15 +377,7 @@ int ncsi_xmit_cmd(struct ncsi_cmd_arg *nca)
>> eh = skb_push(nr->cmd, sizeof(*eh));
>> eh->h_proto = htons(ETH_P_NCSI);
>> eth_broadcast_addr(eh->h_dest);
>> -
>> - /* If mac address received from device then use it for
>> - * source address as unicast address else use broadcast
>> - * address as source address
>> - */
>> - if (nca->ndp->gma_flag == 1)
>> - memcpy(eh->h_source, nca->ndp->ndev.dev->dev_addr, ETH_ALEN);
>> - else
>> - eth_broadcast_addr(eh->h_source);
>> + memcpy(eh->h_source, nca->ndp->ndev.dev->dev_addr, ETH_ALEN);
>>
>> /* Start the timer for the request that might not have
>> * corresponding response. Given NCSI is an internal
>

2022-12-16 02:02:39

by Peter Delevoryas

[permalink] [raw]
Subject: Re: [PATCH] net/ncsi: Always use unicast source MAC address



> On Dec 15, 2022, at 5:07 PM, Peter Delevoryas <[email protected]> wrote:
>
>
>
>> On Dec 13, 2022, at 8:41 AM, Alexander H Duyck <[email protected]> wrote:
>>
>> On Mon, 2022-12-12 at 16:47 -0800, Peter Delevoryas wrote:
>>> I use QEMU for development, and I noticed that NC-SI packets get dropped by
>>> the Linux software bridge[1] because we use a broadcast source MAC address
>>> for the first few NC-SI packets.
>>
>> Normally NC-SI packets should never be seen by a bridge.
>
> True, and it’s good to keep this in context. I’m trying to make this change
> to support simulation environments, but any change in NC-SI could easily
> result in the out-of-band network connection to BMC’s in real data centers
> failing to come up, which can be really bad and usually impossible to
> recover remotely.
>
>> Isn't NC-SI
>> really supposed to just be between the BMC and the NIC firmware?
>
> Yep
>
>> Depending on your setup it might make more sense to use something like
>> macvtap or a socket connection to just bypass the need for the bridge
>> entirely.
>
> For unicast, yes, but I want to test multiple NIC’s sharing an RMII
> link and verifying the broadcast behavior, and the failover behavior
> when an RX or TX channel goes down.
>
> The multicast UDP socket backend _does_ work, but I was getting some
> recirculation problems or some kind of buffering thing. I managed
> to get tap0 + tap1 + br0 working faster.
>
>>
>>> The spec requires that the destination MAC address is FF:FF:FF:FF:FF:FF,
>>> but it doesn't require anything about the source MAC address as far as I
>>> know. From testing on a few different NC-SI NIC's (Broadcom 57502, Nvidia
>>> CX4, CX6) I don't think it matters to the network card. I mean, Meta has
>>> been using this in mass production with millions of BMC's [2].
>>>
>>> In general, I think it's probably just a good idea to use a unicast MAC.
>>
>> I'm not sure I agree there. What is the initial value of the address?
>
> Ok so, to be honest, I thought that the BMC’s FTGMAC100 peripherals
> came with addresses provisioned from the factory, and that we were just
> discarding that value and using an address provisioned through the NIC,
> because I hadn’t really dug into the FTGMAC100 datasheet fully. I see now
> that the MAC address register I thought was a read-only manufacturing
> value is actually 8 different MAC address r/w registers for filtering.
> *facepalm*
>
> It suddenly makes a lot more sense why all these OEM Get MAC Address
> commands exist: the BMC chip doesn’t come with any MAC addresses from
> manufacturing. It’s a necessity, not some convenience artifact/etc.
>
> So, tracing some example systems to see what shows up:
>
> One example:
> INIT: Entering runlevel: 5
> Configuring network interfaces... [ 25.893118] 8021q: adding VLAN 0 to HW filter on device eth0
> [ 25.904809] NCSI: source=ff:ff:ff:ff:ff:ff dest=ff:ff:ff:ff:ff:ff
> [ 25.917307] ftgmac100 1e660000.ethernet eth0: NCSI: Handler for packet type 0x82 returned -19
> [ 25.958096] NCSI: source=ff:ff:ff:ff:ff:ff dest=ff:ff:ff:ff:ff:ff
> [ 25.978124] NCSI: source=ff:ff:ff:ff:ff:ff dest=ff:ff:ff:ff:ff:ff
> [ 25.990559] ftgmac100 1e660000.ethernet eth0: NCSI: 'bad' packet ignored for type 0xd0
> [ 26.018180] NCSI: source=ff:ff:ff:ff:ff:ff dest=ff:ff:ff:ff:ff:ff
> [ 26.030631] ftgmac100 1e660000.ethernet eth0: NCSI: 'bad' packet ignored for type 0xd0
> [ 26.046594] ftgmac100 1e660000.ethernet eth0: NCSI: transmit cmd 0x50 ncsi_oem_sma_mlx success during probe
> [ 26.168109] NCSI: source=ff:ff:ff:ff:ff:ff dest=ff:ff:ff:ff:ff:ff
> [ 26.198101] NCSI: source=ff:ff:ff:ff:ff:ff dest=ff:ff:ff:ff:ff:ff
> [ 26.238237] NCSI: source=ff:ff:ff:ff:ff:ff dest=ff:ff:ff:ff:ff:ff
> [ 26.272011] NCSI: source=ff:ff:ff:ff:ff:ff dest=ff:ff:ff:ff:ff:ff
> [ 26.308155] NCSI: source=ff:ff:ff:ff:ff:ff dest=ff:ff:ff:ff:ff:ff
> [ 26.320504] NCSI: source=ff:ff:ff:ff:ff:ff dest=ff:ff:ff:ff:ff:ff
> done.
> [ 26.408094] NCSI: source=ff:ff:ff:ff:ff:ff dest=ff:ff:ff:ff:ff:ff
> [ 26.438100] NCSI: source=ff:ff:ff:ff:ff:ff dest=ff:ff:ff:ff:ff:ff
> [ 26.450537] ftgmac100 1e660000.ethernet eth0: NCSI: bcm_gmac16 MAC RE:DA:CT:ED:HE:HE
> Starting random number generator[ 26.472388] NCSI: source=ff:ff:ff:ff:ff:ff dest=ff:ff:ff:ff:ff:ff
> daemon[ 26.518241] NCSI: source=ff:ff:ff:ff:ff:ff dest=ff:ff:ff:ff:ff:ff
> [ 26.559504] NCSI: source=ff:ff:ff:ff:ff:ff dest=ff:ff:ff:ff:ff:ff
> .
> [ 26.608229] NCSI: source=ff:ff:ff:ff:ff:ff dest=ff:ff:ff:ff:ff:ff
> Setup dhclient for IPv6... done.
> [ 26.681879] NCSI: source=ff:ff:ff:ff:ff:ff dest=ff:ff:ff:ff:ff:ff
> [ 26.730523] NCSI: source=ff:ff:ff:ff:ff:ff dest=ff:ff:ff:ff:ff:ff
> [ 26.808191] NCSI: source=ff:ff:ff:ff:ff:ff dest=ff:ff:ff:ff:ff:ff
> [ 26.855689] NCSI: source=ff:ff:ff:ff:ff:ff dest=ff:ff:ff:ff:ff:ff
>
> Oddly, due to that code you mentioned, all NC-SI packets are using
> a broadcast source MAC address, even after the Get MAC Address sequence
> gets the MAC provisioned for the BMC from the Broadcom NIC.
>
> root@bmc-oob:~# ifconfig
> eth0 Link encap:Ethernet HWaddr RE:DA:CT:ED:HE:HE
> inet addr:XXXXXXX Bcast:XXXXXXXX Mask:XXXXXXXX
> inet6 addr: XXXXXXXX Scope:Global
> inet6 addr: XXXXXXXX Scope:Link
> inet6 addr: XXXXXXXX Scope:Global
> UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
> RX packets:2965 errors:0 dropped:0 overruns:0 frame:0
> TX packets:637 errors:0 dropped:0 overruns:0 carrier:0
> collisions:0 txqueuelen:1000
> RX bytes:872759 (852.3 KiB) TX bytes:59936 (58.5 KiB)
> Interrupt:19
>
> But, that’s a system using the 5.0 kernel with lots of old hacks
> on top. A system using a 5.15 kernel with this change included:
>
> INIT: Entering runlevel: 5
> Configuring network interfaces... [ 6.596537] 8021q: adding VLAN 0 to HW filter on device eth0
> [ 6.609264] NCSI: source=RE:DA:CT:ED:AD:DR dest=ff:ff:ff:ff:ff:ff
> [ 6.622913] ftgmac100 1e690000.ftgmac eth0: NCSI: Handler for packet type 0x82 returned -19
> [ 6.641447] NCSI: source=RE:DA:CT:ED:AD:DR dest=ff:ff:ff:ff:ff:ff
> [ 6.662543] NCSI: source=RE:DA:CT:ED:AD:DR dest=ff:ff:ff:ff:ff:ff
> [ 6.680454] NCSI: source=RE:DA:CT:ED:AD:DR dest=ff:ff:ff:ff:ff:ff
> [ 6.694114] ftgmac100 1e690000.ftgmac eth0: NCSI: transmit cmd 0x50 ncsi_oem_sma_mlx success during probe
> [ 6.715722] NCSI: source=RE:DA:CT:ED:AD:DR dest=ff:ff:ff:ff:ff:ff
> done.
> [ 6.741372] NCSI: source=RE:DA:CT:ED:AD:DR dest=ff:ff:ff:ff:ff:ff
> [ 6.741451] NCSI: source=RE:DA:CT:ED:AD:DR dest=ff:ff:ff:ff:ff:ff
> [ 6.768714] NCSI: source=RE:DA:CT:ED:AD:DR dest=ff:ff:ff:ff:ff:ff
> Starting random [ 6.782599] NCSI: source=RE:DA:CT:ED:AD:DR dest=ff:ff:ff:ff:ff:ff
> number generator[ 6.799321] NCSI: source=RE:DA:CT:ED:AD:DR dest=ff:ff:ff:ff:ff:ff
> daemon[ 6.815680] NCSI: source=RE:DA:CT:ED:AD:DR dest=ff:ff:ff:ff:ff:ff
> [ 6.831388] NCSI: source=RE:DA:CT:ED:AD:DR dest=ff:ff:ff:ff:ff:ff
> .
> [ 6.846921] ftgmac100 1e690000.ftgmac eth0: NCSI: Network controller supports NC-SI 1.1, querying MAC address through OEM(0x8119) command
> Setup dhclient for IPv6... done.
> [ 6.908921] NCSI: source=RE:DA:CT:ED:AD:DR dest=ff:ff:ff:ff:ff:ff
> reloading rsyslo[ 6.933085] NCSI: source=RE:DA:CT:ED:AD:DR dest=ff:ff:ff:ff:ff:ff
>
> So, this BMC already had the provisioned MAC address somehow,
> even before the Nvidia Get MAC Address command towards the bottom.
>
> Adding tracing to ftgmac100:
>
> [ 2.018672] ftgmac100_initial_mac
> [ 2.026090] Read MAC address from FTGMAC100 register: RE:DA:CT:ED:AD:DR
> [ 2.040771] ftgmac100 1e690000.ftgmac: Read MAC address RE:DA:CT:ED:AD:DR from chip
> [ 2.057774] ftgmac100 1e690000.ftgmac: Using NCSI interface
> [ 2.070957] ftgmac100 1e690000.ftgmac eth0: irq 33, mapped at (ptrval)
>
> Now, after rewriting the FTGMAC100 register to fa:ce:b0:0c:20:22 and rebooting:
>
> root@dhcp6-2620-10d-c0b9-4b08-0-0-0-e4e:~# devmem 0x1e690008 32 0x0000face
> root@dhcp6-2620-10d-c0b9-4b08-0-0-0-e4e:~# devmem 0x1e690008
> 0x0000FACE
> root@dhcp6-2620-10d-c0b9-4b08-0-0-0-e4e:~# devmem 0x1e69000c 32 0xb00c2022
> root@dhcp6-2620-10d-c0b9-4b08-0-0-0-e4e:~# devmem 0x1e69000c
> 0xB00C2022
>
> [ 2.001304] ftgmac100_initial_mac
> [ 2.008727] Read MAC address from FTGMAC100 register: fa:ce:b0:0c:20:22
> [ 2.023373] ftgmac100 1e690000.ftgmac: Read MAC address fa:ce:b0:0c:20:22 from chip
> [ 2.040367] ftgmac100 1e690000.ftgmac: Using NCSI interface
>
> [ 6.581239] ftgmac100_reset_mac
> [ 6.589193] ftgmac100_reset_mac
> [ 6.596727] 8021q: adding VLAN 0 to HW filter on device eth0
> [ 6.609462] NCSI: source=fa:ce:b0:0c:20:22 dest=ff:ff:ff:ff:ff:ff
> [ 6.623117] ftgmac100 1e690000.ftgmac eth0: NCSI: Handler for packet type 0x82 returned -19
> [ 6.641647] NCSI: source=fa:ce:b0:0c:20:22 dest=ff:ff:ff:ff:ff:ff
> [ 6.662398] NCSI: source=fa:ce:b0:0c:20:22 dest=ff:ff:ff:ff:ff:ff
> [ 6.680380] NCSI: source=RE:DA:CT:ED:AD:DR dest=ff:ff:ff:ff:ff:ff
> [ 6.694000] ftgmac100 1e690000.ftgmac eth0: NCSI: transmit cmd 0x50 ncsi_oem_sma_mlx success during probe
> [ 6.715700] NCSI: source=RE:DA:CT:ED:AD:DR dest=ff:ff:ff:ff:ff:ff
> [ 6.729528] NCSI: source=RE:DA:CT:ED:AD:DR dest=ff:ff:ff:ff:ff:ff
>
> So, it looks like whatever is initialized in ftgmac100_initial_mac becomes
> the address we use for the NCSI queries initially.
>
> The Aspeed datasheet says the FTGMAC100 MAC address registers are initialized to zero,
> and in that case the ftgmac100 driver initializes it to something random
> with eth_hw_addr_random().
>
> So, I mean correct me if I’m wrong, but I think it all seems fine?
>
> On a hard power cycle (instead of just resetting the ARM cores, which doesn’t seem to
> have reset the peripherals), maybe it would actually be zero, and get initialized
> to the random value. I’ll test that, need to do some more debug image building to do it
> remotely.

Oh, that didn’t take as long as I expected. Here’s the results from a real
power cycle:

[ 5.248154] ftgmac100_initial_mac
[ 5.255470] Read MAC Address from FTGMAC100 register: 00:00:00:00:00:00
[ 5.255482] Generated random MAC Address: 4e:c7:78:ec:cd:4a
[ 5.282434] ftgmac100 1e690000.ftgmac: Generated random MAC address 4e:c7:78:ec:cd:4a

So yeah, in full power-cycles, it’ll be some random address.

>
>> If I am not mistaken the gma_flag is used to indicate that the MAC
>> address has been acquired isn't it?
>
> That’s correct.
>
>> If using the broadcast is an issue
>> the maybe an all 0's MAC address might be more appropriate.
>
> Possibly yeah, although that would also be dropped by the Linux bridge lol,
> so it wouldn’t solve my simulation problem.
>
>> My main
>> concern would be that the dev_addr is not initialized for those first
>> few messages so you may be leaking information.
>>
>>> This might have the effect of causing the NIC to learn 2 MAC addresses from
>>> an NC-SI link if the BMC uses OEM Get MAC Address commands to change its
>>> initial MAC address, but it shouldn't really matter. Who knows if NIC's
>>> even have MAC learning enabled from the out-of-band BMC link, lol.
>>>
>>> [1]: https://tinyurl.com/4933mhaj
>>> [2]: https://tinyurl.com/mr3tyadb
>>
>> The thing is the OpenBMC approach initializes the value themselves to
>> broadcast[3]. As a result the two code bases are essentially doing the
>> same thing since mac_addr is defaulted to the broadcast address when
>> the ncsi interface is registered.
>
> That’s a very good point, thanks for pointing that out, I hadn’t
> even noticed that!
>
> Anyways, let me know what you think of the traces I added above.
> Sorry for the delay, I’ve just been busy with some other stuff,
> but I do really actually care about upstreaming this (and several
> other NC-SI changes I’ll submit after this one, which are unrelated
> but more useful).
>
> Thanks,
> Peter
>
>>
>> [3]: tinyurl.com/mr3cxf3b
>>
>>>
>>> Signed-off-by: Peter Delevoryas <[email protected]>
>>> ---
>>> net/ncsi/ncsi-cmd.c | 10 +---------
>>> 1 file changed, 1 insertion(+), 9 deletions(-)
>>>
>>> diff --git a/net/ncsi/ncsi-cmd.c b/net/ncsi/ncsi-cmd.c
>>> index dda8b76b7798..fd090156cf0d 100644
>>> --- a/net/ncsi/ncsi-cmd.c
>>> +++ b/net/ncsi/ncsi-cmd.c
>>> @@ -377,15 +377,7 @@ int ncsi_xmit_cmd(struct ncsi_cmd_arg *nca)
>>> eh = skb_push(nr->cmd, sizeof(*eh));
>>> eh->h_proto = htons(ETH_P_NCSI);
>>> eth_broadcast_addr(eh->h_dest);
>>> -
>>> - /* If mac address received from device then use it for
>>> - * source address as unicast address else use broadcast
>>> - * address as source address
>>> - */
>>> - if (nca->ndp->gma_flag == 1)
>>> - memcpy(eh->h_source, nca->ndp->ndev.dev->dev_addr, ETH_ALEN);
>>> - else
>>> - eth_broadcast_addr(eh->h_source);
>>> + memcpy(eh->h_source, nca->ndp->ndev.dev->dev_addr, ETH_ALEN);
>>>
>>> /* Start the timer for the request that might not have
>>> * corresponding response. Given NCSI is an internal


2022-12-16 18:52:33

by Alexander Duyck

[permalink] [raw]
Subject: Re: [PATCH] net/ncsi: Always use unicast source MAC address

On Thu, Dec 15, 2022 at 5:08 PM Peter Delevoryas <[email protected]> wrote:
>
>
>
> > On Dec 13, 2022, at 8:41 AM, Alexander H Duyck <[email protected]> wrote:
> >
> > On Mon, 2022-12-12 at 16:47 -0800, Peter Delevoryas wrote:
> >> I use QEMU for development, and I noticed that NC-SI packets get dropped by
> >> the Linux software bridge[1] because we use a broadcast source MAC address
> >> for the first few NC-SI packets.
> >
> > Normally NC-SI packets should never be seen by a bridge.
>
> True, and it’s good to keep this in context. I’m trying to make this change
> to support simulation environments, but any change in NC-SI could easily
> result in the out-of-band network connection to BMC’s in real data centers
> failing to come up, which can be really bad and usually impossible to
> recover remotely.
>
> > Isn't NC-SI
> > really supposed to just be between the BMC and the NIC firmware?
>
> Yep
>
> > Depending on your setup it might make more sense to use something like
> > macvtap or a socket connection to just bypass the need for the bridge
> > entirely.
>
> For unicast, yes, but I want to test multiple NIC’s sharing an RMII
> link and verifying the broadcast behavior, and the failover behavior
> when an RX or TX channel goes down.
>
> The multicast UDP socket backend _does_ work, but I was getting some
> recirculation problems or some kind of buffering thing. I managed
> to get tap0 + tap1 + br0 working faster.

Right, but I think part of the issue is that things are being extended
in a way that may actually hurt the maintainability of it.
Specifically it seems like the general idea is that the NCSI interface
should be using either broadcast or the assigned unicast address as
its source MAC address.

My main concern with just using the raw MAC address from the device is
that it may be something that would be more problematic than just
being broadcast. My suggestion at a minimum would be to verify it is
valid before we just use it or to do something like in the code I
referenced where if the device doesn't have a valid MAC address we
just overwrite it with broadcast.

> >
> >> The spec requires that the destination MAC address is FF:FF:FF:FF:FF:FF,
> >> but it doesn't require anything about the source MAC address as far as I
> >> know. From testing on a few different NC-SI NIC's (Broadcom 57502, Nvidia
> >> CX4, CX6) I don't think it matters to the network card. I mean, Meta has
> >> been using this in mass production with millions of BMC's [2].
> >>
> >> In general, I think it's probably just a good idea to use a unicast MAC.
> >
> > I'm not sure I agree there. What is the initial value of the address?
>
> Ok so, to be honest, I thought that the BMC’s FTGMAC100 peripherals
> came with addresses provisioned from the factory, and that we were just
> discarding that value and using an address provisioned through the NIC,
> because I hadn’t really dug into the FTGMAC100 datasheet fully. I see now
> that the MAC address register I thought was a read-only manufacturing
> value is actually 8 different MAC address r/w registers for filtering.
> *facepalm*
>
> It suddenly makes a lot more sense why all these OEM Get MAC Address
> commands exist: the BMC chip doesn’t come with any MAC addresses from
> manufacturing. It’s a necessity, not some convenience artifact/etc.
>
> So, tracing some example systems to see what shows up:
>
> One example:
> INIT: Entering runlevel: 5
> Configuring network interfaces... [ 25.893118] 8021q: adding VLAN 0 to HW filter on device eth0
> [ 25.904809] NCSI: source=ff:ff:ff:ff:ff:ff dest=ff:ff:ff:ff:ff:ff
> [ 25.917307] ftgmac100 1e660000.ethernet eth0: NCSI: Handler for packet type 0x82 returned -19
> [ 25.958096] NCSI: source=ff:ff:ff:ff:ff:ff dest=ff:ff:ff:ff:ff:ff
> [ 25.978124] NCSI: source=ff:ff:ff:ff:ff:ff dest=ff:ff:ff:ff:ff:ff
> [ 25.990559] ftgmac100 1e660000.ethernet eth0: NCSI: 'bad' packet ignored for type 0xd0
> [ 26.018180] NCSI: source=ff:ff:ff:ff:ff:ff dest=ff:ff:ff:ff:ff:ff
> [ 26.030631] ftgmac100 1e660000.ethernet eth0: NCSI: 'bad' packet ignored for type 0xd0
> [ 26.046594] ftgmac100 1e660000.ethernet eth0: NCSI: transmit cmd 0x50 ncsi_oem_sma_mlx success during probe
> [ 26.168109] NCSI: source=ff:ff:ff:ff:ff:ff dest=ff:ff:ff:ff:ff:ff
> [ 26.198101] NCSI: source=ff:ff:ff:ff:ff:ff dest=ff:ff:ff:ff:ff:ff
> [ 26.238237] NCSI: source=ff:ff:ff:ff:ff:ff dest=ff:ff:ff:ff:ff:ff
> [ 26.272011] NCSI: source=ff:ff:ff:ff:ff:ff dest=ff:ff:ff:ff:ff:ff
> [ 26.308155] NCSI: source=ff:ff:ff:ff:ff:ff dest=ff:ff:ff:ff:ff:ff
> [ 26.320504] NCSI: source=ff:ff:ff:ff:ff:ff dest=ff:ff:ff:ff:ff:ff
> done.
> [ 26.408094] NCSI: source=ff:ff:ff:ff:ff:ff dest=ff:ff:ff:ff:ff:ff
> [ 26.438100] NCSI: source=ff:ff:ff:ff:ff:ff dest=ff:ff:ff:ff:ff:ff
> [ 26.450537] ftgmac100 1e660000.ethernet eth0: NCSI: bcm_gmac16 MAC RE:DA:CT:ED:HE:HE
> Starting random number generator[ 26.472388] NCSI: source=ff:ff:ff:ff:ff:ff dest=ff:ff:ff:ff:ff:ff
> daemon[ 26.518241] NCSI: source=ff:ff:ff:ff:ff:ff dest=ff:ff:ff:ff:ff:ff
> [ 26.559504] NCSI: source=ff:ff:ff:ff:ff:ff dest=ff:ff:ff:ff:ff:ff
> .
> [ 26.608229] NCSI: source=ff:ff:ff:ff:ff:ff dest=ff:ff:ff:ff:ff:ff
> Setup dhclient for IPv6... done.
> [ 26.681879] NCSI: source=ff:ff:ff:ff:ff:ff dest=ff:ff:ff:ff:ff:ff
> [ 26.730523] NCSI: source=ff:ff:ff:ff:ff:ff dest=ff:ff:ff:ff:ff:ff
> [ 26.808191] NCSI: source=ff:ff:ff:ff:ff:ff dest=ff:ff:ff:ff:ff:ff
> [ 26.855689] NCSI: source=ff:ff:ff:ff:ff:ff dest=ff:ff:ff:ff:ff:ff
>
> Oddly, due to that code you mentioned, all NC-SI packets are using
> a broadcast source MAC address, even after the Get MAC Address sequence
> gets the MAC provisioned for the BMC from the Broadcom NIC.
>
> root@bmc-oob:~# ifconfig
> eth0 Link encap:Ethernet HWaddr RE:DA:CT:ED:HE:HE
> inet addr:XXXXXXX Bcast:XXXXXXXX Mask:XXXXXXXX
> inet6 addr: XXXXXXXX Scope:Global
> inet6 addr: XXXXXXXX Scope:Link
> inet6 addr: XXXXXXXX Scope:Global
> UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
> RX packets:2965 errors:0 dropped:0 overruns:0 frame:0
> TX packets:637 errors:0 dropped:0 overruns:0 carrier:0
> collisions:0 txqueuelen:1000
> RX bytes:872759 (852.3 KiB) TX bytes:59936 (58.5 KiB)
> Interrupt:19
>
> But, that’s a system using the 5.0 kernel with lots of old hacks
> on top. A system using a 5.15 kernel with this change included:
>
> INIT: Entering runlevel: 5
> Configuring network interfaces... [ 6.596537] 8021q: adding VLAN 0 to HW filter on device eth0
> [ 6.609264] NCSI: source=RE:DA:CT:ED:AD:DR dest=ff:ff:ff:ff:ff:ff
> [ 6.622913] ftgmac100 1e690000.ftgmac eth0: NCSI: Handler for packet type 0x82 returned -19
> [ 6.641447] NCSI: source=RE:DA:CT:ED:AD:DR dest=ff:ff:ff:ff:ff:ff
> [ 6.662543] NCSI: source=RE:DA:CT:ED:AD:DR dest=ff:ff:ff:ff:ff:ff
> [ 6.680454] NCSI: source=RE:DA:CT:ED:AD:DR dest=ff:ff:ff:ff:ff:ff
> [ 6.694114] ftgmac100 1e690000.ftgmac eth0: NCSI: transmit cmd 0x50 ncsi_oem_sma_mlx success during probe
> [ 6.715722] NCSI: source=RE:DA:CT:ED:AD:DR dest=ff:ff:ff:ff:ff:ff
> done.
> [ 6.741372] NCSI: source=RE:DA:CT:ED:AD:DR dest=ff:ff:ff:ff:ff:ff
> [ 6.741451] NCSI: source=RE:DA:CT:ED:AD:DR dest=ff:ff:ff:ff:ff:ff
> [ 6.768714] NCSI: source=RE:DA:CT:ED:AD:DR dest=ff:ff:ff:ff:ff:ff
> Starting random [ 6.782599] NCSI: source=RE:DA:CT:ED:AD:DR dest=ff:ff:ff:ff:ff:ff
> number generator[ 6.799321] NCSI: source=RE:DA:CT:ED:AD:DR dest=ff:ff:ff:ff:ff:ff
> daemon[ 6.815680] NCSI: source=RE:DA:CT:ED:AD:DR dest=ff:ff:ff:ff:ff:ff
> [ 6.831388] NCSI: source=RE:DA:CT:ED:AD:DR dest=ff:ff:ff:ff:ff:ff
> .
> [ 6.846921] ftgmac100 1e690000.ftgmac eth0: NCSI: Network controller supports NC-SI 1.1, querying MAC address through OEM(0x8119) command
> Setup dhclient for IPv6... done.
> [ 6.908921] NCSI: source=RE:DA:CT:ED:AD:DR dest=ff:ff:ff:ff:ff:ff
> reloading rsyslo[ 6.933085] NCSI: source=RE:DA:CT:ED:AD:DR dest=ff:ff:ff:ff:ff:ff
>
> So, this BMC already had the provisioned MAC address somehow,
> even before the Nvidia Get MAC Address command towards the bottom.

It is probably a hold over from the last boot. I suspect you would
need a clean power-cycle to reset it back to non-modified values.

> Adding tracing to ftgmac100:
>
> [ 2.018672] ftgmac100_initial_mac
> [ 2.026090] Read MAC address from FTGMAC100 register: RE:DA:CT:ED:AD:DR
> [ 2.040771] ftgmac100 1e690000.ftgmac: Read MAC address RE:DA:CT:ED:AD:DR from chip
> [ 2.057774] ftgmac100 1e690000.ftgmac: Using NCSI interface
> [ 2.070957] ftgmac100 1e690000.ftgmac eth0: irq 33, mapped at (ptrval)
>
> Now, after rewriting the FTGMAC100 register to fa:ce:b0:0c:20:22 and rebooting:
>
> root@dhcp6-2620-10d-c0b9-4b08-0-0-0-e4e:~# devmem 0x1e690008 32 0x0000face
> root@dhcp6-2620-10d-c0b9-4b08-0-0-0-e4e:~# devmem 0x1e690008
> 0x0000FACE
> root@dhcp6-2620-10d-c0b9-4b08-0-0-0-e4e:~# devmem 0x1e69000c 32 0xb00c2022
> root@dhcp6-2620-10d-c0b9-4b08-0-0-0-e4e:~# devmem 0x1e69000c
> 0xB00C2022
>
> [ 2.001304] ftgmac100_initial_mac
> [ 2.008727] Read MAC address from FTGMAC100 register: fa:ce:b0:0c:20:22
> [ 2.023373] ftgmac100 1e690000.ftgmac: Read MAC address fa:ce:b0:0c:20:22 from chip
> [ 2.040367] ftgmac100 1e690000.ftgmac: Using NCSI interface
>
> [ 6.581239] ftgmac100_reset_mac
> [ 6.589193] ftgmac100_reset_mac
> [ 6.596727] 8021q: adding VLAN 0 to HW filter on device eth0
> [ 6.609462] NCSI: source=fa:ce:b0:0c:20:22 dest=ff:ff:ff:ff:ff:ff
> [ 6.623117] ftgmac100 1e690000.ftgmac eth0: NCSI: Handler for packet type 0x82 returned -19
> [ 6.641647] NCSI: source=fa:ce:b0:0c:20:22 dest=ff:ff:ff:ff:ff:ff
> [ 6.662398] NCSI: source=fa:ce:b0:0c:20:22 dest=ff:ff:ff:ff:ff:ff
> [ 6.680380] NCSI: source=RE:DA:CT:ED:AD:DR dest=ff:ff:ff:ff:ff:ff
> [ 6.694000] ftgmac100 1e690000.ftgmac eth0: NCSI: transmit cmd 0x50 ncsi_oem_sma_mlx success during probe
> [ 6.715700] NCSI: source=RE:DA:CT:ED:AD:DR dest=ff:ff:ff:ff:ff:ff
> [ 6.729528] NCSI: source=RE:DA:CT:ED:AD:DR dest=ff:ff:ff:ff:ff:ff
>
> So, it looks like whatever is initialized in ftgmac100_initial_mac becomes
> the address we use for the NCSI queries initially.
>
> The Aspeed datasheet says the FTGMAC100 MAC address registers are initialized to zero,
> and in that case the ftgmac100 driver initializes it to something random
> with eth_hw_addr_random().
>
> So, I mean correct me if I’m wrong, but I think it all seems fine?
>
> On a hard power cycle (instead of just resetting the ARM cores, which doesn’t seem to
> have reset the peripherals), maybe it would actually be zero, and get initialized
> to the random value. I’ll test that, need to do some more debug image building to do it
> remotely.

So we know now that it is using a random MAC address on reboot.

> > My main
> > concern would be that the dev_addr is not initialized for those first
> > few messages so you may be leaking information.
> >
> >> This might have the effect of causing the NIC to learn 2 MAC addresses from
> >> an NC-SI link if the BMC uses OEM Get MAC Address commands to change its
> >> initial MAC address, but it shouldn't really matter. Who knows if NIC's
> >> even have MAC learning enabled from the out-of-band BMC link, lol.
> >>
> >> [1]: https://tinyurl.com/4933mhaj
> >> [2]: https://tinyurl.com/mr3tyadb
> >
> > The thing is the OpenBMC approach initializes the value themselves to
> > broadcast[3]. As a result the two code bases are essentially doing the
> > same thing since mac_addr is defaulted to the broadcast address when
> > the ncsi interface is registered.
>
> That’s a very good point, thanks for pointing that out, I hadn’t
> even noticed that!
>
> Anyways, let me know what you think of the traces I added above.
> Sorry for the delay, I’ve just been busy with some other stuff,
> but I do really actually care about upstreaming this (and several
> other NC-SI changes I’ll submit after this one, which are unrelated
> but more useful).
>
> Thanks,
> Peter

So the NC-SI spec says any value can be used for the source MAC and
that broadcast "may" be used. I would say there are some debugging
advantages to using broadcast that will be obvious in a packet trace.
I wonder if we couldn't look at doing something like requiring
broadcast or LAA if the gma_flag isn't set. With that we could at
least advertise that we don't expect this packet to be going out in a
real network as we cannot guarantee the MAC is unique.

2022-12-17 04:48:22

by Peter Delevoryas

[permalink] [raw]
Subject: Re: [PATCH] net/ncsi: Always use unicast source MAC address



> On Dec 16, 2022, at 10:29 AM, Alexander Duyck <[email protected]> wrote:
>
> On Thu, Dec 15, 2022 at 5:08 PM Peter Delevoryas <[email protected]> wrote:
>>
>>
>>
>>> On Dec 13, 2022, at 8:41 AM, Alexander H Duyck <[email protected]> wrote:
>>>
>>> On Mon, 2022-12-12 at 16:47 -0800, Peter Delevoryas wrote:
>>>> I use QEMU for development, and I noticed that NC-SI packets get dropped by
>>>> the Linux software bridge[1] because we use a broadcast source MAC address
>>>> for the first few NC-SI packets.
>>>
>>> Normally NC-SI packets should never be seen by a bridge.
>>
>> True, and it’s good to keep this in context. I’m trying to make this change
>> to support simulation environments, but any change in NC-SI could easily
>> result in the out-of-band network connection to BMC’s in real data centers
>> failing to come up, which can be really bad and usually impossible to
>> recover remotely.
>>
>>> Isn't NC-SI
>>> really supposed to just be between the BMC and the NIC firmware?
>>
>> Yep
>>
>>> Depending on your setup it might make more sense to use something like
>>> macvtap or a socket connection to just bypass the need for the bridge
>>> entirely.
>>
>> For unicast, yes, but I want to test multiple NIC’s sharing an RMII
>> link and verifying the broadcast behavior, and the failover behavior
>> when an RX or TX channel goes down.
>>
>> The multicast UDP socket backend _does_ work, but I was getting some
>> recirculation problems or some kind of buffering thing. I managed
>> to get tap0 + tap1 + br0 working faster.
>
> Right, but I think part of the issue is that things are being extended
> in a way that may actually hurt the maintainability of it.

Hmmmm yeah you might be right.

> Specifically it seems like the general idea is that the NCSI interface
> should be using either broadcast or the assigned unicast address as
> its source MAC address.

Yeah, most likely.

>
> My main concern with just using the raw MAC address from the device is
> that it may be something that would be more problematic than just
> being broadcast. My suggestion at a minimum would be to verify it is
> valid before we just use it or to do something like in the code I
> referenced where if the device doesn't have a valid MAC address we
> just overwrite it with broadcast.

+1

>
>>>
>>>> The spec requires that the destination MAC address is FF:FF:FF:FF:FF:FF,
>>>> but it doesn't require anything about the source MAC address as far as I
>>>> know. From testing on a few different NC-SI NIC's (Broadcom 57502, Nvidia
>>>> CX4, CX6) I don't think it matters to the network card. I mean, Meta has
>>>> been using this in mass production with millions of BMC's [2].
>>>>
>>>> In general, I think it's probably just a good idea to use a unicast MAC.
>>>
>>> I'm not sure I agree there. What is the initial value of the address?
>>
>> Ok so, to be honest, I thought that the BMC’s FTGMAC100 peripherals
>> came with addresses provisioned from the factory, and that we were just
>> discarding that value and using an address provisioned through the NIC,
>> because I hadn’t really dug into the FTGMAC100 datasheet fully. I see now
>> that the MAC address register I thought was a read-only manufacturing
>> value is actually 8 different MAC address r/w registers for filtering.
>> *facepalm*
>>
>> It suddenly makes a lot more sense why all these OEM Get MAC Address
>> commands exist: the BMC chip doesn’t come with any MAC addresses from
>> manufacturing. It’s a necessity, not some convenience artifact/etc.
>>
>> So, tracing some example systems to see what shows up:
>>
>> One example:
>> INIT: Entering runlevel: 5
>> Configuring network interfaces... [ 25.893118] 8021q: adding VLAN 0 to HW filter on device eth0
>> [ 25.904809] NCSI: source=ff:ff:ff:ff:ff:ff dest=ff:ff:ff:ff:ff:ff
>> [ 25.917307] ftgmac100 1e660000.ethernet eth0: NCSI: Handler for packet type 0x82 returned -19
>> [ 25.958096] NCSI: source=ff:ff:ff:ff:ff:ff dest=ff:ff:ff:ff:ff:ff
>> [ 25.978124] NCSI: source=ff:ff:ff:ff:ff:ff dest=ff:ff:ff:ff:ff:ff
>> [ 25.990559] ftgmac100 1e660000.ethernet eth0: NCSI: 'bad' packet ignored for type 0xd0
>> [ 26.018180] NCSI: source=ff:ff:ff:ff:ff:ff dest=ff:ff:ff:ff:ff:ff
>> [ 26.030631] ftgmac100 1e660000.ethernet eth0: NCSI: 'bad' packet ignored for type 0xd0
>> [ 26.046594] ftgmac100 1e660000.ethernet eth0: NCSI: transmit cmd 0x50 ncsi_oem_sma_mlx success during probe
>> [ 26.168109] NCSI: source=ff:ff:ff:ff:ff:ff dest=ff:ff:ff:ff:ff:ff
>> [ 26.198101] NCSI: source=ff:ff:ff:ff:ff:ff dest=ff:ff:ff:ff:ff:ff
>> [ 26.238237] NCSI: source=ff:ff:ff:ff:ff:ff dest=ff:ff:ff:ff:ff:ff
>> [ 26.272011] NCSI: source=ff:ff:ff:ff:ff:ff dest=ff:ff:ff:ff:ff:ff
>> [ 26.308155] NCSI: source=ff:ff:ff:ff:ff:ff dest=ff:ff:ff:ff:ff:ff
>> [ 26.320504] NCSI: source=ff:ff:ff:ff:ff:ff dest=ff:ff:ff:ff:ff:ff
>> done.
>> [ 26.408094] NCSI: source=ff:ff:ff:ff:ff:ff dest=ff:ff:ff:ff:ff:ff
>> [ 26.438100] NCSI: source=ff:ff:ff:ff:ff:ff dest=ff:ff:ff:ff:ff:ff
>> [ 26.450537] ftgmac100 1e660000.ethernet eth0: NCSI: bcm_gmac16 MAC RE:DA:CT:ED:HE:HE
>> Starting random number generator[ 26.472388] NCSI: source=ff:ff:ff:ff:ff:ff dest=ff:ff:ff:ff:ff:ff
>> daemon[ 26.518241] NCSI: source=ff:ff:ff:ff:ff:ff dest=ff:ff:ff:ff:ff:ff
>> [ 26.559504] NCSI: source=ff:ff:ff:ff:ff:ff dest=ff:ff:ff:ff:ff:ff
>> .
>> [ 26.608229] NCSI: source=ff:ff:ff:ff:ff:ff dest=ff:ff:ff:ff:ff:ff
>> Setup dhclient for IPv6... done.
>> [ 26.681879] NCSI: source=ff:ff:ff:ff:ff:ff dest=ff:ff:ff:ff:ff:ff
>> [ 26.730523] NCSI: source=ff:ff:ff:ff:ff:ff dest=ff:ff:ff:ff:ff:ff
>> [ 26.808191] NCSI: source=ff:ff:ff:ff:ff:ff dest=ff:ff:ff:ff:ff:ff
>> [ 26.855689] NCSI: source=ff:ff:ff:ff:ff:ff dest=ff:ff:ff:ff:ff:ff
>>
>> Oddly, due to that code you mentioned, all NC-SI packets are using
>> a broadcast source MAC address, even after the Get MAC Address sequence
>> gets the MAC provisioned for the BMC from the Broadcom NIC.
>>
>> root@bmc-oob:~# ifconfig
>> eth0 Link encap:Ethernet HWaddr RE:DA:CT:ED:HE:HE
>> inet addr:XXXXXXX Bcast:XXXXXXXX Mask:XXXXXXXX
>> inet6 addr: XXXXXXXX Scope:Global
>> inet6 addr: XXXXXXXX Scope:Link
>> inet6 addr: XXXXXXXX Scope:Global
>> UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
>> RX packets:2965 errors:0 dropped:0 overruns:0 frame:0
>> TX packets:637 errors:0 dropped:0 overruns:0 carrier:0
>> collisions:0 txqueuelen:1000
>> RX bytes:872759 (852.3 KiB) TX bytes:59936 (58.5 KiB)
>> Interrupt:19
>>
>> But, that’s a system using the 5.0 kernel with lots of old hacks
>> on top. A system using a 5.15 kernel with this change included:
>>
>> INIT: Entering runlevel: 5
>> Configuring network interfaces... [ 6.596537] 8021q: adding VLAN 0 to HW filter on device eth0
>> [ 6.609264] NCSI: source=RE:DA:CT:ED:AD:DR dest=ff:ff:ff:ff:ff:ff
>> [ 6.622913] ftgmac100 1e690000.ftgmac eth0: NCSI: Handler for packet type 0x82 returned -19
>> [ 6.641447] NCSI: source=RE:DA:CT:ED:AD:DR dest=ff:ff:ff:ff:ff:ff
>> [ 6.662543] NCSI: source=RE:DA:CT:ED:AD:DR dest=ff:ff:ff:ff:ff:ff
>> [ 6.680454] NCSI: source=RE:DA:CT:ED:AD:DR dest=ff:ff:ff:ff:ff:ff
>> [ 6.694114] ftgmac100 1e690000.ftgmac eth0: NCSI: transmit cmd 0x50 ncsi_oem_sma_mlx success during probe
>> [ 6.715722] NCSI: source=RE:DA:CT:ED:AD:DR dest=ff:ff:ff:ff:ff:ff
>> done.
>> [ 6.741372] NCSI: source=RE:DA:CT:ED:AD:DR dest=ff:ff:ff:ff:ff:ff
>> [ 6.741451] NCSI: source=RE:DA:CT:ED:AD:DR dest=ff:ff:ff:ff:ff:ff
>> [ 6.768714] NCSI: source=RE:DA:CT:ED:AD:DR dest=ff:ff:ff:ff:ff:ff
>> Starting random [ 6.782599] NCSI: source=RE:DA:CT:ED:AD:DR dest=ff:ff:ff:ff:ff:ff
>> number generator[ 6.799321] NCSI: source=RE:DA:CT:ED:AD:DR dest=ff:ff:ff:ff:ff:ff
>> daemon[ 6.815680] NCSI: source=RE:DA:CT:ED:AD:DR dest=ff:ff:ff:ff:ff:ff
>> [ 6.831388] NCSI: source=RE:DA:CT:ED:AD:DR dest=ff:ff:ff:ff:ff:ff
>> .
>> [ 6.846921] ftgmac100 1e690000.ftgmac eth0: NCSI: Network controller supports NC-SI 1.1, querying MAC address through OEM(0x8119) command
>> Setup dhclient for IPv6... done.
>> [ 6.908921] NCSI: source=RE:DA:CT:ED:AD:DR dest=ff:ff:ff:ff:ff:ff
>> reloading rsyslo[ 6.933085] NCSI: source=RE:DA:CT:ED:AD:DR dest=ff:ff:ff:ff:ff:ff
>>
>> So, this BMC already had the provisioned MAC address somehow,
>> even before the Nvidia Get MAC Address command towards the bottom.
>
> It is probably a hold over from the last boot. I suspect you would
> need a clean power-cycle to reset it back to non-modified values.

+1

>
>> Adding tracing to ftgmac100:
>>
>> [ 2.018672] ftgmac100_initial_mac
>> [ 2.026090] Read MAC address from FTGMAC100 register: RE:DA:CT:ED:AD:DR
>> [ 2.040771] ftgmac100 1e690000.ftgmac: Read MAC address RE:DA:CT:ED:AD:DR from chip
>> [ 2.057774] ftgmac100 1e690000.ftgmac: Using NCSI interface
>> [ 2.070957] ftgmac100 1e690000.ftgmac eth0: irq 33, mapped at (ptrval)
>>
>> Now, after rewriting the FTGMAC100 register to fa:ce:b0:0c:20:22 and rebooting:
>>
>> root@dhcp6-2620-10d-c0b9-4b08-0-0-0-e4e:~# devmem 0x1e690008 32 0x0000face
>> root@dhcp6-2620-10d-c0b9-4b08-0-0-0-e4e:~# devmem 0x1e690008
>> 0x0000FACE
>> root@dhcp6-2620-10d-c0b9-4b08-0-0-0-e4e:~# devmem 0x1e69000c 32 0xb00c2022
>> root@dhcp6-2620-10d-c0b9-4b08-0-0-0-e4e:~# devmem 0x1e69000c
>> 0xB00C2022
>>
>> [ 2.001304] ftgmac100_initial_mac
>> [ 2.008727] Read MAC address from FTGMAC100 register: fa:ce:b0:0c:20:22
>> [ 2.023373] ftgmac100 1e690000.ftgmac: Read MAC address fa:ce:b0:0c:20:22 from chip
>> [ 2.040367] ftgmac100 1e690000.ftgmac: Using NCSI interface
>>
>> [ 6.581239] ftgmac100_reset_mac
>> [ 6.589193] ftgmac100_reset_mac
>> [ 6.596727] 8021q: adding VLAN 0 to HW filter on device eth0
>> [ 6.609462] NCSI: source=fa:ce:b0:0c:20:22 dest=ff:ff:ff:ff:ff:ff
>> [ 6.623117] ftgmac100 1e690000.ftgmac eth0: NCSI: Handler for packet type 0x82 returned -19
>> [ 6.641647] NCSI: source=fa:ce:b0:0c:20:22 dest=ff:ff:ff:ff:ff:ff
>> [ 6.662398] NCSI: source=fa:ce:b0:0c:20:22 dest=ff:ff:ff:ff:ff:ff
>> [ 6.680380] NCSI: source=RE:DA:CT:ED:AD:DR dest=ff:ff:ff:ff:ff:ff
>> [ 6.694000] ftgmac100 1e690000.ftgmac eth0: NCSI: transmit cmd 0x50 ncsi_oem_sma_mlx success during probe
>> [ 6.715700] NCSI: source=RE:DA:CT:ED:AD:DR dest=ff:ff:ff:ff:ff:ff
>> [ 6.729528] NCSI: source=RE:DA:CT:ED:AD:DR dest=ff:ff:ff:ff:ff:ff
>>
>> So, it looks like whatever is initialized in ftgmac100_initial_mac becomes
>> the address we use for the NCSI queries initially.
>>
>> The Aspeed datasheet says the FTGMAC100 MAC address registers are initialized to zero,
>> and in that case the ftgmac100 driver initializes it to something random
>> with eth_hw_addr_random().
>>
>> So, I mean correct me if I’m wrong, but I think it all seems fine?
>>
>> On a hard power cycle (instead of just resetting the ARM cores, which doesn’t seem to
>> have reset the peripherals), maybe it would actually be zero, and get initialized
>> to the random value. I’ll test that, need to do some more debug image building to do it
>> remotely.
>
> So we know now that it is using a random MAC address on reboot.

+1

>
>>> My main
>>> concern would be that the dev_addr is not initialized for those first
>>> few messages so you may be leaking information.
>>>
>>>> This might have the effect of causing the NIC to learn 2 MAC addresses from
>>>> an NC-SI link if the BMC uses OEM Get MAC Address commands to change its
>>>> initial MAC address, but it shouldn't really matter. Who knows if NIC's
>>>> even have MAC learning enabled from the out-of-band BMC link, lol.
>>>>
>>>> [1]: https://tinyurl.com/4933mhaj
>>>> [2]: https://tinyurl.com/mr3tyadb
>>>
>>> The thing is the OpenBMC approach initializes the value themselves to
>>> broadcast[3]. As a result the two code bases are essentially doing the
>>> same thing since mac_addr is defaulted to the broadcast address when
>>> the ncsi interface is registered.
>>
>> That’s a very good point, thanks for pointing that out, I hadn’t
>> even noticed that!
>>
>> Anyways, let me know what you think of the traces I added above.
>> Sorry for the delay, I’ve just been busy with some other stuff,
>> but I do really actually care about upstreaming this (and several
>> other NC-SI changes I’ll submit after this one, which are unrelated
>> but more useful).
>>
>> Thanks,
>> Peter
>
> So the NC-SI spec says any value can be used for the source MAC and
> that broadcast "may" be used. I would say there are some debugging
> advantages to using broadcast that will be obvious in a packet trace.

Ehhhhh yeah I guess, but the ethertype is what I filter for. But sure,
a broadcast source MAC is pretty unique too.

> I wonder if we couldn't look at doing something like requiring
> broadcast or LAA if the gma_flag isn't set.

What is LAA? I’m out of the loop

But also: aren’t we already using broadcast if the gma_flag isn’t set?

- if (nca->ndp->gma_flag == 1)
- memcpy(eh->h_source, nca->ndp->ndev.dev->dev_addr, ETH_ALEN);
- else
- eth_broadcast_addr(eh->h_source);
+ memcpy(eh->h_source, nca->ndp->ndev.dev->dev_addr, ETH_ALEN);


> With that we could at
> least advertise that we don't expect this packet to be going out in a
> real network as we cannot guarantee the MAC is unique.

Yeah, but it probably wouldn’t help my simulation scenario.

I guess it sounds like this patch is not a good idea, which to be fair,
is totally reasonable.

I can just add some iptables rules to tunnel these packets with a different
source MAC, or fix the multicast socket issue I was having. It’s really
not a big deal, and like you’re saying, we probably don’t want to make
it harder to maintain _forever_.

I would just suggest praying for the next guy that tries to test NC-SI
stuff with QEMU and finds out NC-SI traffic gets dropped by bridges.
I had to resort to reading the source code and printing stuff with
BPF to identify this. Maybe it’s more obvious to other people this wouldn’t
work though.

2022-12-17 21:43:28

by Alexander Duyck

[permalink] [raw]
Subject: Re: [PATCH] net/ncsi: Always use unicast source MAC address

On Fri, Dec 16, 2022 at 8:20 PM Peter Delevoryas <[email protected]> wrote:
>
>
>
> > On Dec 16, 2022, at 10:29 AM, Alexander Duyck <[email protected]> wrote:
> >
> > On Thu, Dec 15, 2022 at 5:08 PM Peter Delevoryas <[email protected]> wrote:
> >>
> >>
> >>
> >>> On Dec 13, 2022, at 8:41 AM, Alexander H Duyck <[email protected]> wrote:
> >>>
> >>> On Mon, 2022-12-12 at 16:47 -0800, Peter Delevoryas wrote:

<...>

> >
> >>> My main
> >>> concern would be that the dev_addr is not initialized for those first
> >>> few messages so you may be leaking information.
> >>>
> >>>> This might have the effect of causing the NIC to learn 2 MAC addresses from
> >>>> an NC-SI link if the BMC uses OEM Get MAC Address commands to change its
> >>>> initial MAC address, but it shouldn't really matter. Who knows if NIC's
> >>>> even have MAC learning enabled from the out-of-band BMC link, lol.
> >>>>
> >>>> [1]: https://tinyurl.com/4933mhaj
> >>>> [2]: https://tinyurl.com/mr3tyadb
> >>>
> >>> The thing is the OpenBMC approach initializes the value themselves to
> >>> broadcast[3]. As a result the two code bases are essentially doing the
> >>> same thing since mac_addr is defaulted to the broadcast address when
> >>> the ncsi interface is registered.
> >>
> >> That’s a very good point, thanks for pointing that out, I hadn’t
> >> even noticed that!
> >>
> >> Anyways, let me know what you think of the traces I added above.
> >> Sorry for the delay, I’ve just been busy with some other stuff,
> >> but I do really actually care about upstreaming this (and several
> >> other NC-SI changes I’ll submit after this one, which are unrelated
> >> but more useful).
> >>
> >> Thanks,
> >> Peter
> >
> > So the NC-SI spec says any value can be used for the source MAC and
> > that broadcast "may" be used. I would say there are some debugging
> > advantages to using broadcast that will be obvious in a packet trace.
>
> Ehhhhh yeah I guess, but the ethertype is what I filter for. But sure,
> a broadcast source MAC is pretty unique too.
>
> > I wonder if we couldn't look at doing something like requiring
> > broadcast or LAA if the gma_flag isn't set.
>
> What is LAA? I’m out of the loop

Locally administered MAC address[4]. Basically it is a MAC address
that is generated locally such as your random MAC address. Assuming
the other end of the NC-SI link is using a MAC address with a vendor
OUI there should be no risk of collisions on a point-to-point link.
Essentially if you wanted to you could probably just generate a random
MAC address for the NCSI protocol and then use that in place of the
broadcast address.

> But also: aren’t we already using broadcast if the gma_flag isn’t set?
>
> - if (nca->ndp->gma_flag == 1)
> - memcpy(eh->h_source, nca->ndp->ndev.dev->dev_addr, ETH_ALEN);
> - else
> - eth_broadcast_addr(eh->h_source);
> + memcpy(eh->h_source, nca->ndp->ndev.dev->dev_addr, ETH_ALEN);

That I am not sure about. You were using this kernel without your
patch right? With your patch it would make sense to see that behavior,
but without I am not sure why you would see that address for any NC-SI
commands before the gma_flag is set.

>
> > With that we could at
> > least advertise that we don't expect this packet to be going out in a
> > real network as we cannot guarantee the MAC is unique.
>
> Yeah, but it probably wouldn’t help my simulation scenario.
>
> I guess it sounds like this patch is not a good idea, which to be fair,
> is totally reasonable.
>
> I can just add some iptables rules to tunnel these packets with a different
> source MAC, or fix the multicast socket issue I was having. It’s really
> not a big deal, and like you’re saying, we probably don’t want to make
> it harder to maintain _forever_.

Like I said before I would be good with either a Broadcast address OR
a LAA address. The one thing we need to watch out for though is any
sort of leak. One possible concern would be if for example you had 4
ports using 4 different MAC addresses but one BMC. You don't want to
accidently leak the MAC address from one port onto the other one. With
a LAA address if it were to leak and screw up ARP tables somewhere it
wouldn't be a big deal since it isn't expected to be switched in the
first place.

> I would just suggest praying for the next guy that tries to test NC-SI
> stuff with QEMU and finds out NC-SI traffic gets dropped by bridges.
> I had to resort to reading the source code and printing stuff with
> BPF to identify this. Maybe it’s more obvious to other people this wouldn’t
> work though.

Well it seems like NC-SI isn't meant to be bridged based on the fact
that it is using a broadcast MAC address as a source. If nothing else
I suppose you could try to work with the standards committee on that
to see what can be done to make the protocol more portable.. :-)

[4]: https://macaddress.io/faq/what-are-a-universal-address-and-a-local-administered-address

2022-12-19 04:24:21

by Peter Delevoryas

[permalink] [raw]
Subject: Re: [PATCH] net/ncsi: Always use unicast source MAC address



> On Dec 17, 2022, at 12:57 PM, Alexander Duyck <[email protected]> wrote:
>
> On Fri, Dec 16, 2022 at 8:20 PM Peter Delevoryas <[email protected]> wrote:
>>
>>
>>
>>> On Dec 16, 2022, at 10:29 AM, Alexander Duyck <[email protected]> wrote:
>>>
>>> On Thu, Dec 15, 2022 at 5:08 PM Peter Delevoryas <[email protected]> wrote:
>>>>
>>>>
>>>>
>>>>> On Dec 13, 2022, at 8:41 AM, Alexander H Duyck <[email protected]> wrote:
>>>>>
>>>>> On Mon, 2022-12-12 at 16:47 -0800, Peter Delevoryas wrote:
>
> <...>
>
>>>
>>>>> My main
>>>>> concern would be that the dev_addr is not initialized for those first
>>>>> few messages so you may be leaking information.
>>>>>
>>>>>> This might have the effect of causing the NIC to learn 2 MAC addresses from
>>>>>> an NC-SI link if the BMC uses OEM Get MAC Address commands to change its
>>>>>> initial MAC address, but it shouldn't really matter. Who knows if NIC's
>>>>>> even have MAC learning enabled from the out-of-band BMC link, lol.
>>>>>>
>>>>>> [1]: https://tinyurl.com/4933mhaj
>>>>>> [2]: https://tinyurl.com/mr3tyadb
>>>>>
>>>>> The thing is the OpenBMC approach initializes the value themselves to
>>>>> broadcast[3]. As a result the two code bases are essentially doing the
>>>>> same thing since mac_addr is defaulted to the broadcast address when
>>>>> the ncsi interface is registered.
>>>>
>>>> That’s a very good point, thanks for pointing that out, I hadn’t
>>>> even noticed that!
>>>>
>>>> Anyways, let me know what you think of the traces I added above.
>>>> Sorry for the delay, I’ve just been busy with some other stuff,
>>>> but I do really actually care about upstreaming this (and several
>>>> other NC-SI changes I’ll submit after this one, which are unrelated
>>>> but more useful).
>>>>
>>>> Thanks,
>>>> Peter
>>>
>>> So the NC-SI spec says any value can be used for the source MAC and
>>> that broadcast "may" be used. I would say there are some debugging
>>> advantages to using broadcast that will be obvious in a packet trace.
>>
>> Ehhhhh yeah I guess, but the ethertype is what I filter for. But sure,
>> a broadcast source MAC is pretty unique too.
>>
>>> I wonder if we couldn't look at doing something like requiring
>>> broadcast or LAA if the gma_flag isn't set.
>>
>> What is LAA? I’m out of the loop
>
> Locally administered MAC address[4]. Basically it is a MAC address
> that is generated locally such as your random MAC address. Assuming
> the other end of the NC-SI link is using a MAC address with a vendor
> OUI there should be no risk of collisions on a point-to-point link.
> Essentially if you wanted to you could probably just generate a random
> MAC address for the NCSI protocol and then use that in place of the
> broadcast address.
>
>> But also: aren’t we already using broadcast if the gma_flag isn’t set?
>>
>> - if (nca->ndp->gma_flag == 1)
>> - memcpy(eh->h_source, nca->ndp->ndev.dev->dev_addr, ETH_ALEN);
>> - else
>> - eth_broadcast_addr(eh->h_source);
>> + memcpy(eh->h_source, nca->ndp->ndev.dev->dev_addr, ETH_ALEN);
>
> That I am not sure about. You were using this kernel without your
> patch right? With your patch it would make sense to see that behavior,
> but without I am not sure why you would see that address for any NC-SI
> commands before the gma_flag is set.
>
>>
>>> With that we could at
>>> least advertise that we don't expect this packet to be going out in a
>>> real network as we cannot guarantee the MAC is unique.
>>
>> Yeah, but it probably wouldn’t help my simulation scenario.
>>
>> I guess it sounds like this patch is not a good idea, which to be fair,
>> is totally reasonable.
>>
>> I can just add some iptables rules to tunnel these packets with a different
>> source MAC, or fix the multicast socket issue I was having. It’s really
>> not a big deal, and like you’re saying, we probably don’t want to make
>> it harder to maintain _forever_.
>
> Like I said before I would be good with either a Broadcast address OR
> a LAA address. The one thing we need to watch out for though is any
> sort of leak. One possible concern would be if for example you had 4
> ports using 4 different MAC addresses but one BMC. You don't want to
> accidently leak the MAC address from one port onto the other one. With
> a LAA address if it were to leak and screw up ARP tables somewhere it
> wouldn't be a big deal since it isn't expected to be switched in the
> first place.
>
>> I would just suggest praying for the next guy that tries to test NC-SI
>> stuff with QEMU and finds out NC-SI traffic gets dropped by bridges.
>> I had to resort to reading the source code and printing stuff with
>> BPF to identify this. Maybe it’s more obvious to other people this wouldn’t
>> work though.
>
> Well it seems like NC-SI isn't meant to be bridged based on the fact
> that it is using a broadcast MAC address as a source. If nothing else
> I suppose you could try to work with the standards committee on that
> to see what can be done to make the protocol more portable.. :-)

Well, I started preparing some of my other patches to send, and while
digging up the history for that, I happened to notice this commit
completely by chance while browsing Github:

https://github.com/facebook/openbmc-linux/commit/933b5bd024d28f48a6359e6a9db631f778ba9ea7

[openbmc.quanta][PR] FBAL:Fixed NCSI can't work when import BR function

Summary:
As title.
Pull Request resolved: https://github.com/facebookexternal/openbmc.quanta/pull/1668
GitHub Author: Peter <[email protected]>

diff --git a/meta-aspeed/recipes-kernel/linux/files/linux-aspeed-5.0/net/bridge/br_input.c b/meta-aspeed/recipes-kernel/linux/files/linux-aspeed-5.0/net/bridge/br_input.c
index 5ea7e56119c1..8ef0b627f5ec 100644
--- a/meta-aspeed/recipes-kernel/linux/files/linux-aspeed-5.0/net/bridge/br_input.c
+++ b/meta-aspeed/recipes-kernel/linux/files/linux-aspeed-5.0/net/bridge/br_input.c
@@ -220,6 +220,9 @@ rx_handler_result_t br_handle_frame(struct sk_buff **pskb)
if (unlikely(skb->pkt_type == PACKET_LOOPBACK))
return RX_HANDLER_PASS;

+ if (skb->protocol == cpu_to_be16(ETH_P_NCSI))
+ return RX_HANDLER_PASS;
+
if (!is_valid_ether_addr(eth_hdr(skb)->h_source))
goto drop;

Which is accomplishing the same thing I suggested in my patch, except
that it’s modifying the Linux bridge code instead of changing the NC-SI
packets’ source MAC address.

To explain what I *think* this person was doing...

Meta has a system called Zion that’s described here:

https://engineering.fb.com/2019/03/14/data-center-engineering/accelerating-infrastructure/

It consists of two chassis, “Angel's Landing” and “Emerald Pools”.

Together, it’s kinda like an Nvidia DGX A100 system, but with generic
PCIe switches, and “OCP Accelerators”. There’s like an AMD GPU or an
Intel accelerator that can fit there. Maybe an A100 can fit too? I’m
not really completely clear on how its being used compared to GrandTeton,
announced at OCP 2022, which is even closer to the DGX architecture,
but yeah.

Angel’s Landing is 4 dual-socket boards stacked together, each board
with a BMC and NIC supporting NC-SI. I think in practice we reduced
this to 1-2 dual-socket boards, each with 2 NIC’s (presumably cause
we don't need that many CPU's but still need the network bandwidth).

Emerald Pools is a single board and 8 accelerator modules, and
the board has a BMC on it. To get network connectivity to the BMC,
there’s a USB from Emerald Pools to one of the Angel’s Landing BMC's
and the Angel’s Landing BMC bridges Emerald Pools traffic through
its NIC. If this doesn’t make sense, I think this is the whole setup
(Omitting the device tree and some MAC filtering stuff):

On an Angel’s Landing BMC:

$ ip link add br0 type bridge
$ ip link set eth0 master br0
$ ip link set eth1 master br0
$ ip link set usb0 master br0

And on the Emerald Pools BMC, there’s just a usb net intf:

$ ifconfig
lo ….

usb0 Link encap:Ethernet HWaddr xxxxxxxxxxx
inet6 addr: xxxxxx Scope:Link
inet6 addr: xxxxxx Scope:Global
inet6 addr: xxxxxx Scope:Global
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:999332 errors:0 dropped:0 overruns:0 frame:0
TX packets:594253 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:211829527 (202.0 MiB) TX bytes:150569888 (143.5 MiB)

Anyways, so then my question was: is Zion actually relying on NC-SI
packets traversing a bridge?

The Emerald Pools BMC doesn’t have NC-SI enabled at all, not even a
userspace daemon or utility of any kind.

NC-SI *is* enabled and used on the Angel's Landing BMC, so I checked
to see if they traverse the bridge (in QEMU, I didn’t check on a real
system):

root@bmc-oob:~# tcpdump -i br0 -v "ether proto 0x88f8" &
[1] 12045
root@bmc-oob:~# [ 1434.520314] device br0 entered promiscuous mode
tcpdump: listening on br0, link-type EN10MB (Ethernet), snapshot length 262144 bytes
ifconfig eth0 down
[ 1442.863305] br0: port 1(eth0) entered disabled state
root@bmc-oob:~# ifconfig eth0 up
[ 1445.978424] br0: port 1(eth0) entered blocking state
[ 1445.978743] br0: port 1(eth0) entered forwarding state
[ 1445.979131] 8021q: adding VLAN 0 to HW filter on device eth0
[ 1445.979814] ftgmac100 1e660000.ethernet eth0: NCSI: Handler for packet type 0x82 returned -19
root@bmc-oob:~# tcpdump -i eth0 -v "ether proto 0x88f8" &
[2] 12258
root@bmc-oob:~# tcpdump: listening on eth0, link-type EN10MB (Ethernet), snapshot length 262144 bytes
ifcon04:58:49.464810 fa:ce:b0:02:20:22 (oui Unknown) > Broadcast, ethertype Unknown (0x88f8), length 60:
0x0000: 0001 0068 0a00 0000 0000 0000 0000 0000 ...h............
0x0010: ffff f597 0000 0000 0000 0000 0000 0000 ................
0x0020: 0000 0000 0000 0000 0000 0000 0000 ..............
04:58:49.465099 Broadcast > Broadcast, ethertype Unknown (0x88f8), length 64:
0x0000: 0001 0068 8a00 0010 0000 0000 0000 0000 ...h............
0x0010: 0000 0000 0000 0001 0000 0000 0000 0000 ................
0x0020: ffff 7586 0000 0000 0000 0000 0000 d8cd ..u.............
0x0030: c6bc ..
04:58:49.471206 fa:ce:b0:02:20:22 (oui Unknown) > Broadcast, ethertype Unknown (0x88f8), length 60:
0x0000: 0001 0069 1500 0000 0000 0000 0000 0000 ...i............
0x0010: ffff ea96 0000 0000 0000 0000 0000 0000 ................
0x0020: 0000 0000 0000 0000 0000 0000 0000 ..............
04:58:49.471432 Broadcast > Broadcast, ethertype Unknown (0x88f8), length 78:
0x0000: 0001 0069 9500 0028 0000 0000 0000 0000 ...i...(........
0x0010: 0000 0000 f1f0 f000 0000 0000 0000 0000 ................
0x0020: 0000 0000 0000 0000 0000 0000 0000 0000 ................
0x0030: 0000 0000 0000 8119 fffd 0765 84e0 9fa4 ...........e….

So, I’m able to see packets on eth0, but so far I haven’t really seen
anything hitting the bridge. ¯\_(ツ)_/¯

Perhaps if there’s some cross-interface NC-SI traffic (eth0 <-> eth1), then
yes this would occur. But I don’t know why that would even happen? Regular
NC-SI failover or bonding (eth0, eth1) would be the actual solution? not sure.

The original commit was very vague, so perhaps I’ll follow up with
the author and reviewer to see if this patch was actually necessary.

>
> [4]: https://macaddress.io/faq/what-are-a-universal-address-and-a-local-administered-address