On 1/18/2023 2:11 PM, Daniel Vacek wrote:
> On Wed, Jan 18, 2023 at 9:59 PM Jacob Keller <[email protected]> wrote:
>> On 1/18/2023 7:14 AM, Daniel Vacek wrote:
>> 1) request tx timestamp
>> 2) timestamp occurs
>> 3) link goes down while processing
>
> I was thinking this is the case we got reported. But then again, I'm
> not really experienced in this field.
>
I think it might be, or at least something similar to this.
I think that can be fixed with the link check you added. I think we
actually have a copy of the current link status in the ice_ptp or
ice_ptp_tx structure which could be used instead of having to check back
to the other structure.
I'm just hoping not to re-introduce bugs related to the hardware
interrupt counter that we had which results in preventing all future
timestamp interrupts.
> --nX
>
>> 1) link down
>> 2) request tx timestamp rejected
>>
>> Thanks!
>>
>> -Jake
>
On Wed, Jan 18, 2023 at 11:22 PM Jacob Keller <[email protected]> wrote:
> On 1/18/2023 2:11 PM, Daniel Vacek wrote:
> > On Wed, Jan 18, 2023 at 9:59 PM Jacob Keller <[email protected]> wrote:
> >> On 1/18/2023 7:14 AM, Daniel Vacek wrote:
> >> 1) request tx timestamp
> >> 2) timestamp occurs
> >> 3) link goes down while processing
> >
> > I was thinking this is the case we got reported. But then again, I'm
> > not really experienced in this field.
> >
>
> I think it might be, or at least something similar to this.
>
> I think that can be fixed with the link check you added. I think we
> actually have a copy of the current link status in the ice_ptp or
> ice_ptp_tx structure which could be used instead of having to check back
> to the other structure.
If you're talking about ptp_port->link_up that one is always false no
matter the actual NIC link status. First I wanted to use it but
checking all the 8 devices available in the dump data it just does not
match the net_dev->state or the port_info->phy.link_info.link_info
crash> net_device.name,state 0xff48df6f0c553000
name = "ens1f1",
state = 0x7, // DOWN
crash> ice_port_info.phy.link_info.link_info 0xff48df6f05dca018
phy.link_info.link_info = 0xc0, // DOWN
crash> ice_ptp_port.port_num,link_up 0xff48df6f05dd44e0
port_num = 0x1
link_up = 0x0, // False
crash> net_device.name,state 0xff48df6f25e3f000
name = "ens1f0",
state = 0x3, // UP
crash> ice_port_info.phy.link_info.link_info 0xff48df6f070a3018
phy.link_info.link_info = 0xe1, // UP
crash> ice_ptp_port.port_num,link_up 0xff48df6f063184e0
port_num = 0x0
link_up = 0x0, // False
crash> ice_ptp_port.port_num,link_up 0xff48df6f25b844e0
port_num = 0x2
link_up = 0x0, // False even this device is UP
crash> ice_ptp_port.port_num,link_up 0xff48df6f140384e0
port_num = 0x3
link_up = 0x0, // False even this device is UP
crash> ice_ptp_port.port_num,link_up 0xff48df6f055044e0
port_num = 0x0
link_up = 0x0, // False even this device is UP
crash> ice_ptp_port.port_num,link_up 0xff48df6f251cc4e0
port_num = 0x1
link_up = 0x0,
crash> ice_ptp_port.port_num,link_up 0xff48df6f33a9c4e0
port_num = 0x2
link_up = 0x0,
crash> ice_ptp_port.port_num,link_up 0xff48df6f3bb7c4e0
port_num = 0x3
link_up = 0x0,
In other words, the ice_ptp_port.link_up is always false and cannot be
used. That's why I had to fall back to
hw->port_info->phy.link_info.link_info
--nX
> I'm just hoping not to re-introduce bugs related to the hardware
> interrupt counter that we had which results in preventing all future
> timestamp interrupts.
>
> > --nX
> >
> >> 1) link down
> >> 2) request tx timestamp rejected
> >>
> >> Thanks!
> >>
> >> -Jake
> >
>
On 1/19/2023 1:38 AM, Daniel Vacek wrote:
> On Wed, Jan 18, 2023 at 11:22 PM Jacob Keller <[email protected]> wrote:
>> On 1/18/2023 2:11 PM, Daniel Vacek wrote:
>>> On Wed, Jan 18, 2023 at 9:59 PM Jacob Keller <[email protected]> wrote:
>>>> On 1/18/2023 7:14 AM, Daniel Vacek wrote:
>>>> 1) request tx timestamp
>>>> 2) timestamp occurs
>>>> 3) link goes down while processing
>>>
>>> I was thinking this is the case we got reported. But then again, I'm
>>> not really experienced in this field.
>>>
>>
>> I think it might be, or at least something similar to this.
>>
>> I think that can be fixed with the link check you added. I think we
>> actually have a copy of the current link status in the ice_ptp or
>> ice_ptp_tx structure which could be used instead of having to check back
>> to the other structure.
>
> If you're talking about ptp_port->link_up that one is always false no
> matter the actual NIC link status. First I wanted to use it but
> checking all the 8 devices available in the dump data it just does not
> match the net_dev->state or the port_info->phy.link_info.link_info
>
> crash> net_device.name,state 0xff48df6f0c553000
> name = "ens1f1",
> state = 0x7, // DOWN
> crash> ice_port_info.phy.link_info.link_info 0xff48df6f05dca018
> phy.link_info.link_info = 0xc0, // DOWN
> crash> ice_ptp_port.port_num,link_up 0xff48df6f05dd44e0
> port_num = 0x1
> link_up = 0x0, // False
>
> crash> net_device.name,state 0xff48df6f25e3f000
> name = "ens1f0",
> state = 0x3, // UP
> crash> ice_port_info.phy.link_info.link_info 0xff48df6f070a3018
> phy.link_info.link_info = 0xe1, // UP
> crash> ice_ptp_port.port_num,link_up 0xff48df6f063184e0
> port_num = 0x0
> link_up = 0x0, // False
>
> crash> ice_ptp_port.port_num,link_up 0xff48df6f25b844e0
> port_num = 0x2
> link_up = 0x0, // False even this device is UP
> crash> ice_ptp_port.port_num,link_up 0xff48df6f140384e0
> port_num = 0x3
> link_up = 0x0, // False even this device is UP
> crash> ice_ptp_port.port_num,link_up 0xff48df6f055044e0
> port_num = 0x0
> link_up = 0x0, // False even this device is UP
> crash> ice_ptp_port.port_num,link_up 0xff48df6f251cc4e0
> port_num = 0x1
> link_up = 0x0,
> crash> ice_ptp_port.port_num,link_up 0xff48df6f33a9c4e0
> port_num = 0x2
> link_up = 0x0,
> crash> ice_ptp_port.port_num,link_up 0xff48df6f3bb7c4e0
> port_num = 0x3
> link_up = 0x0,
>
> In other words, the ice_ptp_port.link_up is always false and cannot be
> used. That's why I had to fall back to
> hw->port_info->phy.link_info.link_info
>
Hmm. We call ice_ptp_link_change in ice_link_event which is called from
ice_handle_link_event...
In ice_link_event, a local link_up field is set based on
phy_info->link_info.link_info & ICE_AQ_LINK_UP
What kernel are you testing on? Does it include 6b1ff5d39228 ("ice:
always call ice_ptp_link_change and make it void")?
Prior to this commit the field was only valid for E822 devices, but I
fixed that as it was used for other checks as well.
I am guessing that the Red Hat kernel you are using lacks several of
these clean ups and fixes.
For the current code in the net-next kernel I believe we can safely use
the ptp_port->link_up field.
Thanks,
Jake
On Thu, Jan 19, 2023 at 8:25 PM Jacob Keller <[email protected]> wrote:
> On 1/19/2023 1:38 AM, Daniel Vacek wrote:
> > On Wed, Jan 18, 2023 at 11:22 PM Jacob Keller <[email protected]> wrote:
> >> On 1/18/2023 2:11 PM, Daniel Vacek wrote:
> >>> On Wed, Jan 18, 2023 at 9:59 PM Jacob Keller <[email protected]> wrote:
> >>>> On 1/18/2023 7:14 AM, Daniel Vacek wrote:
> >>>> 1) request tx timestamp
> >>>> 2) timestamp occurs
> >>>> 3) link goes down while processing
> >>>
> >>> I was thinking this is the case we got reported. But then again, I'm
> >>> not really experienced in this field.
> >>>
> >>
> >> I think it might be, or at least something similar to this.
> >>
> >> I think that can be fixed with the link check you added. I think we
> >> actually have a copy of the current link status in the ice_ptp or
> >> ice_ptp_tx structure which could be used instead of having to check back
> >> to the other structure.
> >
> > If you're talking about ptp_port->link_up that one is always false no
> > matter the actual NIC link status. First I wanted to use it but
> > checking all the 8 devices available in the dump data it just does not
> > match the net_dev->state or the port_info->phy.link_info.link_info
> >
> > crash> net_device.name,state 0xff48df6f0c553000
> > name = "ens1f1",
> > state = 0x7, // DOWN
> > crash> ice_port_info.phy.link_info.link_info 0xff48df6f05dca018
> > phy.link_info.link_info = 0xc0, // DOWN
> > crash> ice_ptp_port.port_num,link_up 0xff48df6f05dd44e0
> > port_num = 0x1
> > link_up = 0x0, // False
> >
> > crash> net_device.name,state 0xff48df6f25e3f000
> > name = "ens1f0",
> > state = 0x3, // UP
> > crash> ice_port_info.phy.link_info.link_info 0xff48df6f070a3018
> > phy.link_info.link_info = 0xe1, // UP
> > crash> ice_ptp_port.port_num,link_up 0xff48df6f063184e0
> > port_num = 0x0
> > link_up = 0x0, // False
> >
> > crash> ice_ptp_port.port_num,link_up 0xff48df6f25b844e0
> > port_num = 0x2
> > link_up = 0x0, // False even this device is UP
> > crash> ice_ptp_port.port_num,link_up 0xff48df6f140384e0
> > port_num = 0x3
> > link_up = 0x0, // False even this device is UP
> > crash> ice_ptp_port.port_num,link_up 0xff48df6f055044e0
> > port_num = 0x0
> > link_up = 0x0, // False even this device is UP
> > crash> ice_ptp_port.port_num,link_up 0xff48df6f251cc4e0
> > port_num = 0x1
> > link_up = 0x0,
> > crash> ice_ptp_port.port_num,link_up 0xff48df6f33a9c4e0
> > port_num = 0x2
> > link_up = 0x0,
> > crash> ice_ptp_port.port_num,link_up 0xff48df6f3bb7c4e0
> > port_num = 0x3
> > link_up = 0x0,
> >
> > In other words, the ice_ptp_port.link_up is always false and cannot be
> > used. That's why I had to fall back to
> > hw->port_info->phy.link_info.link_info
> >
>
> Hmm. We call ice_ptp_link_change in ice_link_event which is called from
> ice_handle_link_event...
>
> In ice_link_event, a local link_up field is set based on
> phy_info->link_info.link_info & ICE_AQ_LINK_UP
>
> What kernel are you testing on? Does it include 6b1ff5d39228 ("ice:
> always call ice_ptp_link_change and make it void")?
>
> Prior to this commit the field was only valid for E822 devices, but I
> fixed that as it was used for other checks as well.
>
> I am guessing that the Red Hat kernel you are using lacks several of
> these clean ups and fixes.
Yeah, makes perfect sense. We don't have that commit in 8.4. All the data
I have and present here are from 4.18.0-305.49.1.rt7.121.el8_4.x86_64
> For the current code in the net-next kernel I believe we can safely use
> the ptp_port->link_up field.
I'll fix that up and drop you a v3. Thank you for the review.
--nX
> Thanks,
> Jake