2018-05-10 07:30:39

by Benjamin Poirier

[permalink] [raw]
Subject: [PATCH] e1000e: Ignore TSYNCRXCTL when getting I219 clock attributes

There have been multiple reports of crashes that look like
kernel: RIP: 0010:[<ffffffff8110303f>] timecounter_read+0xf/0x50
[...]
kernel: Call Trace:
kernel: [<ffffffffa0806b0f>] e1000e_phc_gettime+0x2f/0x60 [e1000e]
kernel: [<ffffffffa0806c5d>] e1000e_systim_overflow_work+0x1d/0x80 [e1000e]
kernel: [<ffffffff810992c5>] process_one_work+0x155/0x440
kernel: [<ffffffff81099e16>] worker_thread+0x116/0x4b0
kernel: [<ffffffff8109f422>] kthread+0xd2/0xf0
kernel: [<ffffffff8163184f>] ret_from_fork+0x3f/0x70

These can be traced back to the fact that e1000e_systim_reset() skips the
timecounter_init() call if e1000e_get_base_timinca() returns -EINVAL, which
leads to a null deref in timecounter_read().

Commit 83129b37ef35 ("e1000e: fix systim issues", v4.2-rc1) reworked
e1000e_get_base_timinca() in such a way that it can return -EINVAL for
e1000_pch_spt if the SYSCFI bit is not set in TSYNCRXCTL.

Some experimentation has shown that on I219 (e1000_pch_spt, "MAC: 12")
adapters, the E1000_TSYNCRXCTL_SYSCFI flag is unstable; TSYNCRXCTL reads
sometimes don't have the SYSCFI bit set. Retrying the read shortly after
finds the bit to be set. This was observed at boot (probe) but also link up
and link down.

Moreover, the phc (PTP Hardware Clock) seems to operate normally even after
reads where SYSCFI=0. Therefore, remove this register read and
unconditionally set the clock parameters.

Reported-by: Achim Mildenberger <[email protected]>
Message-Id: <20180425065243.g5mqewg5irkwgwgv@f2>
Bugzilla: https://bugzilla.suse.com/show_bug.cgi?id=1075876
Fixes: 83129b37ef35 ("e1000e: fix systim issues")
Signed-off-by: Benjamin Poirier <[email protected]>
---
drivers/net/ethernet/intel/e1000e/netdev.c | 15 ++++++---------
1 file changed, 6 insertions(+), 9 deletions(-)

diff --git a/drivers/net/ethernet/intel/e1000e/netdev.c b/drivers/net/ethernet/intel/e1000e/netdev.c
index ec4a9759a6f2..3afb1f3b6f91 100644
--- a/drivers/net/ethernet/intel/e1000e/netdev.c
+++ b/drivers/net/ethernet/intel/e1000e/netdev.c
@@ -3546,15 +3546,12 @@ s32 e1000e_get_base_timinca(struct e1000_adapter *adapter, u32 *timinca)
}
break;
case e1000_pch_spt:
- if (er32(TSYNCRXCTL) & E1000_TSYNCRXCTL_SYSCFI) {
- /* Stable 24MHz frequency */
- incperiod = INCPERIOD_24MHZ;
- incvalue = INCVALUE_24MHZ;
- shift = INCVALUE_SHIFT_24MHZ;
- adapter->cc.shift = shift;
- break;
- }
- return -EINVAL;
+ /* Stable 24MHz frequency */
+ incperiod = INCPERIOD_24MHZ;
+ incvalue = INCVALUE_24MHZ;
+ shift = INCVALUE_SHIFT_24MHZ;
+ adapter->cc.shift = shift;
+ break;
case e1000_pch_cnp:
if (er32(TSYNCRXCTL) & E1000_TSYNCRXCTL_SYSCFI) {
/* Stable 24MHz frequency */
--
2.16.3



2018-05-10 18:44:36

by Jacob Keller

[permalink] [raw]
Subject: RE: [PATCH] e1000e: Ignore TSYNCRXCTL when getting I219 clock attributes

> -----Original Message-----
> From: Benjamin Poirier [mailto:[email protected]]
> Sent: Thursday, May 10, 2018 12:29 AM
> To: Kirsher, Jeffrey T <[email protected]>
> Cc: Keller, Jacob E <[email protected]>; Achim Mildenberger
> <[email protected]>; [email protected];
> [email protected]; [email protected]; [email protected];
> [email protected]; [email protected];
> [email protected]; [email protected]
> Subject: [PATCH] e1000e: Ignore TSYNCRXCTL when getting I219 clock attributes
>
> There have been multiple reports of crashes that look like
> kernel: RIP: 0010:[<ffffffff8110303f>] timecounter_read+0xf/0x50
> [...]
> kernel: Call Trace:
> kernel: [<ffffffffa0806b0f>] e1000e_phc_gettime+0x2f/0x60 [e1000e]
> kernel: [<ffffffffa0806c5d>] e1000e_systim_overflow_work+0x1d/0x80 [e1000e]
> kernel: [<ffffffff810992c5>] process_one_work+0x155/0x440
> kernel: [<ffffffff81099e16>] worker_thread+0x116/0x4b0
> kernel: [<ffffffff8109f422>] kthread+0xd2/0xf0
> kernel: [<ffffffff8163184f>] ret_from_fork+0x3f/0x70
>
> These can be traced back to the fact that e1000e_systim_reset() skips the
> timecounter_init() call if e1000e_get_base_timinca() returns -EINVAL, which
> leads to a null deref in timecounter_read().
>
> Commit 83129b37ef35 ("e1000e: fix systim issues", v4.2-rc1) reworked
> e1000e_get_base_timinca() in such a way that it can return -EINVAL for
> e1000_pch_spt if the SYSCFI bit is not set in TSYNCRXCTL.
>
> Some experimentation has shown that on I219 (e1000_pch_spt, "MAC: 12")
> adapters, the E1000_TSYNCRXCTL_SYSCFI flag is unstable; TSYNCRXCTL reads
> sometimes don't have the SYSCFI bit set. Retrying the read shortly after
> finds the bit to be set. This was observed at boot (probe) but also link up
> and link down.
>
> Moreover, the phc (PTP Hardware Clock) seems to operate normally even after
> reads where SYSCFI=0. Therefore, remove this register read and
> unconditionally set the clock parameters.
>
> Reported-by: Achim Mildenberger <[email protected]>
> Message-Id: <20180425065243.g5mqewg5irkwgwgv@f2>
> Bugzilla: https://bugzilla.suse.com/show_bug.cgi?id=1075876
> Fixes: 83129b37ef35 ("e1000e: fix systim issues")
> Signed-off-by: Benjamin Poirier <[email protected]>
> ---
> drivers/net/ethernet/intel/e1000e/netdev.c | 15 ++++++---------
> 1 file changed, 6 insertions(+), 9 deletions(-)
>
> diff --git a/drivers/net/ethernet/intel/e1000e/netdev.c
> b/drivers/net/ethernet/intel/e1000e/netdev.c
> index ec4a9759a6f2..3afb1f3b6f91 100644
> --- a/drivers/net/ethernet/intel/e1000e/netdev.c
> +++ b/drivers/net/ethernet/intel/e1000e/netdev.c
> @@ -3546,15 +3546,12 @@ s32 e1000e_get_base_timinca(struct e1000_adapter
> *adapter, u32 *timinca)
> }
> break;
> case e1000_pch_spt:
> - if (er32(TSYNCRXCTL) & E1000_TSYNCRXCTL_SYSCFI) {
> - /* Stable 24MHz frequency */
> - incperiod = INCPERIOD_24MHZ;
> - incvalue = INCVALUE_24MHZ;
> - shift = INCVALUE_SHIFT_24MHZ;
> - adapter->cc.shift = shift;
> - break;
> - }
> - return -EINVAL;
> + /* Stable 24MHz frequency */
> + incperiod = INCPERIOD_24MHZ;
> + incvalue = INCVALUE_24MHZ;
> + shift = INCVALUE_SHIFT_24MHZ;
> + adapter->cc.shift = shift;
> + break;
> case e1000_pch_cnp:
> if (er32(TSYNCRXCTL) & E1000_TSYNCRXCTL_SYSCFI) {
> /* Stable 24MHz frequency */
> --
> 2.16.3

Given testing showing that the clock operates fine regardless of the register read, I think this is probably fine. Normally I believe the register was used to check which frequency was in use, but it doesn't seem to serve that purpose here.

Thanks,
Jake

2018-05-13 06:56:29

by Sasha Neftin

[permalink] [raw]
Subject: Re: [Intel-wired-lan] [PATCH] e1000e: Ignore TSYNCRXCTL when getting I219 clock attributes

On 5/10/2018 21:42, Keller, Jacob E wrote:
>> -----Original Message-----
>> From: Benjamin Poirier [mailto:[email protected]]
>> Sent: Thursday, May 10, 2018 12:29 AM
>> To: Kirsher, Jeffrey T <[email protected]>
>> Cc: Keller, Jacob E <[email protected]>; Achim Mildenberger
>> <[email protected]>; [email protected];
>> [email protected]; [email protected]; [email protected];
>> [email protected]; [email protected];
>> [email protected]; [email protected]
>> Subject: [PATCH] e1000e: Ignore TSYNCRXCTL when getting I219 clock attributes
>>
>> There have been multiple reports of crashes that look like
>> kernel: RIP: 0010:[<ffffffff8110303f>] timecounter_read+0xf/0x50
>> [...]
>> kernel: Call Trace:
>> kernel: [<ffffffffa0806b0f>] e1000e_phc_gettime+0x2f/0x60 [e1000e]
>> kernel: [<ffffffffa0806c5d>] e1000e_systim_overflow_work+0x1d/0x80 [e1000e]
>> kernel: [<ffffffff810992c5>] process_one_work+0x155/0x440
>> kernel: [<ffffffff81099e16>] worker_thread+0x116/0x4b0
>> kernel: [<ffffffff8109f422>] kthread+0xd2/0xf0
>> kernel: [<ffffffff8163184f>] ret_from_fork+0x3f/0x70
>>
>> These can be traced back to the fact that e1000e_systim_reset() skips the
>> timecounter_init() call if e1000e_get_base_timinca() returns -EINVAL, which
>> leads to a null deref in timecounter_read().
>>
>> Commit 83129b37ef35 ("e1000e: fix systim issues", v4.2-rc1) reworked
>> e1000e_get_base_timinca() in such a way that it can return -EINVAL for
>> e1000_pch_spt if the SYSCFI bit is not set in TSYNCRXCTL.
>>
>> Some experimentation has shown that on I219 (e1000_pch_spt, "MAC: 12")
>> adapters, the E1000_TSYNCRXCTL_SYSCFI flag is unstable; TSYNCRXCTL reads
>> sometimes don't have the SYSCFI bit set. Retrying the read shortly after
>> finds the bit to be set. This was observed at boot (probe) but also link up
>> and link down.
>>
>> Moreover, the phc (PTP Hardware Clock) seems to operate normally even after
>> reads where SYSCFI=0. Therefore, remove this register read and
>> unconditionally set the clock parameters.
>>
>> Reported-by: Achim Mildenberger <[email protected]>
>> Message-Id: <20180425065243.g5mqewg5irkwgwgv@f2>
>> Bugzilla: https://bugzilla.suse.com/show_bug.cgi?id=1075876
>> Fixes: 83129b37ef35 ("e1000e: fix systim issues")
>> Signed-off-by: Benjamin Poirier <[email protected]>
>> ---
>> drivers/net/ethernet/intel/e1000e/netdev.c | 15 ++++++---------
>> 1 file changed, 6 insertions(+), 9 deletions(-)
>>
>> diff --git a/drivers/net/ethernet/intel/e1000e/netdev.c
>> b/drivers/net/ethernet/intel/e1000e/netdev.c
>> index ec4a9759a6f2..3afb1f3b6f91 100644
>> --- a/drivers/net/ethernet/intel/e1000e/netdev.c
>> +++ b/drivers/net/ethernet/intel/e1000e/netdev.c
>> @@ -3546,15 +3546,12 @@ s32 e1000e_get_base_timinca(struct e1000_adapter
>> *adapter, u32 *timinca)
>> }
>> break;
>> case e1000_pch_spt:
>> - if (er32(TSYNCRXCTL) & E1000_TSYNCRXCTL_SYSCFI) {
>> - /* Stable 24MHz frequency */
>> - incperiod = INCPERIOD_24MHZ;
>> - incvalue = INCVALUE_24MHZ;
>> - shift = INCVALUE_SHIFT_24MHZ;
>> - adapter->cc.shift = shift;
>> - break;
>> - }
>> - return -EINVAL;
>> + /* Stable 24MHz frequency */
>> + incperiod = INCPERIOD_24MHZ;
>> + incvalue = INCVALUE_24MHZ;
>> + shift = INCVALUE_SHIFT_24MHZ;
>> + adapter->cc.shift = shift;
>> + break;
>> case e1000_pch_cnp:
>> if (er32(TSYNCRXCTL) & E1000_TSYNCRXCTL_SYSCFI) {
>> /* Stable 24MHz frequency */
>> --
>> 2.16.3
>
> Given testing showing that the clock operates fine regardless of the register read, I think this is probably fine. Normally I believe the register was used to check which frequency was in use, but it doesn't seem to serve that purpose here.
>
> Thanks,
> Jake
> _______________________________________________
> Intel-wired-lan mailing list
> [email protected]
> https://lists.osuosl.org/mailman/listinfo/intel-wired-lan
>
I've checked our specification, looks only 24MHz used for this product.
Hope no different platform with another clock support has been
distributed. So, let's pick up this change.

2018-05-23 00:44:57

by Brown, Aaron F

[permalink] [raw]
Subject: RE: [Intel-wired-lan] [PATCH] e1000e: Ignore TSYNCRXCTL when getting I219 clock attributes

> From: Intel-wired-lan [mailto:[email protected]] On
> Behalf Of Benjamin Poirier
> Sent: Thursday, May 10, 2018 12:29 AM
> To: Kirsher, Jeffrey T <[email protected]>
> Cc: [email protected]; [email protected]; [email protected];
> [email protected]; [email protected];
> [email protected]; Achim Mildenberger
> <[email protected]>; [email protected];
> [email protected]
> Subject: [Intel-wired-lan] [PATCH] e1000e: Ignore TSYNCRXCTL when getting
> I219 clock attributes
>
> There have been multiple reports of crashes that look like
> kernel: RIP: 0010:[<ffffffff8110303f>] timecounter_read+0xf/0x50
> [...]
> kernel: Call Trace:
> kernel: [<ffffffffa0806b0f>] e1000e_phc_gettime+0x2f/0x60 [e1000e]
> kernel: [<ffffffffa0806c5d>] e1000e_systim_overflow_work+0x1d/0x80
> [e1000e]
> kernel: [<ffffffff810992c5>] process_one_work+0x155/0x440
> kernel: [<ffffffff81099e16>] worker_thread+0x116/0x4b0
> kernel: [<ffffffff8109f422>] kthread+0xd2/0xf0
> kernel: [<ffffffff8163184f>] ret_from_fork+0x3f/0x70
>
> These can be traced back to the fact that e1000e_systim_reset() skips the
> timecounter_init() call if e1000e_get_base_timinca() returns -EINVAL, which
> leads to a null deref in timecounter_read().
>
> Commit 83129b37ef35 ("e1000e: fix systim issues", v4.2-rc1) reworked
> e1000e_get_base_timinca() in such a way that it can return -EINVAL for
> e1000_pch_spt if the SYSCFI bit is not set in TSYNCRXCTL.
>
> Some experimentation has shown that on I219 (e1000_pch_spt, "MAC: 12")
> adapters, the E1000_TSYNCRXCTL_SYSCFI flag is unstable; TSYNCRXCTL reads
> sometimes don't have the SYSCFI bit set. Retrying the read shortly after
> finds the bit to be set. This was observed at boot (probe) but also link up
> and link down.
>
> Moreover, the phc (PTP Hardware Clock) seems to operate normally even
> after
> reads where SYSCFI=0. Therefore, remove this register read and
> unconditionally set the clock parameters.
>
> Reported-by: Achim Mildenberger <[email protected]>
> Message-Id: <20180425065243.g5mqewg5irkwgwgv@f2>
> Bugzilla: https://bugzilla.suse.com/show_bug.cgi?id=1075876
> Fixes: 83129b37ef35 ("e1000e: fix systim issues")
> Signed-off-by: Benjamin Poirier <[email protected]>
> ---
> drivers/net/ethernet/intel/e1000e/netdev.c | 15 ++++++---------
> 1 file changed, 6 insertions(+), 9 deletions(-)

Tested-by: Aaron Brown <[email protected]>