There is a 64-bit division in iwl_mvm_get_crosstimestamp_fw(), which
results in a link failure when building 32-bit architectures with clang:
ld.lld: error: undefined symbol: __udivdi3
>>> referenced by ptp.c
>>> drivers/net/wireless/intel/iwlwifi/mvm/ptp.o:(iwl_mvm_phc_get_crosstimestamp) in archive vmlinux.a
GCC has optimizations for division by a constant that clang does not
implement, so this issue is not visible when building with GCC.
Using div_u64() would resolve this issue, but Arnd points out that this
can be quite expensive and the timestamp is being read at nanosecond
granularity. Nick pointed out that the result of this division is being
stored to a 32-bit type anyways, so truncate gp2_10ns first then do the
division, which elides the need for libcalls.
Fixes: 21fb8da6ebe4 ("wifi: iwlwifi: mvm: read synced time from firmware if supported")
Reported-by: Arnd Bergmann <[email protected]>
Link: https://github.com/ClangBuiltLinux/linux/issues/1826
Reported-by: "kernelci.org bot" <[email protected]>
Link: https://lore.kernel.org/[email protected]/
Suggested-by: Nick Desaulniers <[email protected]>
Signed-off-by: Nathan Chancellor <[email protected]>
---
drivers/net/wireless/intel/iwlwifi/mvm/ptp.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/net/wireless/intel/iwlwifi/mvm/ptp.c b/drivers/net/wireless/intel/iwlwifi/mvm/ptp.c
index 5c2bfc8ed88d..cdd6d69c5b68 100644
--- a/drivers/net/wireless/intel/iwlwifi/mvm/ptp.c
+++ b/drivers/net/wireless/intel/iwlwifi/mvm/ptp.c
@@ -116,7 +116,7 @@ iwl_mvm_get_crosstimestamp_fw(struct iwl_mvm *mvm, u32 *gp2, u64 *sys_time)
gp2_10ns = (u64)le32_to_cpu(resp->gp2_timestamp_hi) << 32 |
le32_to_cpu(resp->gp2_timestamp_lo);
- *gp2 = gp2_10ns / 100;
+ *gp2 = (u32)gp2_10ns / 100;
*sys_time = (u64)le32_to_cpu(resp->platform_timestamp_hi) << 32 |
le32_to_cpu(resp->platform_timestamp_lo);
---
base-commit: 2af3b2a631b194a43551ce119cb71559d8f6b54b
change-id: 20230329-iwlwifi-ptp-avoid-64-bit-div-1c4717f73f8a
Best regards,
--
Nathan Chancellor <[email protected]>
On Wed, Mar 29, 2023 at 07:20:43PM +0200, Johannes Berg wrote:
> On Wed, 2023-03-29 at 10:05 -0700, Nathan Chancellor wrote:
> >
> > GCC has optimizations for division by a constant that clang does not
> > implement, so this issue is not visible when building with GCC.
>
> Huh yeah, we did 32-bit builds with gcc ...
>
> > Using div_u64() would resolve this issue, but Arnd points out that this
> > can be quite expensive and the timestamp is being read at nanosecond
> > granularity.?
>
> Doesn't matter though, all the calculations are based on just the
> command response from the firmware, which (tries to) take it in a
> synchronised fashion.
Okay, that is good information, thanks for providing it!
> So taking more time here would be fine, as far as I can tell.
>
> > Nick pointed out that the result of this division is being
> > stored to a 32-bit type anyways, so truncate gp2_10ns first then do the
> > division, which elides the need for libcalls.
>
> That loses ~7 top bits though, no? I'd be more worried about that, than
> the time div_u64() takes.
Right, I sent this version of the fix to spur discussion around whether
or not this was an acceptable approach, rather than having the question
sit unanswered in our issue tracker :) I have no problems sending a v2
to use div_u64() and be done with it, which I will do shortly.
Thanks for the quick input, cheers!
Nathan
On Wed, Mar 29, 2023 at 10:20 AM Johannes Berg
<[email protected]> wrote:
>
> On Wed, 2023-03-29 at 10:05 -0700, Nathan Chancellor wrote:
> >
> > GCC has optimizations for division by a constant that clang does not
> > implement, so this issue is not visible when building with GCC.
>
> Huh yeah, we did 32-bit builds with gcc ...
Right, GCC is better about turning division by double-word constant
into multiplication by reciprocal. Craig has been improving LLVM, but
it seems that some divisors still aren't supported (in this case 100).
>
> > Using div_u64() would resolve this issue, but Arnd points out that this
> > can be quite expensive and the timestamp is being read at nanosecond
> > granularity.
>
> Doesn't matter though, all the calculations are based on just the
> command response from the firmware, which (tries to) take it in a
> synchronised fashion.
>
> So taking more time here would be fine, as far as I can tell.
div_u64() it is then.
>
> > Nick pointed out that the result of this division is being
> > stored to a 32-bit type anyways, so truncate gp2_10ns first then do the
> > division, which elides the need for libcalls.
>
> That loses ~7 top bits though, no? I'd be more worried about that, than
> the time div_u64() takes.
The result is still stored in a u32; there is a loss of precision
regardless of use of div_u64 or open coded binary operator /. So is
the loss of precision before the division as tolerable as after the
division?
--
Thanks,
~Nick Desaulniers
On Wed, 2023-03-29 at 10:30 -0700, Nick Desaulniers wrote:
> >
> > > Nick pointed out that the result of this division is being
> > > stored to a 32-bit type anyways, so truncate gp2_10ns first then do the
> > > division, which elides the need for libcalls.
> >
> > That loses ~7 top bits though, no? I'd be more worried about that, than
> > the time div_u64() takes.
>
> The result is still stored in a u32; there is a loss of precision
> regardless of use of div_u64 or open coded binary operator /.
>
Right, obviously.
> So is
> the loss of precision before the division as tolerable as after the
> division?
For all I can tell this is meant to be 'gp2' with an additional lower
bits to reach a unit/granularity of 10ns, basically in FW something like
gp2_10ns = gp2 * 100 + subsampling_10ns_unit
(and gp2 in FW is a 32-bit value, so it rolls over eventually).
But I _think_ we want to make a proper division by 100 to obtain back
the original 'gp2' value here.
johannes