2024-01-24 04:18:01

by Stephen Rothwell

[permalink] [raw]
Subject: linux-next: build failure after merge of the rcu tree

Hi all,

After merging the rcu tree, today's linux-next build (i386 defconfig)
failed like this:

In file included from include/linux/dev_printk.h:14,
from include/linux/device.h:15,
from kernel/time/clocksource.c:10:
kernel/time/clocksource.c: In function 'clocksource_watchdog':
kernel/time/clocksource.c:103:34: error: integer overflow in expression of type 'long int' results in '-1619276800' [-Werror=overflow]
103 | * NSEC_PER_SEC / HZ)
| ^
include/linux/compiler.h:77:45: note: in definition of macro 'unlikely'
77 | # define unlikely(x) __builtin_expect(!!(x), 0)
| ^
kernel/time/clocksource.c:486:41: note: in expansion of macro 'WATCHDOG_INTR_MAX_NS'
486 | if (unlikely(interval > WATCHDOG_INTR_MAX_NS)) {
| ^~~~~~~~~~~~~~~~~~~~
cc1: all warnings being treated as errors

Caused by commit

1a4545025600 ("clocksource: Skip watchdog check for large watchdog intervals")

I have used the rcu tree from next-20240123 for today.

--
Cheers,
Stephen Rothwell


Attachments:
(No filename) (499.00 B)
OpenPGP digital signature

2024-01-24 09:50:34

by Jiri Wiesner

[permalink] [raw]
Subject: Re: linux-next: build failure after merge of the rcu tree

On Wed, Jan 24, 2024 at 03:17:43PM +1100, Stephen Rothwell wrote:
> After merging the rcu tree, today's linux-next build (i386 defconfig)
> failed like this:
> In file included from include/linux/dev_printk.h:14,
> from include/linux/device.h:15,
> from kernel/time/clocksource.c:10:
> kernel/time/clocksource.c: In function 'clocksource_watchdog':
> kernel/time/clocksource.c:103:34: error: integer overflow in expression of type 'long int' results in '-1619276800' [-Werror=overflow]
> 103 | * NSEC_PER_SEC / HZ)
> | ^
> Caused by commit
> 1a4545025600 ("clocksource: Skip watchdog check for large watchdog intervals")
> I have used the rcu tree from next-20240123 for today.

This particular patch is still beging discussed on the LKML. This is the
latest submission with improved variable naming, increased threshold and
changes to the log and the warning message (as proposed by tglx):
https://lore.kernel.org/lkml/20240122172350.GA740@incl/
Especially the change to the message is important. I think this message
will be commonplace on 8 NUMA node (and larger) machines. If there is
anything else I can do to assist please let me know.
--
Jiri Wiesner
SUSE Labs

2024-01-24 12:13:48

by Paul E. McKenney

[permalink] [raw]
Subject: Re: linux-next: build failure after merge of the rcu tree

On Wed, Jan 24, 2024 at 10:49:54AM +0100, Jiri Wiesner wrote:
> On Wed, Jan 24, 2024 at 03:17:43PM +1100, Stephen Rothwell wrote:
> > After merging the rcu tree, today's linux-next build (i386 defconfig)
> > failed like this:
> > In file included from include/linux/dev_printk.h:14,
> > from include/linux/device.h:15,
> > from kernel/time/clocksource.c:10:
> > kernel/time/clocksource.c: In function 'clocksource_watchdog':
> > kernel/time/clocksource.c:103:34: error: integer overflow in expression of type 'long int' results in '-1619276800' [-Werror=overflow]
> > 103 | * NSEC_PER_SEC / HZ)
> > | ^
> > Caused by commit
> > 1a4545025600 ("clocksource: Skip watchdog check for large watchdog intervals")
> > I have used the rcu tree from next-20240123 for today.
>
> This particular patch is still beging discussed on the LKML. This is the
> latest submission with improved variable naming, increased threshold and
> changes to the log and the warning message (as proposed by tglx):
> https://lore.kernel.org/lkml/20240122172350.GA740@incl/
> Especially the change to the message is important. I think this message
> will be commonplace on 8 NUMA node (and larger) machines. If there is
> anything else I can do to assist please let me know.

Here is the offending #define:

#define WATCHDOG_INTR_MAX_NS ((WATCHDOG_INTERVAL + (WATCHDOG_INTERVAL >> 1))\
* NSEC_PER_SEC / HZ)

The problem is that these things are int or long, and on i386, that
is only 32 bits. NSEC_PER_SEC is one billion, and WATCHDOG_INTERVAL
is often 1000, which overflows. The division by HZ gets this back in
range at about 1.5x10^9.

So this computation must be done in 64 bits even on 32-bit systems.
My thought would be a cast to u64, then back to long for the result.

Whatever approach, Jiri, would you like to send an updated patch?

In the meantime, I will rebase to exclude this one from -next.

Thanx, Paul

2024-01-24 13:31:55

by Jiri Wiesner

[permalink] [raw]
Subject: Re: linux-next: build failure after merge of the rcu tree

On Wed, Jan 24, 2024 at 04:12:23AM -0800, Paul E. McKenney wrote:
> On Wed, Jan 24, 2024 at 10:49:54AM +0100, Jiri Wiesner wrote:
> > On Wed, Jan 24, 2024 at 03:17:43PM +1100, Stephen Rothwell wrote:
> > > After merging the rcu tree, today's linux-next build (i386 defconfig)
> > > failed like this:
> > > In file included from include/linux/dev_printk.h:14,
> > > from include/linux/device.h:15,
> > > from kernel/time/clocksource.c:10:
> > > kernel/time/clocksource.c: In function 'clocksource_watchdog':
> > > kernel/time/clocksource.c:103:34: error: integer overflow in expression of type 'long int' results in '-1619276800' [-Werror=overflow]
> > > 103 | * NSEC_PER_SEC / HZ)
> > > | ^
> > > Caused by commit
> > > 1a4545025600 ("clocksource: Skip watchdog check for large watchdog intervals")
> > > I have used the rcu tree from next-20240123 for today.
> >
> > This particular patch is still beging discussed on the LKML. This is the
> > latest submission with improved variable naming, increased threshold and
> > changes to the log and the warning message (as proposed by tglx):
> > https://lore.kernel.org/lkml/20240122172350.GA740@incl/
> > Especially the change to the message is important. I think this message
> > will be commonplace on 8 NUMA node (and larger) machines. If there is
> > anything else I can do to assist please let me know.
>
> Here is the offending #define:
>
> #define WATCHDOG_INTR_MAX_NS ((WATCHDOG_INTERVAL + (WATCHDOG_INTERVAL >> 1))\
> * NSEC_PER_SEC / HZ)
>
> The problem is that these things are int or long, and on i386, that
> is only 32 bits. NSEC_PER_SEC is one billion, and WATCHDOG_INTERVAL
> is often 1000, which overflows. The division by HZ gets this back in
> range at about 1.5x10^9.

Exactly.

> So this computation must be done in 64 bits even on 32-bit systems.
> My thought would be a cast to u64, then back to long for the result.

This will be a more precise solution than enclosing NSEC_PER_SEC / HZ in
brackets, which I chose to do in the v2 of this patch.

> Whatever approach, Jiri, would you like to send an updated patch?

Yes, I can incorporate the casting to u64 and back to long into the patch.
At this point, I am not sure which version to use. There are:
* v1 (submitted to the LKML on Jan 3rd): the patch that got merged into linux-next
* v2 (submitted to the LKML on Jan 10th): that has an alternative fix for the interger overflow
* v3 (submitted to the LKML on Jan 22nd): that incoporates suggestions by Thomas Gleixner

I could update the v3 of this patch with casting to u64 and back to long.
WATCHDOG_INTERVAL_MAX_NS got set to 2 * WATCHDOG_INTERVAL in v3 - a change
I do not entirely agree with. I think WATCHDOG_INTERVAL_MAX_NS should be
kept narrow so as not to impose a limit on time skew that is too strict
for readout intervals approaching 2 * WATCHDOG_INTERVAL in their length.
The question is what is too strict.
--
Jiri Wiesner
SUSE Labs

2024-01-24 14:20:52

by Paul E. McKenney

[permalink] [raw]
Subject: Re: linux-next: build failure after merge of the rcu tree

On Wed, Jan 24, 2024 at 02:31:05PM +0100, Jiri Wiesner wrote:
> On Wed, Jan 24, 2024 at 04:12:23AM -0800, Paul E. McKenney wrote:
> > On Wed, Jan 24, 2024 at 10:49:54AM +0100, Jiri Wiesner wrote:
> > > On Wed, Jan 24, 2024 at 03:17:43PM +1100, Stephen Rothwell wrote:
> > > > After merging the rcu tree, today's linux-next build (i386 defconfig)
> > > > failed like this:
> > > > In file included from include/linux/dev_printk.h:14,
> > > > from include/linux/device.h:15,
> > > > from kernel/time/clocksource.c:10:
> > > > kernel/time/clocksource.c: In function 'clocksource_watchdog':
> > > > kernel/time/clocksource.c:103:34: error: integer overflow in expression of type 'long int' results in '-1619276800' [-Werror=overflow]
> > > > 103 | * NSEC_PER_SEC / HZ)
> > > > | ^
> > > > Caused by commit
> > > > 1a4545025600 ("clocksource: Skip watchdog check for large watchdog intervals")
> > > > I have used the rcu tree from next-20240123 for today.
> > >
> > > This particular patch is still beging discussed on the LKML. This is the
> > > latest submission with improved variable naming, increased threshold and
> > > changes to the log and the warning message (as proposed by tglx):
> > > https://lore.kernel.org/lkml/20240122172350.GA740@incl/
> > > Especially the change to the message is important. I think this message
> > > will be commonplace on 8 NUMA node (and larger) machines. If there is
> > > anything else I can do to assist please let me know.
> >
> > Here is the offending #define:
> >
> > #define WATCHDOG_INTR_MAX_NS ((WATCHDOG_INTERVAL + (WATCHDOG_INTERVAL >> 1))\
> > * NSEC_PER_SEC / HZ)
> >
> > The problem is that these things are int or long, and on i386, that
> > is only 32 bits. NSEC_PER_SEC is one billion, and WATCHDOG_INTERVAL
> > is often 1000, which overflows. The division by HZ gets this back in
> > range at about 1.5x10^9.
>
> Exactly.
>
> > So this computation must be done in 64 bits even on 32-bit systems.
> > My thought would be a cast to u64, then back to long for the result.
>
> This will be a more precise solution than enclosing NSEC_PER_SEC / HZ in
> brackets, which I chose to do in the v2 of this patch.
>
> > Whatever approach, Jiri, would you like to send an updated patch?
>
> Yes, I can incorporate the casting to u64 and back to long into the patch.
> At this point, I am not sure which version to use. There are:
> * v1 (submitted to the LKML on Jan 3rd): the patch that got merged into linux-next
> * v2 (submitted to the LKML on Jan 10th): that has an alternative fix for the interger overflow
> * v3 (submitted to the LKML on Jan 22nd): that incoporates suggestions by Thomas Gleixner
>
> I could update the v3 of this patch with casting to u64 and back to long.
> WATCHDOG_INTERVAL_MAX_NS got set to 2 * WATCHDOG_INTERVAL in v3 - a change
> I do not entirely agree with. I think WATCHDOG_INTERVAL_MAX_NS should be
> kept narrow so as not to impose a limit on time skew that is too strict
> for readout intervals approaching 2 * WATCHDOG_INTERVAL in their length.
> The question is what is too strict.

Please accept my apologies! I should have caught your updates.

I will drop my current version of your patch and queue your v3 for review
and testing.

Thanx, Paul