LinuxLists.cc - Re: [PATCH v2] ntp: remove accidental integer wrap-around

2024-05-24 12:09:51

Subject: Re: [PATCH v2] ntp: remove accidental integer wrap-around

On Fri, May 17 2024 at 20:22, Justin Stitt wrote:
> time_maxerror is unconditionally incremented and the result is checked
> against NTP_PHASE_LIMIT, but the increment itself can overflow,
> resulting in wrap-around to negative space.
>
> The user can supply some crazy values which is causing the overflow. Add
> an extra validation step checking that maxerror is reasonable.

The user can supply any value which can cause an overflow as the input
is unchecked. Add ...

Hmm?

> diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c
> index b58dffc58a8f..321f251c02aa 100644
> --- a/kernel/time/timekeeping.c
> +++ b/kernel/time/timekeeping.c
> @@ -2388,6 +2388,11 @@ static int timekeeping_validate_timex(const struct __kernel_timex *txc)
> }
> }
>
> + if (txc->modes & ADJ_MAXERROR) {
> + if (txc->maxerror < 0 || txc->maxerror > NTP_PHASE_LIMIT)
> + return -EINVAL;
> + }

I dug into history to find a Fixes tag. That unearthed something
interesting. Exactly this check used to be there until commit
eea83d896e31 ("ntp: NTP4 user space bits update") which landed in
2.6.30. The change log says:

"If some values for adjtimex() are outside the acceptable range, they
are now simply normalized instead of letting the syscall fail."

The problem with that commit is that it did not do any normalization at
all and just relied on the actual time_maxerror handling in
second_overflow(), which is both insufficient and also prone to that
overflow issue.

So instead of turning the clock back, we might be better off to actually
put the normalization in place at the assignment:

time_maxerror = min(max(0, txc->maxerror), NTP_PHASE_LIMIT);

or something like that.

Miroslav: Any opinion on that?

Thanks,

tglx

2024-05-24 12:44:33

by Thomas Gleixner

[permalink] [raw]

Subject: Re: [PATCH v2] ntp: remove accidental integer wrap-around

On Fri, May 24 2024 at 14:09, Thomas Gleixner wrote:
> On Fri, May 17 2024 at 20:22, Justin Stitt wrote:
> I dug into history to find a Fixes tag. That unearthed something
> interesting. Exactly this check used to be there until commit
> eea83d896e31 ("ntp: NTP4 user space bits update") which landed in
> 2.6.30. The change log says:
>
> "If some values for adjtimex() are outside the acceptable range, they
> are now simply normalized instead of letting the syscall fail."
>
> The problem with that commit is that it did not do any normalization at
> all and just relied on the actual time_maxerror handling in
> second_overflow(), which is both insufficient and also prone to that
> overflow issue.
>
> So instead of turning the clock back, we might be better off to actually
> put the normalization in place at the assignment:
>
> time_maxerror = min(max(0, txc->maxerror), NTP_PHASE_LIMIT);
>
> or something like that.

So that commit also removed the sanity check for time_esterror, but
that's not doing anything in the kernel other than being reset in
clear_ntp() and being handed back to user space. No idea what this is
actually used for.

Thanks,

tglx

2024-05-27 08:26:48

by Miroslav Lichvar

[permalink] [raw]

Subject: Re: [PATCH v2] ntp: remove accidental integer wrap-around

On Fri, May 24, 2024 at 02:44:19PM +0200, Thomas Gleixner wrote:
> On Fri, May 24 2024 at 14:09, Thomas Gleixner wrote:
> > So instead of turning the clock back, we might be better off to actually
> > put the normalization in place at the assignment:
> >
> > time_maxerror = min(max(0, txc->maxerror), NTP_PHASE_LIMIT);
> >
> > or something like that.

Yes, I think that's a better approach. Failing the system call could
break existing applications, e.g. ntpd can be configured to accept a
large root distance and it doesn't clamp the maxerror value, while
updating the PLL offset in the same adjtimex() call.

> So that commit also removed the sanity check for time_esterror, but
> that's not doing anything in the kernel other than being reset in
> clear_ntp() and being handed back to user space. No idea what this is
> actually used for.

It's a lower-bound estimate of the clock error, which applications can
check if it's acceptable for them. I think it should be clamped too.
It doesn't make much sense for it to be larger than the maximum error.

Another possible improvement of adjtimex() would be to set the UNSYNC
flag immediately in the call if maxerror >= 16s to avoid the delay of
up to 1 second for applications which check only that flag instead of
the maxerror value.

--
Miroslav Lichvar

2024-05-29 08:27:13

by Thomas Gleixner

[permalink] [raw]

Subject: Re: [PATCH v2] ntp: remove accidental integer wrap-around

On Mon, May 27 2024 at 10:26, Miroslav Lichvar wrote:
> On Fri, May 24, 2024 at 02:44:19PM +0200, Thomas Gleixner wrote:
>> On Fri, May 24 2024 at 14:09, Thomas Gleixner wrote:
>> > So instead of turning the clock back, we might be better off to actually
>> > put the normalization in place at the assignment:
>> >
>> > time_maxerror = min(max(0, txc->maxerror), NTP_PHASE_LIMIT);
>> >
>> > or something like that.
>
> Yes, I think that's a better approach. Failing the system call could
> break existing applications, e.g. ntpd can be configured to accept a
> large root distance and it doesn't clamp the maxerror value, while
> updating the PLL offset in the same adjtimex() call.

Thanks for confirming. I suspected that, but the original change logs
are pretty useless in that regard.

>> So that commit also removed the sanity check for time_esterror, but
>> that's not doing anything in the kernel other than being reset in
>> clear_ntp() and being handed back to user space. No idea what this is
>> actually used for.
>
> It's a lower-bound estimate of the clock error, which applications can
> check if it's acceptable for them. I think it should be clamped too.
> It doesn't make much sense for it to be larger than the maximum error.

Ok.

> Another possible improvement of adjtimex() would be to set the UNSYNC
> flag immediately in the call if maxerror >= 16s to avoid the delay of
> up to 1 second for applications which check only that flag instead of
> the maxerror value.

That needs to be a seperate change.

Thanks,

tglx