LinuxLists.cc - [PATCH 3/5] net: fec: initialize clock with 0 rather than current kernel time

[permalink] [raw]

Subject: Re: [PATCH 3/5] net: fec: initialize clock with 0 rather than current kernel time

Hi Sergey,

On Mon, Jul 06, 2020 at 05:26:14PM +0300, Sergey Organov wrote:
> Initializing with 0 makes it much easier to identify time stamps from
> otherwise uninitialized clock.
>
> Initialization of PTP clock with current kernel time makes little sense as
> PTP time scale differs from UTC time scale that kernel time represents.
> It only leads to confusion when no actual PTP initialization happens, as
> these time scales differ in a small integer number of seconds (37 at the
> time of writing.)
>
> Signed-off-by: Sergey Organov <[email protected]>
> ---

Reading your patch, I got reminded of my own attempt of making an
identical change to the ptp_qoriq driver:

https://www.spinics.net/lists/netdev/msg601625.html

Could we have some sort of kernel-wide convention, I wonder (even though
it might be too late for that)? After your patch, I can see equal
amounts of confusion of users expecting some boot-time output of
$(phc_ctl /dev/ptp0 get) as it used to be, and now getting something
else.

There's no correct answer, I'm afraid. Whatever the default value of the
clock may be, it's bound to be confusing for some reason, _if_ the
reason why you're investigating it in the first place is a driver bug.
Also, I don't really see how your change to use Jan 1st 1970 makes it
any less confusing.

> drivers/net/ethernet/freescale/fec_ptp.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/net/ethernet/freescale/fec_ptp.c b/drivers/net/ethernet/freescale/fec_ptp.c
> index 4a12086..e455343 100644
> --- a/drivers/net/ethernet/freescale/fec_ptp.c
> +++ b/drivers/net/ethernet/freescale/fec_ptp.c
> @@ -264,7 +264,7 @@ void fec_ptp_start_cyclecounter(struct net_device *ndev)
> fep->cc.mult = FEC_CC_MULT;
>
> /* reset the ns time counter */
> - timecounter_init(&fep->tc, &fep->cc, ktime_to_ns(ktime_get_real()));
> + timecounter_init(&fep->tc, &fep->cc, 0);
>
> spin_unlock_irqrestore(&fep->tmreg_lock, flags);
> }
> --
> 2.10.0.1.g57b01a3
>

Thanks,
-Vladimir

2020-07-06 18:25:25

[permalink] [raw]

Subject: Re: [PATCH 3/5] net: fec: initialize clock with 0 rather than current kernel time

Vladimir Oltean <[email protected]> writes:

> Hi Sergey,
>
> On Mon, Jul 06, 2020 at 05:26:14PM +0300, Sergey Organov wrote:
>> Initializing with 0 makes it much easier to identify time stamps from
>> otherwise uninitialized clock.
>>
>> Initialization of PTP clock with current kernel time makes little sense as
>> PTP time scale differs from UTC time scale that kernel time represents.
>> It only leads to confusion when no actual PTP initialization happens, as
>> these time scales differ in a small integer number of seconds (37 at the
>> time of writing.)
>>
>> Signed-off-by: Sergey Organov <[email protected]>
>> ---
>
> Reading your patch, I got reminded of my own attempt of making an
> identical change to the ptp_qoriq driver:
>
> https://www.spinics.net/lists/netdev/msg601625.html
>
> Could we have some sort of kernel-wide convention, I wonder (even though
> it might be too late for that)? After your patch, I can see equal
> amounts of confusion of users expecting some boot-time output of
> $(phc_ctl /dev/ptp0 get) as it used to be, and now getting something
> else.
>
> There's no correct answer, I'm afraid.

IMHO, the correct answer would be keep non-initialized clock at 0. No
ticking.

> Whatever the default value of the clock may be, it's bound to be
> confusing for some reason, _if_ the reason why you're investigating it
> in the first place is a driver bug. Also, I don't really see how your
> change to use Jan 1st 1970 makes it any less confusing.

When I print the clocks in application, I see seconds and milliseconds
part since epoch. With this patch seconds count from 0, that simply
match uptime. Easy to tell from any other (malfunctioning) clock.

Here is the description of confusion and improvement. I spent half a day
not realizing that I sometimes get timestamps from the wrong PTP clock.
Part of the problem is that kernel time at startup, when it is used for
initialization of the PTP clock, is in fact somewhat random, and it
could be off by a few seconds. Now, when in application I get time stamp
that is almost right, and then another one that is, say, 9 seconds off,
what should I think? Right, that I drive PTP clock wrongly.

Now, when one of those timestamps is almost 0, I see immediately I got
time from wrong PTP clock, rather than wrong time from correct PTP
clock.

Thanks,
-- Sergey

2020-07-07 06:37:30

[permalink] [raw]

Subject: Re: [PATCH 3/5] net: fec: initialize clock with 0 rather than current kernel time

On Mon, Jul 06, 2020 at 09:24:24PM +0300, Sergey Organov wrote:
> Vladimir Oltean <[email protected]> writes:
>
> > Hi Sergey,
> >
> > On Mon, Jul 06, 2020 at 05:26:14PM +0300, Sergey Organov wrote:
> >> Initializing with 0 makes it much easier to identify time stamps from
> >> otherwise uninitialized clock.
> >>
> >> Initialization of PTP clock with current kernel time makes little sense as
> >> PTP time scale differs from UTC time scale that kernel time represents.
> >> It only leads to confusion when no actual PTP initialization happens, as
> >> these time scales differ in a small integer number of seconds (37 at the
> >> time of writing.)
> >>
> >> Signed-off-by: Sergey Organov <[email protected]>
> >> ---
> >
> > Reading your patch, I got reminded of my own attempt of making an
> > identical change to the ptp_qoriq driver:
> >
> > https://www.spinics.net/lists/netdev/msg601625.html
> >
> > Could we have some sort of kernel-wide convention, I wonder (even though
> > it might be too late for that)? After your patch, I can see equal
> > amounts of confusion of users expecting some boot-time output of
> > $(phc_ctl /dev/ptp0 get) as it used to be, and now getting something
> > else.
> >
> > There's no correct answer, I'm afraid.
>
> IMHO, the correct answer would be keep non-initialized clock at 0. No
> ticking.
>

What do you mean 'no ticking', and what do you mean by 'non-initialized
clock' exactly? I don't know if the fec driver is special in any way, do
you mean that multiple runs of $(phc_ctl /dev/ptp0 get) from user space
all return 0? That is not at all what is to be expected, I think. The
PHC is always ticking. Its time is increasing. What would be that
initialization procedure that makes it tick, and who is doing it (and
when)?

> > Whatever the default value of the clock may be, it's bound to be
> > confusing for some reason, _if_ the reason why you're investigating it
> > in the first place is a driver bug. Also, I don't really see how your
> > change to use Jan 1st 1970 makes it any less confusing.
>
> When I print the clocks in application, I see seconds and milliseconds
> part since epoch. With this patch seconds count from 0, that simply
> match uptime. Easy to tell from any other (malfunctioning) clock.
>

It doesn't really match uptime (CLOCK_MONOTONIC). Instead, it is just
initialized with zero. If you have fec built as module and you insmod it
after a few days of uptime, it will not track CLOCK_MONOTONIC at all.

Not to say that there's anything wrong with initializing it with 0. It's
just that I don't see why it would be objectively better.

> Here is the description of confusion and improvement. I spent half a day
> not realizing that I sometimes get timestamps from the wrong PTP clock.

There is a suite of tests in tools/testing/selftests/ptp/ which is
useful in debugging problems like this.

Alternatively, you can write to each individual clock using $(phc_ctl
/dev/ptpN set 0) and check your timestamps again. If the timestamps
don't nudge, it's clear that the timestamps you're getting are not from
the PHC you've written to. Much simpler.

> Part of the problem is that kernel time at startup, when it is used for
> initialization of the PTP clock, is in fact somewhat random, and it
> could be off by a few seconds.

Yes, the kernel time at startup is exactly random (not traceable to any
clock reference). And so is the PHC.

> Now, when in application I get time stamp
> that is almost right, and then another one that is, say, 9 seconds off,
> what should I think? Right, that I drive PTP clock wrongly.
>
> Now, when one of those timestamps is almost 0, I see immediately I got
> time from wrong PTP clock, rather than wrong time from correct PTP
> clock.
>

There are 2 points to be made here:

1. There are simpler ways to debug your issue than to leave a patch in
the kernel, like the "phc_ctl set 0" I mentioned above. This can be
considered a debugging patch which is also going to have consequences
for the other users of the driver, if applied. We need to consider
whether the change in behavior is useful in general.

2. There are boards out there which don't have any battery-backed RTC,
so CLOCK_REALTIME could be ticking in Jan 1970 already, and therefore
the PHC would be initialized with a time in 1970. Or your GM might be
configured to be ticking in Jan 1970 (there are some applications
that only require the network to be synchronized, but not for the
time to be traceable to TAI). How does your change make a difference
to eliminate confusion there, when all of your clocks are going to be
in 1970? It doesn't make a net difference. Bottom line, a clock
initialized with 0 doesn't mean it's special in any way. You _could_
make that change in your debugging environment, and it _could_ be
useful to your debugging, but if it's not universally useful, I
wouldn't try to patch the kernel with this change.

Please note that, although my comments appear to be in disagreement with
your idea, they are in fact not at all. It's just that, if there's a a
particular answer to "what time to initialize a PHC with" that is more
favourable than the rest (even though the question itself is a bit
irrelevant overall), then that answer ought to be enforced kernel-wide,
I think.

> Thanks,
> -- Sergey

Cheers,
-Vladimir

2020-07-07 16:10:13

[permalink] [raw]

Subject: Re: [PATCH 3/5] net: fec: initialize clock with 0 rather than current kernel time

Vladimir Oltean <[email protected]> writes:

> On Mon, Jul 06, 2020 at 09:24:24PM +0300, Sergey Organov wrote:
>> Vladimir Oltean <[email protected]> writes:
>>
>> > Hi Sergey,
>> >
>> > On Mon, Jul 06, 2020 at 05:26:14PM +0300, Sergey Organov wrote:
>> >> Initializing with 0 makes it much easier to identify time stamps from
>> >> otherwise uninitialized clock.
>> >>
>> >> Initialization of PTP clock with current kernel time makes little sense as
>> >> PTP time scale differs from UTC time scale that kernel time represents.
>> >> It only leads to confusion when no actual PTP initialization happens, as
>> >> these time scales differ in a small integer number of seconds (37 at the
>> >> time of writing.)
>> >>
>> >> Signed-off-by: Sergey Organov <[email protected]>
>> >> ---
>> >
>> > Reading your patch, I got reminded of my own attempt of making an
>> > identical change to the ptp_qoriq driver:
>> >
>> > https://www.spinics.net/lists/netdev/msg601625.html
>> >
>> > Could we have some sort of kernel-wide convention, I wonder (even though
>> > it might be too late for that)? After your patch, I can see equal
>> > amounts of confusion of users expecting some boot-time output of
>> > $(phc_ctl /dev/ptp0 get) as it used to be, and now getting something
>> > else.
>> >
>> > There's no correct answer, I'm afraid.
>>
>> IMHO, the correct answer would be keep non-initialized clock at 0. No
>> ticking.
>>
>
> What do you mean 'no ticking', and what do you mean by 'non-initialized
> clock' exactly? I don't know if the fec driver is special in any way, do
> you mean that multiple runs of $(phc_ctl /dev/ptp0 get) from user space
> all return 0? That is not at all what is to be expected, I think. The
> PHC is always ticking. Its time is increasing.

That's how it is right now. My point is that it likely shouldn't. Why is
it ticking when nobody needs it? Does it draw more power due to that?

> What would be that initialization procedure that makes it tick, and
> who is doing it (and when)?

The user space code that cares, obviously. Most probably some PTP stack
daemon. I'd say that any set clock time ioctl() should start the clock,
or yet another ioctl() that enables/disables the clock, whatever.

>
>> > Whatever the default value of the clock may be, it's bound to be
>> > confusing for some reason, _if_ the reason why you're investigating it
>> > in the first place is a driver bug. Also, I don't really see how your
>> > change to use Jan 1st 1970 makes it any less confusing.
>>
>> When I print the clocks in application, I see seconds and milliseconds
>> part since epoch. With this patch seconds count from 0, that simply
>> match uptime. Easy to tell from any other (malfunctioning) clock.
>>
>
> It doesn't really match uptime (CLOCK_MONOTONIC). Instead, it is just
> initialized with zero. If you have fec built as module and you insmod it
> after a few days of uptime, it will not track CLOCK_MONOTONIC at all.
>
> Not to say that there's anything wrong with initializing it with 0. It's
> just that I don't see why it would be objectively better.

Well, it would have been better for me in my particular quest to find
the problem, so it rather needs to be shown where initializing with
kernel time is objectively better.

Moreover, everything else being equal, 0 is always better, just because
of simplicity.

>
>> Here is the description of confusion and improvement. I spent half a day
>> not realizing that I sometimes get timestamps from the wrong PTP clock.
>
> There is a suite of tests in tools/testing/selftests/ptp/ which is
> useful in debugging problems like this.
>
> Alternatively, you can write to each individual clock using $(phc_ctl
> /dev/ptpN set 0) and check your timestamps again. If the timestamps
> don't nudge, it's clear that the timestamps you're getting are not from
> the PHC you've written to. Much simpler.

Maybe. Once you do figure there is another clock in the system and/or
that that clock is offending. In my case /that/ was the hard part, not
changing that offending clock, once found, to whatever.

>
>> Part of the problem is that kernel time at startup, when it is used for
>> initialization of the PTP clock, is in fact somewhat random, and it
>> could be off by a few seconds.
>
> Yes, the kernel time at startup is exactly random (not traceable to any
> clock reference). And so is the PHC.
>
>> Now, when in application I get time stamp
>> that is almost right, and then another one that is, say, 9 seconds off,
>> what should I think? Right, that I drive PTP clock wrongly.
>>
>> Now, when one of those timestamps is almost 0, I see immediately I got
>> time from wrong PTP clock, rather than wrong time from correct PTP
>> clock.
>>
>
> There are 2 points to be made here:
>
> 1. There are simpler ways to debug your issue than to leave a patch in
> the kernel, like the "phc_ctl set 0" I mentioned above. This can be
> considered a debugging patch which is also going to have consequences
> for the other users of the driver, if applied. We need to consider
> whether the change in behavior is useful in general.

This does not apply to my particular case as I explained above, and then
ease with debug is just a nice side-effect of code simplification.

>
> 2. There are boards out there which don't have any battery-backed RTC,
> so CLOCK_REALTIME could be ticking in Jan 1970 already, and therefore
> the PHC would be initialized with a time in 1970. Or your GM might be
> configured to be ticking in Jan 1970 (there are some applications
> that only require the network to be synchronized, but not for the
> time to be traceable to TAI). How does your change make a difference
> to eliminate confusion there, when all of your clocks are going to be
> in 1970? It doesn't make a net difference. Bottom line, a clock
> initialized with 0 doesn't mean it's special in any way. You _could_
> make that change in your debugging environment, and it _could_ be
> useful to your debugging, but if it's not universally useful, I
> wouldn't try to patch the kernel with this change.

If there is nothing special about any value, 0 is the value to choose,
because of simplicity. Once again, I only explained debugging advantages
because you've asked about it. It's just a nice side-effect, as it
often happens to be when one keeps things as simple as possible.

> Please note that, although my comments appear to be in disagreement with
> your idea, they are in fact not at all. It's just that, if there's a a
> particular answer to "what time to initialize a PHC with" that is more
> favourable than the rest (even though the question itself is a bit
> irrelevant overall), then that answer ought to be enforced kernel-wide,
> I think.

As everybody, I believe in a set of generic programming principles that
are not to be violated lightly. KISS is one of the principles I believe,
and trying to be clever with no apparent reason is one way of violating
it.

Overall, here is my argument: 0 is simpler than kernel time, so how is
it useful to initialize PTP with kernel time that is as wrong as a value
for PTP time as 0?

Thanks,
-- Sergey.

2020-07-07 16:46:09

[permalink] [raw]

Subject: Re: [PATCH 3/5] net: fec: initialize clock with 0 rather than current kernel time

On Tue, Jul 07, 2020 at 07:07:08PM +0300, Sergey Organov wrote:
> Vladimir Oltean <[email protected]> writes:
> >
> > What do you mean 'no ticking', and what do you mean by 'non-initialized
> > clock' exactly? I don't know if the fec driver is special in any way, do
> > you mean that multiple runs of $(phc_ctl /dev/ptp0 get) from user space
> > all return 0? That is not at all what is to be expected, I think. The
> > PHC is always ticking. Its time is increasing.
>
> That's how it is right now. My point is that it likely shouldn't. Why is
> it ticking when nobody needs it? Does it draw more power due to that?
>
> > What would be that initialization procedure that makes it tick, and
> > who is doing it (and when)?
>
> The user space code that cares, obviously. Most probably some PTP stack
> daemon. I'd say that any set clock time ioctl() should start the clock,
> or yet another ioctl() that enables/disables the clock, whatever.
>

That ioctl doesn't exist, at least not in PTP land. This also addresses
your previous point.

> >
> >> > Whatever the default value of the clock may be, it's bound to be
> >> > confusing for some reason, _if_ the reason why you're investigating it
> >> > in the first place is a driver bug. Also, I don't really see how your
> >> > change to use Jan 1st 1970 makes it any less confusing.
> >>
> >> When I print the clocks in application, I see seconds and milliseconds
> >> part since epoch. With this patch seconds count from 0, that simply
> >> match uptime. Easy to tell from any other (malfunctioning) clock.
> >>
> >
> > It doesn't really match uptime (CLOCK_MONOTONIC). Instead, it is just
> > initialized with zero. If you have fec built as module and you insmod it
> > after a few days of uptime, it will not track CLOCK_MONOTONIC at all.
> >
> > Not to say that there's anything wrong with initializing it with 0. It's
> > just that I don't see why it would be objectively better.
>
> Well, it would have been better for me in my particular quest to find
> the problem, so it rather needs to be shown where initializing with
> kernel time is objectively better.
>
> Moreover, everything else being equal, 0 is always better, just because
> of simplicity.
>
> >
> >> Here is the description of confusion and improvement. I spent half a day
> >> not realizing that I sometimes get timestamps from the wrong PTP clock.
> >
> > There is a suite of tests in tools/testing/selftests/ptp/ which is
> > useful in debugging problems like this.
> >
> > Alternatively, you can write to each individual clock using $(phc_ctl
> > /dev/ptpN set 0) and check your timestamps again. If the timestamps
> > don't nudge, it's clear that the timestamps you're getting are not from
> > the PHC you've written to. Much simpler.
>
> Maybe. Once you do figure there is another clock in the system and/or
> that that clock is offending. In my case /that/ was the hard part, not
> changing that offending clock, once found, to whatever.
>

And my point was that you could have been in a different situation, when
all of your clocks could have been ticking in 1970, so this wouldn't
have been a distiguishing point. So this argument is poor. Using
phc_ctl, or scripts around that, is much more dynamic.

> >
> >> Part of the problem is that kernel time at startup, when it is used for
> >> initialization of the PTP clock, is in fact somewhat random, and it
> >> could be off by a few seconds.
> >
> > Yes, the kernel time at startup is exactly random (not traceable to any
> > clock reference). And so is the PHC.
> >
> >> Now, when in application I get time stamp
> >> that is almost right, and then another one that is, say, 9 seconds off,
> >> what should I think? Right, that I drive PTP clock wrongly.
> >>
> >> Now, when one of those timestamps is almost 0, I see immediately I got
> >> time from wrong PTP clock, rather than wrong time from correct PTP
> >> clock.
> >>
> >
> > There are 2 points to be made here:
> >
> > 1. There are simpler ways to debug your issue than to leave a patch in
> > the kernel, like the "phc_ctl set 0" I mentioned above. This can be
> > considered a debugging patch which is also going to have consequences
> > for the other users of the driver, if applied. We need to consider
> > whether the change in behavior is useful in general.
>
> This does not apply to my particular case as I explained above, and then
> ease with debug is just a nice side-effect of code simplification.
>
> >
> > 2. There are boards out there which don't have any battery-backed RTC,
> > so CLOCK_REALTIME could be ticking in Jan 1970 already, and therefore
> > the PHC would be initialized with a time in 1970. Or your GM might be
> > configured to be ticking in Jan 1970 (there are some applications
> > that only require the network to be synchronized, but not for the
> > time to be traceable to TAI). How does your change make a difference
> > to eliminate confusion there, when all of your clocks are going to be
> > in 1970? It doesn't make a net difference. Bottom line, a clock
> > initialized with 0 doesn't mean it's special in any way. You _could_
> > make that change in your debugging environment, and it _could_ be
> > useful to your debugging, but if it's not universally useful, I
> > wouldn't try to patch the kernel with this change.
>
> If there is nothing special about any value, 0 is the value to choose,
> because of simplicity. Once again, I only explained debugging advantages
> because you've asked about it. It's just a nice side-effect, as it
> often happens to be when one keeps things as simple as possible.
>
> > Please note that, although my comments appear to be in disagreement with
> > your idea, they are in fact not at all. It's just that, if there's a a
> > particular answer to "what time to initialize a PHC with" that is more
> > favourable than the rest (even though the question itself is a bit
> > irrelevant overall), then that answer ought to be enforced kernel-wide,
> > I think.
>
> As everybody, I believe in a set of generic programming principles that
> are not to be violated lightly. KISS is one of the principles I believe,
> and trying to be clever with no apparent reason is one way of violating
> it.
>
> Overall, here is my argument: 0 is simpler than kernel time, so how is
> it useful to initialize PTP with kernel time that is as wrong as a value
> for PTP time as 0?
>

And overall, my argument is: you are making a user-visible change, for
basically no strong reason, other than the fact that you like zero
better. You're trying to reduce confusion, not increase it, right?

I agree with the basic fact that zero is a simpler and more consistent
value to initialize a PHC with, than the system time. As I've already
shown to you, I even attempted to make a similar change to the ptp_qoriq
driver which was rejected. So I hoped that you could bring some better
arguments than "I believe 0 is simpler". Since no value is right, no
value is wrong either, so why make a change in the first place? The only
value in _changing_ to zero would be if all drivers were changed to use
it consistently, IMO.

But I will stop here and let the PTP maintainer make a choice. I only
intervened because I knew what the default answer was going to be.

> Thanks,
> -- Sergey.

Thanks,
-Vladimir

2020-07-07 17:10:04

[permalink] [raw]

Subject: Re: [PATCH 3/5] net: fec: initialize clock with 0 rather than current kernel time

Vladimir Oltean <[email protected]> writes:

> On Tue, Jul 07, 2020 at 07:07:08PM +0300, Sergey Organov wrote:
>> Vladimir Oltean <[email protected]> writes:
>> >
>> > What do you mean 'no ticking', and what do you mean by 'non-initialized
>> > clock' exactly? I don't know if the fec driver is special in any way, do
>> > you mean that multiple runs of $(phc_ctl /dev/ptp0 get) from user space
>> > all return 0? That is not at all what is to be expected, I think. The
>> > PHC is always ticking. Its time is increasing.
>>
>> That's how it is right now. My point is that it likely shouldn't. Why is
>> it ticking when nobody needs it? Does it draw more power due to that?
>>
>> > What would be that initialization procedure that makes it tick, and
>> > who is doing it (and when)?
>>
>> The user space code that cares, obviously. Most probably some PTP stack
>> daemon. I'd say that any set clock time ioctl() should start the clock,
>> or yet another ioctl() that enables/disables the clock, whatever.
>>
>
> That ioctl doesn't exist, at least not in PTP land. This also addresses
> your previous point.

struct timespec ts;
...
clock_settime(clkid, &ts)

That's the starting point of my own code, and I bet it's there in PTP
for Linux, as well as in PTPD, as I fail to see how it could possibly
work without it.

>
>> >
>> >> > Whatever the default value of the clock may be, it's bound to be
>> >> > confusing for some reason, _if_ the reason why you're investigating it
>> >> > in the first place is a driver bug. Also, I don't really see how your
>> >> > change to use Jan 1st 1970 makes it any less confusing.
>> >>
>> >> When I print the clocks in application, I see seconds and milliseconds
>> >> part since epoch. With this patch seconds count from 0, that simply
>> >> match uptime. Easy to tell from any other (malfunctioning) clock.
>> >>
>> >
>> > It doesn't really match uptime (CLOCK_MONOTONIC). Instead, it is just
>> > initialized with zero. If you have fec built as module and you insmod it
>> > after a few days of uptime, it will not track CLOCK_MONOTONIC at all.
>> >
>> > Not to say that there's anything wrong with initializing it with 0. It's
>> > just that I don't see why it would be objectively better.
>>
>> Well, it would have been better for me in my particular quest to find
>> the problem, so it rather needs to be shown where initializing with
>> kernel time is objectively better.
>>
>> Moreover, everything else being equal, 0 is always better, just because
>> of simplicity.
>>
>> >
>> >> Here is the description of confusion and improvement. I spent half a day
>> >> not realizing that I sometimes get timestamps from the wrong PTP clock.
>> >
>> > There is a suite of tests in tools/testing/selftests/ptp/ which is
>> > useful in debugging problems like this.
>> >
>> > Alternatively, you can write to each individual clock using $(phc_ctl
>> > /dev/ptpN set 0) and check your timestamps again. If the timestamps
>> > don't nudge, it's clear that the timestamps you're getting are not from
>> > the PHC you've written to. Much simpler.
>>
>> Maybe. Once you do figure there is another clock in the system and/or
>> that that clock is offending. In my case /that/ was the hard part, not
>> changing that offending clock, once found, to whatever.
>>
>
> And my point was that you could have been in a different situation, when
> all of your clocks could have been ticking in 1970, so this wouldn't
> have been a distiguishing point. So this argument is poor. Using
> phc_ctl, or scripts around that, is much more dynamic.
>
>> >
>> >> Part of the problem is that kernel time at startup, when it is used for
>> >> initialization of the PTP clock, is in fact somewhat random, and it
>> >> could be off by a few seconds.
>> >
>> > Yes, the kernel time at startup is exactly random (not traceable to any
>> > clock reference). And so is the PHC.
>> >
>> >> Now, when in application I get time stamp
>> >> that is almost right, and then another one that is, say, 9 seconds off,
>> >> what should I think? Right, that I drive PTP clock wrongly.
>> >>
>> >> Now, when one of those timestamps is almost 0, I see immediately I got
>> >> time from wrong PTP clock, rather than wrong time from correct PTP
>> >> clock.
>> >>
>> >
>> > There are 2 points to be made here:
>> >
>> > 1. There are simpler ways to debug your issue than to leave a patch in
>> > the kernel, like the "phc_ctl set 0" I mentioned above. This can be
>> > considered a debugging patch which is also going to have consequences
>> > for the other users of the driver, if applied. We need to consider
>> > whether the change in behavior is useful in general.
>>
>> This does not apply to my particular case as I explained above, and then
>> ease with debug is just a nice side-effect of code simplification.
>>
>> >
>> > 2. There are boards out there which don't have any battery-backed RTC,
>> > so CLOCK_REALTIME could be ticking in Jan 1970 already, and therefore
>> > the PHC would be initialized with a time in 1970. Or your GM might be
>> > configured to be ticking in Jan 1970 (there are some applications
>> > that only require the network to be synchronized, but not for the
>> > time to be traceable to TAI). How does your change make a difference
>> > to eliminate confusion there, when all of your clocks are going to be
>> > in 1970? It doesn't make a net difference. Bottom line, a clock
>> > initialized with 0 doesn't mean it's special in any way. You _could_
>> > make that change in your debugging environment, and it _could_ be
>> > useful to your debugging, but if it's not universally useful, I
>> > wouldn't try to patch the kernel with this change.
>>
>> If there is nothing special about any value, 0 is the value to choose,
>> because of simplicity. Once again, I only explained debugging advantages
>> because you've asked about it. It's just a nice side-effect, as it
>> often happens to be when one keeps things as simple as possible.
>>
>> > Please note that, although my comments appear to be in disagreement with
>> > your idea, they are in fact not at all. It's just that, if there's a a
>> > particular answer to "what time to initialize a PHC with" that is more
>> > favourable than the rest (even though the question itself is a bit
>> > irrelevant overall), then that answer ought to be enforced kernel-wide,
>> > I think.
>>
>> As everybody, I believe in a set of generic programming principles that
>> are not to be violated lightly. KISS is one of the principles I believe,
>> and trying to be clever with no apparent reason is one way of violating
>> it.
>>
>> Overall, here is my argument: 0 is simpler than kernel time, so how is
>> it useful to initialize PTP with kernel time that is as wrong as a value
>> for PTP time as 0?
>>
>
> And overall, my argument is: you are making a user-visible change, for
> basically no strong reason, other than the fact that you like zero
> better. You're trying to reduce confusion, not increase it, right?

No, not to increase, and I believe there is no way to increase it by
using the value that at least some of the drivers already use.

> I agree with the basic fact that zero is a simpler and more consistent
> value to initialize a PHC with, than the system time. As I've already
> shown to you, I even attempted to make a similar change to the ptp_qoriq
> driver which was rejected. So I hoped that you could bring some better
> arguments than "I believe 0 is simpler". Since no value is right, no
> value is wrong either, so why make a change in the first place? The only
> value in _changing_ to zero would be if all drivers were changed to use
> it consistently, IMO.
>
> But I will stop here and let the PTP maintainer make a choice. I only
> intervened because I knew what the default answer was going to be.

Thanks, so will I, as I won't care much either way, just wanted to make
life of somebody who might travel my path less painful. That person is
not who is already fluent in all the different opportunities to debug
PTP clocks, for sure.

Thanks,
-- Sergey

2020-07-07 17:13:24

[permalink] [raw]

Subject: Re: [PATCH 3/5] net: fec: initialize clock with 0 rather than current kernel time

On Tue, Jul 07, 2020 at 08:09:07PM +0300, Sergey Organov wrote:
> Vladimir Oltean <[email protected]> writes:
>
> > On Tue, Jul 07, 2020 at 07:07:08PM +0300, Sergey Organov wrote:
> >> Vladimir Oltean <[email protected]> writes:
> >> >
> >> > What do you mean 'no ticking', and what do you mean by 'non-initialized
> >> > clock' exactly? I don't know if the fec driver is special in any way, do
> >> > you mean that multiple runs of $(phc_ctl /dev/ptp0 get) from user space
> >> > all return 0? That is not at all what is to be expected, I think. The
> >> > PHC is always ticking. Its time is increasing.
> >>
> >> That's how it is right now. My point is that it likely shouldn't. Why is
> >> it ticking when nobody needs it? Does it draw more power due to that?
> >>
> >> > What would be that initialization procedure that makes it tick, and
> >> > who is doing it (and when)?
> >>
> >> The user space code that cares, obviously. Most probably some PTP stack
> >> daemon. I'd say that any set clock time ioctl() should start the clock,
> >> or yet another ioctl() that enables/disables the clock, whatever.
> >>
> >
> > That ioctl doesn't exist, at least not in PTP land. This also addresses
> > your previous point.
>
> struct timespec ts;
> ...
> clock_settime(clkid, &ts)
>
> That's the starting point of my own code, and I bet it's there in PTP
> for Linux, as well as in PTPD, as I fail to see how it could possibly
> work without it.
>

This won't stop it from ticking, which is what we were talking about,
will it?

Thanks,
-Vladimir

2020-07-07 17:57:17

[permalink] [raw]

Subject: Re: [PATCH 3/5] net: fec: initialize clock with 0 rather than current kernel time

Vladimir Oltean <[email protected]> writes:

> On Tue, Jul 07, 2020 at 08:09:07PM +0300, Sergey Organov wrote:
>> Vladimir Oltean <[email protected]> writes:
>>
>> > On Tue, Jul 07, 2020 at 07:07:08PM +0300, Sergey Organov wrote:
>> >> Vladimir Oltean <[email protected]> writes:
>> >> >
>> >> > What do you mean 'no ticking', and what do you mean by 'non-initialized
>> >> > clock' exactly? I don't know if the fec driver is special in any way, do
>> >> > you mean that multiple runs of $(phc_ctl /dev/ptp0 get) from user space
>> >> > all return 0? That is not at all what is to be expected, I think. The
>> >> > PHC is always ticking. Its time is increasing.
>> >>
>> >> That's how it is right now. My point is that it likely shouldn't. Why is
>> >> it ticking when nobody needs it? Does it draw more power due to that?
>> >>
>> >> > What would be that initialization procedure that makes it tick, and
>> >> > who is doing it (and when)?
>> >>
>> >> The user space code that cares, obviously. Most probably some PTP stack
>> >> daemon. I'd say that any set clock time ioctl() should start the clock,
>> >> or yet another ioctl() that enables/disables the clock, whatever.
>> >>
>> >
>> > That ioctl doesn't exist, at least not in PTP land. This also addresses
>> > your previous point.
>>
>> struct timespec ts;
>> ...
>> clock_settime(clkid, &ts)
>>
>> That's the starting point of my own code, and I bet it's there in PTP
>> for Linux, as well as in PTPD, as I fail to see how it could possibly
>> work without it.
>>
>
> This won't stop it from ticking, which is what we were talking about,
> will it?

It won't. Supposedly it'd force clock (that doesn't tick by default and
stays at 0) to start ticking.

If the ability to stop the clock is in fact useful (I don't immediately
see how it could), and how to do it if it is, is a separate issue. I'd
then expect clock_stop(clkid), or clock_settime(clkid, NULL) to do the
job.

Thanks,
-- Sergey

2020-07-08 11:08:04

[permalink] [raw]

Subject: Re: [PATCH 3/5] net: fec: initialize clock with 0 rather than current kernel time

On Mon, Jul 06, 2020 at 06:27:21PM +0300, Vladimir Oltean wrote:
> There's no correct answer, I'm afraid. Whatever the default value of the
> clock may be, it's bound to be confusing for some reason, _if_ the
> reason why you're investigating it in the first place is a driver bug.
> Also, I don't really see how your change to use Jan 1st 1970 makes it
> any less confusing.

+1

For a PHC, the user of the clock must check the PTP stack's
synchronization flags via the management interface to know the status
of the time signal.

Thanks,
Richard

2020-07-08 11:12:31

[permalink] [raw]

Subject: Re: [PATCH 3/5] net: fec: initialize clock with 0 rather than current kernel time

On Tue, Jul 07, 2020 at 07:43:29PM +0300, Vladimir Oltean wrote:
> And overall, my argument is: you are making a user-visible change, for
> basically no strong reason, other than the fact that you like zero
> better. You're trying to reduce confusion, not increase it, right?

;^)

> I agree with the basic fact that zero is a simpler and more consistent
> value to initialize a PHC with, than the system time. As I've already
> shown to you, I even attempted to make a similar change to the ptp_qoriq
> driver which was rejected. So I hoped that you could bring some better
> arguments than "I believe 0 is simpler". Since no value is right, no
> value is wrong either, so why make a change in the first place? The only
> value in _changing_ to zero would be if all drivers were changed to use
> it consistently, IMO.

Right.

I would not appose making all PHCs start with zero. If you feel
strongly about starting all PHCs at zero, please prepare a patch set
and get the ACKs of the appropriate driver maintainers.

(The effort seems pointless to me because the user needs to consult
the synchronization flags in any case.)

Thanks,
Richard

2020-07-08 11:15:54

[permalink] [raw]

Subject: Re: [PATCH 3/5] net: fec: initialize clock with 0 rather than current kernel time

On Tue, Jul 07, 2020 at 08:56:41PM +0300, Sergey Organov wrote:
> It won't. Supposedly it'd force clock (that doesn't tick by default and
> stays at 0) to start ticking.

No existing clockid_t has this behavior. Consider CLOCK_REALTIME or
CLOCK_MONOTONIC.

The PHC must act the same as the other POSIX clocks.

Thanks,
Richard

2020-07-08 12:18:03

[permalink] [raw]

Subject: Re: [PATCH 3/5] net: fec: initialize clock with 0 rather than current kernel time

Richard Cochran <[email protected]> writes:

> On Tue, Jul 07, 2020 at 08:56:41PM +0300, Sergey Organov wrote:
>> It won't. Supposedly it'd force clock (that doesn't tick by default and
>> stays at 0) to start ticking.
>
> No existing clockid_t has this behavior. Consider CLOCK_REALTIME or
> CLOCK_MONOTONIC.
>
> The PHC must act the same as the other POSIX clocks.

Yeah, that's a good argument!

Thanks,
-- Sergey

2020-07-08 12:25:49

[permalink] [raw]

Subject: Re: [PATCH 3/5] net: fec: initialize clock with 0 rather than current kernel time

Richard Cochran <[email protected]> writes:

> On Mon, Jul 06, 2020 at 06:27:21PM +0300, Vladimir Oltean wrote:
>> There's no correct answer, I'm afraid. Whatever the default value of the
>> clock may be, it's bound to be confusing for some reason, _if_ the
>> reason why you're investigating it in the first place is a driver bug.
>> Also, I don't really see how your change to use Jan 1st 1970 makes it
>> any less confusing.
>
> +1
>
> For a PHC, the user of the clock must check the PTP stack's
> synchronization flags via the management interface to know the status
> of the time signal.

Sorry, but I'm looking at it from the POV of PTP stack, so I don't have
another one to check.

If PTP clock itself had some means of checking for its quality (like
last time is was synchronized, skew and offset estimations at that time,
etc.), then it'd render its initial value mostly unimportant indeed.

But even then, the original problem was that Ethernet packets were time
stamped with the wrong (different) PTP clock, and there is no any flags
or clock ID in the time stamp of Ethernet packet, so any flags attached
to PTP clock are still useless.

Thanks,
-- Sergey

2020-07-08 12:38:07

[permalink] [raw]

Subject: Re: [PATCH 3/5] net: fec: initialize clock with 0 rather than current kernel time

Richard Cochran <[email protected]> writes:

> On Mon, Jul 06, 2020 at 06:27:21PM +0300, Vladimir Oltean wrote:
>> There's no correct answer, I'm afraid. Whatever the default value of the
>> clock may be, it's bound to be confusing for some reason, _if_ the
>> reason why you're investigating it in the first place is a driver bug.
>> Also, I don't really see how your change to use Jan 1st 1970 makes it
>> any less confusing.
>
> +1
>
> For a PHC, the user of the clock must check the PTP stack's
> synchronization flags via the management interface to know the status
> of the time signal.

Actually, as I just realized, the right solution for my original problem
would rather be adding PTP clock ID that time stamped Ethernet packet to
the Ethernet hardware time stamp (see my previous reply as well).

Thanks,
-- Sergey

2020-07-08 14:50:16

[permalink] [raw]

Subject: Re: [PATCH 3/5] net: fec: initialize clock with 0 rather than current kernel time

On Wed, Jul 08, 2020 at 03:37:29PM +0300, Sergey Organov wrote:
> Richard Cochran <[email protected]> writes:
>
> > On Mon, Jul 06, 2020 at 06:27:21PM +0300, Vladimir Oltean wrote:
> >> There's no correct answer, I'm afraid. Whatever the default value of the
> >> clock may be, it's bound to be confusing for some reason, _if_ the
> >> reason why you're investigating it in the first place is a driver bug.
> >> Also, I don't really see how your change to use Jan 1st 1970 makes it
> >> any less confusing.
> >
> > +1
> >
> > For a PHC, the user of the clock must check the PTP stack's
> > synchronization flags via the management interface to know the status
> > of the time signal.
>
> Actually, as I just realized, the right solution for my original problem
> would rather be adding PTP clock ID that time stamped Ethernet packet to
> the Ethernet hardware time stamp (see my previous reply as well).

I think you misunderstood my point. I wasn't commenting on the
"stacked" MAC/PHY/DSA time stamp issue in the kernel.

I am talking about the question of whether to initialize the PHC time
to zero (decades off from TAI) or ktime (likely about 37 seconds off
from TAI). It does not really matter, because the user must not guess
whether the time is valid based on the value. Instead, the user
should query the relevant PTP data sets in a "live" online manner.

For example, to tell whether a PHC is synchronized to anything at all,
you need to check PORT_DATA_SET.portState and probably also
CURRENT_DATA_SET.offsetFromMaster depending on your needs.

In addition, if you care about global time, you need to check:

TIME_PROPERTIES_DATA_SET
currentUtcOffset
currentUtcOffsetValid
ptpTimescale
timeTraceable
frequencyTraceable

Thanks,
Richard

2020-07-08 17:40:19