2014-10-20 09:51:45

by Thomas Meyer

[permalink] [raw]
Subject: Re: [uml-devel] [UM] Long loop in __getnsdayoftime() after resume from ram

Am 20.10.2014 10:27 schrieb Richard Weinberger <[email protected]>:
>
> On Sun, Oct 19, 2014 at 2:39 PM, Thomas Meyer <[email protected]> wrote:
> > Hello,
> >
> > in UML kernel I get a long cpu using loop in __getnstimeofday()
> > (kernel/time/timekeeping.c:315) in the call of timespec_add_ns(),
> > when I left the host kernel suspended to ram for a few hours and resume
> > again.
> > this is because it seems like the tk->xtime_sec wasn't updated yet, but
> > the nsecs were. nsecs can be as high as 8111000111000111000l
> >
> > the function timespec_add_ns() (include/linux/time.h:266) will call
> > __iter_div_u64_rem() which has an optimized loop for the case
> > that the dividend is not much bigger as the divisior.
> > but this isn't the case for resume from ram on the host kernel.
> >
> > any ideas how to fix this? is it possible to intercept the resume from
> > ram and update the timekeeper->xtime_sec somehow?
> > or can the um arch somehow overwrite timespec_add_ns() to always use
> > div_u64_rem() instead?
>
> Hmm, does this always happen?

Yes, my single core system seems to trigger this every time after resume from ram.

> At least on my notebook it did not happen. I've started an UML yesterday
> suspended it and after more than 12h it worked fine today.
>
> BTW: Do you see the issue also then freezing UML using the freezer cgroup?

I'm not sure what do you mean by this. Do I need to enable some special configs for this in the host or uml kernel?

> Would be easier to debug. :)
>
> --
> Thanks,
> //richard
????{.n?+???????+%?????ݶ??w??{.n?+????{??G?????{ay?ʇڙ?,j??f???h?????????z_??(?階?ݢj"???m??????G????????????&???~???iO???z??v?^?m???? ????????I?


2014-10-20 09:56:22

by Richard Weinberger

[permalink] [raw]
Subject: Re: [uml-devel] [UM] Long loop in __getnsdayoftime() after resume from ram

Am 20.10.2014 um 11:51 schrieb Thomas Meyer:
>> Hmm, does this always happen?
>
> Yes, my single core system seems to trigger this every time after resume from ram.

What is your host kernel?

>> At least on my notebook it did not happen. I've started an UML yesterday
>> suspended it and after more than 12h it worked fine today.
>>
>> BTW: Do you see the issue also then freezing UML using the freezer cgroup?
>
> I'm not sure what do you mean by this. Do I need to enable some special configs for this in the host or uml kernel?

Create on the host side a new freezer cgroup, put UML into it and freeze/thaw it.
i.e. mkdir /sys/fs/cgroup/freezer/uml ; echo <pid of a shell> > /sys/fs/cgroup/freezer/uml/tasks.
In the said shell run UML and then freeze it using echo FROZEN > /sys/fs/cgroup/freezer/uml/freezer.state.
Later thaw it: echo THAWED > /sys/fs/cgroup/freezer/uml/freezer.state

Thanks,
//richard

2014-10-20 19:19:32

by Thomas Meyer

[permalink] [raw]
Subject: Re: [uml-devel] [UM] Long loop in __getnsdayoftime() after resume from ram

Am Montag, den 20.10.2014, 11:56 +0200 schrieb Richard Weinberger:
> Am 20.10.2014 um 11:51 schrieb Thomas Meyer:
> >> Hmm, does this always happen?
> >
> > Yes, my single core system seems to trigger this every time after resume from ram.
>
> What is your host kernel?

The standard Fedora kernel: 3.16.4-200.fc20.x86_64

>
> >> At least on my notebook it did not happen. I've started an UML yesterday
> >> suspended it and after more than 12h it worked fine today.
> >>
> >> BTW: Do you see the issue also then freezing UML using the freezer cgroup?
> >
> > I'm not sure what do you mean by this. Do I need to enable some special configs for this in the host or uml kernel?
>
> Create on the host side a new freezer cgroup, put UML into it and freeze/thaw it.
> i.e. mkdir /sys/fs/cgroup/freezer/uml ; echo <pid of a shell> > /sys/fs/cgroup/freezer/uml/tasks.
> In the said shell run UML and then freeze it using echo FROZEN > /sys/fs/cgroup/freezer/uml/freezer.state.
> Later thaw it: echo THAWED > /sys/fs/cgroup/freezer/uml/freezer.state
>

I'll try this.
Thanks for this tip.

with kind regards
thomas

2015-04-24 19:58:53

by Thomas Meyer

[permalink] [raw]
Subject: Re: [uml-devel] [UM] Long loop in __getnsdayoftime() after resume from ram

Am Montag, den 20.10.2014, 11:56 +0200 schrieb Richard Weinberger:
> Am 20.10.2014 um 11:51 schrieb Thomas Meyer:
> >> Hmm, does this always happen?
> >
> > Yes, my single core system seems to trigger this every time after resume from ram.
>
> What is your host kernel?
>
> >> At least on my notebook it did not happen. I've started an UML yesterday
> >> suspended it and after more than 12h it worked fine today.
> >>
> >> BTW: Do you see the issue also then freezing UML using the freezer cgroup?
> >
> > I'm not sure what do you mean by this. Do I need to enable some special configs for this in the host or uml kernel?
>
> Create on the host side a new freezer cgroup, put UML into it and freeze/thaw it.
> i.e. mkdir /sys/fs/cgroup/freezer/uml ; echo <pid of a shell> > /sys/fs/cgroup/freezer/uml/tasks.
> In the said shell run UML and then freeze it using echo FROZEN > /sys/fs/cgroup/freezer/uml/freezer.state.
> Later thaw it: echo THAWED > /sys/fs/cgroup/freezer/uml/freezer.state
>

Sadly, this also happens with a cgroup freezer group :-(

bt
#0 __iter_div_u64_rem (remainder=<optimized out>, divisor=<optimized out>, dividend=14641577537827850536) at include/linux/math64.h:12
7
#1 timespec_add_ns (ns=<optimized out>, a=<optimized out>) at include/linux/time.h:235
#2 __getnstimeofday64 (ts=0xffffffffffffffff) at kernel/time/timekeeping.c:658
#3 0x0000000060098a00 in getnstimeofday64 (ts=<optimized out>) at kernel/time/timekeeping.c:678
#4 0x0000000060098a4c in do_gettimeofday (tv=0xab359e50) at kernel/time/timekeeping.c:897
#5 0x0000000060090d66 in SYSC_gettimeofday (tz=<optimized out>, tv=<optimized out>) at kernel/time/time.c:107
#6 SyS_gettimeofday (tv=-1, tz=2097152000) at kernel/time/time.c:102
#7 0x0000000060032cf3 in handle_syscall (r=0xa39db9e8) at arch/um/kernel/skas/syscall.c:35
#8 0x000000006004a247 in handle_trap (local_using_sysemu=<optimized out>, regs=<optimized out>, pid=<optimized out>) at arch/um/os-Lin
ux/skas/process.c:174
#9 userspace (regs=0xa39db9e8) at arch/um/os-Linux/skas/process.c:399
#10 0x000000006002f125 in fork_handler () at arch/um/kernel/process.c:149
#11 0x0000000000000000 in ?? ()

It seems as only very few people running UML kernels and suspend their host systems...

Any ideas?

I'll go with this patch so long:
diff --git a/include/linux/time.h b/include/linux/time.h
index beebe3a..3486050 100644
--- a/include/linux/time.h
+++ b/include/linux/time.h
@@ -232,7 +232,7 @@ extern struct timeval ns_to_timeval(const s64 nsec);
*/
static __always_inline void timespec_add_ns(struct timespec *a, u64 ns)
{
- a->tv_sec += __iter_div_u64_rem(a->tv_nsec + ns, NSEC_PER_SEC, &ns);
+ a->tv_sec += div_u64_rem(a->tv_nsec + ns, NSEC_PER_SEC, &ns);
a->tv_nsec = ns;
}

kind regards
thomas

2015-04-26 18:32:13

by Richard Weinberger

[permalink] [raw]
Subject: Re: [uml-devel] [UM] Long loop in __getnsdayoftime() after resume from ram

On Fri, Apr 24, 2015 at 9:58 PM, Thomas Meyer <[email protected]> wrote:
> Am Montag, den 20.10.2014, 11:56 +0200 schrieb Richard Weinberger:
>> Am 20.10.2014 um 11:51 schrieb Thomas Meyer:
>> >> Hmm, does this always happen?
>> >
>> > Yes, my single core system seems to trigger this every time after resume from ram.
>>
>> What is your host kernel?
>>
>> >> At least on my notebook it did not happen. I've started an UML yesterday
>> >> suspended it and after more than 12h it worked fine today.
>> >>
>> >> BTW: Do you see the issue also then freezing UML using the freezer cgroup?
>> >
>> > I'm not sure what do you mean by this. Do I need to enable some special configs for this in the host or uml kernel?
>>
>> Create on the host side a new freezer cgroup, put UML into it and freeze/thaw it.
>> i.e. mkdir /sys/fs/cgroup/freezer/uml ; echo <pid of a shell> > /sys/fs/cgroup/freezer/uml/tasks.
>> In the said shell run UML and then freeze it using echo FROZEN > /sys/fs/cgroup/freezer/uml/freezer.state.
>> Later thaw it: echo THAWED > /sys/fs/cgroup/freezer/uml/freezer.state
>>
>
> Sadly, this also happens with a cgroup freezer group :-(
>
> bt
> #0 __iter_div_u64_rem (remainder=<optimized out>, divisor=<optimized out>, dividend=14641577537827850536) at include/linux/math64.h:12
> 7
> #1 timespec_add_ns (ns=<optimized out>, a=<optimized out>) at include/linux/time.h:235
> #2 __getnstimeofday64 (ts=0xffffffffffffffff) at kernel/time/timekeeping.c:658
> #3 0x0000000060098a00 in getnstimeofday64 (ts=<optimized out>) at kernel/time/timekeeping.c:678
> #4 0x0000000060098a4c in do_gettimeofday (tv=0xab359e50) at kernel/time/timekeeping.c:897
> #5 0x0000000060090d66 in SYSC_gettimeofday (tz=<optimized out>, tv=<optimized out>) at kernel/time/time.c:107
> #6 SyS_gettimeofday (tv=-1, tz=2097152000) at kernel/time/time.c:102
> #7 0x0000000060032cf3 in handle_syscall (r=0xa39db9e8) at arch/um/kernel/skas/syscall.c:35
> #8 0x000000006004a247 in handle_trap (local_using_sysemu=<optimized out>, regs=<optimized out>, pid=<optimized out>) at arch/um/os-Lin
> ux/skas/process.c:174
> #9 userspace (regs=0xa39db9e8) at arch/um/os-Linux/skas/process.c:399
> #10 0x000000006002f125 in fork_handler () at arch/um/kernel/process.c:149
> #11 0x0000000000000000 in ?? ()
>
> It seems as only very few people running UML kernels and suspend their host systems...
>
> Any ideas?

Can you give the attached patch a try?
Let's see if it proves my theory.
Looks like UML's clocksource needs fixing.
--
Thanks,
//richard


Attachments:
uml_clock_fix.diff (966.00 B)

2015-04-26 20:21:55

by Richard Weinberger

[permalink] [raw]
Subject: Re: [uml-devel] [UM] Long loop in __getnsdayoftime() after resume from ram

Am 26.04.2015 um 20:32 schrieb Richard Weinberger:
> On Fri, Apr 24, 2015 at 9:58 PM, Thomas Meyer <[email protected]> wrote:
>> Any ideas?
>
> Can you give the attached patch a try?
> Let's see if it proves my theory.
> Looks like UML's clocksource needs fixing.

Please give also this patch a try.
I should fix your issue in a sane way.

Thanks,
//richard


Attachments:
uml_mono.diff (1.30 kB)

2015-04-26 20:57:38

by Thomas Meyer

[permalink] [raw]
Subject: Re: [uml-devel] [UM] Long loop in __getnsdayoftime() after resume from ram

Am Sonntag, den 26.04.2015, 20:32 +0200 schrieb Richard Weinberger:
> On Fri, Apr 24, 2015 at 9:58 PM, Thomas Meyer <[email protected]> wrote:
> > Am Montag, den 20.10.2014, 11:56 +0200 schrieb Richard Weinberger:
> >> Am 20.10.2014 um 11:51 schrieb Thomas Meyer:
> >> >> Hmm, does this always happen?
> >> >
> >> > Yes, my single core system seems to trigger this every time after resume from ram.
> >>
> >> What is your host kernel?
> >>
> >> >> At least on my notebook it did not happen. I've started an UML yesterday
> >> >> suspended it and after more than 12h it worked fine today.
> >> >>
> >> >> BTW: Do you see the issue also then freezing UML using the freezer cgroup?
> >> >
> >> > I'm not sure what do you mean by this. Do I need to enable some special configs for this in the host or uml kernel?
> >>
> >> Create on the host side a new freezer cgroup, put UML into it and freeze/thaw it.
> >> i.e. mkdir /sys/fs/cgroup/freezer/uml ; echo <pid of a shell> > /sys/fs/cgroup/freezer/uml/tasks.
> >> In the said shell run UML and then freeze it using echo FROZEN > /sys/fs/cgroup/freezer/uml/freezer.state.
> >> Later thaw it: echo THAWED > /sys/fs/cgroup/freezer/uml/freezer.state
> >>
> >
> > Sadly, this also happens with a cgroup freezer group :-(
> >
> > bt
> > #0 __iter_div_u64_rem (remainder=<optimized out>, divisor=<optimized out>, dividend=14641577537827850536) at include/linux/math64.h:12
> > 7
> > #1 timespec_add_ns (ns=<optimized out>, a=<optimized out>) at include/linux/time.h:235
> > #2 __getnstimeofday64 (ts=0xffffffffffffffff) at kernel/time/timekeeping.c:658
> > #3 0x0000000060098a00 in getnstimeofday64 (ts=<optimized out>) at kernel/time/timekeeping.c:678
> > #4 0x0000000060098a4c in do_gettimeofday (tv=0xab359e50) at kernel/time/timekeeping.c:897
> > #5 0x0000000060090d66 in SYSC_gettimeofday (tz=<optimized out>, tv=<optimized out>) at kernel/time/time.c:107
> > #6 SyS_gettimeofday (tv=-1, tz=2097152000) at kernel/time/time.c:102
> > #7 0x0000000060032cf3 in handle_syscall (r=0xa39db9e8) at arch/um/kernel/skas/syscall.c:35
> > #8 0x000000006004a247 in handle_trap (local_using_sysemu=<optimized out>, regs=<optimized out>, pid=<optimized out>) at arch/um/os-Lin
> > ux/skas/process.c:174
> > #9 userspace (regs=0xa39db9e8) at arch/um/os-Linux/skas/process.c:399
> > #10 0x000000006002f125 in fork_handler () at arch/um/kernel/process.c:149
> > #11 0x0000000000000000 in ?? ()
> >
> > It seems as only very few people running UML kernels and suspend their host systems...
> >
> > Any ideas?
>
> Can you give the attached patch a try?
> Let's see if it proves my theory.
> Looks like UML's clocksource needs fixing.

Hi Richard,

I did run this for an hour and did 4 suspend/resume cycles and it seems
not to hang any more!

I'll test your other patch the next week, but AFAIU using clock_gettime
should solve this hangs in a sane way.

Thanks for your support.

kind regards
thomas

2015-04-26 21:00:37

by Richard Weinberger

[permalink] [raw]
Subject: Re: [uml-devel] [UM] Long loop in __getnsdayoftime() after resume from ram

Hi!

Am 26.04.2015 um 22:57 schrieb Thomas Meyer:
> Am Sonntag, den 26.04.2015, 20:32 +0200 schrieb Richard Weinberger:
>> On Fri, Apr 24, 2015 at 9:58 PM, Thomas Meyer <[email protected]> wrote:
>>> Am Montag, den 20.10.2014, 11:56 +0200 schrieb Richard Weinberger:
>>>> Am 20.10.2014 um 11:51 schrieb Thomas Meyer:
>>>>>> Hmm, does this always happen?
>>>>>
>>>>> Yes, my single core system seems to trigger this every time after resume from ram.
>>>>
>>>> What is your host kernel?
>>>>
>>>>>> At least on my notebook it did not happen. I've started an UML yesterday
>>>>>> suspended it and after more than 12h it worked fine today.
>>>>>>
>>>>>> BTW: Do you see the issue also then freezing UML using the freezer cgroup?
>>>>>
>>>>> I'm not sure what do you mean by this. Do I need to enable some special configs for this in the host or uml kernel?
>>>>
>>>> Create on the host side a new freezer cgroup, put UML into it and freeze/thaw it.
>>>> i.e. mkdir /sys/fs/cgroup/freezer/uml ; echo <pid of a shell> > /sys/fs/cgroup/freezer/uml/tasks.
>>>> In the said shell run UML and then freeze it using echo FROZEN > /sys/fs/cgroup/freezer/uml/freezer.state.
>>>> Later thaw it: echo THAWED > /sys/fs/cgroup/freezer/uml/freezer.state
>>>>
>>>
>>> Sadly, this also happens with a cgroup freezer group :-(
>>>
>>> bt
>>> #0 __iter_div_u64_rem (remainder=<optimized out>, divisor=<optimized out>, dividend=14641577537827850536) at include/linux/math64.h:12
>>> 7
>>> #1 timespec_add_ns (ns=<optimized out>, a=<optimized out>) at include/linux/time.h:235
>>> #2 __getnstimeofday64 (ts=0xffffffffffffffff) at kernel/time/timekeeping.c:658
>>> #3 0x0000000060098a00 in getnstimeofday64 (ts=<optimized out>) at kernel/time/timekeeping.c:678
>>> #4 0x0000000060098a4c in do_gettimeofday (tv=0xab359e50) at kernel/time/timekeeping.c:897
>>> #5 0x0000000060090d66 in SYSC_gettimeofday (tz=<optimized out>, tv=<optimized out>) at kernel/time/time.c:107
>>> #6 SyS_gettimeofday (tv=-1, tz=2097152000) at kernel/time/time.c:102
>>> #7 0x0000000060032cf3 in handle_syscall (r=0xa39db9e8) at arch/um/kernel/skas/syscall.c:35
>>> #8 0x000000006004a247 in handle_trap (local_using_sysemu=<optimized out>, regs=<optimized out>, pid=<optimized out>) at arch/um/os-Lin
>>> ux/skas/process.c:174
>>> #9 userspace (regs=0xa39db9e8) at arch/um/os-Linux/skas/process.c:399
>>> #10 0x000000006002f125 in fork_handler () at arch/um/kernel/process.c:149
>>> #11 0x0000000000000000 in ?? ()
>>>
>>> It seems as only very few people running UML kernels and suspend their host systems...
>>>
>>> Any ideas?
>>
>> Can you give the attached patch a try?
>> Let's see if it proves my theory.
>> Looks like UML's clocksource needs fixing.
>
> Hi Richard,
>
> I did run this for an hour and did 4 suspend/resume cycles and it seems
> not to hang any more!

Yay!
BTW: Changing the host's time should also work for testing...

> I'll test your other patch the next week, but AFAIU using clock_gettime
> should solve this hangs in a sane way.

Yep. I have no idea why UML is currently using gettimeofday() as clocksource,
this is completely bogus. ;-\

Thanks,
//richard