2011-02-24 00:08:30

by Nikola Ciprich

[permalink] [raw]
Subject: regression - 2.6.36 -> 2.6.37 - kvm - 32bit SMP guests don't boot

Hello Avi et al,
seems like I've hit regression in 2.6.37:
32bit SMP centos guest stopped booting, they just hang during initrd phase. (haven't tried
different distros)
UP guest are OK.
when I (forcibly) compiled kvm-kmod-2.6.36.2 and used it in 2.6.37, even
the SMP guests boot fine.
does somebody have a tip on where the problem could be, or should I bisect this?
I tried on 2 different machines, host is x86_64, qemu-kvm 0.13.0, 0.14.0.
If I shall provide more information (or bisect), please let me know.
cheers!
nik


--
-------------------------------------
Ing. Nikola CIPRICH
LinuxBox.cz, s.r.o.
28. rijna 168, 709 01 Ostrava

tel.: +420 596 603 142
fax: +420 596 621 273
mobil: +420 777 093 799

http://www.linuxbox.cz

mobil servis: +420 737 238 656
email servis: [email protected]
-------------------------------------


2011-02-24 10:17:48

by Avi Kivity

[permalink] [raw]
Subject: Re: regression - 2.6.36 -> 2.6.37 - kvm - 32bit SMP guests don't boot

On 02/24/2011 01:42 AM, Nikola Ciprich wrote:
> Hello Avi et al,
> seems like I've hit regression in 2.6.37:
> 32bit SMP centos guest stopped booting, they just hang during initrd phase. (haven't tried
> different distros)
> UP guest are OK.
> when I (forcibly) compiled kvm-kmod-2.6.36.2 and used it in 2.6.37, even
> the SMP guests boot fine.
> does somebody have a tip on where the problem could be, or should I bisect this?
> I tried on 2 different machines, host is x86_64, qemu-kvm 0.13.0, 0.14.0.
> If I shall provide more information (or bisect), please let me know.

Bisect is of course great, if laborious. Meanwhile can you post 'info
registers' for all cpus? Is the guest consuming cpu? kvm_stat output?

--
error compiling committee.c: too many arguments to function

2011-02-24 10:49:10

by Nikola Ciprich

[permalink] [raw]
Subject: Re: regression - 2.6.36 -> 2.6.37 - kvm - 32bit SMP guests don't boot

On Thu, Feb 24, 2011 at 12:17:40PM +0200, Avi Kivity wrote:
> On 02/24/2011 01:42 AM, Nikola Ciprich wrote:
>> Hello Avi et al,
>> seems like I've hit regression in 2.6.37:
>> 32bit SMP centos guest stopped booting, they just hang during initrd phase. (haven't tried
>> different distros)
>> UP guest are OK.
>> when I (forcibly) compiled kvm-kmod-2.6.36.2 and used it in 2.6.37, even
>> the SMP guests boot fine.
>> does somebody have a tip on where the problem could be, or should I bisect this?
>> I tried on 2 different machines, host is x86_64, qemu-kvm 0.13.0, 0.14.0.
>> If I shall provide more information (or bisect), please let me know.
>
> Bisect is of course great, if laborious. Meanwhile can you post 'info
> registers' for all cpus? Is the guest consuming cpu? kvm_stat output?
yes, it's eating 100% of one CPU core.

kvm_stat for few seconds (hunged guest is the only one running on the host):

kvm_entry 29327 9091
kvm_exit 29357 9090
kvm_inj_virq 24588 7609
kvm_apic_accept_irq 17146 5310
kvm_emulate_insn 12682 3931
kvm_apic 12530 3879
kvm_mmio 12525 3879
kvm_exit(APIC_ACCESS) 12525 3879
kvm_exit(HLT) 11262 3466
kvm_ioapic_set_irq 6532 2024
kvm_set_irq 6538 2024
kvm_pic_set_irq 6536 2024
kvm_exit(EXTERNAL_INTERRUPT) 4255 1300
kvm_ack_irq 2442 756
kvm_exit(PENDING_INTERRUPT) 1030 335
kvm_exit(IO_INSTRUCTION) 313 104
kvm_pio 312 104
kvm_age_page 18 6
kvm_exit(EPT_VIOLATION) 14 4
kvm_page_fault 12 4
kvm_exit(INVALID_STATE) 4 0
kvm_exit(VMLAUNCH) 3 0
kvm_exit(CPUID) 3 0
kvm_exit(DR_ACCESS) 2 0
kvm_exit(MSR_READ) 2 0
kvm_exit(PAUSE_INSTRUCTION) 1 0

info registers:
EAX=00000000 EBX=6a000000 ECX=0000000a EDX=000f41a8
ESI=000f41a8 EDI=00000000 EBP=c0690320 ESP=c0769f58
EIP=c042d137 EFL=00000002 [-------] CPL=0 II=0 A20=1 SMM=0 HLT=0
ES =007b 00000000 ffffffff 00c0f300 DPL=3 DS [-WA]
CS =0060 00000000 ffffffff 00c09b00 DPL=0 CS32 [-RA]
SS =0068 00000000 ffffffff 00c09300 DPL=0 DS [-WA]
DS =007b 00000000 ffffffff 00c0f300 DPL=3 DS [-WA]
FS =0000 00000000 ffffffff 00000000
GS =0000 00000000 ffffffff 00000000
LDT=0088 c0747020 00000027 00008200 DPL=0 LDT
TR =0080 c300f380 00002073 00008b00 DPL=0 TSS32-busy
GDT= c302b000 000000ff
IDT= c06f7000 000007ff
CR0=8005003b CR2=ffc46000 CR3=00743000 CR4=000006d0
DR0=0000000000000000 DR1=0000000000000000 DR2=0000000000000000 DR3=0000000000000000
DR6=00000000ffff0ff0 DR7=0000000000000400
EFER=0000000000000000
FCW=037f FSW=0000 [ST=0] FTW=00 MXCSR=00001f80
FPR0=0000000000000000 0000 FPR1=0000000000000000 0000
FPR2=0000000000000000 0000 FPR3=0000000000000000 0000
FPR4=0000000000000000 0000 FPR5=0000000000000000 0000
FPR6=800bf60000000000 4015 FPR7=0000000000000000 0000
XMM00=00000000000000000000000000000000 XMM01=00000000000000000000000000000000
XMM02=00000000000000000000000000000000 XMM03=00000000000000000000000000000000
XMM04=00000000000000000000000000000000 XMM05=00000000000000000000000000000000
XMM06=00000000000000000000000000000000 XMM07=00000000000000000000000000000000

I'll wait a bit with bisect whether You'll spot some obvious bug or not ;)
thanks for Your time!

PS: I still owe You the kvm_stat comparison about this slow windows chkdsk problem,
I'm aware of it, I just had to postpone this due to more urgent matters :(
but I'll get back to it sooner or later..

>
> --
> error compiling committee.c: too many arguments to function
>
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>

--
-------------------------------------
Ing. Nikola CIPRICH
LinuxBox.cz, s.r.o.
28. rijna 168, 709 01 Ostrava

tel.: +420 596 603 142
fax: +420 596 621 273
mobil: +420 777 093 799
http://www.linuxbox.cz

mobil servis: +420 737 238 656
email servis: [email protected]
-------------------------------------

2011-02-24 10:53:02

by Avi Kivity

[permalink] [raw]
Subject: Re: regression - 2.6.36 -> 2.6.37 - kvm - 32bit SMP guests don't boot

On 02/24/2011 12:48 PM, Nikola Ciprich wrote:
> On Thu, Feb 24, 2011 at 12:17:40PM +0200, Avi Kivity wrote:
> > On 02/24/2011 01:42 AM, Nikola Ciprich wrote:
> >> Hello Avi et al,
> >> seems like I've hit regression in 2.6.37:
> >> 32bit SMP centos guest stopped booting, they just hang during initrd phase. (haven't tried
> >> different distros)
> >> UP guest are OK.
> >> when I (forcibly) compiled kvm-kmod-2.6.36.2 and used it in 2.6.37, even
> >> the SMP guests boot fine.
> >> does somebody have a tip on where the problem could be, or should I bisect this?
> >> I tried on 2 different machines, host is x86_64, qemu-kvm 0.13.0, 0.14.0.
> >> If I shall provide more information (or bisect), please let me know.
> >
> > Bisect is of course great, if laborious. Meanwhile can you post 'info
> > registers' for all cpus? Is the guest consuming cpu? kvm_stat output?
> yes, it's eating 100% of one CPU core.
>
> kvm_stat for few seconds (hunged guest is the only one running on the host):
>
> kvm_entry 29327 9091
> kvm_exit 29357 9090
> kvm_inj_virq 24588 7609
> kvm_apic_accept_irq 17146 5310
> kvm_emulate_insn 12682 3931
> kvm_apic 12530 3879
> kvm_mmio 12525 3879
> kvm_exit(APIC_ACCESS) 12525 3879
> kvm_exit(HLT) 11262 3466
> kvm_ioapic_set_irq 6532 2024
> kvm_set_irq 6538 2024
> kvm_pic_set_irq 6536 2024
> kvm_exit(EXTERNAL_INTERRUPT) 4255 1300
> kvm_ack_irq 2442 756
> kvm_exit(PENDING_INTERRUPT) 1030 335
> kvm_exit(IO_INSTRUCTION) 313 104
> kvm_pio 312 104
> kvm_age_page 18 6
> kvm_exit(EPT_VIOLATION) 14 4
> kvm_page_fault 12 4
> kvm_exit(INVALID_STATE) 4 0
> kvm_exit(VMLAUNCH) 3 0
> kvm_exit(CPUID) 3 0
> kvm_exit(DR_ACCESS) 2 0
> kvm_exit(MSR_READ) 2 0
> kvm_exit(PAUSE_INSTRUCTION) 1 0
>

Guest is churning along.

> info registers:
> EAX=00000000 EBX=6a000000 ECX=0000000a EDX=000f41a8
> ESI=000f41a8 EDI=00000000 EBP=c0690320 ESP=c0769f58
> EIP=c042d137 EFL=00000002 [-------] CPL=0 II=0 A20=1 SMM=0 HLT=0
>

Not very useful when the guest is making progress, I'm afraid.

> I'll wait a bit with bisect whether You'll spot some obvious bug or not ;)
> thanks for Your time!

Can you try a little trace-cmd -e kvm -b 20000?

> PS: I still owe You the kvm_stat comparison about this slow windows chkdsk problem,
> I'm aware of it, I just had to postpone this due to more urgent matters :(
> but I'll get back to it sooner or later..

Sure. Something similar that came up - sometimes Windows IDE drivers
fall back to PIO mode. Are you using IDE? If so, please check whether
it's using DMA or PIO.

--
error compiling committee.c: too many arguments to function

2011-02-24 11:28:51

by Nikola Ciprich

[permalink] [raw]
Subject: Re: regression - 2.6.36 -> 2.6.37 - kvm - 32bit SMP guests don't boot

> Not very useful when the guest is making progress, I'm afraid.
can perf report help here?

> Can you try a little trace-cmd -e kvm -b 20000?
ugh, I'm afraid I'll have some dumb questions here :-[
You mean this: git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/trace-cmd.git ?
and then re-execute qemu-kvm using it? or I'm totally wrong?

> Sure. Something similar that came up - sometimes Windows IDE drivers
> fall back to PIO mode. Are you using IDE? If so, please check whether
> it's using DMA or PIO.
I'll check, but this problem occurs only during fsck phase, when to guest boots, then it runs pretty fast..
so maybe during boot it might fall back to PIO, but from guest, I guess I won't have a chance
to find out.. can I somehow check it from host?
>
> --
> error compiling committee.c: too many arguments to function
>
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>

--
-------------------------------------
Ing. Nikola CIPRICH
LinuxBox.cz, s.r.o.
28. rijna 168, 709 01 Ostrava

tel.: +420 596 603 142
fax: +420 596 621 273
mobil: +420 777 093 799
http://www.linuxbox.cz

mobil servis: +420 737 238 656
email servis: [email protected]
-------------------------------------

2011-02-24 12:26:55

by Avi Kivity

[permalink] [raw]
Subject: Re: regression - 2.6.36 -> 2.6.37 - kvm - 32bit SMP guests don't boot

On 02/24/2011 01:27 PM, Nikola Ciprich wrote:
> > Not very useful when the guest is making progress, I'm afraid.
> can perf report help here?
>
> > Can you try a little trace-cmd -e kvm -b 20000?
> ugh, I'm afraid I'll have some dumb questions here :-[
> You mean this: git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/trace-cmd.git ?

Yes. If you have udis86 and udis86-devel installed when building it,
it's even better.

> and then re-execute qemu-kvm using it? or I'm totally wrong?

You don't have to execute qemu-kvm under it, if you have a running
instance you can run trace-cmd in parallel and it will record whatever's
happening.

> > Sure. Something similar that came up - sometimes Windows IDE drivers
> > fall back to PIO mode. Are you using IDE? If so, please check whether
> > it's using DMA or PIO.
> I'll check, but this problem occurs only during fsck phase, when to guest boots, then it runs pretty fast..
> so maybe during boot it might fall back to PIO, but from guest, I guess I won't have a chance
> to find out.. can I somehow check it from host?

The trace-cmd output will show. Please run trace-cmd report afterwards
and post the results somewhere.


--
error compiling committee.c: too many arguments to function

2011-02-24 12:51:30

by Avi Kivity

[permalink] [raw]
Subject: Re: regression - 2.6.36 -> 2.6.37 - kvm - 32bit SMP guests don't boot

On 02/24/2011 02:41 PM, Nikola Ciprich wrote:
> > Yes. If you have udis86 and udis86-devel installed when building it,
> > it's even better.
> yes, now I remember! I've done some tracing for You already..
>
> > You don't have to execute qemu-kvm under it, if you have a running
> > instance you can run trace-cmd in parallel and it will record whatever's
> > happening.
> I've uploaded the report for You here:
> nelide.cz/downloads/nik/report.txt.xz
>

The only activity I can see is the timer interrupt, so I'm afraid a
bisect is needed.

If you let git bisect just kvm, it'll be a bit faster:

$ git bisect $BAD $GOOD virt/kvm arch/x86/kvm

--
error compiling committee.c: too many arguments to function

2011-02-24 12:58:09

by Nikola Ciprich

[permalink] [raw]
Subject: Re: regression - 2.6.36 -> 2.6.37 - kvm - 32bit SMP guests don't boot

> The only activity I can see is the timer interrupt, so I'm afraid a
> bisect is needed.
OK, nevermind, it's easy to reproduce, so I'll just bisect it and report.
n.


>
> If you let git bisect just kvm, it'll be a bit faster:
>
> $ git bisect $BAD $GOOD virt/kvm arch/x86/kvm
>
> --
> error compiling committee.c: too many arguments to function
>
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>

--
-------------------------------------
Ing. Nikola CIPRICH
LinuxBox.cz, s.r.o.
28. rijna 168, 709 01 Ostrava

tel.: +420 596 603 142
fax: +420 596 621 273
mobil: +420 777 093 799
http://www.linuxbox.cz

mobil servis: +420 737 238 656
email servis: [email protected]
-------------------------------------

2011-02-24 13:05:50

by Nikola Ciprich

[permalink] [raw]
Subject: Re: regression - 2.6.36 -> 2.6.37 - kvm - 32bit SMP guests don't boot

> Yes. If you have udis86 and udis86-devel installed when building it,
> it's even better.
yes, now I remember! I've done some tracing for You already..

> You don't have to execute qemu-kvm under it, if you have a running
> instance you can run trace-cmd in parallel and it will record whatever's
> happening.
I've uploaded the report for You here:
nelide.cz/downloads/nik/report.txt.xz

> The trace-cmd output will show. Please run trace-cmd report afterwards
> and post the results somewhere.
OK, I'll prepare some new windows testing machine, try and report..

--
-------------------------------------
Ing. Nikola CIPRICH
LinuxBox.cz, s.r.o.
28. rijna 168, 709 01 Ostrava

tel.: +420 596 603 142
fax: +420 596 621 273
mobil: +420 777 093 799
http://www.linuxbox.cz

mobil servis: +420 737 238 656
email servis: [email protected]
-------------------------------------


Attachments:
(No filename) (882.00 B)
(No filename) (198.00 B)
Download all attachments

2011-02-25 10:49:42

by Nikola Ciprich

[permalink] [raw]
Subject: Re: regression - 2.6.36 -> 2.6.37 - kvm - 32bit SMP guests don't boot

(CC: Zachary)

Hello,
Zachary, in case You haven't noticed the thread, we're trying
to find out the reason why 32bit SMP guests stopped working
in 2.6.37.
bisect shows this as the culprit:

e48672fa25e879f7ae21785c7efd187738139593 is first bad commit
commit e48672fa25e879f7ae21785c7efd187738139593
Author: Zachary Amsden <[email protected]>
Date: Thu Aug 19 22:07:23 2010 -1000

KVM: x86: Unify TSC logic

Move the TSC control logic from the vendor backends into x86.c
by adding adjust_tsc_offset to x86 ops. Now all TSC decisions
can be done in one place.

Signed-off-by: Zachary Amsden <[email protected]>
Signed-off-by: Marcelo Tosatti <[email protected]>

Unfortunately I couldn't try 2.6.37 with just this one reverted, certainly
other patches rely on it, but hopefully I've not screwed something while bisecting...

so what now?
n.

--
-------------------------------------
Ing. Nikola CIPRICH
LinuxBox.cz, s.r.o.
28. rijna 168, 709 01 Ostrava

tel.: +420 596 603 142
fax: +420 596 621 273
mobil: +420 777 093 799
http://www.linuxbox.cz

mobil servis: +420 737 238 656
email servis: [email protected]
-------------------------------------

2011-02-25 14:45:18

by Zachary Amsden

[permalink] [raw]
Subject: Re: regression - 2.6.36 -> 2.6.37 - kvm - 32bit SMP guests don't boot

On 02/25/2011 05:48 AM, Nikola Ciprich wrote:
> (CC: Zachary)
>
> Hello,
> Zachary, in case You haven't noticed the thread, we're trying
> to find out the reason why 32bit SMP guests stopped working
> in 2.6.37.
> bisect shows this as the culprit:
>

I was not aware of the thread. Please cc me directly, or add a keyword
I track - timekeeping, TSC..

> e48672fa25e879f7ae21785c7efd187738139593 is first bad commit
> commit e48672fa25e879f7ae21785c7efd187738139593
> Author: Zachary Amsden<[email protected]>
> Date: Thu Aug 19 22:07:23 2010 -1000
>
> KVM: x86: Unify TSC logic
>
> Move the TSC control logic from the vendor backends into x86.c
> by adding adjust_tsc_offset to x86 ops. Now all TSC decisions
> can be done in one place.
>
> Signed-off-by: Zachary Amsden<[email protected]>
> Signed-off-by: Marcelo Tosatti<[email protected]>
>

That change alone may not bisect well; without further fixes on top of
it, you may end up with a hang or stall, which is likely to manifest in
a vendor-specific way.

Basically there were a few differences in the platform code about how
TSC was dealt with on systems which did not have stable clocks, this
brought the logic into one location, but there was a slight change to
the logic here.

Note very carefully, the logic on SVM is gated by a condition before
this change:

if (unlikely(cpu != vcpu->cpu)) {
- u64 delta;
-
- if (check_tsc_unstable()) {
- /*
- * Make sure that the guest sees a monotonically
- * increasing TSC.
- */
- delta = vcpu->arch.host_tsc - native_read_tsc();
- svm->vmcb->control.tsc_offset += delta;
- if (is_nested(svm))
- svm->nested.hsave->control.tsc_offset +=
delta;
- }
- vcpu->cpu = cpu;
- kvm_migrate_timers(vcpu);


So this only happens with a system which reports TSC as unstable. After
the change, KVM itself may report the TSC as unstable:

+ if (unlikely(vcpu->cpu != cpu)) {
+ /* Make sure TSC doesn't go backwards */
+ s64 tsc_delta = !vcpu->arch.last_host_tsc ? 0 :
+ native_read_tsc() -
vcpu->arch.last_host_tsc;
+ if (tsc_delta < 0)
+ mark_tsc_unstable("KVM discovered backwards TSC");
+ if (check_tsc_unstable())
+ kvm_x86_ops->adjust_tsc_offset(vcpu, -tsc_delta);
+ kvm_migrate_timers(vcpu);
+ vcpu->cpu = cpu;
+ }

If the platform has very small TSC deltas across CPUs, but indicates the
TSC is stable, this could result in KVM marking the TSC unstable. If
that is the case, this compensation logic will kick in to avoid
backwards TSCs.

Note however, that the logic is not perfect; time which passes while not
running on any CPU will be erased, as the delta compensation removes not
just backwards, but any elapsed time from the TSC. In extreme cases,
this could result in time appearing to stand still.... with guests
failing to boot.

This was addressed with a later change, which catches up the missing time:

commit c285545f813d7b0ce989fd34e42ad1fe785dc65d
Author: Zachary Amsden <[email protected]>
Date: Sat Sep 18 14:38:15 2010 -1000

KVM: x86: TSC catchup mode

Negate the effects of AN TYM spell while kvm thread is preempted by
tracking
conversion factor to the highest TSC rate and catching the TSC up
when it has
fallen behind the kernel view of time. Note that once triggered,
we don't
turn off catchup mode.

A slightly more clever version of this is possible, which only does
catchup
when TSC rate drops, and which specifically targets only CPUs with
broken
TSC, but since these all are considered unstable_tsc(), this patch
covers
all necessary cases.

Signed-off-by: Zachary Amsden <[email protected]>
Signed-off-by: Marcelo Tosatti <[email protected]>

2011-02-27 17:21:09

by Nikola Ciprich

[permalink] [raw]
Subject: Re: regression - 2.6.36 -> 2.6.37 - kvm - 32bit SMP guests don't boot

> I was not aware of the thread. Please cc me directly, or add a keyword
> I track - timekeeping, TSC..
Hello Zachary, thanks for Your time looking at this!
> That change alone may not bisect well; without further fixes on top of
> it, you may end up with a hang or stall, which is likely to manifest in
> a vendor-specific way.
I'm not sure I really understand You here, but this change is exactly to
what I got while bisecting. With later revisions, including this one,
32bit SMP guests don't boot, before it, they do..
>
> Basically there were a few differences in the platform code about how
> TSC was dealt with on systems which did not have stable clocks, this
> brought the logic into one location, but there was a slight change to
> the logic here.
>
> Note very carefully, the logic on SVM is gated by a condition before
> this change:
>
> if (unlikely(cpu != vcpu->cpu)) {
> - u64 delta;
> -
> - if (check_tsc_unstable()) {
> - /*
> - * Make sure that the guest sees a monotonically
> - * increasing TSC.
> - */
> - delta = vcpu->arch.host_tsc - native_read_tsc();
> - svm->vmcb->control.tsc_offset += delta;
> - if (is_nested(svm))
> - svm->nested.hsave->control.tsc_offset +=
> delta;
> - }
> - vcpu->cpu = cpu;
> - kvm_migrate_timers(vcpu);
>
>
> So this only happens with a system which reports TSC as unstable. After
> the change, KVM itself may report the TSC as unstable:
>
> + if (unlikely(vcpu->cpu != cpu)) {
> + /* Make sure TSC doesn't go backwards */
> + s64 tsc_delta = !vcpu->arch.last_host_tsc ? 0 :
> + native_read_tsc() -
> vcpu->arch.last_host_tsc;
> + if (tsc_delta < 0)
> + mark_tsc_unstable("KVM discovered backwards TSC");
> + if (check_tsc_unstable())
> + kvm_x86_ops->adjust_tsc_offset(vcpu, -tsc_delta);
> + kvm_migrate_timers(vcpu);
> + vcpu->cpu = cpu;
> + }
>
> If the platform has very small TSC deltas across CPUs, but indicates the
> TSC is stable, this could result in KVM marking the TSC unstable. If
> that is the case, this compensation logic will kick in to avoid
> backwards TSCs.
>
> Note however, that the logic is not perfect; time which passes while not
> running on any CPU will be erased, as the delta compensation removes not
> just backwards, but any elapsed time from the TSC. In extreme cases,
> this could result in time appearing to stand still.... with guests
> failing to boot.
>
> This was addressed with a later change, which catches up the missing time:
>
> commit c285545f813d7b0ce989fd34e42ad1fe785dc65d
yes, but this change is already included in 2.6.37, so maybe some other fix is needed?
if You have some idea what could be changed, I'll gladly test whatever You recommend,
but I'm afraid that's all I can do, since this is a bit of a rocket science for me, sorry :(
nik




> Author: Zachary Amsden <[email protected]>
> Date: Sat Sep 18 14:38:15 2010 -1000
>
> KVM: x86: TSC catchup mode
>
> Negate the effects of AN TYM spell while kvm thread is preempted by
> tracking
> conversion factor to the highest TSC rate and catching the TSC up
> when it has
> fallen behind the kernel view of time. Note that once triggered, we
> don't
> turn off catchup mode.
>
> A slightly more clever version of this is possible, which only does
> catchup
> when TSC rate drops, and which specifically targets only CPUs with
> broken
> TSC, but since these all are considered unstable_tsc(), this patch
> covers
> all necessary cases.
>
> Signed-off-by: Zachary Amsden <[email protected]>
> Signed-off-by: Marcelo Tosatti <[email protected]>
>
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>

--
-------------------------------------
Ing. Nikola CIPRICH
LinuxBox.cz, s.r.o.
28. rijna 168, 709 01 Ostrava

tel.: +420 596 603 142
fax: +420 596 621 273
mobil: +420 777 093 799

http://www.linuxbox.cz

mobil servis: +420 737 238 656
email servis: [email protected]
-------------------------------------

2011-02-28 13:51:31

by Zachary Amsden

[permalink] [raw]
Subject: Re: regression - 2.6.36 -> 2.6.37 - kvm - 32bit SMP guests don't boot

On 02/27/2011 12:20 PM, Nikola Ciprich wrote:
>> I was not aware of the thread. Please cc me directly, or add a keyword
>> I track - timekeeping, TSC..
>>
> Hello Zachary, thanks for Your time looking at this!
>
>> That change alone may not bisect well; without further fixes on top of
>> it, you may end up with a hang or stall, which is likely to manifest in
>> a vendor-specific way.
>>
> I'm not sure I really understand You here, but this change is exactly to
> what I got while bisecting. With later revisions, including this one,
> 32bit SMP guests don't boot, before it, they do..
>

Does the bug you are hitting manifest on both Intel and AMD platforms?

Further, do the systems you are hitting this on have stable or unstable
TSCs?

Thanks,

Zach

2011-02-28 14:34:30

by Nikola Ciprich

[permalink] [raw]
Subject: Re: regression - 2.6.36 -> 2.6.37 - kvm - 32bit SMP guests don't boot

> Does the bug you are hitting manifest on both Intel and AMD platforms?
I don't have any AMD box here, I'll try this out at my home box.

>
> Further, do the systems you are hitting this on have stable or unstable
> TSCs?
how do I find this out? I don't see any warning about TSC in guest, but I've
just started it..
n.



>
> Thanks,
>
> Zach
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>

--
-------------------------------------
Ing. Nikola CIPRICH
LinuxBox.cz, s.r.o.
28. rijna 168, 709 01 Ostrava

tel.: +420 596 603 142
fax: +420 596 621 273
mobil: +420 777 093 799
http://www.linuxbox.cz

mobil servis: +420 737 238 656
email servis: [email protected]
-------------------------------------

2011-02-28 15:17:32

by Zachary Amsden

[permalink] [raw]
Subject: Re: regression - 2.6.36 -> 2.6.37 - kvm - 32bit SMP guests don't boot

On 02/28/2011 09:32 AM, Nikola Ciprich wrote:
>> Does the bug you are hitting manifest on both Intel and AMD platforms?
>>
> I don't have any AMD box here, I'll try this out at my home box.
>
>
>> Further, do the systems you are hitting this on have stable or unstable
>> TSCs?
>>
> how do I find this out? I don't see any warning about TSC in guest, but I've
> just started it..
> n.

Before worrying about the guest, is the host TSC stable? What is the
host clocksource?

2011-02-28 15:30:06

by Nikola Ciprich

[permalink] [raw]
Subject: Re: regression - 2.6.36 -> 2.6.37 - kvm - 32bit SMP guests don't boot

On Mon, Feb 28, 2011 at 10:17:24AM -0500, Zachary Amsden wrote:
> On 02/28/2011 09:32 AM, Nikola Ciprich wrote:
>>> Does the bug you are hitting manifest on both Intel and AMD platforms?
>>>
>> I don't have any AMD box here, I'll try this out at my home box.
>>
>>
>>> Further, do the systems you are hitting this on have stable or unstable
>>> TSCs?
>>>
>> how do I find this out? I don't see any warning about TSC in guest, but I've
>> just started it..
>> n.
>
> Before worrying about the guest, is the host TSC stable? What is the
> host clocksource?
not sure, I'm not setting anything specifically, is this snippet of dmesg relevant:

[ 1.148829] HPET: 8 timers in total, 5 timers will be used for per-cpu timer
[ 1.148934] hpet0: at MMIO 0xfed00000, IRQs 2, 8, 40, 41, 42, 43, 44, 0
[ 1.149331] hpet0: 8 comparators, 64-bit 14.318180 MHz counter
[ 1.151831] hpet: hpet2 irq 40 for MSI
[ 1.151962] hpet: hpet3 irq 41 for MSI
[ 1.155930] hpet: hpet4 irq 42 for MSI
[ 1.159937] hpet: hpet5 irq 43 for MSI
[ 1.163943] hpet: hpet6 irq 44 for MSI
[ 1.175955] Switching to clocksource tsc

so I guess I'm using hpet?
n.


> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>

--
-------------------------------------
Ing. Nikola CIPRICH
LinuxBox.cz, s.r.o.
28. rijna 168, 709 01 Ostrava

tel.: +420 596 603 142
fax: +420 596 621 273
mobil: +420 777 093 799
http://www.linuxbox.cz

mobil servis: +420 737 238 656
email servis: [email protected]
-------------------------------------

2011-02-28 15:57:05

by Zachary Amsden

[permalink] [raw]
Subject: Re: regression - 2.6.36 -> 2.6.37 - kvm - 32bit SMP guests don't boot

>
> On Mon, Feb 28, 2011 at 10:17:24AM -0500, Zachary Amsden wrote:
>
>> On 02/28/2011 09:32 AM, Nikola Ciprich wrote:
>>
>>>> Does the bug you are hitting manifest on both Intel and AMD platforms?
>>>>
>>>>
>>> I don't have any AMD box here, I'll try this out at my home box.
>>>
>>>
>>>
>>>> Further, do the systems you are hitting this on have stable or unstable
>>>> TSCs?
>>>>
>>>>
>>> how do I find this out? I don't see any warning about TSC in guest, but I've
>>> just started it..
>>> n.
>>>
>> Before worrying about the guest, is the host TSC stable? What is the
>> host clocksource?
>>
> not sure, I'm not setting anything specifically, is this snippet of dmesg relevant:
>
> [ 1.148829] HPET: 8 timers in total, 5 timers will be used for per-cpu timer
> [ 1.148934] hpet0: at MMIO 0xfed00000, IRQs 2, 8, 40, 41, 42, 43, 44, 0
> [ 1.149331] hpet0: 8 comparators, 64-bit 14.318180 MHz counter
> [ 1.151831] hpet: hpet2 irq 40 for MSI
> [ 1.151962] hpet: hpet3 irq 41 for MSI
> [ 1.155930] hpet: hpet4 irq 42 for MSI
> [ 1.159937] hpet: hpet5 irq 43 for MSI
> [ 1.163943] hpet: hpet6 irq 44 for MSI
> [ 1.175955] Switching to clocksource tsc
>
> so I guess I'm using hpet?
> n.
>
>
>
Looks like you are using tsc based on the last line. Can you tell us
please

cat /proc/cpuinfo
cat /sys/devices/system/clocksource/clocksource0/current_clocksource

and grep -i dmesg for these keywords: TSC, clock, hpet, stable, khz, kvm

2011-02-28 17:16:07

by Nikola Ciprich

[permalink] [raw]
Subject: Re: regression - 2.6.36 -> 2.6.37 - kvm - 32bit SMP guests don't boot

(resend, sorry for the mess)
> cat /proc/cpuinfo
processor : 0
vendor_id : GenuineIntel
cpu family : 6
model : 30
model name : Intel(R) Xeon(R) CPU X3440 @ 2.53GHz
stepping : 5
cpu MHz : 2533.185
cache size : 8192 KB
physical id : 0
siblings : 8
core id : 0
cpu cores : 4
apicid : 0
initial apicid : 0
fpu : yes
fpu_exception : yes
cpuid level : 11
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca c=
mov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx rd=
tscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_=
tsc aperfmperf pni dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pd=
cm sse4_1 sse4_2 popcnt lahf_lm ida dts tpr_shadow vnmi flexpriority ept vp=
id
bogomips : 5066.37
clflush size : 64
cache_alignment : 64
address sizes : 36 bits physical, 48 bits virtual
power management:
.
.
.
.
processor : 7
vendor_id : GenuineIntel
cpu family : 6
model : 30
model name : Intel(R) Xeon(R) CPU X3440 @ 2.53GHz
stepping : 5
cpu MHz : 2533.185
cache size : 8192 KB
physical id : 0
siblings : 8
core id : 3
cpu cores : 4
apicid : 7
initial apicid : 7
fpu : yes
fpu_exception : yes
cpuid level : 11
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca c=
mov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx rd=
tscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_=
tsc aperfmperf pni dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pd=
cm sse4_1 sse4_2 popcnt lahf_lm ida dts tpr_shadow vnmi flexpriority ept vp=
id
bogomips : 5066.35
clflush size : 64
cache_alignment : 64
address sizes : 36 bits physical, 48 bits virtual
power management:


> cat /sys/devices/system/clocksource/clocksource0/current_clocksource
[root@vbox5 ~]# cat /sys/devices/system/clocksource/clocksource0/current_cl=
ocksource
tsc


>
> and grep -i dmesg for these keywords: TSC, clock, hpet, stable, khz, kvm
[root@vbox5 ~]# dmesg | grep -i "tsc\|clock\|hpet\|stable\|stable\|khz\|kvm"
[ 0.000000] ACPI: HPET 00000000bf7aa5f0 00038 (v01 052710 OEMHPET 20100=
527 MSFT 00000097)
[ 0.000000] ACPI: HPET id: 0x8086a701 base: 0xfed00000
[ 0.000000] hpet clockevent registered
[ 0.000000] Fast TSC calibration using PIT
[ 1.148829] HPET: 8 timers in total, 5 timers will be used for per-cpu t=
imer
[ 1.148934] hpet0: at MMIO 0xfed00000, IRQs 2, 8, 40, 41, 42, 43, 44, 0
[ 1.149331] hpet0: 8 comparators, 64-bit 14.318180 MHz counter
[ 1.151831] hpet: hpet2 irq 40 for MSI
[ 1.151962] hpet: hpet3 irq 41 for MSI
[ 1.155930] hpet: hpet4 irq 42 for MSI
[ 1.159937] hpet: hpet5 irq 43 for MSI
[ 1.163943] hpet: hpet6 irq 44 for MSI
[ 1.175955] Switching to clocksource tsc
[ 1.260015] CE: hpet3 increased min_delta_ns to 7500 nsec
[ 1.260117] CE: hpet3 increased min_delta_ns to 11250 nsec
[ 1.294150] Real Time Clock Driver v1.12b
[ 7.564355] CE: hpet4 increased min_delta_ns to 7500 nsec
[ 7.564367] CE: hpet4 increased min_delta_ns to 11250 nsec
[ 299.307242] CE: hpet2 increased min_delta_ns to 7500 nsec
[ 299.307251] CE: hpet2 increased min_delta_ns to 11250 nsec
[ 1414.616685] CE: hpet5 increased min_delta_ns to 7500 nsec
[ 1414.616694] CE: hpet5 increased min_delta_ns to 11250 nsec
[ 5241.474310] CE: hpet6 increased min_delta_ns to 7500 nsec
[ 5241.474321] CE: hpet6 increased min_delta_ns to 11250 nsec


--
-------------------------------------
Ing. Nikola CIPRICH
LinuxBox.cz, s.r.o.
28. rijna 168, 709 01 Ostrava

tel.: +420 596 603 142
fax: +420 596 621 273
mobil: +420 777 093 799

http://www.linuxbox.cz

mobil servis: +420 737 238 656
email servis: [email protected]
-------------------------------------

2011-02-28 18:04:18

by Nikola Ciprich

[permalink] [raw]
Subject: Re: regression - 2.6.36 -> 2.6.37 - kvm - 32bit SMP guests don't boot

> cat /proc/cpuinfo
processor : 0
vendor_id : GenuineIntel
cpu family : 6
model : 30
model name : Intel(R) Xeon(R) CPU X3440 @ 2.53GHz
stepping : 5
cpu MHz : 2533.185
cache size : 8192 KB
physical id : 0
siblings : 8
core id : 0
cpu cores : 4
apicid : 0
initial apicid : 0
fpu : yes
fpu_exception : yes
cpuid level : 11
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf pni dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm sse4_1 sse4_2 popcnt lahf_lm ida dts tpr_shadow vnmi flexpriority ept vpid
bogomips : 5066.37
clflush size : 64
cache_alignment : 64
address sizes : 36 bits physical, 48 bits virtual
power management:
.
.
.
.
.
processor : 7
vendor_id : GenuineIntel
cpu family : 6
model : 30
model name : Intel(R) Xeon(R) CPU X3440 @ 2.53GHz
stepping : 5
cpu MHz : 2533.185
cache size : 8192 KB
physical id : 0
siblings : 8
core id : 3
cpu cores : 4
apicid : 7
initial apicid : 7
fpu : yes
fpu_exception : yes
cpuid level : 11
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf pni dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm sse4_1 sse4_2 popcnt lahf_lm ida dts tpr_shadow vnmi flexpriority ept vpid
bogomips : 5066.35
clflush size : 64
cache_alignment : 64
address sizes : 36 bits physical, 48 bits virtual
power management:


> cat /sys/devices/system/clocksource/clocksource0/current_clocksource
[root@vbox5 ~]# cat /sys/devices/system/clocksource/clocksource0/current_clocksource
tsc


>
> and grep -i dmesg for these keywords: TSC, clock, hpet, stable, khz, kvm
[root@vbox5 ~]# dmesg | grep -i "tsc\|clock\|hpet\|stable\|stable\|khz\|kvm"
[ 0.000000] ACPI: HPET 00000000bf7aa5f0 00038 (v01 052710 OEMHPET 20100527 MSFT 00000097)
[ 0.000000] ACPI: HPET id: 0x8086a701 base: 0xfed00000
[ 0.000000] hpet clockevent registered
[ 0.000000] Fast TSC calibration using PIT
[ 1.148829] HPET: 8 timers in total, 5 timers will be used for per-cpu timer
[ 1.148934] hpet0: at MMIO 0xfed00000, IRQs 2, 8, 40, 41, 42, 43, 44, 0
[ 1.149331] hpet0: 8 comparators, 64-bit 14.318180 MHz counter
[ 1.151831] hpet: hpet2 irq 40 for MSI
[ 1.151962] hpet: hpet3 irq 41 for MSI
[ 1.155930] hpet: hpet4 irq 42 for MSI
[ 1.159937] hpet: hpet5 irq 43 for MSI
[ 1.163943] hpet: hpet6 irq 44 for MSI
[ 1.175955] Switching to clocksource tsc
[ 1.260015] CE: hpet3 increased min_delta_ns to 7500 nsec
[ 1.260117] CE: hpet3 increased min_delta_ns to 11250 nsec
[ 1.294150] Real Time Clock Driver v1.12b
[ 7.564355] CE: hpet4 increased min_delta_ns to 7500 nsec
[ 7.564367] CE: hpet4 increased min_delta_ns to 11250 nsec
[ 299.307242] CE: hpet2 increased min_delta_ns to 7500 nsec
[ 299.307251] CE: hpet2 increased min_delta_ns to 11250 nsec
[ 1414.616685] CE: hpet5 increased min_delta_ns to 7500 nsec
[ 1414.616694] CE: hpet5 increased min_delta_ns to 11250 nsec
[ 5241.474310] CE: hpet6 increased min_delta_ns to 7500 nsec
[ 5241.474321] CE: hpet6 increased min_delta_ns to 11250 nsec


--
-------------------------------------
Ing. Nikola CIPRICH
LinuxBox.cz, s.r.o.
28. rijna 168, 709 01 Ostrava

tel.: +420 596 603 142
fax: +420 596 621 273
mobil: +420 777 093 799

http://www.linuxbox.cz

mobil servis: +420 737 238 656
email servis: [email protected]
-------------------------------------


Attachments:
(No filename) (3.70 kB)
(No filename) (198.00 B)
Download all attachments

2011-03-03 01:56:10

by Zachary Amsden

[permalink] [raw]
Subject: Re: regression - 2.6.36 -> 2.6.37 - kvm - 32bit SMP guests don't boot

>
> (resend, sorry for the mess)
>

No worries. What mess?

I have two things you can try:

first is running a single VCPU guest, if you have not done so already.

Second is adding the bootparameter "clocksource=acpi_pm" to your guest
kernel.

If either of those fixes the problem, it very well have to do with this
change and not that you may be missing later dependent patches. This
change should be nearly a 1-1 transformation, and if it is not,
something is wrong.

What branch are you bisecting on, the kvm branch or the kernel tree
itself? It would be helpful to see the exact code in case any
surrouding logic changed.

Thanks,

Zach

2011-03-03 07:08:57

by Nikola Ciprich

[permalink] [raw]
Subject: Re: regression - 2.6.36 -> 2.6.37 - kvm - 32bit SMP guests don't boot

> No worries. What mess?
twice sending the same mail, nevermind :)

>
> I have two things you can try:
>
> first is running a single VCPU guest, if you have not done so already.
yup, UP guest is fine, just SMP doesn't work.

> Second is adding the bootparameter "clocksource=acpi_pm" to your guest
> kernel.
yes, this makes SMP work too! I just realized when You were asking about current
clocksource, I told You only host source, not the guest. So I checked now,
and (at least for UP, I guess for SMP it's the same), the clocksource is
kvm-clock! So seems like it got broken with the TSC changes?


>
> If either of those fixes the problem, it very well have to do with this
> change and not that you may be missing later dependent patches. This
> change should be nearly a 1-1 transformation, and if it is not,
> something is wrong.
>
> What branch are you bisecting on, the kvm branch or the kernel tree
> itself? It would be helpful to see the exact code in case any
> surrouding logic changed.
I was bisecting linus' linux-2.6.git main branch, between 2.6.36..2.6.37

>
> Thanks,
>
> Zach
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>

--
-------------------------------------
Ing. Nikola CIPRICH
LinuxBox.cz, s.r.o.
28. rijna 168, 709 01 Ostrava

tel.: +420 596 603 142
fax: +420 596 621 273
mobil: +420 777 093 799
http://www.linuxbox.cz

mobil servis: +420 737 238 656
email servis: [email protected]
-------------------------------------

2011-03-03 20:47:41

by Zachary Amsden

[permalink] [raw]
Subject: Re: regression - 2.6.36 -> 2.6.37 - kvm - 32bit SMP guests don't boot

On 03/03/2011 02:06 AM, Nikola Ciprich wrote:
>> No worries. What mess?
>>
> twice sending the same mail, nevermind :)
>
>
>> I have two things you can try:
>>
>> first is running a single VCPU guest, if you have not done so already.
>>
> yup, UP guest is fine, just SMP doesn't work.
>
>
>> Second is adding the bootparameter "clocksource=acpi_pm" to your guest
>> kernel.
>>
> yes, this makes SMP work too! I just realized when You were asking about current
> clocksource, I told You only host source, not the guest. So I checked now,
> and (at least for UP, I guess for SMP it's the same), the clocksource is
> kvm-clock! So seems like it got broken with the TSC changes?
>

What is the exact kernel version you are using in the guest.

It appears that some earlier 32-bit versions of kvm-clock enabled
kernels are still missing the required atomic check for backwards-time
protection which would be needed on SMP. This explains why 64-bit is
fine, 32-bit is not.

Why this change triggers that problem still is a slight mystery,
logically it should only affect the system if you have an unstable TSC.

Zach

2011-03-03 21:07:06

by Nikola Ciprich

[permalink] [raw]
Subject: Re: regression - 2.6.36 -> 2.6.37 - kvm - 32bit SMP guests don't boot

> What is the exact kernel version you are using in the guest.
It's latest centos (2.6.18-194.32.1.el5), so I guess there are a lot
of fixes, but it's possible the kvm-clock is broken in it.
I can't influence what kernel is used there (at least not on customer's
guests), but I guess asking for adding clocksource kernel parameter is
not problem.


--
-------------------------------------
Ing. Nikola CIPRICH
LinuxBox.cz, s.r.o.
28. rijna 168, 709 01 Ostrava

tel.: +420 596 603 142
fax: +420 596 621 273
mobil: +420 777 093 799

http://www.linuxbox.cz

mobil servis: +420 737 238 656
email servis: [email protected]
-------------------------------------

2011-03-03 21:58:46

by Zachary Amsden

[permalink] [raw]
Subject: Re: regression - 2.6.36 -> 2.6.37 - kvm - 32bit SMP guests don't boot

On 03/03/2011 04:06 PM, Nikola Ciprich wrote:
>> What is the exact kernel version you are using in the guest.
>>
> It's latest centos (2.6.18-194.32.1.el5), so I guess there are a lot
> of fixes, but it's possible the kvm-clock is broken in it.
> I can't influence what kernel is used there (at least not on customer's
> guests), but I guess asking for adding clocksource kernel parameter is
> not problem.
>
>

That sounds like a kernel which will be vulnerable to broken KVM clock
on 32-bit. There's a kernel side fix that is needed, but why the server
side change triggers the problem needs more investigation.

Zach

2011-03-03 22:02:12

by Nikola Ciprich

[permalink] [raw]
Subject: Re: regression - 2.6.36 -> 2.6.37 - kvm - 32bit SMP guests don't boot

> That sounds like a kernel which will be vulnerable to broken KVM clock
> on 32-bit. There's a kernel side fix that is needed, but why the server
> side change triggers the problem needs more investigation.
OK, it's important for me that I can fix this by kernel parameter,
but if I can help somehow with debugging, please let me know.
thanks for Your time!
nik

>
> Zach
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>

--
-------------------------------------
Ing. Nikola CIPRICH
LinuxBox.cz, s.r.o.
28. rijna 168, 709 01 Ostrava

tel.: +420 596 603 142
fax: +420 596 621 273
mobil: +420 777 093 799

http://www.linuxbox.cz

mobil servis: +420 737 238 656
email servis: [email protected]
-------------------------------------

2011-03-04 15:13:56

by Zachary Amsden

[permalink] [raw]
Subject: Re: regression - 2.6.36 -> 2.6.37 - kvm - 32bit SMP guests don't boot

On 03/03/2011 05:01 PM, Nikola Ciprich wrote:
>> That sounds like a kernel which will be vulnerable to broken KVM clock
>> on 32-bit. There's a kernel side fix that is needed, but why the server
>> side change triggers the problem needs more investigation.
>>
> OK, it's important for me that I can fix this by kernel parameter,
> but if I can help somehow with debugging, please let me know.
> thanks for Your time!
> nik
>

You don't see any messages about TSC being unstable or switching
clocksource after loading the KVM module? And you are not suspending
the host or anything?

Can you try using "processor.max_cstate=1" on the host as a kernel
parameter and see if it makes a difference?

2011-03-04 18:27:50

by Nikola Ciprich

[permalink] [raw]
Subject: Re: regression - 2.6.36 -> 2.6.37 - kvm - 32bit SMP guests don't boot

Hello Zachary,

> You don't see any messages about TSC being unstable or switching
> clocksource after loading the KVM module? And you are not suspending
> the host or anything?
no messages, no suspending, nothing.


> Can you try using "processor.max_cstate=1" on the host as a kernel
> parameter and see if it makes a difference?
I tried it, no change..
n.


--
-------------------------------------
Ing. Nikola CIPRICH
LinuxBox.cz, s.r.o.
28. rijna 168, 709 01 Ostrava

tel.: +420 596 603 142
fax: +420 596 621 273
mobil: +420 777 093 799

http://www.linuxbox.cz

mobil servis: +420 737 238 656
email servis: [email protected]
-------------------------------------

2011-03-04 19:09:36

by Glauber Costa

[permalink] [raw]
Subject: Re: regression - 2.6.36 -> 2.6.37 - kvm - 32bit SMP guests don't boot

On Fri, 2011-03-04 at 19:27 +0100, Nikola Ciprich wrote:
> Hello Zachary,
>
> > You don't see any messages about TSC being unstable or switching
> > clocksource after loading the KVM module? And you are not suspending
> > the host or anything?
> no messages, no suspending, nothing.
>
>
> > Can you try using "processor.max_cstate=1" on the host as a kernel
> > parameter and see if it makes a difference?
> I tried it, no change..
> n.

Zach,

I don't understand 100 % the logic behind all your tsc changes.
But kvm-clock-wise, most of the problems we had in the past were related
to the difference in resolution between the tsc and the host clocksource
(hpet, acpi_pm, etc), which in his case, it is a non-issue.

It does seem to me like some compensation logic kicked in, dismantling
an otherwise good tsc. He does have nonstop_tsc, which means it can't
get any better.

One thing I noticed when reading the culprit patch in bisect, is that in
vcpu_load(), there were previously a call to

kvm_request_guest_time_update(vcpu)

that was removed without a counterpart addition. Any idea about why it
was done?

Nikola, does adding that line back alleviate the problem for you ?

2011-03-04 20:55:28

by Nikola Ciprich

[permalink] [raw]
Subject: Re: regression - 2.6.36 -> 2.6.37 - kvm - 32bit SMP guests don't boot

> Zach,
>
> I don't understand 100 % the logic behind all your tsc changes.
> But kvm-clock-wise, most of the problems we had in the past were related
> to the difference in resolution between the tsc and the host clocksource
> (hpet, acpi_pm, etc), which in his case, it is a non-issue.
>
> It does seem to me like some compensation logic kicked in, dismantling
> an otherwise good tsc. He does have nonstop_tsc, which means it can't
> get any better.
>
> One thing I noticed when reading the culprit patch in bisect, is that in
> vcpu_load(), there were previously a call to
>
> kvm_request_guest_time_update(vcpu)
>
> that was removed without a counterpart addition. Any idea about why it
> was done?
>
> Nikola, does adding that line back alleviate the problem for you ?
Hello Glauber,
kvm_request_guest_time_update seems to have been renamed and then
removed since then, but I've added
kvm_make_request(KVM_REQ_CLOCK_UPDATE, vcpu);
instead and now the guest boots!
So maybe missing clock update is really the culprit here?
What do You guys think?
n.



>
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>

--
-------------------------------------
Ing. Nikola CIPRICH
LinuxBox.cz, s.r.o.
28. rijna 168, 709 01 Ostrava

tel.: +420 596 603 142
fax: +420 596 621 273
mobil: +420 777 093 799

http://www.linuxbox.cz

mobil servis: +420 737 238 656
email servis: [email protected]
-------------------------------------

2011-03-04 21:41:59

by Glauber Costa

[permalink] [raw]
Subject: Re: regression - 2.6.36 -> 2.6.37 - kvm - 32bit SMP guests don't boot

On Fri, 2011-03-04 at 21:55 +0100, Nikola Ciprich wrote:
> > Zach,
> >
> > I don't understand 100 % the logic behind all your tsc changes.
> > But kvm-clock-wise, most of the problems we had in the past were related
> > to the difference in resolution between the tsc and the host clocksource
> > (hpet, acpi_pm, etc), which in his case, it is a non-issue.
> >
> > It does seem to me like some compensation logic kicked in, dismantling
> > an otherwise good tsc. He does have nonstop_tsc, which means it can't
> > get any better.
> >
> > One thing I noticed when reading the culprit patch in bisect, is that in
> > vcpu_load(), there were previously a call to
> >
> > kvm_request_guest_time_update(vcpu)
> >
> > that was removed without a counterpart addition. Any idea about why it
> > was done?
> >
> > Nikola, does adding that line back alleviate the problem for you ?
> Hello Glauber,
> kvm_request_guest_time_update seems to have been renamed and then
> removed since then, but I've added
> kvm_make_request(KVM_REQ_CLOCK_UPDATE, vcpu);
> instead and now the guest boots!
> So maybe missing clock update is really the culprit here?
> What do You guys think?
> n.

I think although the long term plan is to just do this update once in
your case (stable tsc), this update is needed.

Why don't you send a patch to re-include it ?

2011-03-04 22:36:56

by Nikola Ciprich

[permalink] [raw]
Subject: Re: regression - 2.6.36 -> 2.6.37 - kvm - 32bit SMP guests don't boot

>
> I think although the long term plan is to just do this update once in
> your case (stable tsc), this update is needed.
>
> Why don't you send a patch to re-include it ?
>
Yes, I'll gladly submit patch, one question, is this OK
to just add calling kvm_make_request(KVM_REQ_CLOCK_UPDATE, vcpu) before
the conditional (as I did in my test), or should it go somewhere to else {..}
section? it's called inside the conditional again, which will cause it
to be called twice in some cases, is it OK?
n.

--
-------------------------------------
Ing. Nikola CIPRICH
LinuxBox.cz, s.r.o.
28. rijna 168, 709 01 Ostrava

tel.: +420 596 603 142
fax: +420 596 621 273
mobil: +420 777 093 799

http://www.linuxbox.cz

mobil servis: +420 737 238 656
email servis: [email protected]
-------------------------------------

2011-03-04 22:57:13

by Zachary Amsden

[permalink] [raw]
Subject: Re: regression - 2.6.36 -> 2.6.37 - kvm - 32bit SMP guests don't boot

On 03/04/2011 02:09 PM, Glauber Costa wrote:
> On Fri, 2011-03-04 at 19:27 +0100, Nikola Ciprich wrote:
>
>> Hello Zachary,
>>
>>
>>> You don't see any messages about TSC being unstable or switching
>>> clocksource after loading the KVM module? And you are not suspending
>>> the host or anything?
>>>
>> no messages, no suspending, nothing.
>>
>>
>>
>>> Can you try using "processor.max_cstate=1" on the host as a kernel
>>> parameter and see if it makes a difference?
>>>
>> I tried it, no change..
>> n.
>>
> Zach,
>
> I don't understand 100 % the logic behind all your tsc changes.
> But kvm-clock-wise, most of the problems we had in the past were related
> to the difference in resolution between the tsc and the host clocksource
> (hpet, acpi_pm, etc), which in his case, it is a non-issue.
>
> It does seem to me like some compensation logic kicked in, dismantling
> an otherwise good tsc. He does have nonstop_tsc, which means it can't
> get any better.
>
> One thing I noticed when reading the culprit patch in bisect, is that in
> vcpu_load(), there were previously a call to
>
> kvm_request_guest_time_update(vcpu)
>
> that was removed without a counterpart addition. Any idea about why it
> was done?
>

That's probably the source of the bug... I've been looking for that
exact line, though, and I can't find it missing.

2011-03-04 22:59:39

by Zachary Amsden

[permalink] [raw]
Subject: Re: regression - 2.6.36 -> 2.6.37 - kvm - 32bit SMP guests don't boot

On 03/04/2011 05:36 PM, Nikola Ciprich wrote:
>> I think although the long term plan is to just do this update once in
>> your case (stable tsc), this update is needed.
>>
>> Why don't you send a patch to re-include it ?
>>
>>
> Yes, I'll gladly submit patch, one question, is this OK
> to just add calling kvm_make_request(KVM_REQ_CLOCK_UPDATE, vcpu) before
> the conditional (as I did in my test), or should it go somewhere to else {..}
> section? it's called inside the conditional again, which will cause it
> to be called twice in some cases, is it OK?
> n.
>

Let me write a patch to fix this..

2011-03-05 01:17:40

by Zachary Amsden

[permalink] [raw]
Subject: Re: regression - 2.6.36 -> 2.6.37 - kvm - 32bit SMP guests don't boot

On 03/04/2011 05:36 PM, Nikola Ciprich wrote:
>> I think although the long term plan is to just do this update once in
>> your case (stable tsc), this update is needed.
>>
>> Why don't you send a patch to re-include it ?
>>
>>
> Yes, I'll gladly submit patch, one question, is this OK
> to just add calling kvm_make_request(KVM_REQ_CLOCK_UPDATE, vcpu) before
> the conditional (as I did in my test), or should it go somewhere to else {..}
> section? it's called inside the conditional again, which will cause it
> to be called twice in some cases, is it OK?
> n.
>
>

Can you try this patch to see if it fixes the problem?

Thanks,

Zach


Attachments:
guest-time-test.patch (424.00 B)

2011-03-05 07:21:32

by Nikola Ciprich

[permalink] [raw]
Subject: Re: regression - 2.6.36 -> 2.6.37 - kvm - 32bit SMP guests don't boot


>
> Can you try this patch to see if it fixes the problem?
You haven't read my replies, did you? ;-)
kvm_request_guest_time_update seems to have been
removed, and kvm_make_request(KVM_REQ_CLOCK_UPDATE, vcpu)
seems to be used instead, adding it fixes the problem.
That's what I was going to use in the patch... :)

>
> Thanks,
>
> Zach

> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 468fafa..ba05303 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -1866,6 +1866,7 @@ void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
> }
>
> kvm_x86_ops->vcpu_load(vcpu, cpu);
> + kvm_request_guest_time_update(vcpu);
> if (unlikely(vcpu->cpu != cpu)) {
> /* Make sure TSC doesn't go backwards */
> s64 tsc_delta = !vcpu->arch.last_host_tsc ? 0 :


--
-------------------------------------
Ing. Nikola CIPRICH
LinuxBox.cz, s.r.o.
28. rijna 168, 709 01 Ostrava

tel.: +420 596 603 142
fax: +420 596 621 273
mobil: +420 777 093 799

http://www.linuxbox.cz

mobil servis: +420 737 238 656
email servis: [email protected]
-------------------------------------

2011-03-06 14:53:44

by Zachary Amsden

[permalink] [raw]
Subject: Re: regression - 2.6.36 -> 2.6.37 - kvm - 32bit SMP guests don't boot

On 03/05/2011 02:21 AM, Nikola Ciprich wrote:
>
>> Can you try this patch to see if it fixes the problem?
>>
> You haven't read my replies, did you? ;-)
> kvm_request_guest_time_update seems to have been
> removed, and kvm_make_request(KVM_REQ_CLOCK_UPDATE, vcpu)
> seems to be used instead, adding it fixes the problem.
> That's what I was going to use in the patch... :)
>

I did read your mail, but I was working on an old tree... because of
that transformation, this fix will unfortunately have to be back and
forward ported by hand.

Did you try just that change right applied on top of the patch
(e48672fa25e879f7ae21785c7efd187738139593) implicated by bisect?

It will be great to know if that change alone fixes the problem, if so,
the fix you propose is probably the right one for upstream.

Thanks,

Zach

2011-03-06 16:03:45

by Nikola Ciprich

[permalink] [raw]
Subject: Re: regression - 2.6.36 -> 2.6.37 - kvm - 32bit SMP guests don't boot

> I did read your mail, but I was working on an old tree... because of
> that transformation, this fix will unfortunately have to be back and
> forward ported by hand.
OK, sorry, I didn't mean to be adverse...
>

> Did you try just that change right applied on top of the patch
> (e48672fa25e879f7ae21785c7efd187738139593) implicated by bisect?
yes, with host running e48672fa25e879f7ae21785c7efd187738139593,
32bit SMP guest doesn't boot, when I add kvm_request_guest_time_update(vcpu),
it helps.

>
> It will be great to know if that change alone fixes the problem, if so,
> the fix you propose is probably the right one for upstream.
ok, so shell I submit patch adding kvm_make_request(KVM_REQ_CLOCK_UPDATE, vcpu)?
this fixes things for me for 2.6.37.

>
> Thanks,
>
> Zach
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>

--
-------------------------------------
Ing. Nikola CIPRICH
LinuxBox.cz, s.r.o.
28. rijna 168, 709 01 Ostrava

tel.: +420 596 603 142
fax: +420 596 621 273
mobil: +420 777 093 799

http://www.linuxbox.cz

mobil servis: +420 737 238 656
email servis: [email protected]
-------------------------------------