2014-12-26 01:57:40

by Qu Wenruo

[permalink] [raw]
Subject: [REGRESSION][x86] Commit f5b2831d65 cause boot failure in VMware ESXi 5.1 guest

Hi all,

When testing v3.19-rc1 kernel(in fact, try to test), the kernel itself
fail to boot on VMware ESXi 5.1 guest.
The boot failure is quite easy to describe, only one line is output:
"Probing EDD (edd=off to disable)...ok"

No other output(including warning/bug_on/backtrace or whatever) and the
guest just hangs.
It's OK on v3.18, so it's a regression.

Bisect points to the following commit:
commit f5b2831d654167d77da8afbef4d2584897b12d0c
Author: Juergen Gross <[email protected]>
Date: Mon Nov 3 14:02:02 2014 +0100

x86: Respect PAT bit when copying pte values between large and
normal pages

The PAT bit in the ptes is not moved to the correct position when
copying page protection attributes between entries of different sized
pages. Translate the ptes according to their page size.


I have also created the kernel BZ report:
https://bugzilla.kernel.org/show_bug.cgi?id=90321

Hopes this can be resolved in next rc.

Thanks,
Qu


2014-12-26 03:21:14

by Peter Hurley

[permalink] [raw]
Subject: Re: [REGRESSION][x86] Commit f5b2831d65 cause boot failure in VMware ESXi 5.1 guest

[ +to Thomas Hellstrom ]

On 12/25/2014 08:57 PM, Qu Wenruo wrote:
> Hi all,
>
> When testing v3.19-rc1 kernel(in fact, try to test), the kernel itself fail to boot on VMware ESXi 5.1 guest.

Maybe this problem related to the other VMware PAT issue?

> The boot failure is quite easy to describe, only one line is output:
> "Probing EDD (edd=off to disable)...ok"
>
> No other output(including warning/bug_on/backtrace or whatever) and the guest just hangs.
> It's OK on v3.18, so it's a regression.
>
> Bisect points to the following commit:
> commit f5b2831d654167d77da8afbef4d2584897b12d0c
> Author: Juergen Gross <[email protected]>
> Date: Mon Nov 3 14:02:02 2014 +0100
>
> x86: Respect PAT bit when copying pte values between large and normal pages
>
> The PAT bit in the ptes is not moved to the correct position when
> copying page protection attributes between entries of different sized
> pages. Translate the ptes according to their page size.
>
>
> I have also created the kernel BZ report:
> https://bugzilla.kernel.org/show_bug.cgi?id=90321
>
> Hopes this can be resolved in next rc.
>
> Thanks,
> Qu

2014-12-27 13:51:30

by Juergen Gross

[permalink] [raw]
Subject: Re: [REGRESSION][x86] Commit f5b2831d65 cause boot failure in VMware ESXi 5.1 guest

On 12/26/2014 02:57 AM, Qu Wenruo wrote:
> Hi all,
>
> When testing v3.19-rc1 kernel(in fact, try to test), the kernel itself
> fail to boot on VMware ESXi 5.1 guest.
> The boot failure is quite easy to describe, only one line is output:
> "Probing EDD (edd=off to disable)...ok"
>
> No other output(including warning/bug_on/backtrace or whatever) and the
> guest just hangs.
> It's OK on v3.18, so it's a regression.
>
> Bisect points to the following commit:
> commit f5b2831d654167d77da8afbef4d2584897b12d0c
> Author: Juergen Gross <[email protected]>
> Date: Mon Nov 3 14:02:02 2014 +0100
>
> x86: Respect PAT bit when copying pte values between large and
> normal pages
>
> The PAT bit in the ptes is not moved to the correct position when
> copying page protection attributes between entries of different sized
> pages. Translate the ptes according to their page size.
>
>
> I have also created the kernel BZ report:
> https://bugzilla.kernel.org/show_bug.cgi?id=90321
>
> Hopes this can be resolved in next rc.

As the same issue has been reported with VMWare workstation which was
related to an error in the PAT MSR emulation of VMWare, I guess this
will be the same problem. I've already sent a patch.

You should be able to boot with the "nopat" kernel option.


Juergen

2014-12-29 01:18:49

by Qu Wenruo

[permalink] [raw]
Subject: Re: [REGRESSION][x86] Commit f5b2831d65 cause boot failure in VMware ESXi 5.1 guest


-------- Original Message --------
Subject: Re: [REGRESSION][x86] Commit f5b2831d65 cause boot failure in
VMware ESXi 5.1 guest
From: Juergen Gross <[email protected]>
To: Qu Wenruo <[email protected]>
Date: 2014年12月27日 21:51
> On 12/26/2014 02:57 AM, Qu Wenruo wrote:
>> Hi all,
>>
>> When testing v3.19-rc1 kernel(in fact, try to test), the kernel itself
>> fail to boot on VMware ESXi 5.1 guest.
>> The boot failure is quite easy to describe, only one line is output:
>> "Probing EDD (edd=off to disable)...ok"
>>
>> No other output(including warning/bug_on/backtrace or whatever) and the
>> guest just hangs.
>> It's OK on v3.18, so it's a regression.
>>
>> Bisect points to the following commit:
>> commit f5b2831d654167d77da8afbef4d2584897b12d0c
>> Author: Juergen Gross <[email protected]>
>> Date: Mon Nov 3 14:02:02 2014 +0100
>>
>> x86: Respect PAT bit when copying pte values between large and
>> normal pages
>>
>> The PAT bit in the ptes is not moved to the correct position when
>> copying page protection attributes between entries of different
>> sized
>> pages. Translate the ptes according to their page size.
>>
>>
>> I have also created the kernel BZ report:
>> https://bugzilla.kernel.org/show_bug.cgi?id=90321
>>
>> Hopes this can be resolved in next rc.
>
> As the same issue has been reported with VMWare workstation which was
> related to an error in the PAT MSR emulation of VMWare, I guess this
> will be the same problem. I've already sent a patch.
>
> You should be able to boot with the "nopat" kernel option.
>
>
> Juergen
Thanks for the explanation. Again the closed source blob to blame.

Anyway, the nopat option works.

Great thanks
Qu