2014-11-11 23:20:05

by Dave Hansen

[permalink] [raw]
Subject: BUG() at boot in __phys_addr with DEBUG_VIRTUAL

I'm seeing a BUG() at boot in __phys_addr when it has DEBUG_VIRTUAL enabled:

>> [ 1.193264] ------------[ cut here ]------------
>> [ 1.198502] kernel BUG at /home/davehans/linux.git/arch/x86/mm/physaddr.c:36!
> ...
>> [ 1.368810] Call Trace:
>> [ 1.371590] [<ffffffff8105824c>] __change_page_attr_set_clr+0x42c/0xff0
>> [ 1.379197] [<ffffffff81059e42>] kernel_map_pages_in_pgd+0x72/0x110
>> [ 1.386410] [<ffffffff81fe2be2>] __map_region+0x45/0x63
>> [ 1.392437] [<ffffffff81fe2e13>] efi_map_region+0x32/0xce
>> [ 1.398663] [<ffffffff81fe2936>] efi_enter_virtual_mode+0x18c/0x3a4
>> [ 1.405876] [<ffffffff81fcb0b6>] start_kernel+0x421/0x4a1
>> [ 1.412101] [<ffffffff81fcaa85>] ? set_init_arg+0x55/0x55
>> [ 1.418327] [<ffffffff81fca120>] ? early_idt_handlers+0x120/0x120
>> [ 1.425342] [<ffffffff81fca5f2>] x86_64_start_reservations+0x2a/0x2c
>> [ 1.432652] [<ffffffff81fca746>] x86_64_start_kernel+0x152/0x161
>> [ 1.439565] Code: 0f 94 c2 31 c0 e8 a6 47 83 00 48 c7 c7 41 49 cc 81 31 c0 e8 98 47 83 00 31 d2 be 01 00 00 00 48 c7 c7 a0 49 f2 81 e8 ab 4a 0e 00 <0f> 0b 0f 0b 4c 89 e2 48 c7 c6 b3 e5 a0 81 48 c7 c7 5c 7a ca 81
>> [ 1.461866] RIP [<ffffffff8105c055>] __phys_addr+0x185/0x260
>> [ 1.468400] RSP <ffffffff81e03cf8>
>> [ 1.472396] ---[ end trace b59b0f17341a4bc4 ]---
>> [ 1.477663] Kernel panic - not syncing: Attempted to kill the idle task!
>> [ 1.485270] ---[ end Kernel panic - not syncing: Attempted to kill the idle task!

But I've noticed something odd. kernel_map_pages_in_pgd() takes a pfn:

extern int kernel_map_pages_in_pgd(pgd_t *pgd, u64 pfn, unsigned long
address, unsigned numpages, unsigned long
page_flags);

But the code in arch/x86/platform/efi/efi_64.c seems a bit confused
about that. Two users pass a physical address while a third passes in a
pfn:

> if (kernel_map_pages_in_pgd(pgd, text >> PAGE_SHIFT, text, npages, 0)) {
> if (kernel_map_pages_in_pgd(pgd, md->phys_addr, va, md->num_pages, pf))
> if (kernel_map_pages_in_pgd(pgd, pa_memmap, pa_memmap, num_pages, _PAGE_NX)) {

kernel_map_pages_in_pgd() also sticks that value in to 'struct
cpa_data'->pfn. But, then the "PFN" seems to get used like a physical
address. For instance:

set_pmd(pmd, __pmd(cpa->pfn | _PAGE_PSE | ...

How could this possibly work?


2014-11-11 23:47:59

by Borislav Petkov

[permalink] [raw]
Subject: Re: BUG() at boot in __phys_addr with DEBUG_VIRTUAL

On Tue, Nov 11, 2014 at 03:19:54PM -0800, Dave Hansen wrote:
> I'm seeing a BUG() at boot in __phys_addr when it has DEBUG_VIRTUAL enabled:
>
> >> [ 1.193264] ------------[ cut here ]------------
> >> [ 1.198502] kernel BUG at /home/davehans/linux.git/arch/x86/mm/physaddr.c:36!
> > ...
> >> [ 1.368810] Call Trace:
> >> [ 1.371590] [<ffffffff8105824c>] __change_page_attr_set_clr+0x42c/0xff0
> >> [ 1.379197] [<ffffffff81059e42>] kernel_map_pages_in_pgd+0x72/0x110
> >> [ 1.386410] [<ffffffff81fe2be2>] __map_region+0x45/0x63
> >> [ 1.392437] [<ffffffff81fe2e13>] efi_map_region+0x32/0xce
> >> [ 1.398663] [<ffffffff81fe2936>] efi_enter_virtual_mode+0x18c/0x3a4
> >> [ 1.405876] [<ffffffff81fcb0b6>] start_kernel+0x421/0x4a1
> >> [ 1.412101] [<ffffffff81fcaa85>] ? set_init_arg+0x55/0x55
> >> [ 1.418327] [<ffffffff81fca120>] ? early_idt_handlers+0x120/0x120
> >> [ 1.425342] [<ffffffff81fca5f2>] x86_64_start_reservations+0x2a/0x2c
> >> [ 1.432652] [<ffffffff81fca746>] x86_64_start_kernel+0x152/0x161
> >> [ 1.439565] Code: 0f 94 c2 31 c0 e8 a6 47 83 00 48 c7 c7 41 49 cc 81 31 c0 e8 98 47 83 00 31 d2 be 01 00 00 00 48 c7 c7 a0 49 f2 81 e8 ab 4a 0e 00 <0f> 0b 0f 0b 4c 89 e2 48 c7 c6 b3 e5 a0 81 48 c7 c7 5c 7a ca 81
> >> [ 1.461866] RIP [<ffffffff8105c055>] __phys_addr+0x185/0x260
> >> [ 1.468400] RSP <ffffffff81e03cf8>
> >> [ 1.472396] ---[ end trace b59b0f17341a4bc4 ]---
> >> [ 1.477663] Kernel panic - not syncing: Attempted to kill the idle task!
> >> [ 1.485270] ---[ end Kernel panic - not syncing: Attempted to kill the idle task!
>
> But I've noticed something odd. kernel_map_pages_in_pgd() takes a pfn:
>
> extern int kernel_map_pages_in_pgd(pgd_t *pgd, u64 pfn, unsigned long
> address, unsigned numpages, unsigned long
> page_flags);
>
> But the code in arch/x86/platform/efi/efi_64.c seems a bit confused
> about that. Two users pass a physical address while a third passes in a
> pfn:
>
> > if (kernel_map_pages_in_pgd(pgd, text >> PAGE_SHIFT, text, npages, 0)) {

Does it work if you drop the ">> PAGE_SHIFT" ?

> > if (kernel_map_pages_in_pgd(pgd, md->phys_addr, va, md->num_pages, pf))
> > if (kernel_map_pages_in_pgd(pgd, pa_memmap, pa_memmap, num_pages, _PAGE_NX)) {
>
> kernel_map_pages_in_pgd() also sticks that value in to 'struct
> cpa_data'->pfn. But, then the "PFN" seems to get used like a physical
> address. For instance:

Yeah, I called it pfn because struct cpa_data has a pfn member and at
the time I wanted to reuse it for the physical address.

I guess I should change that by adding a ->paddr instead of misusing
pfn.

--
Regards/Gruss,
Boris.

Sent from a fat crate under my desk. Formatting is fine.
--

2014-11-12 09:25:06

by Matt Fleming

[permalink] [raw]
Subject: Re: BUG() at boot in __phys_addr with DEBUG_VIRTUAL

On Tue, 2014-11-11 at 15:19 -0800, Dave Hansen wrote:
> I'm seeing a BUG() at boot in __phys_addr when it has DEBUG_VIRTUAL enabled:
>
> >> [ 1.193264] ------------[ cut here ]------------
> >> [ 1.198502] kernel BUG at /home/davehans/linux.git/arch/x86/mm/physaddr.c:36!
> > ...
> >> [ 1.368810] Call Trace:
> >> [ 1.371590] [<ffffffff8105824c>] __change_page_attr_set_clr+0x42c/0xff0
> >> [ 1.379197] [<ffffffff81059e42>] kernel_map_pages_in_pgd+0x72/0x110
> >> [ 1.386410] [<ffffffff81fe2be2>] __map_region+0x45/0x63
> >> [ 1.392437] [<ffffffff81fe2e13>] efi_map_region+0x32/0xce
> >> [ 1.398663] [<ffffffff81fe2936>] efi_enter_virtual_mode+0x18c/0x3a4
> >> [ 1.405876] [<ffffffff81fcb0b6>] start_kernel+0x421/0x4a1
> >> [ 1.412101] [<ffffffff81fcaa85>] ? set_init_arg+0x55/0x55
> >> [ 1.418327] [<ffffffff81fca120>] ? early_idt_handlers+0x120/0x120
> >> [ 1.425342] [<ffffffff81fca5f2>] x86_64_start_reservations+0x2a/0x2c
> >> [ 1.432652] [<ffffffff81fca746>] x86_64_start_kernel+0x152/0x161
> >> [ 1.439565] Code: 0f 94 c2 31 c0 e8 a6 47 83 00 48 c7 c7 41 49 cc 81 31 c0 e8 98 47 83 00 31 d2 be 01 00 00 00 48 c7 c7 a0 49 f2 81 e8 ab 4a 0e 00 <0f> 0b 0f 0b 4c 89 e2 48 c7 c6 b3 e5 a0 81 48 c7 c7 5c 7a ca 81
> >> [ 1.461866] RIP [<ffffffff8105c055>] __phys_addr+0x185/0x260
> >> [ 1.468400] RSP <ffffffff81e03cf8>
> >> [ 1.472396] ---[ end trace b59b0f17341a4bc4 ]---

I'm having trouble linking that BUG_ON() with a line in Linus' tree.
What's the failing expression?

2014-11-12 14:58:07

by Dave Hansen

[permalink] [raw]
Subject: Re: BUG() at boot in __phys_addr with DEBUG_VIRTUAL

On 11/12/2014 01:24 AM, Matt Fleming wrote:
> On Tue, 2014-11-11 at 15:19 -0800, Dave Hansen wrote:
>> I'm seeing a BUG() at boot in __phys_addr when it has DEBUG_VIRTUAL enabled:
>>
>>>> [ 1.193264] ------------[ cut here ]------------
>>>> [ 1.198502] kernel BUG at /home/davehans/linux.git/arch/x86/mm/physaddr.c:36!
>>> ...
>>>> [ 1.368810] Call Trace:
>>>> [ 1.371590] [<ffffffff8105824c>] __change_page_attr_set_clr+0x42c/0xff0
>>>> [ 1.379197] [<ffffffff81059e42>] kernel_map_pages_in_pgd+0x72/0x110
>>>> [ 1.386410] [<ffffffff81fe2be2>] __map_region+0x45/0x63
>>>> [ 1.392437] [<ffffffff81fe2e13>] efi_map_region+0x32/0xce
>>>> [ 1.398663] [<ffffffff81fe2936>] efi_enter_virtual_mode+0x18c/0x3a4
>>>> [ 1.405876] [<ffffffff81fcb0b6>] start_kernel+0x421/0x4a1
>>>> [ 1.412101] [<ffffffff81fcaa85>] ? set_init_arg+0x55/0x55
>>>> [ 1.418327] [<ffffffff81fca120>] ? early_idt_handlers+0x120/0x120
>>>> [ 1.425342] [<ffffffff81fca5f2>] x86_64_start_reservations+0x2a/0x2c
>>>> [ 1.432652] [<ffffffff81fca746>] x86_64_start_kernel+0x152/0x161
>>>> [ 1.439565] Code: 0f 94 c2 31 c0 e8 a6 47 83 00 48 c7 c7 41 49 cc 81 31 c0 e8 98 47 83 00 31 d2 be 01 00 00 00 48 c7 c7 a0 49 f2 81 e8 ab 4a 0e 00 <0f> 0b 0f 0b 4c 89 e2 48 c7 c6 b3 e5 a0 81 48 c7 c7 5c 7a ca 81
>>>> [ 1.461866] RIP [<ffffffff8105c055>] __phys_addr+0x185/0x260
>>>> [ 1.468400] RSP <ffffffff81e03cf8>
>>>> [ 1.472396] ---[ end trace b59b0f17341a4bc4 ]---
>
> I'm having trouble linking that BUG_ON() with a line in Linus' tree.
> What's the failing expression?

It's actually tripping over this in __phys_addr:

VIRTUAL_BUG_ON((x > y) || !phys_addr_valid(x));

I dumped out a few of the variables too:

[ 1.161406] __phys_addr() x: 0x000078009d3c6000
[ 1.166567] __phys_addr() origx: 0x000000009d3c6000
[ 1.171832] __phys_addr() y: 0x000000011d3c6000
[ 1.176999] __phys_addr() __START_KERNEL_map: 0xffffffff80000000
[ 1.183841] __phys_addr() PAGE_OFFSET: 0xffff880000000000
[ 1.189993] __phys_addr() x valid: 0

So it looks like the root cause is a physical address getting pass in in
the first place instead of a virtual.

2014-11-12 15:11:15

by Borislav Petkov

[permalink] [raw]
Subject: Re: BUG() at boot in __phys_addr with DEBUG_VIRTUAL

On Wed, Nov 12, 2014 at 06:57:58AM -0800, Dave Hansen wrote:

> It's actually tripping over this in __phys_addr:
>
> VIRTUAL_BUG_ON((x > y) || !phys_addr_valid(x));
>
> I dumped out a few of the variables too:
>
> [ 1.161406] __phys_addr() x: 0x000078009d3c6000
> [ 1.166567] __phys_addr() origx: 0x000000009d3c6000
> [ 1.171832] __phys_addr() y: 0x000000011d3c6000
> [ 1.176999] __phys_addr() __START_KERNEL_map: 0xffffffff80000000
> [ 1.183841] __phys_addr() PAGE_OFFSET: 0xffff880000000000
> [ 1.189993] __phys_addr() x valid: 0
>
> So it looks like the root cause is a physical address getting pass in in
> the first place instead of a virtual.

I'd say that's on purpose as we're mapping kernel text 1:1 in the EFI
page table.

Maybe we want this:

---
diff --git a/arch/x86/platform/efi/efi_64.c b/arch/x86/platform/efi/efi_64.c
index 35aecb6042fb..8262d534080d 100644
--- a/arch/x86/platform/efi/efi_64.c
+++ b/arch/x86/platform/efi/efi_64.c
@@ -182,7 +182,7 @@ int __init efi_setup_page_tables(unsigned long pa_memmap, unsigned num_pages)
efi_scratch.phys_stack += PAGE_SIZE; /* stack grows down */

npages = (_end - _text) >> PAGE_SHIFT;
- text = __pa(_text);
+ text = __pa_nodebug(_text);

if (kernel_map_pages_in_pgd(pgd, text >> PAGE_SHIFT, text, npages, 0)) {
pr_err("Failed to map kernel text 1:1\n");


--
Regards/Gruss,
Boris.

Sent from a fat crate under my desk. Formatting is fine.
--

2014-11-12 15:22:25

by Dave Hansen

[permalink] [raw]
Subject: Re: BUG() at boot in __phys_addr with DEBUG_VIRTUAL

On 11/12/2014 07:11 AM, Borislav Petkov wrote:
>> > [ 1.161406] __phys_addr() x: 0x000078009d3c6000
>> > [ 1.166567] __phys_addr() origx: 0x000000009d3c6000
>> > [ 1.171832] __phys_addr() y: 0x000000011d3c6000
>> > [ 1.176999] __phys_addr() __START_KERNEL_map: 0xffffffff80000000
>> > [ 1.183841] __phys_addr() PAGE_OFFSET: 0xffff880000000000
>> > [ 1.189993] __phys_addr() x valid: 0
>> >
>> > So it looks like the root cause is a physical address getting pass in in
>> > the first place instead of a virtual.
> I'd say that's on purpose as we're mapping kernel text 1:1 in the EFI
> page table.

__pa and friends do their calculations *against* PAGE_OFFSET to do the
virt<->phys translation. If that's not what a caller wants, then
they're calling the wrong function.

The path that we're actually hitting this from is:

> __split_large_page(struct cpa_data *cpa, pte_t *kpte, unsigned long
...
> if (pfn_range_is_mapped(PFN_DOWN(__pa(address)),
> PFN_DOWN(__pa(address)) + 1))
> split_page_count(level);

So perhaps efi_map_region() is handing an address from the EFI identity
map down in here. __pa() gets called on it, but that fails since __pa()
only works on the *KERNEL* identity map.

I think we might actually need to walk the page tables in there. Even
the pfn_range_is_mapped() code looks to be dealing with the kernel
identity map.

2014-11-12 17:25:22

by Borislav Petkov

[permalink] [raw]
Subject: Re: BUG() at boot in __phys_addr with DEBUG_VIRTUAL

On Wed, Nov 12, 2014 at 07:20:53AM -0800, Dave Hansen wrote:
> So perhaps efi_map_region() is handing an address from the EFI identity
> map down in here. __pa() gets called on it, but that fails since __pa()
> only works on the *KERNEL* identity map.

Grrr, yeah, so pageattr.c assumes the kernel pgd in some places still.

> I think we might actually need to walk the page tables in there. Even
> the pfn_range_is_mapped() code looks to be dealing with the kernel
> identity map.

Yep, pageattr.c should be taught to pay attention to cpa->pgd. Btw, we
do have lookup_address_in_pgd() there which should do the walking.

But before we do that: Matt, why do we need the kernel text mapped 1:1
in the EFI page table at all?

You've added this with

4f9dbcfc4029 ("x86/efi: Add mixed runtime services support")

AFAICT.

Thanks.

--
Regards/Gruss,
Boris.

Sent from a fat crate under my desk. Formatting is fine.
--

2014-11-13 10:37:13

by Matt Fleming

[permalink] [raw]
Subject: Re: BUG() at boot in __phys_addr with DEBUG_VIRTUAL

On Wed, 2014-11-12 at 18:25 +0100, Borislav Petkov wrote:
>
> But before we do that: Matt, why do we need the kernel text mapped 1:1
> in the EFI page table at all?
>
> You've added this with
>
> 4f9dbcfc4029 ("x86/efi: Add mixed runtime services support")

So that when we switch into 32-bit mode we can still access the kernel
text via the 1:1 mapping in protected mode.

2014-11-13 13:10:06

by Matt Fleming

[permalink] [raw]
Subject: Re: BUG() at boot in __phys_addr with DEBUG_VIRTUAL

On Wed, 2014-11-12 at 07:20 -0800, Dave Hansen wrote:
>
> The path that we're actually hitting this from is:
>
> > __split_large_page(struct cpa_data *cpa, pte_t *kpte, unsigned long
> ...
> > if (pfn_range_is_mapped(PFN_DOWN(__pa(address)),
> > PFN_DOWN(__pa(address)) + 1))
> > split_page_count(level);

I'm sure I'm just being dumb, but I can't for the life of me work out
why we need to calculate the physical address from the virtual address
at this point - we should already have that information.

> So perhaps efi_map_region() is handing an address from the EFI identity
> map down in here. __pa() gets called on it, but that fails since __pa()
> only works on the *KERNEL* identity map.

Yep, sounds right.

2014-12-09 10:35:24

by Borislav Petkov

[permalink] [raw]
Subject: Re: BUG() at boot in __phys_addr with DEBUG_VIRTUAL

On Thu, Nov 13, 2014 at 10:36:33AM +0000, Matt Fleming wrote:
> So that when we switch into 32-bit mode we can still access the kernel
> text via the 1:1 mapping in protected mode.

Ok, I am able to trigger the same splat with latest EDKII here. Applying
the hunk below fixes it and the guest boots fine.

And this is the thing we could now do short of teaching pageattr.c to
distinguish PGDs. This way we're sending any mapping requests which have
a pgd different than the implicit kernel one into populate_pgd().

What that does is overwrite any mappings there are in the pagetable
prior to that because the userspace part is reserved for UEFI RT 1:1
mappings and since those should be static and not change, we can allow
ourselves the more clumsy method of simply overwriting the page table
with the regions.

The VA mappings region is also reserved for UEFI so not an issue there
right now.

What I'm having problems with is mapping the kernel text 1:1 for
EFI_MIXED in the presence of KASLR and then, as getting "lucky" and
having the kernel addresses overlap with EFI regions. I think if that
happens, then we have a boom but I could very well be missing something.

Matt?

---
diff --git a/arch/x86/mm/pageattr.c b/arch/x86/mm/pageattr.c
index a3a5d46605d2..226ecb319913 100644
--- a/arch/x86/mm/pageattr.c
+++ b/arch/x86/mm/pageattr.c
@@ -1120,7 +1140,7 @@ static int __change_page_attr(struct cpa_data *cpa, int primary)
address = *cpa->vaddr;
repeat:
kpte = _lookup_address_cpa(cpa, address, &level);
- if (!kpte)
+ if (!kpte || cpa->pgd)
return __cpa_process_fault(cpa, address, primary);

old_pte = *kpte;



--
Regards/Gruss,
Boris.

Sent from a fat crate under my desk. Formatting is fine.
--