Folks, the following patch fixes a bug that is triggered with the new
EFI page table code sitting in tip/efi/core and queued up for the
merge window.
It turns out that we were relying on the kernel's mappings for those
regions that are marked E820_RESERVED, etc in the e820 map. The
specific example in the bug report was page zero which was always
"special" on legacy BIOS.
On EFI, legitimate code/data can be placed there and so if it's part
of the EFI memory map, we must make sure we map it into the EFI page
tables. EFI runtime drivers may need to access it.
The following changes since commit 2ad510dc372c2caac9aada9ff6dd10e787616e1d:
x86/efi: Only map kernel text for EFI mixed mode (2016-02-22 08:26:28 +0100)
are available in the git repository at:
git://git.kernel.org/pub/scm/linux/kernel/git/mfleming/efi.git tags/efi-urgent
for you to fetch changes up to c7af05d993ef60102eea6e0056de7eb5f6d4e52d:
x86/efi: Always map boot service regions into new EFI page tables (2016-03-11 11:08:36 +0000)
----------------------------------------------------------------
* Fix a bug triggered with the new separate EFI page table code where
EFI boot services regions that are marked E820_RESERVED are not
mapped into the page tables, leading to an oops during
SetVirtualAddressMap() - Matt Fleming
----------------------------------------------------------------
Matt Fleming (1):
x86/efi: Always map boot service regions into new EFI page tables
arch/x86/platform/efi/quirks.c | 79 +++++++++++++++++++++++++++++++++---------
1 file changed, 62 insertions(+), 17 deletions(-)
Some machines have EFI regions in page zero (physical address
0x00000000) and historically that region has been added to the e820
map via trim_bios_range(), and ultimately mapped into the kernel page
tables. It was not mapped via efi_map_regions() as one would expect.
Alexis reports that with the new separate EFI page tables some boot
services regions, such as page zero, are not mapped. This triggers an
oops during the SetVirtualAddressMap() runtime call.
For the EFI boot services quirk on x86 we need to memblock_reserve()
boot services regions until after SetVirtualAddressMap(). Doing that
while respecting the ownership of regions that may have already been
reserved by the kernel was the motivation behind commit 7d68dc3f1003
("x86, efi: Do not reserve boot services regions within reserved
areas").
That patch was merged at a time when the EFI runtime virtual mappings
were inserted into the kernel page tables as described above, and the
trick of setting ->numpages (and hence the region size) to zero to
track regions that should not be freed in efi_free_boot_services()
meant that we never mapped those regions in efi_map_regions(). Instead
we were relying solely on the existing kernel mappings.
Now that we have separate page tables we need to make sure the EFI
boot services regions are mapped correctly, even if someone else has
already called memblock_reserve(). Instead of stashing a tag in
->numpages, set the EFI_MEMORY_RUNTIME bit of ->attribute. Since it
generally makes no sense to mark a boot services region as required at
runtime, it's pretty much guaranteed the firmware will not have
already set this bit.
For the record, the specific circumstances under which Alexis
triggered this bug was that an EFI runtime driver on his machine was
responding to the EVT_SIGNAL_VIRTUAL_ADDRESS_CHANGE event during
SetVirtualAddressMap().
The event handler for this driver looks like this,
sub rsp,0x28
lea rdx,[rip+0x2445] # 0xaa948720
mov ecx,0x4
call func_aa9447c0 ; call to ConvertPointer(4, & 0xaa948720)
mov r11,QWORD PTR [rip+0x2434] # 0xaa948720
xor eax,eax
mov BYTE PTR [r11+0x1],0x1
add rsp,0x28
ret
Which is pretty typical code for an EVT_SIGNAL_VIRTUAL_ADDRESS_CHANGE
handler. The "mov r11, QWORD PTR [rip+0x2424]" was the faulting
instruction because ConvertPointer() was being called to convert the
address 0x0000000000000000, which when converted is left unchanged and
remains 0x0000000000000000.
The output of the oops trace gave the impression of a standard NULL
pointer dereference bug, but because we're accessing physical
addresses during ConvertPointer(), it wasn't. EFI boot services code
is stored at that address on Alexis' machine.
Reported-by: Alexis Murzeau <[email protected]>
Link: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=815125
Cc: Maarten Lankhorst <[email protected]>
Cc: Matthew Garrett <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Ben Hutchings <[email protected]>
Cc: Raphael Hertzog <[email protected]>
Cc: Roger Shimizu <[email protected]>
Signed-off-by: Matt Fleming <[email protected]>
---
arch/x86/platform/efi/quirks.c | 79 +++++++++++++++++++++++++++++++++---------
1 file changed, 62 insertions(+), 17 deletions(-)
diff --git a/arch/x86/platform/efi/quirks.c b/arch/x86/platform/efi/quirks.c
index 2326bf51978f..da35f957d4ed 100644
--- a/arch/x86/platform/efi/quirks.c
+++ b/arch/x86/platform/efi/quirks.c
@@ -164,6 +164,27 @@ efi_status_t efi_query_variable_store(u32 attributes, unsigned long size,
EXPORT_SYMBOL_GPL(efi_query_variable_store);
/*
+ * Helper function for efi_reserve_boot_services() to figure out if we
+ * can free regions in efi_free_boot_services().
+ *
+ * Use this function to ensure we do not free regions owned by somebody
+ * else. We must only reserve (and then free) regions:
+ *
+ * - Not within any part of the kernel
+ * - Not the bios reserved area (E820_RESERVED, E820_NVS, etc)
+ */
+static bool can_free_region(u64 start, u64 size)
+{
+ if (start + size > __pa_symbol(_text) && start <= __pa_symbol(_end))
+ return false;
+
+ if (!e820_all_mapped(start, start+size, E820_RAM))
+ return false;
+
+ return true;
+}
+
+/*
* The UEFI specification makes it clear that the operating system is free to do
* whatever it wants with boot services code after ExitBootServices() has been
* called. Ignoring this recommendation a significant bunch of EFI implementations
@@ -180,26 +201,50 @@ void __init efi_reserve_boot_services(void)
efi_memory_desc_t *md = p;
u64 start = md->phys_addr;
u64 size = md->num_pages << EFI_PAGE_SHIFT;
+ bool already_reserved;
if (md->type != EFI_BOOT_SERVICES_CODE &&
md->type != EFI_BOOT_SERVICES_DATA)
continue;
- /* Only reserve where possible:
- * - Not within any already allocated areas
- * - Not over any memory area (really needed, if above?)
- * - Not within any part of the kernel
- * - Not the bios reserved area
- */
- if ((start + size > __pa_symbol(_text)
- && start <= __pa_symbol(_end)) ||
- !e820_all_mapped(start, start+size, E820_RAM) ||
- memblock_is_region_reserved(start, size)) {
- /* Could not reserve, skip it */
- md->num_pages = 0;
- memblock_dbg("Could not reserve boot range [0x%010llx-0x%010llx]\n",
- start, start+size-1);
- } else
+
+ already_reserved = memblock_is_region_reserved(start, size);
+
+ /*
+ * Because the following memblock_reserve() is paired
+ * with free_bootmem_late() for this region in
+ * efi_free_boot_services(), we must be extremely
+ * careful not to reserve, and subsequently free,
+ * critical regions of memory (like the kernel image) or
+ * those regions that somebody else has already
+ * reserved.
+ *
+ * A good example of a critical region that must not be
+ * freed is page zero (first 4Kb of memory), which may
+ * contain boot services code/data but is marked
+ * E820_RESERVED by trim_bios_range().
+ */
+ if (!already_reserved) {
memblock_reserve(start, size);
+
+ /*
+ * If we are the first to reserve the region, no
+ * one else cares about it. We own it and can
+ * free it later.
+ */
+ if (can_free_region(start, size))
+ continue;
+ }
+
+ /*
+ * We don't own the region. We must not free it.
+ *
+ * Setting this bit for a boot services region really
+ * doesn't make sense as far as the firmware is
+ * concerned, but it does provide us with a way to tag
+ * those regions that must not be paired with
+ * free_bootmem_late().
+ */
+ md->attribute |= EFI_MEMORY_RUNTIME;
}
}
@@ -216,8 +261,8 @@ void __init efi_free_boot_services(void)
md->type != EFI_BOOT_SERVICES_DATA)
continue;
- /* Could not reserve boot area */
- if (!size)
+ /* Do not free, someone else owns it */
+ if (md->attribute & EFI_MEMORY_RUNTIME)
continue;
free_bootmem_late(start, size);
--
2.6.2
Commit-ID: 452308de61056a539352a9306c46716d7af8a1f1
Gitweb: http://git.kernel.org/tip/452308de61056a539352a9306c46716d7af8a1f1
Author: Matt Fleming <[email protected]>
AuthorDate: Fri, 11 Mar 2016 11:19:23 +0000
Committer: Ingo Molnar <[email protected]>
CommitDate: Sat, 12 Mar 2016 16:57:45 +0100
x86/efi: Fix boot crash by always mapping boot service regions into new EFI page tables
Some machines have EFI regions in page zero (physical address
0x00000000) and historically that region has been added to the e820
map via trim_bios_range(), and ultimately mapped into the kernel page
tables. It was not mapped via efi_map_regions() as one would expect.
Alexis reports that with the new separate EFI page tables some boot
services regions, such as page zero, are not mapped. This triggers an
oops during the SetVirtualAddressMap() runtime call.
For the EFI boot services quirk on x86 we need to memblock_reserve()
boot services regions until after SetVirtualAddressMap(). Doing that
while respecting the ownership of regions that may have already been
reserved by the kernel was the motivation behind this commit:
7d68dc3f1003 ("x86, efi: Do not reserve boot services regions within reserved areas")
That patch was merged at a time when the EFI runtime virtual mappings
were inserted into the kernel page tables as described above, and the
trick of setting ->numpages (and hence the region size) to zero to
track regions that should not be freed in efi_free_boot_services()
meant that we never mapped those regions in efi_map_regions(). Instead
we were relying solely on the existing kernel mappings.
Now that we have separate page tables we need to make sure the EFI
boot services regions are mapped correctly, even if someone else has
already called memblock_reserve(). Instead of stashing a tag in
->numpages, set the EFI_MEMORY_RUNTIME bit of ->attribute. Since it
generally makes no sense to mark a boot services region as required at
runtime, it's pretty much guaranteed the firmware will not have
already set this bit.
For the record, the specific circumstances under which Alexis
triggered this bug was that an EFI runtime driver on his machine was
responding to the EVT_SIGNAL_VIRTUAL_ADDRESS_CHANGE event during
SetVirtualAddressMap().
The event handler for this driver looks like this,
sub rsp,0x28
lea rdx,[rip+0x2445] # 0xaa948720
mov ecx,0x4
call func_aa9447c0 ; call to ConvertPointer(4, & 0xaa948720)
mov r11,QWORD PTR [rip+0x2434] # 0xaa948720
xor eax,eax
mov BYTE PTR [r11+0x1],0x1
add rsp,0x28
ret
Which is pretty typical code for an EVT_SIGNAL_VIRTUAL_ADDRESS_CHANGE
handler. The "mov r11, QWORD PTR [rip+0x2424]" was the faulting
instruction because ConvertPointer() was being called to convert the
address 0x0000000000000000, which when converted is left unchanged and
remains 0x0000000000000000.
The output of the oops trace gave the impression of a standard NULL
pointer dereference bug, but because we're accessing physical
addresses during ConvertPointer(), it wasn't. EFI boot services code
is stored at that address on Alexis' machine.
Reported-by: Alexis Murzeau <[email protected]>
Signed-off-by: Matt Fleming <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Ard Biesheuvel <[email protected]>
Cc: Ben Hutchings <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Brian Gerst <[email protected]>
Cc: Denys Vlasenko <[email protected]>
Cc: H. Peter Anvin <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Maarten Lankhorst <[email protected]>
Cc: Matthew Garrett <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Raphael Hertzog <[email protected]>
Cc: Roger Shimizu <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: [email protected]
Link: http://lkml.kernel.org/r/[email protected]
Link: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=815125
Signed-off-by: Ingo Molnar <[email protected]>
---
arch/x86/platform/efi/quirks.c | 79 +++++++++++++++++++++++++++++++++---------
1 file changed, 62 insertions(+), 17 deletions(-)
diff --git a/arch/x86/platform/efi/quirks.c b/arch/x86/platform/efi/quirks.c
index 2d66db8..ed30e79 100644
--- a/arch/x86/platform/efi/quirks.c
+++ b/arch/x86/platform/efi/quirks.c
@@ -131,6 +131,27 @@ efi_status_t efi_query_variable_store(u32 attributes, unsigned long size)
EXPORT_SYMBOL_GPL(efi_query_variable_store);
/*
+ * Helper function for efi_reserve_boot_services() to figure out if we
+ * can free regions in efi_free_boot_services().
+ *
+ * Use this function to ensure we do not free regions owned by somebody
+ * else. We must only reserve (and then free) regions:
+ *
+ * - Not within any part of the kernel
+ * - Not the BIOS reserved area (E820_RESERVED, E820_NVS, etc)
+ */
+static bool can_free_region(u64 start, u64 size)
+{
+ if (start + size > __pa_symbol(_text) && start <= __pa_symbol(_end))
+ return false;
+
+ if (!e820_all_mapped(start, start+size, E820_RAM))
+ return false;
+
+ return true;
+}
+
+/*
* The UEFI specification makes it clear that the operating system is free to do
* whatever it wants with boot services code after ExitBootServices() has been
* called. Ignoring this recommendation a significant bunch of EFI implementations
@@ -147,26 +168,50 @@ void __init efi_reserve_boot_services(void)
efi_memory_desc_t *md = p;
u64 start = md->phys_addr;
u64 size = md->num_pages << EFI_PAGE_SHIFT;
+ bool already_reserved;
if (md->type != EFI_BOOT_SERVICES_CODE &&
md->type != EFI_BOOT_SERVICES_DATA)
continue;
- /* Only reserve where possible:
- * - Not within any already allocated areas
- * - Not over any memory area (really needed, if above?)
- * - Not within any part of the kernel
- * - Not the bios reserved area
- */
- if ((start + size > __pa_symbol(_text)
- && start <= __pa_symbol(_end)) ||
- !e820_all_mapped(start, start+size, E820_RAM) ||
- memblock_is_region_reserved(start, size)) {
- /* Could not reserve, skip it */
- md->num_pages = 0;
- memblock_dbg("Could not reserve boot range [0x%010llx-0x%010llx]\n",
- start, start+size-1);
- } else
+
+ already_reserved = memblock_is_region_reserved(start, size);
+
+ /*
+ * Because the following memblock_reserve() is paired
+ * with free_bootmem_late() for this region in
+ * efi_free_boot_services(), we must be extremely
+ * careful not to reserve, and subsequently free,
+ * critical regions of memory (like the kernel image) or
+ * those regions that somebody else has already
+ * reserved.
+ *
+ * A good example of a critical region that must not be
+ * freed is page zero (first 4Kb of memory), which may
+ * contain boot services code/data but is marked
+ * E820_RESERVED by trim_bios_range().
+ */
+ if (!already_reserved) {
memblock_reserve(start, size);
+
+ /*
+ * If we are the first to reserve the region, no
+ * one else cares about it. We own it and can
+ * free it later.
+ */
+ if (can_free_region(start, size))
+ continue;
+ }
+
+ /*
+ * We don't own the region. We must not free it.
+ *
+ * Setting this bit for a boot services region really
+ * doesn't make sense as far as the firmware is
+ * concerned, but it does provide us with a way to tag
+ * those regions that must not be paired with
+ * free_bootmem_late().
+ */
+ md->attribute |= EFI_MEMORY_RUNTIME;
}
}
@@ -183,8 +228,8 @@ void __init efi_free_boot_services(void)
md->type != EFI_BOOT_SERVICES_DATA)
continue;
- /* Could not reserve boot area */
- if (!size)
+ /* Do not free, someone else owns it: */
+ if (md->attribute & EFI_MEMORY_RUNTIME)
continue;
free_bootmem_late(start, size);
On Sat, 12 Mar, at 10:57:39AM, tip-bot for Matt Fleming wrote:
> Commit-ID: 452308de61056a539352a9306c46716d7af8a1f1
> Gitweb: http://git.kernel.org/tip/452308de61056a539352a9306c46716d7af8a1f1
> Author: Matt Fleming <[email protected]>
> AuthorDate: Fri, 11 Mar 2016 11:19:23 +0000
> Committer: Ingo Molnar <[email protected]>
> CommitDate: Sat, 12 Mar 2016 16:57:45 +0100
>
> x86/efi: Fix boot crash by always mapping boot service regions into new EFI page tables
Ingo, I see you picked this up for x86/urgent, but note that the bug
isn't actually triggerable until the stuff in tip/efi/core gets
merged. I'd suggest sticking this patch in tip/efi/core also.
It should be harmless to merge this patch before that, but the
references to separate EFI page tables don't make sense.
On Fri, 2016-03-11 at 11:19 +0000, Matt Fleming wrote:
> Some machines have EFI regions in page zero (physical address
> 0x00000000) and historically that region has been added to the e820
> map via trim_bios_range(), and ultimately mapped into the kernel page
> tables. It was not mapped via efi_map_regions() as one would expect.
>
> Alexis reports that with the new separate EFI page tables some boot
> services regions, such as page zero, are not mapped. This triggers an
> oops during the SetVirtualAddressMap() runtime call.
I'm still seeing a failure to boot even with this patch.
http://www.qzxyz.com/IMG_20160313_164601.jpgSorry for the dodgy photo but the screen has almost a mirror finish.
Attached is the dmesg from 4.4 with efi=debug memblock=debug
Cheers,
Scott
* Matt Fleming <[email protected]> wrote:
> On Sat, 12 Mar, at 10:57:39AM, tip-bot for Matt Fleming wrote:
> > Commit-ID: 452308de61056a539352a9306c46716d7af8a1f1
> > Gitweb: http://git.kernel.org/tip/452308de61056a539352a9306c46716d7af8a1f1
> > Author: Matt Fleming <[email protected]>
> > AuthorDate: Fri, 11 Mar 2016 11:19:23 +0000
> > Committer: Ingo Molnar <[email protected]>
> > CommitDate: Sat, 12 Mar 2016 16:57:45 +0100
> >
> > x86/efi: Fix boot crash by always mapping boot service regions into new EFI page tables
>
> Ingo, I see you picked this up for x86/urgent, but note that the bug
> isn't actually triggerable until the stuff in tip/efi/core gets
> merged. I'd suggest sticking this patch in tip/efi/core also.
Oh well ... too late for that, it's now upstream.
> It should be harmless to merge this patch before that, but the
> references to separate EFI page tables don't make sense.
Yeah, indeed. It's my bad: we merged the EFI-pagetables code back in
November and I forgot that we skipped its upstream merge twice ...
Thanks,
Ingo
(Fixing Maarten's email address which I botched originally)
On Sun, 13 Mar, at 05:09:35PM, Scott Ashcroft wrote:
> On Fri, 2016-03-11 at 11:19 +0000, Matt Fleming wrote:
> > Some machines have EFI regions in page zero (physical address
> > 0x00000000) and historically that region has been added to the e820
> > map via trim_bios_range(), and ultimately mapped into the kernel page
> > tables. It was not mapped via efi_map_regions() as one would expect.
> >
> > Alexis reports that with the new separate EFI page tables some boot
> > services regions, such as page zero, are not mapped. This triggers an
> > oops during the SetVirtualAddressMap() runtime call.
>
>
> I'm still seeing a failure to boot even with this patch.
>
> http://www.qzxyz.com/IMG_20160313_164601.jpgSorry for the dodgy photo but the screen has almost a mirror finish.
>
> Attached is the dmesg from 4.4 with efi=debug memblock=debug
Well, crap. Can you enable CONFIG_EFI_PGT_DUMP and send the dmesg
because it would be good to know how things are mapped before this
patch.
I'd be surprised if the issue you're seeing is related to the one that
Alexis reported. Having corrupt page table structures is a whole new
bag of scary.
Does $(grep pdpe1gb /proc/cpuinfo) show any output on your machine?
On Sun, 13 Mar, at 09:58:47PM, Matt Fleming wrote:
> (Fixing Maarten's email address which I botched originally)
>
> On Sun, 13 Mar, at 05:09:35PM, Scott Ashcroft wrote:
> > On Fri, 2016-03-11 at 11:19 +0000, Matt Fleming wrote:
> > > Some machines have EFI regions in page zero (physical address
> > > 0x00000000) and historically that region has been added to the e820
> > > map via trim_bios_range(), and ultimately mapped into the kernel page
> > > tables. It was not mapped via efi_map_regions() as one would expect.
> > >
> > > Alexis reports that with the new separate EFI page tables some boot
> > > services regions, such as page zero, are not mapped. This triggers an
> > > oops during the SetVirtualAddressMap() runtime call.
> >
> >
> > I'm still seeing a failure to boot even with this patch.
> >
> > http://www.qzxyz.com/IMG_20160313_164601.jpgSorry for the dodgy photo but the screen has almost a mirror finish.
> >
> > Attached is the dmesg from 4.4 with efi=debug memblock=debug
>
> Well, crap. Can you enable CONFIG_EFI_PGT_DUMP and send the dmesg
> because it would be good to know how things are mapped before this
> patch.
>
> I'd be surprised if the issue you're seeing is related to the one that
> Alexis reported. Having corrupt page table structures is a whole new
> bag of scary.
>
> Does $(grep pdpe1gb /proc/cpuinfo) show any output on your machine?
Assuming the answer to this question is "no", can you try out this
patch?
diff --git a/arch/x86/mm/pageattr.c b/arch/x86/mm/pageattr.c
index 8fee5b6f8f66..af74849e8c0f 100644
--- a/arch/x86/mm/pageattr.c
+++ b/arch/x86/mm/pageattr.c
@@ -1055,7 +1055,7 @@ static int populate_pud(struct cpa_data *cpa, unsigned long start, pgd_t *pgd,
/*
* Map everything starting from the Gb boundary, possibly with 1G pages
*/
- while (end - start >= PUD_SIZE) {
+ while (cpu_has_gbpages && end - start >= PUD_SIZE) {
set_pud(pud, __pud(cpa->pfn << PAGE_SHIFT | _PAGE_PSE |
massage_pgprot(pud_pgprot)));
On Sun, 2016-03-13 at 23:07 +0000, Matt Fleming wrote:
> On Sun, 13 Mar, at 09:58:47PM, Matt Fleming wrote:
> > Does $(grep pdpe1gb /proc/cpuinfo) show any output on your machine?
> Assuming the answer to this question is "no", can you try out this
> patch?
The answer is no. Attached is the dmesg with CONFIG_EFI_PGT_DUMP.
> diff --git a/arch/x86/mm/pageattr.c b/arch/x86/mm/pageattr.c
> index 8fee5b6f8f66..af74849e8c0f 100644
> --- a/arch/x86/mm/pageattr.c
> +++ b/arch/x86/mm/pageattr.c
> @@ -1055,7 +1055,7 @@ static int populate_pud(struct cpa_data *cpa,
> unsigned long start, pgd_t *pgd,
> /*
> * Map everything starting from the Gb boundary, possibly
> with 1G pages
> */
> - while (end - start >= PUD_SIZE) {
> + while (cpu_has_gbpages && end - start >= PUD_SIZE) {
> set_pud(pud, __pud(cpa->pfn << PAGE_SHIFT |
> _PAGE_PSE |
> massage_pgprot(pud_pgprot)));
OK. I'll give that a go.
Cheers,
Scott
On Sun, 2016-03-13 at 23:07 +0000, Matt Fleming wrote:
> Assuming the answer to this question is "no", can you try out this
> patch?
>
> diff --git a/arch/x86/mm/pageattr.c b/arch/x86/mm/pageattr.c
> index 8fee5b6f8f66..af74849e8c0f 100644
> --- a/arch/x86/mm/pageattr.c
> +++ b/arch/x86/mm/pageattr.c
> @@ -1055,7 +1055,7 @@ static int populate_pud(struct cpa_data *cpa,
> unsigned long start, pgd_t *pgd,
> /*
> * Map everything starting from the Gb boundary, possibly
> with 1G pages
> */
> - while (end - start >= PUD_SIZE) {
> + while (cpu_has_gbpages && end - start >= PUD_SIZE) {
> set_pud(pud, __pud(cpa->pfn << PAGE_SHIFT |
> _PAGE_PSE |
> massage_pgprot(pud_pgprot)));
>
Yes, that does fix it.
Cheers,
Scott
* Matt Fleming <[email protected]> wrote:
> diff --git a/arch/x86/mm/pageattr.c b/arch/x86/mm/pageattr.c
> index 8fee5b6f8f66..af74849e8c0f 100644
> --- a/arch/x86/mm/pageattr.c
> +++ b/arch/x86/mm/pageattr.c
> @@ -1055,7 +1055,7 @@ static int populate_pud(struct cpa_data *cpa, unsigned long start, pgd_t *pgd,
> /*
> * Map everything starting from the Gb boundary, possibly with 1G pages
> */
> - while (end - start >= PUD_SIZE) {
> + while (cpu_has_gbpages && end - start >= PUD_SIZE) {
> set_pud(pud, __pud(cpa->pfn << PAGE_SHIFT | _PAGE_PSE |
> massage_pgprot(pud_pgprot)));
Btw., can 'cpa->pfn << PAGE_SHIFT' possibly work on 32-bit systems?
cpa->pfn is unsigned long, so the result gets truncated to 32 bits ...
cpa->pfn should be u64.
Thanks,
Ingo
On Mon, 14 Mar, at 11:30:19AM, Ingo Molnar wrote:
>
> * Matt Fleming <[email protected]> wrote:
>
> > diff --git a/arch/x86/mm/pageattr.c b/arch/x86/mm/pageattr.c
> > index 8fee5b6f8f66..af74849e8c0f 100644
> > --- a/arch/x86/mm/pageattr.c
> > +++ b/arch/x86/mm/pageattr.c
> > @@ -1055,7 +1055,7 @@ static int populate_pud(struct cpa_data *cpa, unsigned long start, pgd_t *pgd,
> > /*
> > * Map everything starting from the Gb boundary, possibly with 1G pages
> > */
> > - while (end - start >= PUD_SIZE) {
> > + while (cpu_has_gbpages && end - start >= PUD_SIZE) {
> > set_pud(pud, __pud(cpa->pfn << PAGE_SHIFT | _PAGE_PSE |
> > massage_pgprot(pud_pgprot)));
>
> Btw., can 'cpa->pfn << PAGE_SHIFT' possibly work on 32-bit systems?
>
> cpa->pfn is unsigned long, so the result gets truncated to 32 bits ...
>
> cpa->pfn should be u64.
That is a nice catch.
Note that we never run this code on 32-bit right now. Moving 32-bit to
this code and away from the old_map scheme is on my TODO list.
* Matt Fleming <[email protected]> wrote:
> On Mon, 14 Mar, at 11:30:19AM, Ingo Molnar wrote:
> >
> > * Matt Fleming <[email protected]> wrote:
> >
> > > diff --git a/arch/x86/mm/pageattr.c b/arch/x86/mm/pageattr.c
> > > index 8fee5b6f8f66..af74849e8c0f 100644
> > > --- a/arch/x86/mm/pageattr.c
> > > +++ b/arch/x86/mm/pageattr.c
> > > @@ -1055,7 +1055,7 @@ static int populate_pud(struct cpa_data *cpa, unsigned long start, pgd_t *pgd,
> > > /*
> > > * Map everything starting from the Gb boundary, possibly with 1G pages
> > > */
> > > - while (end - start >= PUD_SIZE) {
> > > + while (cpu_has_gbpages && end - start >= PUD_SIZE) {
> > > set_pud(pud, __pud(cpa->pfn << PAGE_SHIFT | _PAGE_PSE |
> > > massage_pgprot(pud_pgprot)));
> >
> > Btw., can 'cpa->pfn << PAGE_SHIFT' possibly work on 32-bit systems?
> >
> > cpa->pfn is unsigned long, so the result gets truncated to 32 bits ...
> >
> > cpa->pfn should be u64.
>
> That is a nice catch.
>
> Note that we never run this code on 32-bit right now. Moving 32-bit to
> this code and away from the old_map scheme is on my TODO list.
There's a number of such occurences that look suspicious:
triton:~/tip> git grep 'cpa->pfn.*<<.*PAGE_SHIFT' arch/x86/
arch/x86/mm/pageattr.c: set_pmd(pmd, __pmd(cpa->pfn << PAGE_SHIFT | _PAGE_PSE |
arch/x86/mm/pageattr.c: set_pud(pud, __pud(cpa->pfn << PAGE_SHIFT | _PAGE_PSE |
arch/x86/mm/pageattr.c: unsigned long laddr = (unsigned long)__va(cpa->pfn << PAGE_SHIFT);
arch/x86/mm/pageattr.c: unsigned long temp_cpa_vaddr = (cpa->pfn << PAGE_SHIFT) +
are you sure none of the code runs on 32-bit?
All this got introduced with:
| commit edc3b9129cecd0f0857112136f5b8b1bc1d45918
| Author: Matt Fleming <[email protected]>
| Date: Fri Nov 27 21:09:31 2015 +0000
|
| x86/mm/pat: Ensure cpa->pfn only contains page frame numbers
AFAICS.
Even if none of this is run on 32-bit, we should really fix it to be u64, because
the code is really bogus and the fix is easy ...
Thanks,
Ingo
On Mon, 14 Mar, at 01:05:02PM, Ingo Molnar wrote:
>
> There's a number of such occurences that look suspicious:
>
> triton:~/tip> git grep 'cpa->pfn.*<<.*PAGE_SHIFT' arch/x86/
> arch/x86/mm/pageattr.c: set_pmd(pmd, __pmd(cpa->pfn << PAGE_SHIFT | _PAGE_PSE |
> arch/x86/mm/pageattr.c: set_pud(pud, __pud(cpa->pfn << PAGE_SHIFT | _PAGE_PSE |
> arch/x86/mm/pageattr.c: unsigned long laddr = (unsigned long)__va(cpa->pfn << PAGE_SHIFT);
> arch/x86/mm/pageattr.c: unsigned long temp_cpa_vaddr = (cpa->pfn << PAGE_SHIFT) +
>
> are you sure none of the code runs on 32-bit?
The following occurrences do not run on 32-bit,
> arch/x86/mm/pageattr.c: set_pmd(pmd, __pmd(cpa->pfn << PAGE_SHIFT | _PAGE_PSE |
> arch/x86/mm/pageattr.c: set_pud(pud, __pud(cpa->pfn << PAGE_SHIFT | _PAGE_PSE |
but these might,
> arch/x86/mm/pageattr.c: unsigned long laddr = (unsigned long)__va(cpa->pfn << PAGE_SHIFT);
> arch/x86/mm/pageattr.c: unsigned long temp_cpa_vaddr = (cpa->pfn << PAGE_SHIFT) +
> All this got introduced with:
>
> | commit edc3b9129cecd0f0857112136f5b8b1bc1d45918
> | Author: Matt Fleming <[email protected]>
> | Date: Fri Nov 27 21:09:31 2015 +0000
> |
> | x86/mm/pat: Ensure cpa->pfn only contains page frame numbers
>
> AFAICS.
Kinda. The bugs that do not run on 32-bit were introduced with this
commit. The ones that will run on 32-bit existed before this.
Running the attached semantic patch across arch/x86/mm yields a few
more places where we get the data type wrong for PAE,
* file: arch/x86/mm/mmap.c:43 shifting int '( ( - 1UL ) & STACK_RND_MASK )' by PAGE_SHIFT is truncated to 32-bits
* file: arch/x86/mm/pageattr-test.c:57 shifting long 'i' by PAGE_SHIFT is truncated to 32-bits for PAE
* file: arch/x86/mm/pageattr-test.c:138 shifting unsigned long 'pfn' by PAGE_SHIFT is truncated to 32-bits for PAE
* file: arch/x86/mm/gup.c:422 shifting int 'nr' by PAGE_SHIFT is truncated to 32-bits
* file: arch/x86/mm/gup.c:303 shifting unsigned long '( unsigned long ) nr_pages' by PAGE_SHIFT is truncated to 32-bits for PAE
* file: arch/x86/mm/gup.c:370 shifting unsigned long '( unsigned long ) nr_pages' by PAGE_SHIFT is truncated to 32-bits for PAE
* file: arch/x86/mm/pat.c:751 shifting unsigned long 'pfn' by PAGE_SHIFT is truncated to 32-bits for PAE
* file: arch/x86/mm/pageattr-test.c:57 shifting long 'i' by PAGE_SHIFT is truncated to 32-bits for PAE
* file: arch/x86/mm/pageattr.c:947 shifting unsigned 'num_pages' by PAGE_SHIFT is truncated to 32-bits
* file: arch/x86/mm/pageattr.c:1995 shifting unsigned 'numpages' by PAGE_SHIFT is truncated to 32-bits
* file: arch/x86/mm/pageattr-test.c:138 shifting unsigned long 'pfn' by PAGE_SHIFT is truncated to 32-bits for PAE
* file: arch/x86/mm/pageattr.c:1117 shifting unsigned long 'cpa -> numpages' by PAGE_SHIFT is truncated to 32-bits for PAE
* file: arch/x86/mm/pageattr.c:1017 shifting unsigned long 'cpa -> numpages' by PAGE_SHIFT is truncated to 32-bits for PAE
* file: arch/x86/mm/pageattr.c:1277 shifting unsigned long 'cpa -> pfn' by PAGE_SHIFT is truncated to 32-bits for PAE
* file: arch/x86/mm/pageattr.c:1318 shifting unsigned long 'cpa -> pfn' by PAGE_SHIFT is truncated to 32-bits for PAE
* file: arch/x86/mm/pageattr.c:986 shifting unsigned long 'cpa -> pfn' by PAGE_SHIFT is truncated to 32-bits for PAE
* file: arch/x86/mm/pageattr.c:1059 shifting unsigned long 'cpa -> pfn' by PAGE_SHIFT is truncated to 32-bits for PAE
* file: arch/x86/mm/init.c:197 shifting unsigned long 'end_pfn' by PAGE_SHIFT is truncated to 32-bits for PAE
* file: arch/x86/mm/init.c:100 shifting unsigned long 'min_pfn_mapped' by PAGE_SHIFT is truncated to 32-bits for PAE
* file: arch/x86/mm/init.c:641 shifting unsigned long 'pagenr' by PAGE_SHIFT is truncated to 32-bits for PAE
* file: arch/x86/mm/init.c:111 shifting unsigned long 'pfn' by PAGE_SHIFT is truncated to 32-bits for PAE
* file: arch/x86/mm/init.c:121 shifting unsigned long 'pfn' by PAGE_SHIFT is truncated to 32-bits for PAE
* file: arch/x86/mm/init.c:111 shifting unsigned long __initdata 'pgt_buf_end' by PAGE_SHIFT is truncated to 32-bits
* file: arch/x86/mm/init.c:196 shifting unsigned long 'start_pfn' by PAGE_SHIFT is truncated to 32-bits for PAE
* file: arch/x86/mm/init.c:91 shifting unsigned long '( unsigned long ) num' by PAGE_SHIFT is truncated to 32-bits for PAE
* file: arch/x86/mm/init.c:117 shifting unsigned long '( pfn + i )' by PAGE_SHIFT is truncated to 32-bits for PAE
* file: arch/x86/mm/init_32.c:293 shifting unsigned long 'pfn' by PAGE_SHIFT is truncated to 32-bits for PAE
* file: arch/x86/mm/init_32.c:301 shifting unsigned long 'pfn' by PAGE_SHIFT is truncated to 32-bits for PAE
* file: arch/x86/mm/init_32.c:344 shifting unsigned long 'pfn' by PAGE_SHIFT is truncated to 32-bits for PAE
* file: arch/x86/mm/init_32.c:361 shifting unsigned long 'pfn' by PAGE_SHIFT is truncated to 32-bits for PAE
* file: arch/x86/mm/init_32.c:471 shifting unsigned long 'pfn' by PAGE_SHIFT is truncated to 32-bits for PAE
The coccinelle script isn't perfect, and there are a number of false
positives. For example, the first hit is bogus and looks like a
coccinelle bug, but the results do show some things that need to be
investigated.
Running it over arch/x86/ turns up 66 potential issues.
* Matt Fleming <[email protected]> wrote:
> Running the attached semantic patch across arch/x86/mm yields a few
> more places where we get the data type wrong for PAE,
Very nice!
> * file: arch/x86/mm/mmap.c:43 shifting int '( ( - 1UL ) & STACK_RND_MASK )' by PAGE_SHIFT is truncated to 32-bits
> * file: arch/x86/mm/pageattr-test.c:57 shifting long 'i' by PAGE_SHIFT is truncated to 32-bits for PAE
> * file: arch/x86/mm/pageattr-test.c:138 shifting unsigned long 'pfn' by PAGE_SHIFT is truncated to 32-bits for PAE
> * file: arch/x86/mm/gup.c:422 shifting int 'nr' by PAGE_SHIFT is truncated to 32-bits
> * file: arch/x86/mm/gup.c:303 shifting unsigned long '( unsigned long ) nr_pages' by PAGE_SHIFT is truncated to 32-bits for PAE
> * file: arch/x86/mm/gup.c:370 shifting unsigned long '( unsigned long ) nr_pages' by PAGE_SHIFT is truncated to 32-bits for PAE
> * file: arch/x86/mm/pat.c:751 shifting unsigned long 'pfn' by PAGE_SHIFT is truncated to 32-bits for PAE
> * file: arch/x86/mm/pageattr-test.c:57 shifting long 'i' by PAGE_SHIFT is truncated to 32-bits for PAE
> * file: arch/x86/mm/pageattr.c:947 shifting unsigned 'num_pages' by PAGE_SHIFT is truncated to 32-bits
> * file: arch/x86/mm/pageattr.c:1995 shifting unsigned 'numpages' by PAGE_SHIFT is truncated to 32-bits
> * file: arch/x86/mm/pageattr-test.c:138 shifting unsigned long 'pfn' by PAGE_SHIFT is truncated to 32-bits for PAE
> * file: arch/x86/mm/pageattr.c:1117 shifting unsigned long 'cpa -> numpages' by PAGE_SHIFT is truncated to 32-bits for PAE
> * file: arch/x86/mm/pageattr.c:1017 shifting unsigned long 'cpa -> numpages' by PAGE_SHIFT is truncated to 32-bits for PAE
> * file: arch/x86/mm/pageattr.c:1277 shifting unsigned long 'cpa -> pfn' by PAGE_SHIFT is truncated to 32-bits for PAE
> * file: arch/x86/mm/pageattr.c:1318 shifting unsigned long 'cpa -> pfn' by PAGE_SHIFT is truncated to 32-bits for PAE
> * file: arch/x86/mm/pageattr.c:986 shifting unsigned long 'cpa -> pfn' by PAGE_SHIFT is truncated to 32-bits for PAE
> * file: arch/x86/mm/pageattr.c:1059 shifting unsigned long 'cpa -> pfn' by PAGE_SHIFT is truncated to 32-bits for PAE
> * file: arch/x86/mm/init.c:197 shifting unsigned long 'end_pfn' by PAGE_SHIFT is truncated to 32-bits for PAE
> * file: arch/x86/mm/init.c:100 shifting unsigned long 'min_pfn_mapped' by PAGE_SHIFT is truncated to 32-bits for PAE
> * file: arch/x86/mm/init.c:641 shifting unsigned long 'pagenr' by PAGE_SHIFT is truncated to 32-bits for PAE
> * file: arch/x86/mm/init.c:111 shifting unsigned long 'pfn' by PAGE_SHIFT is truncated to 32-bits for PAE
> * file: arch/x86/mm/init.c:121 shifting unsigned long 'pfn' by PAGE_SHIFT is truncated to 32-bits for PAE
> * file: arch/x86/mm/init.c:111 shifting unsigned long __initdata 'pgt_buf_end' by PAGE_SHIFT is truncated to 32-bits
> * file: arch/x86/mm/init.c:196 shifting unsigned long 'start_pfn' by PAGE_SHIFT is truncated to 32-bits for PAE
> * file: arch/x86/mm/init.c:91 shifting unsigned long '( unsigned long ) num' by PAGE_SHIFT is truncated to 32-bits for PAE
> * file: arch/x86/mm/init.c:117 shifting unsigned long '( pfn + i )' by PAGE_SHIFT is truncated to 32-bits for PAE
> * file: arch/x86/mm/init_32.c:293 shifting unsigned long 'pfn' by PAGE_SHIFT is truncated to 32-bits for PAE
> * file: arch/x86/mm/init_32.c:301 shifting unsigned long 'pfn' by PAGE_SHIFT is truncated to 32-bits for PAE
> * file: arch/x86/mm/init_32.c:344 shifting unsigned long 'pfn' by PAGE_SHIFT is truncated to 32-bits for PAE
> * file: arch/x86/mm/init_32.c:361 shifting unsigned long 'pfn' by PAGE_SHIFT is truncated to 32-bits for PAE
> * file: arch/x86/mm/init_32.c:471 shifting unsigned long 'pfn' by PAGE_SHIFT is truncated to 32-bits for PAE
>
> The coccinelle script isn't perfect, and there are a number of false
> positives. For example, the first hit is bogus and looks like a
> coccinelle bug, but the results do show some things that need to be
> investigated.
So I checked a few random examples in your list, and the false positive rate looks
rather low.
The current Kbuild integration of Cocci scripts is pretty user-hostile. I'd love
to make this Cocci check part of the regular build process in some fashion (if a
Kconfig option is enabled), similarly to how we run objtool for example. We could
emit the Cocci warnings as a regular compiler 'warning: ' message, so people will
notice them as part of the build?
The false positive(s) could either be worked around or annotated away.
Obviously we'd only use Cocci scripts that are known to be reliable.
Thanks,
Ingo
On Mon, 14 Mar, at 05:47:00PM, Ingo Molnar wrote:
>
> * Matt Fleming <[email protected]> wrote:
>
> > Running the attached semantic patch across arch/x86/mm yields a few
> > more places where we get the data type wrong for PAE,
>
> Very nice!
>
> > * file: arch/x86/mm/mmap.c:43 shifting int '( ( - 1UL ) & STACK_RND_MASK )' by PAGE_SHIFT is truncated to 32-bits
> > * file: arch/x86/mm/pageattr-test.c:57 shifting long 'i' by PAGE_SHIFT is truncated to 32-bits for PAE
> > * file: arch/x86/mm/pageattr-test.c:138 shifting unsigned long 'pfn' by PAGE_SHIFT is truncated to 32-bits for PAE
> > * file: arch/x86/mm/gup.c:422 shifting int 'nr' by PAGE_SHIFT is truncated to 32-bits
> > * file: arch/x86/mm/gup.c:303 shifting unsigned long '( unsigned long ) nr_pages' by PAGE_SHIFT is truncated to 32-bits for PAE
> > * file: arch/x86/mm/gup.c:370 shifting unsigned long '( unsigned long ) nr_pages' by PAGE_SHIFT is truncated to 32-bits for PAE
> > * file: arch/x86/mm/pat.c:751 shifting unsigned long 'pfn' by PAGE_SHIFT is truncated to 32-bits for PAE
> > * file: arch/x86/mm/pageattr-test.c:57 shifting long 'i' by PAGE_SHIFT is truncated to 32-bits for PAE
> > * file: arch/x86/mm/pageattr.c:947 shifting unsigned 'num_pages' by PAGE_SHIFT is truncated to 32-bits
> > * file: arch/x86/mm/pageattr.c:1995 shifting unsigned 'numpages' by PAGE_SHIFT is truncated to 32-bits
> > * file: arch/x86/mm/pageattr-test.c:138 shifting unsigned long 'pfn' by PAGE_SHIFT is truncated to 32-bits for PAE
> > * file: arch/x86/mm/pageattr.c:1117 shifting unsigned long 'cpa -> numpages' by PAGE_SHIFT is truncated to 32-bits for PAE
> > * file: arch/x86/mm/pageattr.c:1017 shifting unsigned long 'cpa -> numpages' by PAGE_SHIFT is truncated to 32-bits for PAE
> > * file: arch/x86/mm/pageattr.c:1277 shifting unsigned long 'cpa -> pfn' by PAGE_SHIFT is truncated to 32-bits for PAE
> > * file: arch/x86/mm/pageattr.c:1318 shifting unsigned long 'cpa -> pfn' by PAGE_SHIFT is truncated to 32-bits for PAE
> > * file: arch/x86/mm/pageattr.c:986 shifting unsigned long 'cpa -> pfn' by PAGE_SHIFT is truncated to 32-bits for PAE
> > * file: arch/x86/mm/pageattr.c:1059 shifting unsigned long 'cpa -> pfn' by PAGE_SHIFT is truncated to 32-bits for PAE
> > * file: arch/x86/mm/init.c:197 shifting unsigned long 'end_pfn' by PAGE_SHIFT is truncated to 32-bits for PAE
> > * file: arch/x86/mm/init.c:100 shifting unsigned long 'min_pfn_mapped' by PAGE_SHIFT is truncated to 32-bits for PAE
> > * file: arch/x86/mm/init.c:641 shifting unsigned long 'pagenr' by PAGE_SHIFT is truncated to 32-bits for PAE
> > * file: arch/x86/mm/init.c:111 shifting unsigned long 'pfn' by PAGE_SHIFT is truncated to 32-bits for PAE
> > * file: arch/x86/mm/init.c:121 shifting unsigned long 'pfn' by PAGE_SHIFT is truncated to 32-bits for PAE
> > * file: arch/x86/mm/init.c:111 shifting unsigned long __initdata 'pgt_buf_end' by PAGE_SHIFT is truncated to 32-bits
> > * file: arch/x86/mm/init.c:196 shifting unsigned long 'start_pfn' by PAGE_SHIFT is truncated to 32-bits for PAE
> > * file: arch/x86/mm/init.c:91 shifting unsigned long '( unsigned long ) num' by PAGE_SHIFT is truncated to 32-bits for PAE
> > * file: arch/x86/mm/init.c:117 shifting unsigned long '( pfn + i )' by PAGE_SHIFT is truncated to 32-bits for PAE
> > * file: arch/x86/mm/init_32.c:293 shifting unsigned long 'pfn' by PAGE_SHIFT is truncated to 32-bits for PAE
> > * file: arch/x86/mm/init_32.c:301 shifting unsigned long 'pfn' by PAGE_SHIFT is truncated to 32-bits for PAE
> > * file: arch/x86/mm/init_32.c:344 shifting unsigned long 'pfn' by PAGE_SHIFT is truncated to 32-bits for PAE
> > * file: arch/x86/mm/init_32.c:361 shifting unsigned long 'pfn' by PAGE_SHIFT is truncated to 32-bits for PAE
> > * file: arch/x86/mm/init_32.c:471 shifting unsigned long 'pfn' by PAGE_SHIFT is truncated to 32-bits for PAE
> >
> > The coccinelle script isn't perfect, and there are a number of false
> > positives. For example, the first hit is bogus and looks like a
> > coccinelle bug, but the results do show some things that need to be
> > investigated.
>
> So I checked a few random examples in your list, and the false positive rate looks
> rather low.
>
> The current Kbuild integration of Cocci scripts is pretty user-hostile. I'd love
> to make this Cocci check part of the regular build process in some fashion (if a
> Kconfig option is enabled), similarly to how we run objtool for example. We could
> emit the Cocci warnings as a regular compiler 'warning: ' message, so people will
> notice them as part of the build?
For this type of cocci script where architecture knowledge of bug
idioms is required (you need to know shifting by "PAGE_SHIFT" is
usually done to build an address or size of some kind) I think the
first step would be to automatically lookup scripts to be run on a
per-directory basis.
For example, running make(1) in arch/x86/kernel should map to
scripts/coccinelle/arch/x86/kernel, etc.
And yes, I agree, turning this on via a CONFIG_* symbol would be nice.
> The false positive(s) could either be worked around or annotated away.
>
> Obviously we'd only use Cocci scripts that are known to be reliable.
Fengguang, do you run coccinelle scripts currently as part of the
0-day lkp build machinery?
On Tue, Mar 15, 2016 at 03:54:18PM +0000, Matt Fleming wrote:
> Fengguang, do you run coccinelle scripts currently as part of the
> 0-day lkp build machinery?
Not only that - it generates patches too. For example:
https://lkml.kernel.org/r/20160303015519.GA43500@lkp-nex06
--
Regards/Gruss,
Boris.
ECO tip #101: Trim your mails when you reply.
On Tue, 15 Mar 2016, Matt Fleming wrote:
> On Mon, 14 Mar, at 05:47:00PM, Ingo Molnar wrote:
> >
> > * Matt Fleming <[email protected]> wrote:
> >
> > > Running the attached semantic patch across arch/x86/mm yields a few
> > > more places where we get the data type wrong for PAE,
> >
> > Very nice!
> >
> > > * file: arch/x86/mm/mmap.c:43 shifting int '( ( - 1UL ) & STACK_RND_MASK )' by PAGE_SHIFT is truncated to 32-bits
> > > * file: arch/x86/mm/pageattr-test.c:57 shifting long 'i' by PAGE_SHIFT is truncated to 32-bits for PAE
> > > * file: arch/x86/mm/pageattr-test.c:138 shifting unsigned long 'pfn' by PAGE_SHIFT is truncated to 32-bits for PAE
> > > * file: arch/x86/mm/gup.c:422 shifting int 'nr' by PAGE_SHIFT is truncated to 32-bits
> > > * file: arch/x86/mm/gup.c:303 shifting unsigned long '( unsigned long ) nr_pages' by PAGE_SHIFT is truncated to 32-bits for PAE
> > > * file: arch/x86/mm/gup.c:370 shifting unsigned long '( unsigned long ) nr_pages' by PAGE_SHIFT is truncated to 32-bits for PAE
> > > * file: arch/x86/mm/pat.c:751 shifting unsigned long 'pfn' by PAGE_SHIFT is truncated to 32-bits for PAE
> > > * file: arch/x86/mm/pageattr-test.c:57 shifting long 'i' by PAGE_SHIFT is truncated to 32-bits for PAE
> > > * file: arch/x86/mm/pageattr.c:947 shifting unsigned 'num_pages' by PAGE_SHIFT is truncated to 32-bits
> > > * file: arch/x86/mm/pageattr.c:1995 shifting unsigned 'numpages' by PAGE_SHIFT is truncated to 32-bits
> > > * file: arch/x86/mm/pageattr-test.c:138 shifting unsigned long 'pfn' by PAGE_SHIFT is truncated to 32-bits for PAE
> > > * file: arch/x86/mm/pageattr.c:1117 shifting unsigned long 'cpa -> numpages' by PAGE_SHIFT is truncated to 32-bits for PAE
> > > * file: arch/x86/mm/pageattr.c:1017 shifting unsigned long 'cpa -> numpages' by PAGE_SHIFT is truncated to 32-bits for PAE
> > > * file: arch/x86/mm/pageattr.c:1277 shifting unsigned long 'cpa -> pfn' by PAGE_SHIFT is truncated to 32-bits for PAE
> > > * file: arch/x86/mm/pageattr.c:1318 shifting unsigned long 'cpa -> pfn' by PAGE_SHIFT is truncated to 32-bits for PAE
> > > * file: arch/x86/mm/pageattr.c:986 shifting unsigned long 'cpa -> pfn' by PAGE_SHIFT is truncated to 32-bits for PAE
> > > * file: arch/x86/mm/pageattr.c:1059 shifting unsigned long 'cpa -> pfn' by PAGE_SHIFT is truncated to 32-bits for PAE
> > > * file: arch/x86/mm/init.c:197 shifting unsigned long 'end_pfn' by PAGE_SHIFT is truncated to 32-bits for PAE
> > > * file: arch/x86/mm/init.c:100 shifting unsigned long 'min_pfn_mapped' by PAGE_SHIFT is truncated to 32-bits for PAE
> > > * file: arch/x86/mm/init.c:641 shifting unsigned long 'pagenr' by PAGE_SHIFT is truncated to 32-bits for PAE
> > > * file: arch/x86/mm/init.c:111 shifting unsigned long 'pfn' by PAGE_SHIFT is truncated to 32-bits for PAE
> > > * file: arch/x86/mm/init.c:121 shifting unsigned long 'pfn' by PAGE_SHIFT is truncated to 32-bits for PAE
> > > * file: arch/x86/mm/init.c:111 shifting unsigned long __initdata 'pgt_buf_end' by PAGE_SHIFT is truncated to 32-bits
> > > * file: arch/x86/mm/init.c:196 shifting unsigned long 'start_pfn' by PAGE_SHIFT is truncated to 32-bits for PAE
> > > * file: arch/x86/mm/init.c:91 shifting unsigned long '( unsigned long ) num' by PAGE_SHIFT is truncated to 32-bits for PAE
> > > * file: arch/x86/mm/init.c:117 shifting unsigned long '( pfn + i )' by PAGE_SHIFT is truncated to 32-bits for PAE
> > > * file: arch/x86/mm/init_32.c:293 shifting unsigned long 'pfn' by PAGE_SHIFT is truncated to 32-bits for PAE
> > > * file: arch/x86/mm/init_32.c:301 shifting unsigned long 'pfn' by PAGE_SHIFT is truncated to 32-bits for PAE
> > > * file: arch/x86/mm/init_32.c:344 shifting unsigned long 'pfn' by PAGE_SHIFT is truncated to 32-bits for PAE
> > > * file: arch/x86/mm/init_32.c:361 shifting unsigned long 'pfn' by PAGE_SHIFT is truncated to 32-bits for PAE
> > > * file: arch/x86/mm/init_32.c:471 shifting unsigned long 'pfn' by PAGE_SHIFT is truncated to 32-bits for PAE
> > >
> > > The coccinelle script isn't perfect, and there are a number of false
> > > positives. For example, the first hit is bogus and looks like a
> > > coccinelle bug, but the results do show some things that need to be
> > > investigated.
> >
> > So I checked a few random examples in your list, and the false positive rate looks
> > rather low.
> >
> > The current Kbuild integration of Cocci scripts is pretty user-hostile. I'd love
> > to make this Cocci check part of the regular build process in some fashion (if a
> > Kconfig option is enabled), similarly to how we run objtool for example. We could
> > emit the Cocci warnings as a regular compiler 'warning: ' message, so people will
> > notice them as part of the build?
>
> For this type of cocci script where architecture knowledge of bug
> idioms is required (you need to know shifting by "PAGE_SHIFT" is
> usually done to build an address or size of some kind) I think the
> first step would be to automatically lookup scripts to be run on a
> per-directory basis.
>
> For example, running make(1) in arch/x86/kernel should map to
> scripts/coccinelle/arch/x86/kernel, etc.
Considering the current organization of make coccicheck, I think it would
be easier for Coccinelle to just abort on files not in the desired
directories. This is possible to encode, although perhaps a little bit
awkward.
julia
> And yes, I agree, turning this on via a CONFIG_* symbol would be nice.
>
> > The false positive(s) could either be worked around or annotated away.
> >
> > Obviously we'd only use Cocci scripts that are known to be reliable.
>
> Fengguang, do you run coccinelle scripts currently as part of the
> 0-day lkp build machinery?
>