2016-10-19 18:30:52

by Greg KH

[permalink] [raw]
Subject: [PATCH 4.4 0/2] 4.4.26-stable review

This is the start of the stable review cycle for the 4.4.26 release.
There are 2 patches in this series, all will be posted as a response
to this one. If anyone has any issues with these being applied, please
let me know.

Responses should be made by Fri Oct 21 18:27:53 UTC 2016.
Anything received after that time might be too late.

The whole patch series can be found in one patch at:
kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.4.26-rc1.gz
or in the git tree and branch at:
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-4.4.y
and the diffstat can be found below.

thanks,

greg k-h

-------------
Pseudo-Shortlog of commits:

Greg Kroah-Hartman <[email protected]>
Linux 4.4.26-rc1

Linus Torvalds <[email protected]>
mm: remove gup_flags FOLL_WRITE games from __get_user_pages()

H.J. Lu <[email protected]>
x86/build: Build compressed x86 kernels as PIE


-------------

Diffstat:

Makefile | 4 ++--
arch/x86/boot/compressed/Makefile | 14 +++++++++++++-
arch/x86/boot/compressed/head_32.S | 28 ++++++++++++++++++++++++++++
arch/x86/boot/compressed/head_64.S | 8 ++++++++
include/linux/mm.h | 1 +
mm/gup.c | 14 ++++++++++++--
6 files changed, 64 insertions(+), 5 deletions(-)



2016-10-19 18:31:00

by Greg KH

[permalink] [raw]
Subject: [PATCH 4.4 1/2] x86/build: Build compressed x86 kernels as PIE

4.4-stable review patch. If anyone has any objections, please let me know.

------------------

From: H.J. Lu <[email protected]>

commit 6d92bc9d483aa1751755a66fee8fb39dffb088c0 upstream.

The 32-bit x86 assembler in binutils 2.26 will generate R_386_GOT32X
relocation to get the symbol address in PIC. When the compressed x86
kernel isn't built as PIC, the linker optimizes R_386_GOT32X relocations
to their fixed symbol addresses. However, when the compressed x86
kernel is loaded at a different address, it leads to the following
load failure:

Failed to allocate space for phdrs

during the decompression stage.

If the compressed x86 kernel is relocatable at run-time, it should be
compiled with -fPIE, instead of -fPIC, if possible and should be built as
Position Independent Executable (PIE) so that linker won't optimize
R_386_GOT32X relocation to its fixed symbol address.

Older linkers generate R_386_32 relocations against locally defined
symbols, _bss, _ebss, _got and _egot, in PIE. It isn't wrong, just less
optimal than R_386_RELATIVE. But the x86 kernel fails to properly handle
R_386_32 relocations when relocating the kernel. To generate
R_386_RELATIVE relocations, we mark _bss, _ebss, _got and _egot as
hidden in both 32-bit and 64-bit x86 kernels.

To build a 64-bit compressed x86 kernel as PIE, we need to disable the
relocation overflow check to avoid relocation overflow errors. We do
this with a new linker command-line option, -z noreloc-overflow, which
got added recently:

commit 4c10bbaa0912742322f10d9d5bb630ba4e15dfa7
Author: H.J. Lu <[email protected]>
Date: Tue Mar 15 11:07:06 2016 -0700

Add -z noreloc-overflow option to x86-64 ld

Add -z noreloc-overflow command-line option to the x86-64 ELF linker to
disable relocation overflow check. This can be used to avoid relocation
overflow check if there will be no dynamic relocation overflow at
run-time.

The 64-bit compressed x86 kernel is built as PIE only if the linker supports
-z noreloc-overflow. So far 64-bit relocatable compressed x86 kernel
boots fine even when it is built as a normal executable.

Signed-off-by: H.J. Lu <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Brian Gerst <[email protected]>
Cc: Denys Vlasenko <[email protected]>
Cc: H. Peter Anvin <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: [email protected]
[ Edited the changelog and comments. ]
Signed-off-by: Ingo Molnar <[email protected]>
Cc: Paul Bolle <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
arch/x86/boot/compressed/Makefile | 14 +++++++++++++-
arch/x86/boot/compressed/head_32.S | 28 ++++++++++++++++++++++++++++
arch/x86/boot/compressed/head_64.S | 8 ++++++++
3 files changed, 49 insertions(+), 1 deletion(-)

--- a/arch/x86/boot/compressed/Makefile
+++ b/arch/x86/boot/compressed/Makefile
@@ -22,7 +22,7 @@ targets := vmlinux vmlinux.bin vmlinux.b
vmlinux.bin.xz vmlinux.bin.lzo vmlinux.bin.lz4

KBUILD_CFLAGS := -m$(BITS) -D__KERNEL__ $(LINUX_INCLUDE) -O2
-KBUILD_CFLAGS += -fno-strict-aliasing -fPIC
+KBUILD_CFLAGS += -fno-strict-aliasing $(call cc-option, -fPIE, -fPIC)
KBUILD_CFLAGS += -DDISABLE_BRANCH_PROFILING
cflags-$(CONFIG_X86_32) := -march=i386
cflags-$(CONFIG_X86_64) := -mcmodel=small
@@ -35,6 +35,18 @@ KBUILD_AFLAGS := $(KBUILD_CFLAGS) -D__A
GCOV_PROFILE := n

LDFLAGS := -m elf_$(UTS_MACHINE)
+ifeq ($(CONFIG_RELOCATABLE),y)
+# If kernel is relocatable, build compressed kernel as PIE.
+ifeq ($(CONFIG_X86_32),y)
+LDFLAGS += $(call ld-option, -pie) $(call ld-option, --no-dynamic-linker)
+else
+# To build 64-bit compressed kernel as PIE, we disable relocation
+# overflow check to avoid relocation overflow error with a new linker
+# command-line option, -z noreloc-overflow.
+LDFLAGS += $(shell $(LD) --help 2>&1 | grep -q "\-z noreloc-overflow" \
+ && echo "-z noreloc-overflow -pie --no-dynamic-linker")
+endif
+endif
LDFLAGS_vmlinux := -T

hostprogs-y := mkpiggy
--- a/arch/x86/boot/compressed/head_32.S
+++ b/arch/x86/boot/compressed/head_32.S
@@ -31,6 +31,34 @@
#include <asm/asm-offsets.h>
#include <asm/bootparam.h>

+/*
+ * The 32-bit x86 assembler in binutils 2.26 will generate R_386_GOT32X
+ * relocation to get the symbol address in PIC. When the compressed x86
+ * kernel isn't built as PIC, the linker optimizes R_386_GOT32X
+ * relocations to their fixed symbol addresses. However, when the
+ * compressed x86 kernel is loaded at a different address, it leads
+ * to the following load failure:
+ *
+ * Failed to allocate space for phdrs
+ *
+ * during the decompression stage.
+ *
+ * If the compressed x86 kernel is relocatable at run-time, it should be
+ * compiled with -fPIE, instead of -fPIC, if possible and should be built as
+ * Position Independent Executable (PIE) so that linker won't optimize
+ * R_386_GOT32X relocation to its fixed symbol address. Older
+ * linkers generate R_386_32 relocations against locally defined symbols,
+ * _bss, _ebss, _got and _egot, in PIE. It isn't wrong, just less
+ * optimal than R_386_RELATIVE. But the x86 kernel fails to properly handle
+ * R_386_32 relocations when relocating the kernel. To generate
+ * R_386_RELATIVE relocations, we mark _bss, _ebss, _got and _egot as
+ * hidden:
+ */
+ .hidden _bss
+ .hidden _ebss
+ .hidden _got
+ .hidden _egot
+
__HEAD
ENTRY(startup_32)
#ifdef CONFIG_EFI_STUB
--- a/arch/x86/boot/compressed/head_64.S
+++ b/arch/x86/boot/compressed/head_64.S
@@ -33,6 +33,14 @@
#include <asm/asm-offsets.h>
#include <asm/bootparam.h>

+/*
+ * Locally defined symbols should be marked hidden:
+ */
+ .hidden _bss
+ .hidden _ebss
+ .hidden _got
+ .hidden _egot
+
__HEAD
.code32
ENTRY(startup_32)


2016-10-19 18:31:03

by Greg KH

[permalink] [raw]
Subject: [PATCH 4.4 2/2] mm: remove gup_flags FOLL_WRITE games from __get_user_pages()

4.4-stable review patch. If anyone has any objections, please let me know.

------------------

From: Linus Torvalds <[email protected]>

commit 19be0eaffa3ac7d8eb6784ad9bdbc7d67ed8e619 upstream.

This is an ancient bug that was actually attempted to be fixed once
(badly) by me eleven years ago in commit 4ceb5db9757a ("Fix
get_user_pages() race for write access") but that was then undone due to
problems on s390 by commit f33ea7f404e5 ("fix get_user_pages bug").

In the meantime, the s390 situation has long been fixed, and we can now
fix it by checking the pte_dirty() bit properly (and do it better). The
s390 dirty bit was implemented in abf09bed3cce ("s390/mm: implement
software dirty bits") which made it into v3.9. Earlier kernels will
have to look at the page state itself.

Also, the VM has become more scalable, and what used a purely
theoretical race back then has become easier to trigger.

To fix it, we introduce a new internal FOLL_COW flag to mark the "yes,
we already did a COW" rather than play racy games with FOLL_WRITE that
is very fundamental, and then use the pte dirty flag to validate that
the FOLL_COW flag is still valid.

Reported-and-tested-by: Phil "not Paul" Oester <[email protected]>
Acked-by: Hugh Dickins <[email protected]>
Reviewed-by: Michal Hocko <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Kees Cook <[email protected]>
Cc: Oleg Nesterov <[email protected]>
Cc: Willy Tarreau <[email protected]>
Cc: Nick Piggin <[email protected]>
Cc: Greg Thelen <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
include/linux/mm.h | 1 +
mm/gup.c | 14 ++++++++++++--
2 files changed, 13 insertions(+), 2 deletions(-)

--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -2112,6 +2112,7 @@ static inline struct page *follow_page(s
#define FOLL_MIGRATION 0x400 /* wait for page to replace migration entry */
#define FOLL_TRIED 0x800 /* a retry, previous pass started an IO */
#define FOLL_MLOCK 0x1000 /* lock present pages */
+#define FOLL_COW 0x4000 /* internal GUP flag */

typedef int (*pte_fn_t)(pte_t *pte, pgtable_t token, unsigned long addr,
void *data);
--- a/mm/gup.c
+++ b/mm/gup.c
@@ -58,6 +58,16 @@ static int follow_pfn_pte(struct vm_area
return -EEXIST;
}

+/*
+ * FOLL_FORCE can write to even unwritable pte's, but only
+ * after we've gone through a COW cycle and they are dirty.
+ */
+static inline bool can_follow_write_pte(pte_t pte, unsigned int flags)
+{
+ return pte_write(pte) ||
+ ((flags & FOLL_FORCE) && (flags & FOLL_COW) && pte_dirty(pte));
+}
+
static struct page *follow_page_pte(struct vm_area_struct *vma,
unsigned long address, pmd_t *pmd, unsigned int flags)
{
@@ -92,7 +102,7 @@ retry:
}
if ((flags & FOLL_NUMA) && pte_protnone(pte))
goto no_page;
- if ((flags & FOLL_WRITE) && !pte_write(pte)) {
+ if ((flags & FOLL_WRITE) && !can_follow_write_pte(pte, flags)) {
pte_unmap_unlock(ptep, ptl);
return NULL;
}
@@ -352,7 +362,7 @@ static int faultin_page(struct task_stru
* reCOWed by userspace write).
*/
if ((ret & VM_FAULT_WRITE) && !(vma->vm_flags & VM_WRITE))
- *flags &= ~FOLL_WRITE;
+ *flags |= FOLL_COW;
return 0;
}



2016-10-19 18:53:03

by Paul Bolle

[permalink] [raw]
Subject: Re: [PATCH 4.4 0/2] 4.4.26-stable review

On Wed, 2016-10-19 at 20:30 +0200, Greg Kroah-Hartman wrote:
> This is the start of the stable review cycle for the 4.4.26 release.
> There are 2 patches in this series, all will be posted as a response
> to this one.  If anyone has any issues with these being applied, please
> let me know.

Did I botch my attempt at a backport of "lightnvm: ensure that
nvm_dev_ops can be used without CONFIG_NVM" to v4.4.y (see
https://lkml.kernel.org/r/<1476477349-28155-1-git-send-email-pebolle@ti
scali.nl> ) sufficiently for it to be dropped?

Thanks,


Paul Bolle

2016-10-19 19:34:07

by Greg KH

[permalink] [raw]
Subject: Re: [PATCH 4.4 0/2] 4.4.26-stable review

On Wed, Oct 19, 2016 at 08:52:55PM +0200, Paul Bolle wrote:
> On Wed, 2016-10-19 at 20:30 +0200, Greg Kroah-Hartman wrote:
> > This is the start of the stable review cycle for the 4.4.26 release.
> > There are 2 patches in this series, all will be posted as a response
> > to this one.??If anyone has any issues with these being applied, please
> > let me know.
>
> Did I botch my attempt at a backport of "lightnvm: ensure that
> nvm_dev_ops can be used without CONFIG_NVM" to v4.4.y (see
> https://lkml.kernel.org/r/<1476477349-28155-1-git-send-email-pebolle@ti
> scali.nl>?) sufficiently for it to be dropped?

It's in good company, sitting along with 250+ other patches I have yet
to work through to apply to the stable kernels. For various reasons I
needed to get a round of stable kernels out sooner, which is why it
isn't in there. Don't worry, it's not lost, it will get handled
eventually...

thanks,

greg k-h

2016-10-19 19:41:50

by Paul Bolle

[permalink] [raw]
Subject: Re: [PATCH 4.4 0/2] 4.4.26-stable review

On Wed, 2016-10-19 at 21:34 +0200, Greg Kroah-Hartman wrote:
> It's in good company, sitting along with 250+ other patches I have yet
> to work through to apply to the stable kernels.  For various reasons I
> needed to get a round of stable kernels out sooner, which is why it
> isn't in there.  Don't worry, it's not lost, it will get handled
> eventually...

Great. I'll be patient from now on. Sorry for the noise.


Paul Bolle

2016-10-19 22:28:51

by Shuah Khan

[permalink] [raw]
Subject: Re: [PATCH 4.4 0/2] 4.4.26-stable review

On 10/19/2016 12:30 PM, Greg Kroah-Hartman wrote:
> This is the start of the stable review cycle for the 4.4.26 release.
> There are 2 patches in this series, all will be posted as a response
> to this one. If anyone has any issues with these being applied, please
> let me know.
>
> Responses should be made by Fri Oct 21 18:27:53 UTC 2016.
> Anything received after that time might be too late.
>
> The whole patch series can be found in one patch at:
> kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.4.26-rc1.gz
> or in the git tree and branch at:
> git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-4.4.y
> and the diffstat can be found below.
>
> thanks,
>
> greg k-h
>

Compiled and booted on my test system. No dmesg regressions.

thanks,
-- Shuah

--
Shuah Khan
Sr. Linux Kernel Developer
Open Source Innovation Group
Samsung Research America(Silicon Valley)
[email protected]

2016-10-20 01:41:58

by Guenter Roeck

[permalink] [raw]
Subject: Re: [PATCH 4.4 0/2] 4.4.26-stable review

On 10/19/2016 11:30 AM, Greg Kroah-Hartman wrote:
> This is the start of the stable review cycle for the 4.4.26 release.
> There are 2 patches in this series, all will be posted as a response
> to this one. If anyone has any issues with these being applied, please
> let me know.
>
> Responses should be made by Fri Oct 21 18:27:53 UTC 2016.
> Anything received after that time might be too late.
>
Build results:
total: 149 pass: 149 fail: 0
Qemu test results:
total: 103 pass: 103 fail: 0

Details are available at http://kerneltests.org/builders.

Guenter