2018-08-14 17:49:54

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.4 00/43] 4.4.148-stable review

This is the start of the stable review cycle for the 4.4.148 release.
There are 43 patches in this series, all will be posted as a response
to this one. If anyone has any issues with these being applied, please
let me know.

Responses should be made by Thu Aug 16 17:14:59 UTC 2018.
Anything received after that time might be too late.

The whole patch series can be found in one patch at:
https://www.kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.4.148-rc1.gz
or in the git tree and branch at:
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-4.4.y
and the diffstat can be found below.

thanks,

greg k-h

-------------
Pseudo-Shortlog of commits:

Greg Kroah-Hartman <[email protected]>
Linux 4.4.148-rc1

Guenter Roeck <[email protected]>
x86/speculation/l1tf: Fix up CPU feature flags

Andi Kleen <[email protected]>
x86/mm/kmmio: Make the tracer robust against L1TF

Andi Kleen <[email protected]>
x86/mm/pat: Make set_memory_np() L1TF safe

Andi Kleen <[email protected]>
x86/speculation/l1tf: Make pmd/pud_mknotpresent() invert

Andi Kleen <[email protected]>
x86/speculation/l1tf: Invert all not present mappings

Michal Hocko <[email protected]>
x86/speculation/l1tf: Fix up pte->pfn conversion for PAE

Vlastimil Babka <[email protected]>
x86/speculation/l1tf: Protect PAE swap entries against L1TF

Konrad Rzeszutek Wilk <[email protected]>
x86/cpufeatures: Add detection of L1D cache flush support.

Vlastimil Babka <[email protected]>
x86/speculation/l1tf: Extend 64bit swap file size limit

Konrad Rzeszutek Wilk <[email protected]>
x86/bugs: Move the l1tf function and define pr_fmt properly

Andi Kleen <[email protected]>
x86/speculation/l1tf: Limit swap file size to MAX_PA/2

Andi Kleen <[email protected]>
x86/speculation/l1tf: Disallow non privileged high MMIO PROT_NONE mappings

Dan Williams <[email protected]>
mm: fix cache mode tracking in vm_insert_mixed()

Andy Lutomirski <[email protected]>
mm: Add vm_insert_pfn_prot()

Andi Kleen <[email protected]>
x86/speculation/l1tf: Add sysfs reporting for l1tf

Andi Kleen <[email protected]>
x86/speculation/l1tf: Make sure the first page is always reserved

Andi Kleen <[email protected]>
x86/speculation/l1tf: Protect PROT_NONE PTEs against speculation

Linus Torvalds <[email protected]>
x86/speculation/l1tf: Protect swap entries against L1TF

Linus Torvalds <[email protected]>
x86/speculation/l1tf: Change order of offset/type in swap entry

Naoya Horiguchi <[email protected]>
mm: x86: move _PAGE_SWP_SOFT_DIRTY from bit 7 to bit 1

Dave Hansen <[email protected]>
x86/mm: Fix swap entry comment and macro

Dave Hansen <[email protected]>
x86/mm: Move swap offset/type up in PTE to work around erratum

Andi Kleen <[email protected]>
x86/speculation/l1tf: Increase 32bit PAE __PHYSICAL_PAGE_SHIFT

Nick Desaulniers <[email protected]>
x86/irqflags: Provide a declaration for native_save_fl

Masami Hiramatsu <[email protected]>
kprobes/x86: Fix %p uses in error messages

Jiri Kosina <[email protected]>
x86/speculation: Protect against userspace-userspace spectreRSB

Peter Zijlstra <[email protected]>
x86/paravirt: Fix spectre-v2 mitigations for paravirt guests

Oleksij Rempel <[email protected]>
ARM: dts: imx6sx: fix irq for pcie bridge

Michael Mera <[email protected]>
IB/ocrdma: fix out of bounds access to local buffer

Jack Morgenstein <[email protected]>
IB/mlx4: Mark user MR as writable if actual virtual memory is writable

Jack Morgenstein <[email protected]>
IB/core: Make testing MR flags for writability a static inline function

Al Viro <[email protected]>
fix __legitimize_mnt()/mntput() race

Al Viro <[email protected]>
fix mntput/mntput race

Al Viro <[email protected]>
root dentries need RCU-delayed freeing

Bart Van Assche <[email protected]>
scsi: sr: Avoid that opening a CD-ROM hangs with runtime power management enabled

Hans de Goede <[email protected]>
ACPI / LPSS: Add missing prv_offset setting for byt/cht PWM devices

Juergen Gross <[email protected]>
xen/netfront: don't cache skb_shinfo()

John David Anglin <[email protected]>
parisc: Define mb() and add memory barriers to assembler unlock sequences

Helge Deller <[email protected]>
parisc: Enable CONFIG_MLONGCALLS by default

Kees Cook <[email protected]>
fork: unconditionally clear stack on fork

Thomas Egerer <[email protected]>
ipv4+ipv6: Make INET*_ESP select CRYPTO_ECHAINIV

Tadeusz Struk <[email protected]>
tpm: fix race condition in tpm_common_write()

Theodore Ts'o <[email protected]>
ext4: fix check to prevent initializing reserved inodes


-------------

Diffstat:

Makefile | 4 +-
arch/arm/boot/dts/imx6sx.dtsi | 2 +-
arch/parisc/Kconfig | 2 +-
arch/parisc/include/asm/barrier.h | 32 +++++++++++
arch/parisc/kernel/entry.S | 2 +
arch/parisc/kernel/pacache.S | 1 +
arch/parisc/kernel/syscall.S | 4 ++
arch/x86/include/asm/cpufeatures.h | 10 ++--
arch/x86/include/asm/irqflags.h | 2 +
arch/x86/include/asm/page_32_types.h | 9 +++-
arch/x86/include/asm/pgtable-2level.h | 17 ++++++
arch/x86/include/asm/pgtable-3level.h | 37 ++++++++++++-
arch/x86/include/asm/pgtable-invert.h | 32 +++++++++++
arch/x86/include/asm/pgtable.h | 84 +++++++++++++++++++++++------
arch/x86/include/asm/pgtable_64.h | 54 +++++++++++++++----
arch/x86/include/asm/pgtable_types.h | 10 ++--
arch/x86/include/asm/processor.h | 5 ++
arch/x86/kernel/cpu/bugs.c | 81 +++++++++++++++++-----------
arch/x86/kernel/cpu/common.c | 20 +++++++
arch/x86/kernel/kprobes/core.c | 4 +-
arch/x86/kernel/paravirt.c | 14 +++--
arch/x86/kernel/setup.c | 6 +++
arch/x86/mm/init.c | 23 ++++++++
arch/x86/mm/kmmio.c | 25 +++++----
arch/x86/mm/mmap.c | 21 ++++++++
arch/x86/mm/pageattr.c | 8 +--
drivers/acpi/acpi_lpss.c | 2 +
drivers/base/cpu.c | 8 +++
drivers/char/tpm/tpm-dev.c | 43 +++++++--------
drivers/infiniband/core/umem.c | 11 +---
drivers/infiniband/hw/mlx4/mr.c | 50 ++++++++++++++---
drivers/infiniband/hw/ocrdma/ocrdma_stats.c | 2 +-
drivers/net/xen-netfront.c | 8 +--
drivers/scsi/sr.c | 29 +++++++---
fs/dcache.c | 6 ++-
fs/ext4/ialloc.c | 5 +-
fs/ext4/super.c | 8 +--
fs/namespace.c | 28 +++++++++-
include/asm-generic/pgtable.h | 12 +++++
include/linux/cpu.h | 2 +
include/linux/mm.h | 2 +
include/linux/swapfile.h | 2 +
include/linux/thread_info.h | 6 +--
include/rdma/ib_verbs.h | 14 +++++
mm/memory.c | 62 +++++++++++++++++----
mm/mprotect.c | 49 +++++++++++++++++
mm/swapfile.c | 46 ++++++++++------
net/ipv4/Kconfig | 1 +
net/ipv6/Kconfig | 1 +
49 files changed, 714 insertions(+), 192 deletions(-)




2018-08-14 17:47:31

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.4 12/43] fix __legitimize_mnt()/mntput() race

4.4-stable review patch. If anyone has any objections, please let me know.

------------------

From: Al Viro <[email protected]>

commit 119e1ef80ecfe0d1deb6378d4ab41f5b71519de1 upstream.

__legitimize_mnt() has two problems - one is that in case of success
the check of mount_lock is not ordered wrt preceding increment of
refcount, making it possible to have successful __legitimize_mnt()
on one CPU just before the otherwise final mntpu() on another,
with __legitimize_mnt() not seeing mntput() taking the lock and
mntput() not seeing the increment done by __legitimize_mnt().
Solved by a pair of barriers.

Another is that failure of __legitimize_mnt() on the second
read_seqretry() leaves us with reference that'll need to be
dropped by caller; however, if that races with final mntput()
we can end up with caller dropping rcu_read_lock() and doing
mntput() to release that reference - with the first mntput()
having freed the damn thing just as rcu_read_lock() had been
dropped. Solution: in "do mntput() yourself" failure case
grab mount_lock, check if MNT_DOOMED has been set by racing
final mntput() that has missed our increment and if it has -
undo the increment and treat that as "failure, caller doesn't
need to drop anything" case.

It's not easy to hit - the final mntput() has to come right
after the first read_seqretry() in __legitimize_mnt() *and*
manage to miss the increment done by __legitimize_mnt() before
the second read_seqretry() in there. The things that are almost
impossible to hit on bare hardware are not impossible on SMP
KVM, though...

Reported-by: Oleg Nesterov <[email protected]>
Fixes: 48a066e72d97 ("RCU'd vsfmounts")
Cc: [email protected]
Signed-off-by: Al Viro <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
fs/namespace.c | 14 ++++++++++++++
1 file changed, 14 insertions(+)

--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -603,12 +603,21 @@ int __legitimize_mnt(struct vfsmount *ba
return 0;
mnt = real_mount(bastard);
mnt_add_count(mnt, 1);
+ smp_mb(); // see mntput_no_expire()
if (likely(!read_seqretry(&mount_lock, seq)))
return 0;
if (bastard->mnt_flags & MNT_SYNC_UMOUNT) {
mnt_add_count(mnt, -1);
return 1;
}
+ lock_mount_hash();
+ if (unlikely(bastard->mnt_flags & MNT_DOOMED)) {
+ mnt_add_count(mnt, -1);
+ unlock_mount_hash();
+ return 1;
+ }
+ unlock_mount_hash();
+ /* caller will mntput() */
return -1;
}

@@ -1139,6 +1148,11 @@ static void mntput_no_expire(struct moun
return;
}
lock_mount_hash();
+ /*
+ * make sure that if __legitimize_mnt() has not seen us grab
+ * mount_lock, we'll see their refcount increment here.
+ */
+ smp_mb();
mnt_add_count(mnt, -1);
if (mnt_get_count(mnt)) {
rcu_read_unlock();



2018-08-14 17:47:34

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.4 13/43] IB/core: Make testing MR flags for writability a static inline function

4.4-stable review patch. If anyone has any objections, please let me know.

------------------

From: Jack Morgenstein <[email protected]>

commit 08bb558ac11ab944e0539e78619d7b4c356278bd upstream.

Make the MR writability flags check, which is performed in umem.c,
a static inline function in file ib_verbs.h

This allows the function to be used by low-level infiniband drivers.

Cc: <[email protected]>
Signed-off-by: Jason Gunthorpe <[email protected]>
Signed-off-by: Jack Morgenstein <[email protected]>
Signed-off-by: Leon Romanovsky <[email protected]>
Signed-off-by: Sudip Mukherjee <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
drivers/infiniband/core/umem.c | 11 +----------
include/rdma/ib_verbs.h | 14 ++++++++++++++
2 files changed, 15 insertions(+), 10 deletions(-)

--- a/drivers/infiniband/core/umem.c
+++ b/drivers/infiniband/core/umem.c
@@ -122,16 +122,7 @@ struct ib_umem *ib_umem_get(struct ib_uc
umem->address = addr;
umem->page_size = PAGE_SIZE;
umem->pid = get_task_pid(current, PIDTYPE_PID);
- /*
- * We ask for writable memory if any of the following
- * access flags are set. "Local write" and "remote write"
- * obviously require write access. "Remote atomic" can do
- * things like fetch and add, which will modify memory, and
- * "MW bind" can change permissions by binding a window.
- */
- umem->writable = !!(access &
- (IB_ACCESS_LOCAL_WRITE | IB_ACCESS_REMOTE_WRITE |
- IB_ACCESS_REMOTE_ATOMIC | IB_ACCESS_MW_BIND));
+ umem->writable = ib_access_writable(access);

if (access & IB_ACCESS_ON_DEMAND) {
put_pid(umem->pid);
--- a/include/rdma/ib_verbs.h
+++ b/include/rdma/ib_verbs.h
@@ -3007,6 +3007,20 @@ static inline int ib_check_mr_access(int
return 0;
}

+static inline bool ib_access_writable(int access_flags)
+{
+ /*
+ * We have writable memory backing the MR if any of the following
+ * access flags are set. "Local write" and "remote write" obviously
+ * require write access. "Remote atomic" can do things like fetch and
+ * add, which will modify memory, and "MW bind" can change permissions
+ * by binding a window.
+ */
+ return access_flags &
+ (IB_ACCESS_LOCAL_WRITE | IB_ACCESS_REMOTE_WRITE |
+ IB_ACCESS_REMOTE_ATOMIC | IB_ACCESS_MW_BIND);
+}
+
/**
* ib_check_mr_status: lightweight check of MR status.
* This routine may provide status checks on a selected



2018-08-14 17:47:42

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.4 15/43] IB/ocrdma: fix out of bounds access to local buffer

4.4-stable review patch. If anyone has any objections, please let me know.

------------------

From: Michael Mera <[email protected]>

commit 062d0f22a30c39840ea49b72cfcfc1aa4cc538fa upstream.

In write to debugfs file 'resource_stats' the local buffer 'tmp_str' is
written at index 'count-1' where 'count' is the size of the write, so
potentially 0.

This patch filters odd values for the write size/position to avoid this
type of problem.

Signed-off-by: Michael Mera <[email protected]>
Reviewed-by: Leon Romanovsky <[email protected]>
Signed-off-by: Doug Ledford <[email protected]>
Signed-off-by: Amit Pundir <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
drivers/infiniband/hw/ocrdma/ocrdma_stats.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

--- a/drivers/infiniband/hw/ocrdma/ocrdma_stats.c
+++ b/drivers/infiniband/hw/ocrdma/ocrdma_stats.c
@@ -643,7 +643,7 @@ static ssize_t ocrdma_dbgfs_ops_write(st
struct ocrdma_stats *pstats = filp->private_data;
struct ocrdma_dev *dev = pstats->dev;

- if (count > 32)
+ if (*ppos != 0 || count == 0 || count > sizeof(tmp_str))
goto err;

if (copy_from_user(tmp_str, buffer, count))



2018-08-14 17:47:46

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.4 17/43] x86/paravirt: Fix spectre-v2 mitigations for paravirt guests

4.4-stable review patch. If anyone has any objections, please let me know.

------------------

From: Peter Zijlstra <[email protected]>

commit 5800dc5c19f34e6e03b5adab1282535cb102fafd upstream.

Nadav reported that on guests we're failing to rewrite the indirect
calls to CALLEE_SAVE paravirt functions. In particular the
pv_queued_spin_unlock() call is left unpatched and that is all over the
place. This obviously wrecks Spectre-v2 mitigation (for paravirt
guests) which relies on not actually having indirect calls around.

The reason is an incorrect clobber test in paravirt_patch_call(); this
function rewrites an indirect call with a direct call to the _SAME_
function, there is no possible way the clobbers can be different
because of this.

Therefore remove this clobber check. Also put WARNs on the other patch
failure case (not enough room for the instruction) which I've not seen
trigger in my (limited) testing.

Three live kernel image disassemblies for lock_sock_nested (as a small
function that illustrates the problem nicely). PRE is the current
situation for guests, POST is with this patch applied and NATIVE is with
or without the patch for !guests.

PRE:

(gdb) disassemble lock_sock_nested
Dump of assembler code for function lock_sock_nested:
0xffffffff817be970 <+0>: push %rbp
0xffffffff817be971 <+1>: mov %rdi,%rbp
0xffffffff817be974 <+4>: push %rbx
0xffffffff817be975 <+5>: lea 0x88(%rbp),%rbx
0xffffffff817be97c <+12>: callq 0xffffffff819f7160 <_cond_resched>
0xffffffff817be981 <+17>: mov %rbx,%rdi
0xffffffff817be984 <+20>: callq 0xffffffff819fbb00 <_raw_spin_lock_bh>
0xffffffff817be989 <+25>: mov 0x8c(%rbp),%eax
0xffffffff817be98f <+31>: test %eax,%eax
0xffffffff817be991 <+33>: jne 0xffffffff817be9ba <lock_sock_nested+74>
0xffffffff817be993 <+35>: movl $0x1,0x8c(%rbp)
0xffffffff817be99d <+45>: mov %rbx,%rdi
0xffffffff817be9a0 <+48>: callq *0xffffffff822299e8
0xffffffff817be9a7 <+55>: pop %rbx
0xffffffff817be9a8 <+56>: pop %rbp
0xffffffff817be9a9 <+57>: mov $0x200,%esi
0xffffffff817be9ae <+62>: mov $0xffffffff817be993,%rdi
0xffffffff817be9b5 <+69>: jmpq 0xffffffff81063ae0 <__local_bh_enable_ip>
0xffffffff817be9ba <+74>: mov %rbp,%rdi
0xffffffff817be9bd <+77>: callq 0xffffffff817be8c0 <__lock_sock>
0xffffffff817be9c2 <+82>: jmp 0xffffffff817be993 <lock_sock_nested+35>
End of assembler dump.

POST:

(gdb) disassemble lock_sock_nested
Dump of assembler code for function lock_sock_nested:
0xffffffff817be970 <+0>: push %rbp
0xffffffff817be971 <+1>: mov %rdi,%rbp
0xffffffff817be974 <+4>: push %rbx
0xffffffff817be975 <+5>: lea 0x88(%rbp),%rbx
0xffffffff817be97c <+12>: callq 0xffffffff819f7160 <_cond_resched>
0xffffffff817be981 <+17>: mov %rbx,%rdi
0xffffffff817be984 <+20>: callq 0xffffffff819fbb00 <_raw_spin_lock_bh>
0xffffffff817be989 <+25>: mov 0x8c(%rbp),%eax
0xffffffff817be98f <+31>: test %eax,%eax
0xffffffff817be991 <+33>: jne 0xffffffff817be9ba <lock_sock_nested+74>
0xffffffff817be993 <+35>: movl $0x1,0x8c(%rbp)
0xffffffff817be99d <+45>: mov %rbx,%rdi
0xffffffff817be9a0 <+48>: callq 0xffffffff810a0c20 <__raw_callee_save___pv_queued_spin_unlock>
0xffffffff817be9a5 <+53>: xchg %ax,%ax
0xffffffff817be9a7 <+55>: pop %rbx
0xffffffff817be9a8 <+56>: pop %rbp
0xffffffff817be9a9 <+57>: mov $0x200,%esi
0xffffffff817be9ae <+62>: mov $0xffffffff817be993,%rdi
0xffffffff817be9b5 <+69>: jmpq 0xffffffff81063aa0 <__local_bh_enable_ip>
0xffffffff817be9ba <+74>: mov %rbp,%rdi
0xffffffff817be9bd <+77>: callq 0xffffffff817be8c0 <__lock_sock>
0xffffffff817be9c2 <+82>: jmp 0xffffffff817be993 <lock_sock_nested+35>
End of assembler dump.

NATIVE:

(gdb) disassemble lock_sock_nested
Dump of assembler code for function lock_sock_nested:
0xffffffff817be970 <+0>: push %rbp
0xffffffff817be971 <+1>: mov %rdi,%rbp
0xffffffff817be974 <+4>: push %rbx
0xffffffff817be975 <+5>: lea 0x88(%rbp),%rbx
0xffffffff817be97c <+12>: callq 0xffffffff819f7160 <_cond_resched>
0xffffffff817be981 <+17>: mov %rbx,%rdi
0xffffffff817be984 <+20>: callq 0xffffffff819fbb00 <_raw_spin_lock_bh>
0xffffffff817be989 <+25>: mov 0x8c(%rbp),%eax
0xffffffff817be98f <+31>: test %eax,%eax
0xffffffff817be991 <+33>: jne 0xffffffff817be9ba <lock_sock_nested+74>
0xffffffff817be993 <+35>: movl $0x1,0x8c(%rbp)
0xffffffff817be99d <+45>: mov %rbx,%rdi
0xffffffff817be9a0 <+48>: movb $0x0,(%rdi)
0xffffffff817be9a3 <+51>: nopl 0x0(%rax)
0xffffffff817be9a7 <+55>: pop %rbx
0xffffffff817be9a8 <+56>: pop %rbp
0xffffffff817be9a9 <+57>: mov $0x200,%esi
0xffffffff817be9ae <+62>: mov $0xffffffff817be993,%rdi
0xffffffff817be9b5 <+69>: jmpq 0xffffffff81063ae0 <__local_bh_enable_ip>
0xffffffff817be9ba <+74>: mov %rbp,%rdi
0xffffffff817be9bd <+77>: callq 0xffffffff817be8c0 <__lock_sock>
0xffffffff817be9c2 <+82>: jmp 0xffffffff817be993 <lock_sock_nested+35>
End of assembler dump.


Fixes: 63f70270ccd9 ("[PATCH] i386: PARAVIRT: add common patching machinery")
Fixes: 3010a0663fd9 ("x86/paravirt, objtool: Annotate indirect calls")
Reported-by: Nadav Amit <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Reviewed-by: Juergen Gross <[email protected]>
Cc: Konrad Rzeszutek Wilk <[email protected]>
Cc: Boris Ostrovsky <[email protected]>
Cc: David Woodhouse <[email protected]>
Cc: [email protected]
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
arch/x86/kernel/paravirt.c | 14 ++++++++++----
1 file changed, 10 insertions(+), 4 deletions(-)

--- a/arch/x86/kernel/paravirt.c
+++ b/arch/x86/kernel/paravirt.c
@@ -97,10 +97,12 @@ unsigned paravirt_patch_call(void *insnb
struct branch *b = insnbuf;
unsigned long delta = (unsigned long)target - (addr+5);

- if (tgt_clobbers & ~site_clobbers)
- return len; /* target would clobber too much for this site */
- if (len < 5)
+ if (len < 5) {
+#ifdef CONFIG_RETPOLINE
+ WARN_ONCE("Failing to patch indirect CALL in %ps\n", (void *)addr);
+#endif
return len; /* call too long for patch site */
+ }

b->opcode = 0xe8; /* call */
b->delta = delta;
@@ -115,8 +117,12 @@ unsigned paravirt_patch_jmp(void *insnbu
struct branch *b = insnbuf;
unsigned long delta = (unsigned long)target - (addr+5);

- if (len < 5)
+ if (len < 5) {
+#ifdef CONFIG_RETPOLINE
+ WARN_ONCE("Failing to patch indirect JMP in %ps\n", (void *)addr);
+#endif
return len; /* call too long for patch site */
+ }

b->opcode = 0xe9; /* jmp */
b->delta = delta;



2018-08-14 17:47:51

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.4 01/43] ext4: fix check to prevent initializing reserved inodes

4.4-stable review patch. If anyone has any objections, please let me know.

------------------

From: Theodore Ts'o <[email protected]>

commit 5012284700775a4e6e3fbe7eac4c543c4874b559 upstream.

Commit 8844618d8aa7: "ext4: only look at the bg_flags field if it is
valid" will complain if block group zero does not have the
EXT4_BG_INODE_ZEROED flag set. Unfortunately, this is not correct,
since a freshly created file system has this flag cleared. It gets
almost immediately after the file system is mounted read-write --- but
the following somewhat unlikely sequence will end up triggering a
false positive report of a corrupted file system:

mkfs.ext4 /dev/vdc
mount -o ro /dev/vdc /vdc
mount -o remount,rw /dev/vdc

Instead, when initializing the inode table for block group zero, test
to make sure that itable_unused count is not too large, since that is
the case that will result in some or all of the reserved inodes
getting cleared.

This fixes the failures reported by Eric Whiteney when running
generic/230 and generic/231 in the the nojournal test case.

Fixes: 8844618d8aa7 ("ext4: only look at the bg_flags field if it is valid")
Reported-by: Eric Whitney <[email protected]>
Signed-off-by: Theodore Ts'o <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
fs/ext4/ialloc.c | 5 ++++-
fs/ext4/super.c | 8 +-------
2 files changed, 5 insertions(+), 8 deletions(-)

--- a/fs/ext4/ialloc.c
+++ b/fs/ext4/ialloc.c
@@ -1308,7 +1308,10 @@ int ext4_init_inode_table(struct super_b
ext4_itable_unused_count(sb, gdp)),
sbi->s_inodes_per_block);

- if ((used_blks < 0) || (used_blks > sbi->s_itb_per_group)) {
+ if ((used_blks < 0) || (used_blks > sbi->s_itb_per_group) ||
+ ((group == 0) && ((EXT4_INODES_PER_GROUP(sb) -
+ ext4_itable_unused_count(sb, gdp)) <
+ EXT4_FIRST_INO(sb)))) {
ext4_error(sb, "Something is wrong with group %u: "
"used itable blocks: %d; "
"itable unused count: %u",
--- a/fs/ext4/super.c
+++ b/fs/ext4/super.c
@@ -2875,14 +2875,8 @@ static ext4_group_t ext4_has_uninit_itab
if (!gdp)
continue;

- if (gdp->bg_flags & cpu_to_le16(EXT4_BG_INODE_ZEROED))
- continue;
- if (group != 0)
+ if (!(gdp->bg_flags & cpu_to_le16(EXT4_BG_INODE_ZEROED)))
break;
- ext4_error(sb, "Inode table for bg 0 marked as "
- "needing zeroing");
- if (sb->s_flags & MS_RDONLY)
- return ngroups;
}

return group;



2018-08-14 17:48:00

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.4 19/43] kprobes/x86: Fix %p uses in error messages

4.4-stable review patch. If anyone has any objections, please let me know.

------------------

From: Masami Hiramatsu <[email protected]>

commit 0ea063306eecf300fcf06d2f5917474b580f666f upstream.

Remove all %p uses in error messages in kprobes/x86.

Signed-off-by: Masami Hiramatsu <[email protected]>
Cc: Ananth N Mavinakayanahalli <[email protected]>
Cc: Anil S Keshavamurthy <[email protected]>
Cc: Arnd Bergmann <[email protected]>
Cc: David Howells <[email protected]>
Cc: David S . Miller <[email protected]>
Cc: Heiko Carstens <[email protected]>
Cc: Jon Medhurst <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Thomas Richter <[email protected]>
Cc: Tobin C . Harding <[email protected]>
Cc: Will Deacon <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Link: https://lkml.kernel.org/lkml/152491902310.9916.13355297638917767319.stgit@devbox
Signed-off-by: Ingo Molnar <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
arch/x86/kernel/kprobes/core.c | 4 +---
1 file changed, 1 insertion(+), 3 deletions(-)

--- a/arch/x86/kernel/kprobes/core.c
+++ b/arch/x86/kernel/kprobes/core.c
@@ -393,7 +393,6 @@ int __copy_instruction(u8 *dest, u8 *src
newdisp = (u8 *) src + (s64) insn.displacement.value - (u8 *) dest;
if ((s64) (s32) newdisp != newdisp) {
pr_err("Kprobes error: new displacement does not fit into s32 (%llx)\n", newdisp);
- pr_err("\tSrc: %p, Dest: %p, old disp: %x\n", src, dest, insn.displacement.value);
return 0;
}
disp = (u8 *) dest + insn_offset_displacement(&insn);
@@ -609,8 +608,7 @@ static int reenter_kprobe(struct kprobe
* Raise a BUG or we'll continue in an endless reentering loop
* and eventually a stack overflow.
*/
- printk(KERN_WARNING "Unrecoverable kprobe detected at %p.\n",
- p->addr);
+ pr_err("Unrecoverable kprobe detected.\n");
dump_kprobe(p);
BUG();
default:



2018-08-14 17:48:01

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.4 11/43] fix mntput/mntput race

4.4-stable review patch. If anyone has any objections, please let me know.

------------------

From: Al Viro <[email protected]>

commit 9ea0a46ca2c318fcc449c1e6b62a7230a17888f1 upstream.

mntput_no_expire() does the calculation of total refcount under mount_lock;
unfortunately, the decrement (as well as all increments) are done outside
of it, leading to false positives in the "are we dropping the last reference"
test. Consider the following situation:
* mnt is a lazy-umounted mount, kept alive by two opened files. One
of those files gets closed. Total refcount of mnt is 2. On CPU 42
mntput(mnt) (called from __fput()) drops one reference, decrementing component
* After it has looked at component #0, the process on CPU 0 does
mntget(), incrementing component #0, gets preempted and gets to run again -
on CPU 69. There it does mntput(), which drops the reference (component #69)
and proceeds to spin on mount_lock.
* On CPU 42 our first mntput() finishes counting. It observes the
decrement of component #69, but not the increment of component #0. As the
result, the total it gets is not 1 as it should've been - it's 0. At which
point we decide that vfsmount needs to be killed and proceed to free it and
shut the filesystem down. However, there's still another opened file
on that filesystem, with reference to (now freed) vfsmount, etc. and we are
screwed.

It's not a wide race, but it can be reproduced with artificial slowdown of
the mnt_get_count() loop, and it should be easier to hit on SMP KVM setups.

Fix consists of moving the refcount decrement under mount_lock; the tricky
part is that we want (and can) keep the fast case (i.e. mount that still
has non-NULL ->mnt_ns) entirely out of mount_lock. All places that zero
mnt->mnt_ns are dropping some reference to mnt and they call synchronize_rcu()
before that mntput(). IOW, if mntput() observes (under rcu_read_lock())
a non-NULL ->mnt_ns, it is guaranteed that there is another reference yet to
be dropped.

Reported-by: Jann Horn <[email protected]>
Tested-by: Jann Horn <[email protected]>
Fixes: 48a066e72d97 ("RCU'd vsfmounts")
Cc: [email protected]
Signed-off-by: Al Viro <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
fs/namespace.c | 14 ++++++++++++--
1 file changed, 12 insertions(+), 2 deletions(-)

--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -1124,12 +1124,22 @@ static DECLARE_DELAYED_WORK(delayed_mntp
static void mntput_no_expire(struct mount *mnt)
{
rcu_read_lock();
- mnt_add_count(mnt, -1);
- if (likely(mnt->mnt_ns)) { /* shouldn't be the last one */
+ if (likely(READ_ONCE(mnt->mnt_ns))) {
+ /*
+ * Since we don't do lock_mount_hash() here,
+ * ->mnt_ns can change under us. However, if it's
+ * non-NULL, then there's a reference that won't
+ * be dropped until after an RCU delay done after
+ * turning ->mnt_ns NULL. So if we observe it
+ * non-NULL under rcu_read_lock(), the reference
+ * we are dropping is not the final one.
+ */
+ mnt_add_count(mnt, -1);
rcu_read_unlock();
return;
}
lock_mount_hash();
+ mnt_add_count(mnt, -1);
if (mnt_get_count(mnt)) {
rcu_read_unlock();
unlock_mount_hash();



2018-08-14 17:48:02

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.4 02/43] tpm: fix race condition in tpm_common_write()

4.4-stable review patch. If anyone has any objections, please let me know.

------------------

From: Tadeusz Struk <[email protected]>

commit 3ab2011ea368ec3433ad49e1b9e1c7b70d2e65df upstream.

There is a race condition in tpm_common_write function allowing
two threads on the same /dev/tpm<N>, or two different applications
on the same /dev/tpmrm<N> to overwrite each other commands/responses.
Fixed this by taking the priv->buffer_mutex early in the function.

Also converted the priv->data_pending from atomic to a regular size_t
type. There is no need for it to be atomic since it is only touched
under the protection of the priv->buffer_mutex.

Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Cc: [email protected]
Signed-off-by: Tadeusz Struk <[email protected]>
Reviewed-by: Jarkko Sakkinen <[email protected]>
Signed-off-by: Jarkko Sakkinen <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
drivers/char/tpm/tpm-dev.c | 43 ++++++++++++++++++++-----------------------
1 file changed, 20 insertions(+), 23 deletions(-)

--- a/drivers/char/tpm/tpm-dev.c
+++ b/drivers/char/tpm/tpm-dev.c
@@ -25,7 +25,7 @@ struct file_priv {
struct tpm_chip *chip;

/* Data passed to and from the tpm via the read/write calls */
- atomic_t data_pending;
+ size_t data_pending;
struct mutex buffer_mutex;

struct timer_list user_read_timer; /* user needs to claim result */
@@ -46,7 +46,7 @@ static void timeout_work(struct work_str
struct file_priv *priv = container_of(work, struct file_priv, work);

mutex_lock(&priv->buffer_mutex);
- atomic_set(&priv->data_pending, 0);
+ priv->data_pending = 0;
memset(priv->data_buffer, 0, sizeof(priv->data_buffer));
mutex_unlock(&priv->buffer_mutex);
}
@@ -72,7 +72,6 @@ static int tpm_open(struct inode *inode,
}

priv->chip = chip;
- atomic_set(&priv->data_pending, 0);
mutex_init(&priv->buffer_mutex);
setup_timer(&priv->user_read_timer, user_reader_timeout,
(unsigned long)priv);
@@ -86,28 +85,24 @@ static ssize_t tpm_read(struct file *fil
size_t size, loff_t *off)
{
struct file_priv *priv = file->private_data;
- ssize_t ret_size;
+ ssize_t ret_size = 0;
int rc;

del_singleshot_timer_sync(&priv->user_read_timer);
flush_work(&priv->work);
- ret_size = atomic_read(&priv->data_pending);
- if (ret_size > 0) { /* relay data */
- ssize_t orig_ret_size = ret_size;
- if (size < ret_size)
- ret_size = size;
+ mutex_lock(&priv->buffer_mutex);

- mutex_lock(&priv->buffer_mutex);
+ if (priv->data_pending) {
+ ret_size = min_t(ssize_t, size, priv->data_pending);
rc = copy_to_user(buf, priv->data_buffer, ret_size);
- memset(priv->data_buffer, 0, orig_ret_size);
+ memset(priv->data_buffer, 0, priv->data_pending);
if (rc)
ret_size = -EFAULT;

- mutex_unlock(&priv->buffer_mutex);
+ priv->data_pending = 0;
}

- atomic_set(&priv->data_pending, 0);
-
+ mutex_unlock(&priv->buffer_mutex);
return ret_size;
}

@@ -118,18 +113,20 @@ static ssize_t tpm_write(struct file *fi
size_t in_size = size;
ssize_t out_size;

- /* cannot perform a write until the read has cleared
- either via tpm_read or a user_read_timer timeout.
- This also prevents splitted buffered writes from blocking here.
- */
- if (atomic_read(&priv->data_pending) != 0)
- return -EBUSY;
-
if (in_size > TPM_BUFSIZE)
return -E2BIG;

mutex_lock(&priv->buffer_mutex);

+ /* Cannot perform a write until the read has cleared either via
+ * tpm_read or a user_read_timer timeout. This also prevents split
+ * buffered writes from blocking here.
+ */
+ if (priv->data_pending != 0) {
+ mutex_unlock(&priv->buffer_mutex);
+ return -EBUSY;
+ }
+
if (copy_from_user
(priv->data_buffer, (void __user *) buf, in_size)) {
mutex_unlock(&priv->buffer_mutex);
@@ -153,7 +150,7 @@ static ssize_t tpm_write(struct file *fi
return out_size;
}

- atomic_set(&priv->data_pending, out_size);
+ priv->data_pending = out_size;
mutex_unlock(&priv->buffer_mutex);

/* Set a timeout by which the reader must come claim the result */
@@ -172,7 +169,7 @@ static int tpm_release(struct inode *ino
del_singleshot_timer_sync(&priv->user_read_timer);
flush_work(&priv->work);
file->private_data = NULL;
- atomic_set(&priv->data_pending, 0);
+ priv->data_pending = 0;
clear_bit(0, &priv->chip->is_open);
kfree(priv);
return 0;



2018-08-14 17:48:12

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.4 14/43] IB/mlx4: Mark user MR as writable if actual virtual memory is writable

4.4-stable review patch. If anyone has any objections, please let me know.

------------------

From: Jack Morgenstein <[email protected]>

commit d8f9cc328c8888369880e2527e9186d745f2bbf6 upstream.

To allow rereg_user_mr to modify the MR from read-only to writable without
using get_user_pages again, we needed to define the initial MR as writable.
However, this was originally done unconditionally, without taking into
account the writability of the underlying virtual memory.

As a result, any attempt to register a read-only MR over read-only
virtual memory failed.

To fix this, do not add the writable flag bit when the user virtual memory
is not writable (e.g. const memory).

However, when the underlying memory is NOT writable (and we therefore
do not define the initial MR as writable), the IB core adds a
"force writable" flag to its user-pages request. If this succeeds,
the reg_user_mr caller gets a writable copy of the original pages.

If the user-space caller then does a rereg_user_mr operation to enable
writability, this will succeed. This should not be allowed, since
the original virtual memory was not writable.

Cc: <[email protected]>
Fixes: 9376932d0c26 ("IB/mlx4_ib: Add support for user MR re-registration")
Signed-off-by: Jason Gunthorpe <[email protected]>
Signed-off-by: Jack Morgenstein <[email protected]>
Signed-off-by: Leon Romanovsky <[email protected]>
Signed-off-by: Sudip Mukherjee <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
drivers/infiniband/hw/mlx4/mr.c | 50 +++++++++++++++++++++++++++++++++-------
1 file changed, 42 insertions(+), 8 deletions(-)

--- a/drivers/infiniband/hw/mlx4/mr.c
+++ b/drivers/infiniband/hw/mlx4/mr.c
@@ -130,6 +130,40 @@ out:
return err;
}

+static struct ib_umem *mlx4_get_umem_mr(struct ib_ucontext *context, u64 start,
+ u64 length, u64 virt_addr,
+ int access_flags)
+{
+ /*
+ * Force registering the memory as writable if the underlying pages
+ * are writable. This is so rereg can change the access permissions
+ * from readable to writable without having to run through ib_umem_get
+ * again
+ */
+ if (!ib_access_writable(access_flags)) {
+ struct vm_area_struct *vma;
+
+ down_read(&current->mm->mmap_sem);
+ /*
+ * FIXME: Ideally this would iterate over all the vmas that
+ * cover the memory, but for now it requires a single vma to
+ * entirely cover the MR to support RO mappings.
+ */
+ vma = find_vma(current->mm, start);
+ if (vma && vma->vm_end >= start + length &&
+ vma->vm_start <= start) {
+ if (vma->vm_flags & VM_WRITE)
+ access_flags |= IB_ACCESS_LOCAL_WRITE;
+ } else {
+ access_flags |= IB_ACCESS_LOCAL_WRITE;
+ }
+
+ up_read(&current->mm->mmap_sem);
+ }
+
+ return ib_umem_get(context, start, length, access_flags, 0);
+}
+
struct ib_mr *mlx4_ib_reg_user_mr(struct ib_pd *pd, u64 start, u64 length,
u64 virt_addr, int access_flags,
struct ib_udata *udata)
@@ -144,10 +178,8 @@ struct ib_mr *mlx4_ib_reg_user_mr(struct
if (!mr)
return ERR_PTR(-ENOMEM);

- /* Force registering the memory as writable. */
- /* Used for memory re-registeration. HCA protects the access */
- mr->umem = ib_umem_get(pd->uobject->context, start, length,
- access_flags | IB_ACCESS_LOCAL_WRITE, 0);
+ mr->umem = mlx4_get_umem_mr(pd->uobject->context, start, length,
+ virt_addr, access_flags);
if (IS_ERR(mr->umem)) {
err = PTR_ERR(mr->umem);
goto err_free;
@@ -214,6 +246,9 @@ int mlx4_ib_rereg_user_mr(struct ib_mr *
}

if (flags & IB_MR_REREG_ACCESS) {
+ if (ib_access_writable(mr_access_flags) && !mmr->umem->writable)
+ return -EPERM;
+
err = mlx4_mr_hw_change_access(dev->dev, *pmpt_entry,
convert_access(mr_access_flags));

@@ -227,10 +262,9 @@ int mlx4_ib_rereg_user_mr(struct ib_mr *

mlx4_mr_rereg_mem_cleanup(dev->dev, &mmr->mmr);
ib_umem_release(mmr->umem);
- mmr->umem = ib_umem_get(mr->uobject->context, start, length,
- mr_access_flags |
- IB_ACCESS_LOCAL_WRITE,
- 0);
+ mmr->umem =
+ mlx4_get_umem_mr(mr->uobject->context, start, length,
+ virt_addr, mr_access_flags);
if (IS_ERR(mmr->umem)) {
err = PTR_ERR(mmr->umem);
/* Prevent mlx4_ib_dereg_mr from free'ing invalid pointer */



2018-08-14 17:48:12

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.4 24/43] mm: x86: move _PAGE_SWP_SOFT_DIRTY from bit 7 to bit 1

4.4-stable review patch. If anyone has any objections, please let me know.

------------------

From: Naoya Horiguchi <[email protected]>

commit eee4818baac0f2b37848fdf90e4b16430dc536ac upstream

_PAGE_PSE is used to distinguish between a truly non-present
(_PAGE_PRESENT=0) PMD, and a PMD which is undergoing a THP split and
should be treated as present.

But _PAGE_SWP_SOFT_DIRTY currently uses the _PAGE_PSE bit, which would
cause confusion between one of those PMDs undergoing a THP split, and a
soft-dirty PMD. Dropping _PAGE_PSE check in pmd_present() does not work
well, because it can hurt optimization of tlb handling in thp split.

Thus, we need to move the bit.

In the current kernel, bits 1-4 are not used in non-present format since
commit 00839ee3b299 ("x86/mm: Move swap offset/type up in PTE to work
around erratum"). So let's move _PAGE_SWP_SOFT_DIRTY to bit 1. Bit 7
is used as reserved (always clear), so please don't use it for other
purpose.

[dwmw2: Pulled in to 4.9 backport to support L1TF changes]

Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Naoya Horiguchi <[email protected]>
Signed-off-by: Zi Yan <[email protected]>
Acked-by: Dave Hansen <[email protected]>
Cc: "H. Peter Anvin" <[email protected]>
Cc: Anshuman Khandual <[email protected]>
Cc: David Nellans <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Kirill A. Shutemov <[email protected]>
Cc: Mel Gorman <[email protected]>
Cc: Minchan Kim <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Vlastimil Babka <[email protected]>
Cc: Andrea Arcangeli <[email protected]>
Cc: Michal Hocko <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
Signed-off-by: David Woodhouse <[email protected]>
Signed-off-by: Guenter Roeck <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
arch/x86/include/asm/pgtable_64.h | 12 +++++++++---
arch/x86/include/asm/pgtable_types.h | 10 +++++-----
2 files changed, 14 insertions(+), 8 deletions(-)

--- a/arch/x86/include/asm/pgtable_64.h
+++ b/arch/x86/include/asm/pgtable_64.h
@@ -166,15 +166,21 @@ static inline int pgd_large(pgd_t pgd) {
/*
* Encode and de-code a swap entry
*
- * | ... | 11| 10| 9|8|7|6|5| 4| 3|2|1|0| <- bit number
- * | ... |SW3|SW2|SW1|G|L|D|A|CD|WT|U|W|P| <- bit names
- * | OFFSET (14->63) | TYPE (9-13) |0|X|X|X| X| X|X|X|0| <- swp entry
+ * | ... | 11| 10| 9|8|7|6|5| 4| 3|2| 1|0| <- bit number
+ * | ... |SW3|SW2|SW1|G|L|D|A|CD|WT|U| W|P| <- bit names
+ * | OFFSET (14->63) | TYPE (9-13) |0|0|X|X| X| X|X|SD|0| <- swp entry
*
* G (8) is aliased and used as a PROT_NONE indicator for
* !present ptes. We need to start storing swap entries above
* there. We also need to avoid using A and D because of an
* erratum where they can be incorrectly set by hardware on
* non-present PTEs.
+ *
+ * SD (1) in swp entry is used to store soft dirty bit, which helps us
+ * remember soft dirty over page migration
+ *
+ * Bit 7 in swp entry should be 0 because pmd_present checks not only P,
+ * but also L and G.
*/
#define SWP_TYPE_FIRST_BIT (_PAGE_BIT_PROTNONE + 1)
#define SWP_TYPE_BITS 5
--- a/arch/x86/include/asm/pgtable_types.h
+++ b/arch/x86/include/asm/pgtable_types.h
@@ -70,15 +70,15 @@
/*
* Tracking soft dirty bit when a page goes to a swap is tricky.
* We need a bit which can be stored in pte _and_ not conflict
- * with swap entry format. On x86 bits 6 and 7 are *not* involved
- * into swap entry computation, but bit 6 is used for nonlinear
- * file mapping, so we borrow bit 7 for soft dirty tracking.
+ * with swap entry format. On x86 bits 1-4 are *not* involved
+ * into swap entry computation, but bit 7 is used for thp migration,
+ * so we borrow bit 1 for soft dirty tracking.
*
* Please note that this bit must be treated as swap dirty page
- * mark if and only if the PTE has present bit clear!
+ * mark if and only if the PTE/PMD has present bit clear!
*/
#ifdef CONFIG_MEM_SOFT_DIRTY
-#define _PAGE_SWP_SOFT_DIRTY _PAGE_PSE
+#define _PAGE_SWP_SOFT_DIRTY _PAGE_RW
#else
#define _PAGE_SWP_SOFT_DIRTY (_AT(pteval_t, 0))
#endif



2018-08-14 17:48:22

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.4 26/43] x86/speculation/l1tf: Protect swap entries against L1TF

4.4-stable review patch. If anyone has any objections, please let me know.

------------------

From: Linus Torvalds <[email protected]>

commit 2f22b4cd45b67b3496f4aa4c7180a1271c6452f6 upstream

With L1 terminal fault the CPU speculates into unmapped PTEs, and resulting
side effects allow to read the memory the PTE is pointing too, if its
values are still in the L1 cache.

For swapped out pages Linux uses unmapped PTEs and stores a swap entry into
them.

To protect against L1TF it must be ensured that the swap entry is not
pointing to valid memory, which requires setting higher bits (between bit
36 and bit 45) that are inside the CPUs physical address space, but outside
any real memory.

To do this invert the offset to make sure the higher bits are always set,
as long as the swap file is not too big.

Note there is no workaround for 32bit !PAE, or on systems which have more
than MAX_PA/2 worth of memory. The later case is very unlikely to happen on
real systems.

[AK: updated description and minor tweaks by. Split out from the original
patch ]

Signed-off-by: Linus Torvalds <[email protected]>
Signed-off-by: Andi Kleen <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Tested-by: Andi Kleen <[email protected]>
Reviewed-by: Josh Poimboeuf <[email protected]>
Acked-by: Michal Hocko <[email protected]>
Acked-by: Vlastimil Babka <[email protected]>
Acked-by: Dave Hansen <[email protected]>
Signed-off-by: David Woodhouse <[email protected]>
Signed-off-by: Guenter Roeck <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
arch/x86/include/asm/pgtable_64.h | 11 ++++++++---
1 file changed, 8 insertions(+), 3 deletions(-)

--- a/arch/x86/include/asm/pgtable_64.h
+++ b/arch/x86/include/asm/pgtable_64.h
@@ -168,7 +168,7 @@ static inline int pgd_large(pgd_t pgd) {
*
* | ... | 11| 10| 9|8|7|6|5| 4| 3|2| 1|0| <- bit number
* | ... |SW3|SW2|SW1|G|L|D|A|CD|WT|U| W|P| <- bit names
- * | TYPE (59-63) | OFFSET (9-58) |0|0|X|X| X| X|X|SD|0| <- swp entry
+ * | TYPE (59-63) | ~OFFSET (9-58) |0|0|X|X| X| X|X|SD|0| <- swp entry
*
* G (8) is aliased and used as a PROT_NONE indicator for
* !present ptes. We need to start storing swap entries above
@@ -181,6 +181,9 @@ static inline int pgd_large(pgd_t pgd) {
*
* Bit 7 in swp entry should be 0 because pmd_present checks not only P,
* but also L and G.
+ *
+ * The offset is inverted by a binary not operation to make the high
+ * physical bits set.
*/
#define SWP_TYPE_BITS 5

@@ -195,13 +198,15 @@ static inline int pgd_large(pgd_t pgd) {
#define __swp_type(x) ((x).val >> (64 - SWP_TYPE_BITS))

/* Shift up (to get rid of type), then down to get value */
-#define __swp_offset(x) ((x).val << SWP_TYPE_BITS >> SWP_OFFSET_SHIFT)
+#define __swp_offset(x) (~(x).val << SWP_TYPE_BITS >> SWP_OFFSET_SHIFT)

/*
* Shift the offset up "too far" by TYPE bits, then down again
+ * The offset is inverted by a binary not operation to make the high
+ * physical bits set.
*/
#define __swp_entry(type, offset) ((swp_entry_t) { \
- ((unsigned long)(offset) << SWP_OFFSET_SHIFT >> SWP_TYPE_BITS) \
+ (~(unsigned long)(offset) << SWP_OFFSET_SHIFT >> SWP_TYPE_BITS) \
| ((unsigned long)(type) << (64-SWP_TYPE_BITS)) })

#define __pte_to_swp_entry(pte) ((swp_entry_t) { pte_val((pte)) })



2018-08-14 17:48:28

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.4 18/43] x86/speculation: Protect against userspace-userspace spectreRSB

4.4-stable review patch. If anyone has any objections, please let me know.

------------------

From: Jiri Kosina <[email protected]>

commit fdf82a7856b32d905c39afc85e34364491e46346 upstream.

The article "Spectre Returns! Speculation Attacks using the Return Stack
Buffer" [1] describes two new (sub-)variants of spectrev2-like attacks,
making use solely of the RSB contents even on CPUs that don't fallback to
BTB on RSB underflow (Skylake+).

Mitigate userspace-userspace attacks by always unconditionally filling RSB on
context switch when the generic spectrev2 mitigation has been enabled.

[1] https://arxiv.org/pdf/1807.07940.pdf

Signed-off-by: Jiri Kosina <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Reviewed-by: Josh Poimboeuf <[email protected]>
Acked-by: Tim Chen <[email protected]>
Cc: Konrad Rzeszutek Wilk <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: David Woodhouse <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: [email protected]
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
arch/x86/kernel/cpu/bugs.c | 38 +++++++-------------------------------
1 file changed, 7 insertions(+), 31 deletions(-)

--- a/arch/x86/kernel/cpu/bugs.c
+++ b/arch/x86/kernel/cpu/bugs.c
@@ -309,23 +309,6 @@ static enum spectre_v2_mitigation_cmd __
return cmd;
}

-/* Check for Skylake-like CPUs (for RSB handling) */
-static bool __init is_skylake_era(void)
-{
- if (boot_cpu_data.x86_vendor == X86_VENDOR_INTEL &&
- boot_cpu_data.x86 == 6) {
- switch (boot_cpu_data.x86_model) {
- case INTEL_FAM6_SKYLAKE_MOBILE:
- case INTEL_FAM6_SKYLAKE_DESKTOP:
- case INTEL_FAM6_SKYLAKE_X:
- case INTEL_FAM6_KABYLAKE_MOBILE:
- case INTEL_FAM6_KABYLAKE_DESKTOP:
- return true;
- }
- }
- return false;
-}
-
static void __init spectre_v2_select_mitigation(void)
{
enum spectre_v2_mitigation_cmd cmd = spectre_v2_parse_cmdline();
@@ -386,22 +369,15 @@ retpoline_auto:
pr_info("%s\n", spectre_v2_strings[mode]);

/*
- * If neither SMEP nor PTI are available, there is a risk of
- * hitting userspace addresses in the RSB after a context switch
- * from a shallow call stack to a deeper one. To prevent this fill
- * the entire RSB, even when using IBRS.
+ * If spectre v2 protection has been enabled, unconditionally fill
+ * RSB during a context switch; this protects against two independent
+ * issues:
*
- * Skylake era CPUs have a separate issue with *underflow* of the
- * RSB, when they will predict 'ret' targets from the generic BTB.
- * The proper mitigation for this is IBRS. If IBRS is not supported
- * or deactivated in favour of retpolines the RSB fill on context
- * switch is required.
+ * - RSB underflow (and switch to BTB) on Skylake+
+ * - SpectreRSB variant of spectre v2 on X86_BUG_SPECTRE_V2 CPUs
*/
- if ((!boot_cpu_has(X86_FEATURE_KAISER) &&
- !boot_cpu_has(X86_FEATURE_SMEP)) || is_skylake_era()) {
- setup_force_cpu_cap(X86_FEATURE_RSB_CTXSW);
- pr_info("Spectre v2 mitigation: Filling RSB on context switch\n");
- }
+ setup_force_cpu_cap(X86_FEATURE_RSB_CTXSW);
+ pr_info("Spectre v2 / SpectreRSB mitigation: Filling RSB on context switch\n");

/* Initialize Indirect Branch Prediction Barrier if supported */
if (boot_cpu_has(X86_FEATURE_IBPB)) {



2018-08-14 17:48:30

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.4 10/43] root dentries need RCU-delayed freeing

4.4-stable review patch. If anyone has any objections, please let me know.

------------------

From: Al Viro <[email protected]>

commit 90bad5e05bcdb0308cfa3d3a60f5c0b9c8e2efb3 upstream.

Since mountpoint crossing can happen without leaving lazy mode,
root dentries do need the same protection against having their
memory freed without RCU delay as everything else in the tree.

It's partially hidden by RCU delay between detaching from the
mount tree and dropping the vfsmount reference, but the starting
point of pathwalk can be on an already detached mount, in which
case umount-caused RCU delay has already passed by the time the
lazy pathwalk grabs rcu_read_lock(). If the starting point
happens to be at the root of that vfsmount *and* that vfsmount
covers the entire filesystem, we get trouble.

Fixes: 48a066e72d97 ("RCU'd vsfmounts")
Cc: [email protected]
Signed-off-by: Al Viro <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
fs/dcache.c | 6 ++++--
1 file changed, 4 insertions(+), 2 deletions(-)

--- a/fs/dcache.c
+++ b/fs/dcache.c
@@ -1954,10 +1954,12 @@ struct dentry *d_make_root(struct inode
static const struct qstr name = QSTR_INIT("/", 1);

res = __d_alloc(root_inode->i_sb, &name);
- if (res)
+ if (res) {
+ res->d_flags |= DCACHE_RCUACCESS;
d_instantiate(res, root_inode);
- else
+ } else {
iput(root_inode);
+ }
}
return res;
}



2018-08-14 17:48:37

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.4 20/43] x86/irqflags: Provide a declaration for native_save_fl

4.4-stable review patch. If anyone has any objections, please let me know.

------------------

From: Nick Desaulniers <[email protected]>

commit 208cbb32558907f68b3b2a081ca2337ac3744794 upstream.

It was reported that the commit d0a8d9378d16 is causing users of gcc < 4.9
to observe -Werror=missing-prototypes errors.

Indeed, it seems that:
extern inline unsigned long native_save_fl(void) { return 0; }

compiled with -Werror=missing-prototypes produces this warning in gcc <
4.9, but not gcc >= 4.9.

Fixes: d0a8d9378d16 ("x86/paravirt: Make native_save_fl() extern inline").
Reported-by: David Laight <[email protected]>
Reported-by: Jean Delvare <[email protected]>
Signed-off-by: Nick Desaulniers <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
arch/x86/include/asm/irqflags.h | 2 ++
1 file changed, 2 insertions(+)

--- a/arch/x86/include/asm/irqflags.h
+++ b/arch/x86/include/asm/irqflags.h
@@ -8,6 +8,8 @@
* Interrupt control:
*/

+/* Declaration required for gcc < 4.9 to prevent -Werror=missing-prototypes */
+extern inline unsigned long native_save_fl(void);
extern inline unsigned long native_save_fl(void)
{
unsigned long flags;



2018-08-14 17:48:38

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.4 21/43] x86/speculation/l1tf: Increase 32bit PAE __PHYSICAL_PAGE_SHIFT

4.4-stable review patch. If anyone has any objections, please let me know.

------------------

From: Andi Kleen <[email protected]>

commit 50896e180c6aa3a9c61a26ced99e15d602666a4c upstream

L1 Terminal Fault (L1TF) is a speculation related vulnerability. The CPU
speculates on PTE entries which do not have the PRESENT bit set, if the
content of the resulting physical address is available in the L1D cache.

The OS side mitigation makes sure that a !PRESENT PTE entry points to a
physical address outside the actually existing and cachable memory
space. This is achieved by inverting the upper bits of the PTE. Due to the
address space limitations this only works for 64bit and 32bit PAE kernels,
but not for 32bit non PAE.

This mitigation applies to both host and guest kernels, but in case of a
64bit host (hypervisor) and a 32bit PAE guest, inverting the upper bits of
the PAE address space (44bit) is not enough if the host has more than 43
bits of populated memory address space, because the speculation treats the
PTE content as a physical host address bypassing EPT.

The host (hypervisor) protects itself against the guest by flushing L1D as
needed, but pages inside the guest are not protected against attacks from
other processes inside the same guest.

For the guest the inverted PTE mask has to match the host to provide the
full protection for all pages the host could possibly map into the
guest. The hosts populated address space is not known to the guest, so the
mask must cover the possible maximal host address space, i.e. 52 bit.

On 32bit PAE the maximum PTE mask is currently set to 44 bit because that
is the limit imposed by 32bit unsigned long PFNs in the VMs. This limits
the mask to be below what the host could possible use for physical pages.

The L1TF PROT_NONE protection code uses the PTE masks to determine which
bits to invert to make sure the higher bits are set for unmapped entries to
prevent L1TF speculation attacks against EPT inside guests.

In order to invert all bits that could be used by the host, increase
__PHYSICAL_PAGE_SHIFT to 52 to match 64bit.

The real limit for a 32bit PAE kernel is still 44 bits because all Linux
PTEs are created from unsigned long PFNs, so they cannot be higher than 44
bits on a 32bit kernel. So these extra PFN bits should be never set. The
only users of this macro are using it to look at PTEs, so it's safe.

[ tglx: Massaged changelog ]

Signed-off-by: Andi Kleen <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Reviewed-by: Josh Poimboeuf <[email protected]>
Acked-by: Michal Hocko <[email protected]>
Acked-by: Dave Hansen <[email protected]>
Signed-off-by: David Woodhouse <[email protected]>
Signed-off-by: Guenter Roeck <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
arch/x86/include/asm/page_32_types.h | 9 +++++++--
1 file changed, 7 insertions(+), 2 deletions(-)

--- a/arch/x86/include/asm/page_32_types.h
+++ b/arch/x86/include/asm/page_32_types.h
@@ -27,8 +27,13 @@
#define N_EXCEPTION_STACKS 1

#ifdef CONFIG_X86_PAE
-/* 44=32+12, the limit we can fit into an unsigned long pfn */
-#define __PHYSICAL_MASK_SHIFT 44
+/*
+ * This is beyond the 44 bit limit imposed by the 32bit long pfns,
+ * but we need the full mask to make sure inverted PROT_NONE
+ * entries have all the host bits set in a guest.
+ * The real limit is still 44 bits.
+ */
+#define __PHYSICAL_MASK_SHIFT 52
#define __VIRTUAL_MASK_SHIFT 32

#else /* !CONFIG_X86_PAE */



2018-08-14 17:48:42

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.4 23/43] x86/mm: Fix swap entry comment and macro

4.4-stable review patch. If anyone has any objections, please let me know.

------------------

From: Dave Hansen <[email protected]>

commit ace7fab7a6cdd363a615ec537f2aa94dbc761ee2 upstream

A recent patch changed the format of a swap PTE.

The comment explaining the format of the swap PTE is wrong about
the bits used for the swap type field. Amusingly, the ASCII art
and the patch description are correct, but the comment itself
is wrong.

As I was looking at this, I also noticed that the
SWP_OFFSET_FIRST_BIT has an off-by-one error. This does not
really hurt anything. It just wasted a bit of space in the PTE,
giving us 2^59 bytes of addressable space in our swapfiles
instead of 2^60. But, it doesn't match with the comments, and it
wastes a bit of space, so fix it.

Signed-off-by: Dave Hansen <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Brian Gerst <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: Denys Vlasenko <[email protected]>
Cc: H. Peter Anvin <[email protected]>
Cc: Josh Poimboeuf <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Luis R. Rodriguez <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Toshi Kani <[email protected]>
Fixes: 00839ee3b299 ("x86/mm: Move swap offset/type up in PTE to work around erratum")
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Ingo Molnar <[email protected]>
Signed-off-by: Guenter Roeck <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
arch/x86/include/asm/pgtable_64.h | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)

--- a/arch/x86/include/asm/pgtable_64.h
+++ b/arch/x86/include/asm/pgtable_64.h
@@ -168,7 +168,7 @@ static inline int pgd_large(pgd_t pgd) {
*
* | ... | 11| 10| 9|8|7|6|5| 4| 3|2|1|0| <- bit number
* | ... |SW3|SW2|SW1|G|L|D|A|CD|WT|U|W|P| <- bit names
- * | OFFSET (14->63) | TYPE (10-13) |0|X|X|X| X| X|X|X|0| <- swp entry
+ * | OFFSET (14->63) | TYPE (9-13) |0|X|X|X| X| X|X|X|0| <- swp entry
*
* G (8) is aliased and used as a PROT_NONE indicator for
* !present ptes. We need to start storing swap entries above
@@ -179,7 +179,7 @@ static inline int pgd_large(pgd_t pgd) {
#define SWP_TYPE_FIRST_BIT (_PAGE_BIT_PROTNONE + 1)
#define SWP_TYPE_BITS 5
/* Place the offset above the type: */
-#define SWP_OFFSET_FIRST_BIT (SWP_TYPE_FIRST_BIT + SWP_TYPE_BITS + 1)
+#define SWP_OFFSET_FIRST_BIT (SWP_TYPE_FIRST_BIT + SWP_TYPE_BITS)

#define MAX_SWAPFILES_CHECK() BUILD_BUG_ON(MAX_SWAPFILES_SHIFT > SWP_TYPE_BITS)




2018-08-14 17:48:46

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.4 25/43] x86/speculation/l1tf: Change order of offset/type in swap entry

4.4-stable review patch. If anyone has any objections, please let me know.

------------------

From: Linus Torvalds <[email protected]>

commit bcd11afa7adad8d720e7ba5ef58bdcd9775cf45f upstream

If pages are swapped out, the swap entry is stored in the corresponding
PTE, which has the Present bit cleared. CPUs vulnerable to L1TF speculate
on PTE entries which have the present bit set and would treat the swap
entry as phsyical address (PFN). To mitigate that the upper bits of the PTE
must be set so the PTE points to non existent memory.

The swap entry stores the type and the offset of a swapped out page in the
PTE. type is stored in bit 9-13 and offset in bit 14-63. The hardware
ignores the bits beyond the phsyical address space limit, so to make the
mitigation effective its required to start 'offset' at the lowest possible
bit so that even large swap offsets do not reach into the physical address
space limit bits.

Move offset to bit 9-58 and type to bit 59-63 which are the bits that
hardware generally doesn't care about.

That, in turn, means that if you on desktop chip with only 40 bits of
physical addressing, now that the offset starts at bit 9, there needs to be
30 bits of offset actually *in use* until bit 39 ends up being set, which
means when inverted it will again point into existing memory.

So that's 4 terabyte of swap space (because the offset is counted in pages,
so 30 bits of offset is 42 bits of actual coverage). With bigger physical
addressing, that obviously grows further, until the limit of the offset is
hit (at 50 bits of offset - 62 bits of actual swap file coverage).

This is a preparatory change for the actual swap entry inversion to protect
against L1TF.

[ AK: Updated description and minor tweaks. Split into two parts ]
[ tglx: Massaged changelog ]

Signed-off-by: Linus Torvalds <[email protected]>
Signed-off-by: Andi Kleen <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Tested-by: Andi Kleen <[email protected]>
Reviewed-by: Josh Poimboeuf <[email protected]>
Acked-by: Michal Hocko <[email protected]>
Acked-by: Vlastimil Babka <[email protected]>
Acked-by: Dave Hansen <[email protected]>
Signed-off-by: David Woodhouse <[email protected]>
Signed-off-by: Guenter Roeck <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
arch/x86/include/asm/pgtable_64.h | 31 ++++++++++++++++++++-----------
1 file changed, 20 insertions(+), 11 deletions(-)

--- a/arch/x86/include/asm/pgtable_64.h
+++ b/arch/x86/include/asm/pgtable_64.h
@@ -168,7 +168,7 @@ static inline int pgd_large(pgd_t pgd) {
*
* | ... | 11| 10| 9|8|7|6|5| 4| 3|2| 1|0| <- bit number
* | ... |SW3|SW2|SW1|G|L|D|A|CD|WT|U| W|P| <- bit names
- * | OFFSET (14->63) | TYPE (9-13) |0|0|X|X| X| X|X|SD|0| <- swp entry
+ * | TYPE (59-63) | OFFSET (9-58) |0|0|X|X| X| X|X|SD|0| <- swp entry
*
* G (8) is aliased and used as a PROT_NONE indicator for
* !present ptes. We need to start storing swap entries above
@@ -182,19 +182,28 @@ static inline int pgd_large(pgd_t pgd) {
* Bit 7 in swp entry should be 0 because pmd_present checks not only P,
* but also L and G.
*/
-#define SWP_TYPE_FIRST_BIT (_PAGE_BIT_PROTNONE + 1)
-#define SWP_TYPE_BITS 5
-/* Place the offset above the type: */
-#define SWP_OFFSET_FIRST_BIT (SWP_TYPE_FIRST_BIT + SWP_TYPE_BITS)
+#define SWP_TYPE_BITS 5
+
+#define SWP_OFFSET_FIRST_BIT (_PAGE_BIT_PROTNONE + 1)
+
+/* We always extract/encode the offset by shifting it all the way up, and then down again */
+#define SWP_OFFSET_SHIFT (SWP_OFFSET_FIRST_BIT+SWP_TYPE_BITS)

#define MAX_SWAPFILES_CHECK() BUILD_BUG_ON(MAX_SWAPFILES_SHIFT > SWP_TYPE_BITS)

-#define __swp_type(x) (((x).val >> (SWP_TYPE_FIRST_BIT)) \
- & ((1U << SWP_TYPE_BITS) - 1))
-#define __swp_offset(x) ((x).val >> SWP_OFFSET_FIRST_BIT)
-#define __swp_entry(type, offset) ((swp_entry_t) { \
- ((type) << (SWP_TYPE_FIRST_BIT)) \
- | ((offset) << SWP_OFFSET_FIRST_BIT) })
+/* Extract the high bits for type */
+#define __swp_type(x) ((x).val >> (64 - SWP_TYPE_BITS))
+
+/* Shift up (to get rid of type), then down to get value */
+#define __swp_offset(x) ((x).val << SWP_TYPE_BITS >> SWP_OFFSET_SHIFT)
+
+/*
+ * Shift the offset up "too far" by TYPE bits, then down again
+ */
+#define __swp_entry(type, offset) ((swp_entry_t) { \
+ ((unsigned long)(offset) << SWP_OFFSET_SHIFT >> SWP_TYPE_BITS) \
+ | ((unsigned long)(type) << (64-SWP_TYPE_BITS)) })
+
#define __pte_to_swp_entry(pte) ((swp_entry_t) { pte_val((pte)) })
#define __swp_entry_to_pte(x) ((pte_t) { .pte = (x).val })




2018-08-14 17:48:47

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.4 31/43] mm: fix cache mode tracking in vm_insert_mixed()

4.4-stable review patch. If anyone has any objections, please let me know.

------------------

From: Dan Williams <[email protected]>

commit 87744ab3832b83ba71b931f86f9cfdb000d07da5 upstream

vm_insert_mixed() unlike vm_insert_pfn_prot() and vmf_insert_pfn_pmd(),
fails to check the pgprot_t it uses for the mapping against the one
recorded in the memtype tracking tree. Add the missing call to
track_pfn_insert() to preclude cases where incompatible aliased mappings
are established for a given physical address range.

[groeck: Backport to v4.4.y]

Link: http://lkml.kernel.org/r/147328717909.35069.14256589123570653697.stgit@dwillia2-desk3.amr.corp.intel.com
Signed-off-by: Dan Williams <[email protected]>
Cc: David Airlie <[email protected]>
Cc: Matthew Wilcox <[email protected]>
Cc: Ross Zwisler <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
Signed-off-by: Guenter Roeck <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
mm/memory.c | 8 ++++++--
1 file changed, 6 insertions(+), 2 deletions(-)

--- a/mm/memory.c
+++ b/mm/memory.c
@@ -1654,10 +1654,14 @@ EXPORT_SYMBOL(vm_insert_pfn_prot);
int vm_insert_mixed(struct vm_area_struct *vma, unsigned long addr,
unsigned long pfn)
{
+ pgprot_t pgprot = vma->vm_page_prot;
+
BUG_ON(!(vma->vm_flags & VM_MIXEDMAP));

if (addr < vma->vm_start || addr >= vma->vm_end)
return -EFAULT;
+ if (track_pfn_insert(vma, &pgprot, pfn))
+ return -EINVAL;

/*
* If we don't have pte special, then we have to use the pfn_valid()
@@ -1670,9 +1674,9 @@ int vm_insert_mixed(struct vm_area_struc
struct page *page;

page = pfn_to_page(pfn);
- return insert_page(vma, addr, page, vma->vm_page_prot);
+ return insert_page(vma, addr, page, pgprot);
}
- return insert_pfn(vma, addr, pfn, vma->vm_page_prot);
+ return insert_pfn(vma, addr, pfn, pgprot);
}
EXPORT_SYMBOL(vm_insert_mixed);




2018-08-14 17:48:54

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.4 16/43] ARM: dts: imx6sx: fix irq for pcie bridge

4.4-stable review patch. If anyone has any objections, please let me know.

------------------

From: Oleksij Rempel <[email protected]>

commit 1bcfe0564044be578841744faea1c2f46adc8178 upstream.

Use the correct IRQ line for the MSI controller in the PCIe host
controller. Apparently a different IRQ line is used compared to other
i.MX6 variants. Without this change MSI IRQs aren't properly propagated
to the upstream interrupt controller.

Signed-off-by: Oleksij Rempel <[email protected]>
Reviewed-by: Lucas Stach <[email protected]>
Fixes: b1d17f68e5c5 ("ARM: dts: imx: add initial imx6sx device tree source")
Signed-off-by: Shawn Guo <[email protected]>
Signed-off-by: Amit Pundir <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
arch/arm/boot/dts/imx6sx.dtsi | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

--- a/arch/arm/boot/dts/imx6sx.dtsi
+++ b/arch/arm/boot/dts/imx6sx.dtsi
@@ -1250,7 +1250,7 @@
/* non-prefetchable memory */
0x82000000 0 0x08000000 0x08000000 0 0x00f00000>;
num-lanes = <1>;
- interrupts = <GIC_SPI 123 IRQ_TYPE_LEVEL_HIGH>;
+ interrupts = <GIC_SPI 120 IRQ_TYPE_LEVEL_HIGH>;
clocks = <&clks IMX6SX_CLK_PCIE_REF_125M>,
<&clks IMX6SX_CLK_PCIE_AXI>,
<&clks IMX6SX_CLK_LVDS1_OUT>,



2018-08-14 17:49:02

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.4 28/43] x86/speculation/l1tf: Make sure the first page is always reserved

4.4-stable review patch. If anyone has any objections, please let me know.

------------------

From: Andi Kleen <[email protected]>

commit 10a70416e1f067f6c4efda6ffd8ea96002ac4223 upstream

The L1TF workaround doesn't make any attempt to mitigate speculate accesses
to the first physical page for zeroed PTEs. Normally it only contains some
data from the early real mode BIOS.

It's not entirely clear that the first page is reserved in all
configurations, so add an extra reservation call to make sure it is really
reserved. In most configurations (e.g. with the standard reservations)
it's likely a nop.

Signed-off-by: Andi Kleen <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Reviewed-by: Josh Poimboeuf <[email protected]>
Acked-by: Dave Hansen <[email protected]>
Signed-off-by: David Woodhouse <[email protected]>
Signed-off-by: Guenter Roeck <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
arch/x86/kernel/setup.c | 6 ++++++
1 file changed, 6 insertions(+)

--- a/arch/x86/kernel/setup.c
+++ b/arch/x86/kernel/setup.c
@@ -851,6 +851,12 @@ void __init setup_arch(char **cmdline_p)
memblock_reserve(__pa_symbol(_text),
(unsigned long)__bss_stop - (unsigned long)_text);

+ /*
+ * Make sure page 0 is always reserved because on systems with
+ * L1TF its contents can be leaked to user processes.
+ */
+ memblock_reserve(0, PAGE_SIZE);
+
early_reserve_initrd();

/*



2018-08-14 17:49:02

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.4 33/43] x86/speculation/l1tf: Limit swap file size to MAX_PA/2

4.4-stable review patch. If anyone has any objections, please let me know.

------------------

From: Andi Kleen <[email protected]>

commit 377eeaa8e11fe815b1d07c81c4a0e2843a8c15eb upstream

For the L1TF workaround its necessary to limit the swap file size to below
MAX_PA/2, so that the higher bits of the swap offset inverted never point
to valid memory.

Add a mechanism for the architecture to override the swap file size check
in swapfile.c and add a x86 specific max swapfile check function that
enforces that limit.

The check is only enabled if the CPU is vulnerable to L1TF.

In VMs with 42bit MAX_PA the typical limit is 2TB now, on a native system
with 46bit PA it is 32TB. The limit is only per individual swap file, so
it's always possible to exceed these limits with multiple swap files or
partitions.

Signed-off-by: Andi Kleen <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Reviewed-by: Josh Poimboeuf <[email protected]>
Acked-by: Michal Hocko <[email protected]>
Acked-by: Dave Hansen <[email protected]>
Signed-off-by: David Woodhouse <[email protected]>
Signed-off-by: Guenter Roeck <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
arch/x86/mm/init.c | 15 +++++++++++++++
include/linux/swapfile.h | 2 ++
mm/swapfile.c | 46 ++++++++++++++++++++++++++++++----------------
3 files changed, 47 insertions(+), 16 deletions(-)

--- a/arch/x86/mm/init.c
+++ b/arch/x86/mm/init.c
@@ -4,6 +4,8 @@
#include <linux/swap.h>
#include <linux/memblock.h>
#include <linux/bootmem.h> /* for max_low_pfn */
+#include <linux/swapfile.h>
+#include <linux/swapops.h>

#include <asm/cacheflush.h>
#include <asm/e820.h>
@@ -767,3 +769,16 @@ void update_cache_mode_entry(unsigned en
__cachemode2pte_tbl[cache] = __cm_idx2pte(entry);
__pte2cachemode_tbl[entry] = cache;
}
+
+unsigned long max_swapfile_size(void)
+{
+ unsigned long pages;
+
+ pages = generic_max_swapfile_size();
+
+ if (boot_cpu_has_bug(X86_BUG_L1TF)) {
+ /* Limit the swap file size to MAX_PA/2 for L1TF workaround */
+ pages = min_t(unsigned long, l1tf_pfn_limit() + 1, pages);
+ }
+ return pages;
+}
--- a/include/linux/swapfile.h
+++ b/include/linux/swapfile.h
@@ -9,5 +9,7 @@ extern spinlock_t swap_lock;
extern struct plist_head swap_active_head;
extern struct swap_info_struct *swap_info[];
extern int try_to_unuse(unsigned int, bool, unsigned long);
+extern unsigned long generic_max_swapfile_size(void);
+extern unsigned long max_swapfile_size(void);

#endif /* _LINUX_SWAPFILE_H */
--- a/mm/swapfile.c
+++ b/mm/swapfile.c
@@ -2206,6 +2206,35 @@ static int claim_swapfile(struct swap_in
return 0;
}

+
+/*
+ * Find out how many pages are allowed for a single swap device. There
+ * are two limiting factors:
+ * 1) the number of bits for the swap offset in the swp_entry_t type, and
+ * 2) the number of bits in the swap pte, as defined by the different
+ * architectures.
+ *
+ * In order to find the largest possible bit mask, a swap entry with
+ * swap type 0 and swap offset ~0UL is created, encoded to a swap pte,
+ * decoded to a swp_entry_t again, and finally the swap offset is
+ * extracted.
+ *
+ * This will mask all the bits from the initial ~0UL mask that can't
+ * be encoded in either the swp_entry_t or the architecture definition
+ * of a swap pte.
+ */
+unsigned long generic_max_swapfile_size(void)
+{
+ return swp_offset(pte_to_swp_entry(
+ swp_entry_to_pte(swp_entry(0, ~0UL)))) + 1;
+}
+
+/* Can be overridden by an architecture for additional checks. */
+__weak unsigned long max_swapfile_size(void)
+{
+ return generic_max_swapfile_size();
+}
+
static unsigned long read_swap_header(struct swap_info_struct *p,
union swap_header *swap_header,
struct inode *inode)
@@ -2241,22 +2270,7 @@ static unsigned long read_swap_header(st
p->cluster_next = 1;
p->cluster_nr = 0;

- /*
- * Find out how many pages are allowed for a single swap
- * device. There are two limiting factors: 1) the number
- * of bits for the swap offset in the swp_entry_t type, and
- * 2) the number of bits in the swap pte as defined by the
- * different architectures. In order to find the
- * largest possible bit mask, a swap entry with swap type 0
- * and swap offset ~0UL is created, encoded to a swap pte,
- * decoded to a swp_entry_t again, and finally the swap
- * offset is extracted. This will mask all the bits from
- * the initial ~0UL mask that can't be encoded in either
- * the swp_entry_t or the architecture definition of a
- * swap pte.
- */
- maxpages = swp_offset(pte_to_swp_entry(
- swp_entry_to_pte(swp_entry(0, ~0UL)))) + 1;
+ maxpages = max_swapfile_size();
last_page = swap_header->info.last_page;
if (!last_page) {
pr_warn("Empty swap-file\n");



2018-08-14 17:49:05

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.4 34/43] x86/bugs: Move the l1tf function and define pr_fmt properly

4.4-stable review patch. If anyone has any objections, please let me know.

------------------

From: Konrad Rzeszutek Wilk <[email protected]>

commit 56563f53d3066afa9e63d6c997bf67e76a8b05c0 upstream

The pr_warn in l1tf_select_mitigation would have used the prior pr_fmt
which was defined as "Spectre V2 : ".

Move the function to be past SSBD and also define the pr_fmt.

Fixes: 17dbca119312 ("x86/speculation/l1tf: Add sysfs reporting for l1tf")
Signed-off-by: Konrad Rzeszutek Wilk <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Signed-off-by: David Woodhouse <[email protected]>
Signed-off-by: Guenter Roeck <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
arch/x86/kernel/cpu/bugs.c | 55 +++++++++++++++++++++++----------------------
1 file changed, 29 insertions(+), 26 deletions(-)

--- a/arch/x86/kernel/cpu/bugs.c
+++ b/arch/x86/kernel/cpu/bugs.c
@@ -207,32 +207,6 @@ static void x86_amd_ssb_disable(void)
wrmsrl(MSR_AMD64_LS_CFG, msrval);
}

-static void __init l1tf_select_mitigation(void)
-{
- u64 half_pa;
-
- if (!boot_cpu_has_bug(X86_BUG_L1TF))
- return;
-
-#if CONFIG_PGTABLE_LEVELS == 2
- pr_warn("Kernel not compiled for PAE. No mitigation for L1TF\n");
- return;
-#endif
-
- /*
- * This is extremely unlikely to happen because almost all
- * systems have far more MAX_PA/2 than RAM can be fit into
- * DIMM slots.
- */
- half_pa = (u64)l1tf_pfn_limit() << PAGE_SHIFT;
- if (e820_any_mapped(half_pa, ULLONG_MAX - half_pa, E820_RAM)) {
- pr_warn("System has more than MAX_PA/2 memory. L1TF mitigation not effective.\n");
- return;
- }
-
- setup_force_cpu_cap(X86_FEATURE_L1TF_PTEINV);
-}
-
#ifdef RETPOLINE
static bool spectre_v2_bad_module;

@@ -658,6 +632,35 @@ void x86_spec_ctrl_setup_ap(void)
x86_amd_ssb_disable();
}

+#undef pr_fmt
+#define pr_fmt(fmt) "L1TF: " fmt
+static void __init l1tf_select_mitigation(void)
+{
+ u64 half_pa;
+
+ if (!boot_cpu_has_bug(X86_BUG_L1TF))
+ return;
+
+#if CONFIG_PGTABLE_LEVELS == 2
+ pr_warn("Kernel not compiled for PAE. No mitigation for L1TF\n");
+ return;
+#endif
+
+ /*
+ * This is extremely unlikely to happen because almost all
+ * systems have far more MAX_PA/2 than RAM can be fit into
+ * DIMM slots.
+ */
+ half_pa = (u64)l1tf_pfn_limit() << PAGE_SHIFT;
+ if (e820_any_mapped(half_pa, ULLONG_MAX - half_pa, E820_RAM)) {
+ pr_warn("System has more than MAX_PA/2 memory. L1TF mitigation not effective.\n");
+ return;
+ }
+
+ setup_force_cpu_cap(X86_FEATURE_L1TF_PTEINV);
+}
+#undef pr_fmt
+
#ifdef CONFIG_SYSFS

static ssize_t cpu_show_common(struct device *dev, struct device_attribute *attr,



2018-08-14 17:49:12

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.4 29/43] x86/speculation/l1tf: Add sysfs reporting for l1tf

4.4-stable review patch. If anyone has any objections, please let me know.

------------------

From: Andi Kleen <[email protected]>

commit 17dbca119312b4e8173d4e25ff64262119fcef38 upstream

L1TF core kernel workarounds are cheap and normally always enabled, However
they still should be reported in sysfs if the system is vulnerable or
mitigated. Add the necessary CPU feature/bug bits.

- Extend the existing checks for Meltdowns to determine if the system is
vulnerable. All CPUs which are not vulnerable to Meltdown are also not
vulnerable to L1TF

- Check for 32bit non PAE and emit a warning as there is no practical way
for mitigation due to the limited physical address bits

- If the system has more than MAX_PA/2 physical memory the invert page
workarounds don't protect the system against the L1TF attack anymore,
because an inverted physical address will also point to valid
memory. Print a warning in this case and report that the system is
vulnerable.

Add a function which returns the PFN limit for the L1TF mitigation, which
will be used in follow up patches for sanity and range checks.

[ tglx: Renamed the CPU feature bit to L1TF_PTEINV ]
[ dwmw2: Backport to 4.9 (cpufeatures.h, E820) ]

Signed-off-by: Andi Kleen <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Reviewed-by: Josh Poimboeuf <[email protected]>
Acked-by: Dave Hansen <[email protected]>
Signed-off-by: David Woodhouse <[email protected]>
Signed-off-by: Guenter Roeck <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
arch/x86/include/asm/cpufeatures.h | 3 +-
arch/x86/include/asm/processor.h | 5 ++++
arch/x86/kernel/cpu/bugs.c | 40 +++++++++++++++++++++++++++++++++++++
arch/x86/kernel/cpu/common.c | 20 ++++++++++++++++++
drivers/base/cpu.c | 8 +++++++
include/linux/cpu.h | 2 +
6 files changed, 77 insertions(+), 1 deletion(-)

--- a/arch/x86/include/asm/cpufeatures.h
+++ b/arch/x86/include/asm/cpufeatures.h
@@ -214,7 +214,7 @@
#define X86_FEATURE_IBPB ( 7*32+26) /* Indirect Branch Prediction Barrier */
#define X86_FEATURE_STIBP ( 7*32+27) /* Single Thread Indirect Branch Predictors */
#define X86_FEATURE_ZEN ( 7*32+28) /* "" CPU is AMD family 0x17 (Zen) */
-
+#define X86_FEATURE_L1TF_PTEINV ( 7*32+29) /* "" L1TF workaround PTE inversion */

/* Virtualization flags: Linux defined, word 8 */
#define X86_FEATURE_TPR_SHADOW ( 8*32+ 0) /* Intel TPR Shadow */
@@ -331,5 +331,6 @@
#define X86_BUG_SPECTRE_V1 X86_BUG(15) /* CPU is affected by Spectre variant 1 attack with conditional branches */
#define X86_BUG_SPECTRE_V2 X86_BUG(16) /* CPU is affected by Spectre variant 2 attack with indirect branches */
#define X86_BUG_SPEC_STORE_BYPASS X86_BUG(17) /* CPU is affected by speculative store bypass attack */
+#define X86_BUG_L1TF X86_BUG(18) /* CPU is affected by L1 Terminal Fault */

#endif /* _ASM_X86_CPUFEATURES_H */
--- a/arch/x86/include/asm/processor.h
+++ b/arch/x86/include/asm/processor.h
@@ -172,6 +172,11 @@ extern const struct seq_operations cpuin

extern void cpu_detect(struct cpuinfo_x86 *c);

+static inline unsigned long l1tf_pfn_limit(void)
+{
+ return BIT(boot_cpu_data.x86_phys_bits - 1 - PAGE_SHIFT) - 1;
+}
+
extern void early_cpu_init(void);
extern void identify_boot_cpu(void);
extern void identify_secondary_cpu(struct cpuinfo_x86 *);
--- a/arch/x86/kernel/cpu/bugs.c
+++ b/arch/x86/kernel/cpu/bugs.c
@@ -26,9 +26,11 @@
#include <asm/pgtable.h>
#include <asm/cacheflush.h>
#include <asm/intel-family.h>
+#include <asm/e820.h>

static void __init spectre_v2_select_mitigation(void);
static void __init ssb_select_mitigation(void);
+static void __init l1tf_select_mitigation(void);

/*
* Our boot-time value of the SPEC_CTRL MSR. We read it once so that any
@@ -80,6 +82,8 @@ void __init check_bugs(void)
*/
ssb_select_mitigation();

+ l1tf_select_mitigation();
+
#ifdef CONFIG_X86_32
/*
* Check whether we are able to run this kernel safely on SMP.
@@ -203,6 +207,32 @@ static void x86_amd_ssb_disable(void)
wrmsrl(MSR_AMD64_LS_CFG, msrval);
}

+static void __init l1tf_select_mitigation(void)
+{
+ u64 half_pa;
+
+ if (!boot_cpu_has_bug(X86_BUG_L1TF))
+ return;
+
+#if CONFIG_PGTABLE_LEVELS == 2
+ pr_warn("Kernel not compiled for PAE. No mitigation for L1TF\n");
+ return;
+#endif
+
+ /*
+ * This is extremely unlikely to happen because almost all
+ * systems have far more MAX_PA/2 than RAM can be fit into
+ * DIMM slots.
+ */
+ half_pa = (u64)l1tf_pfn_limit() << PAGE_SHIFT;
+ if (e820_any_mapped(half_pa, ULLONG_MAX - half_pa, E820_RAM)) {
+ pr_warn("System has more than MAX_PA/2 memory. L1TF mitigation not effective.\n");
+ return;
+ }
+
+ setup_force_cpu_cap(X86_FEATURE_L1TF_PTEINV);
+}
+
#ifdef RETPOLINE
static bool spectre_v2_bad_module;

@@ -655,6 +685,11 @@ static ssize_t cpu_show_common(struct de
case X86_BUG_SPEC_STORE_BYPASS:
return sprintf(buf, "%s\n", ssb_strings[ssb_mode]);

+ case X86_BUG_L1TF:
+ if (boot_cpu_has(X86_FEATURE_L1TF_PTEINV))
+ return sprintf(buf, "Mitigation: Page Table Inversion\n");
+ break;
+
default:
break;
}
@@ -681,4 +716,9 @@ ssize_t cpu_show_spec_store_bypass(struc
{
return cpu_show_common(dev, attr, buf, X86_BUG_SPEC_STORE_BYPASS);
}
+
+ssize_t cpu_show_l1tf(struct device *dev, struct device_attribute *attr, char *buf)
+{
+ return cpu_show_common(dev, attr, buf, X86_BUG_L1TF);
+}
#endif
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -880,6 +880,21 @@ static const __initconst struct x86_cpu_
{}
};

+static const __initconst struct x86_cpu_id cpu_no_l1tf[] = {
+ /* in addition to cpu_no_speculation */
+ { X86_VENDOR_INTEL, 6, INTEL_FAM6_ATOM_SILVERMONT1 },
+ { X86_VENDOR_INTEL, 6, INTEL_FAM6_ATOM_SILVERMONT2 },
+ { X86_VENDOR_INTEL, 6, INTEL_FAM6_ATOM_AIRMONT },
+ { X86_VENDOR_INTEL, 6, INTEL_FAM6_ATOM_MERRIFIELD },
+ { X86_VENDOR_INTEL, 6, INTEL_FAM6_ATOM_MOOREFIELD },
+ { X86_VENDOR_INTEL, 6, INTEL_FAM6_ATOM_GOLDMONT },
+ { X86_VENDOR_INTEL, 6, INTEL_FAM6_ATOM_DENVERTON },
+ { X86_VENDOR_INTEL, 6, INTEL_FAM6_ATOM_GEMINI_LAKE },
+ { X86_VENDOR_INTEL, 6, INTEL_FAM6_XEON_PHI_KNL },
+ { X86_VENDOR_INTEL, 6, INTEL_FAM6_XEON_PHI_KNM },
+ {}
+};
+
static void __init cpu_set_bug_bits(struct cpuinfo_x86 *c)
{
u64 ia32_cap = 0;
@@ -905,6 +920,11 @@ static void __init cpu_set_bug_bits(stru
return;

setup_force_cpu_bug(X86_BUG_CPU_MELTDOWN);
+
+ if (x86_match_cpu(cpu_no_l1tf))
+ return;
+
+ setup_force_cpu_bug(X86_BUG_L1TF);
}

/*
--- a/drivers/base/cpu.c
+++ b/drivers/base/cpu.c
@@ -524,16 +524,24 @@ ssize_t __weak cpu_show_spec_store_bypas
return sprintf(buf, "Not affected\n");
}

+ssize_t __weak cpu_show_l1tf(struct device *dev,
+ struct device_attribute *attr, char *buf)
+{
+ return sprintf(buf, "Not affected\n");
+}
+
static DEVICE_ATTR(meltdown, 0444, cpu_show_meltdown, NULL);
static DEVICE_ATTR(spectre_v1, 0444, cpu_show_spectre_v1, NULL);
static DEVICE_ATTR(spectre_v2, 0444, cpu_show_spectre_v2, NULL);
static DEVICE_ATTR(spec_store_bypass, 0444, cpu_show_spec_store_bypass, NULL);
+static DEVICE_ATTR(l1tf, 0444, cpu_show_l1tf, NULL);

static struct attribute *cpu_root_vulnerabilities_attrs[] = {
&dev_attr_meltdown.attr,
&dev_attr_spectre_v1.attr,
&dev_attr_spectre_v2.attr,
&dev_attr_spec_store_bypass.attr,
+ &dev_attr_l1tf.attr,
NULL
};

--- a/include/linux/cpu.h
+++ b/include/linux/cpu.h
@@ -48,6 +48,8 @@ extern ssize_t cpu_show_spectre_v2(struc
struct device_attribute *attr, char *buf);
extern ssize_t cpu_show_spec_store_bypass(struct device *dev,
struct device_attribute *attr, char *buf);
+extern ssize_t cpu_show_l1tf(struct device *dev,
+ struct device_attribute *attr, char *buf);

extern __printf(4, 5)
struct device *cpu_device_create(struct device *parent, void *drvdata,



2018-08-14 17:49:13

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.4 03/43] ipv4+ipv6: Make INET*_ESP select CRYPTO_ECHAINIV

4.4-stable review patch. If anyone has any objections, please let me know.

------------------

From: Thomas Egerer <[email protected]>

commit 32b6170ca59ccf07d0e394561e54b2cd9726038c upstream.

The ESP algorithms using CBC mode require echainiv. Hence INET*_ESP have
to select CRYPTO_ECHAINIV in order to work properly. This solves the
issues caused by a misconfiguration as described in [1].
The original approach, patching crypto/Kconfig was turned down by
Herbert Xu [2].

[1] https://lists.strongswan.org/pipermail/users/2015-December/009074.html
[2] http://marc.info/?l=linux-crypto-vger&m=145224655809562&w=2

Signed-off-by: Thomas Egerer <[email protected]>
Acked-by: Herbert Xu <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Cc: Yongqin Liu <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
net/ipv4/Kconfig | 1 +
net/ipv6/Kconfig | 1 +
2 files changed, 2 insertions(+)

--- a/net/ipv4/Kconfig
+++ b/net/ipv4/Kconfig
@@ -354,6 +354,7 @@ config INET_ESP
select CRYPTO_CBC
select CRYPTO_SHA1
select CRYPTO_DES
+ select CRYPTO_ECHAINIV
---help---
Support for IPsec ESP.

--- a/net/ipv6/Kconfig
+++ b/net/ipv6/Kconfig
@@ -69,6 +69,7 @@ config INET6_ESP
select CRYPTO_CBC
select CRYPTO_SHA1
select CRYPTO_DES
+ select CRYPTO_ECHAINIV
---help---
Support for IPsec ESP.




2018-08-14 17:49:14

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.4 07/43] xen/netfront: dont cache skb_shinfo()

4.4-stable review patch. If anyone has any objections, please let me know.

------------------

From: Juergen Gross <[email protected]>

commit d472b3a6cf63cd31cae1ed61930f07e6cd6671b5 upstream.

skb_shinfo() can change when calling __pskb_pull_tail(): Don't cache
its return value.

Cc: [email protected]
Signed-off-by: Juergen Gross <[email protected]>
Reviewed-by: Wei Liu <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
drivers/net/xen-netfront.c | 8 ++++----
1 file changed, 4 insertions(+), 4 deletions(-)

--- a/drivers/net/xen-netfront.c
+++ b/drivers/net/xen-netfront.c
@@ -879,7 +879,6 @@ static RING_IDX xennet_fill_frags(struct
struct sk_buff *skb,
struct sk_buff_head *list)
{
- struct skb_shared_info *shinfo = skb_shinfo(skb);
RING_IDX cons = queue->rx.rsp_cons;
struct sk_buff *nskb;

@@ -888,15 +887,16 @@ static RING_IDX xennet_fill_frags(struct
RING_GET_RESPONSE(&queue->rx, ++cons);
skb_frag_t *nfrag = &skb_shinfo(nskb)->frags[0];

- if (shinfo->nr_frags == MAX_SKB_FRAGS) {
+ if (skb_shinfo(skb)->nr_frags == MAX_SKB_FRAGS) {
unsigned int pull_to = NETFRONT_SKB_CB(skb)->pull_to;

BUG_ON(pull_to <= skb_headlen(skb));
__pskb_pull_tail(skb, pull_to - skb_headlen(skb));
}
- BUG_ON(shinfo->nr_frags >= MAX_SKB_FRAGS);
+ BUG_ON(skb_shinfo(skb)->nr_frags >= MAX_SKB_FRAGS);

- skb_add_rx_frag(skb, shinfo->nr_frags, skb_frag_page(nfrag),
+ skb_add_rx_frag(skb, skb_shinfo(skb)->nr_frags,
+ skb_frag_page(nfrag),
rx->offset, rx->status, PAGE_SIZE);

skb_shinfo(nskb)->nr_frags = 0;



2018-08-14 17:49:15

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.4 30/43] mm: Add vm_insert_pfn_prot()

4.4-stable review patch. If anyone has any objections, please let me know.

------------------

From: Andy Lutomirski <[email protected]>

commit 1745cbc5d0dee0749a6bc0ea8e872c5db0074061 upstream

The x86 vvar vma contains pages with differing cacheability
flags. x86 currently implements this by manually inserting all
the ptes using (io_)remap_pfn_range when the vma is set up.

x86 wants to move to using .fault with VM_FAULT_NOPAGE to set up
the mappings as needed. The correct API to use to insert a pfn
in .fault is vm_insert_pfn(), but vm_insert_pfn() can't override the
vma's cache mode, and the HPET page in particular needs to be
uncached despite the fact that the rest of the VMA is cached.

Add vm_insert_pfn_prot() to support varying cacheability within
the same non-COW VMA in a more sane manner.

x86 could alternatively use multiple VMAs, but that's messy,
would break CRIU, and would create unnecessary VMAs that would
waste memory.

Signed-off-by: Andy Lutomirski <[email protected]>
Reviewed-by: Kees Cook <[email protected]>
Acked-by: Andrew Morton <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: Fenghua Yu <[email protected]>
Cc: H. Peter Anvin <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Oleg Nesterov <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Quentin Casasnovas <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Link: http://lkml.kernel.org/r/d2938d1eb37be7a5e4f86182db646551f11e45aa.1451446564.git.luto@kernel.org
Signed-off-by: Ingo Molnar <[email protected]>
Signed-off-by: Guenter Roeck <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
include/linux/mm.h | 2 ++
mm/memory.c | 25 +++++++++++++++++++++++--
2 files changed, 25 insertions(+), 2 deletions(-)

--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -2083,6 +2083,8 @@ int remap_pfn_range(struct vm_area_struc
int vm_insert_page(struct vm_area_struct *, unsigned long addr, struct page *);
int vm_insert_pfn(struct vm_area_struct *vma, unsigned long addr,
unsigned long pfn);
+int vm_insert_pfn_prot(struct vm_area_struct *vma, unsigned long addr,
+ unsigned long pfn, pgprot_t pgprot);
int vm_insert_mixed(struct vm_area_struct *vma, unsigned long addr,
unsigned long pfn);
int vm_iomap_memory(struct vm_area_struct *vma, phys_addr_t start, unsigned long len);
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -1605,8 +1605,29 @@ out:
int vm_insert_pfn(struct vm_area_struct *vma, unsigned long addr,
unsigned long pfn)
{
+ return vm_insert_pfn_prot(vma, addr, pfn, vma->vm_page_prot);
+}
+EXPORT_SYMBOL(vm_insert_pfn);
+
+/**
+ * vm_insert_pfn_prot - insert single pfn into user vma with specified pgprot
+ * @vma: user vma to map to
+ * @addr: target user address of this page
+ * @pfn: source kernel pfn
+ * @pgprot: pgprot flags for the inserted page
+ *
+ * This is exactly like vm_insert_pfn, except that it allows drivers to
+ * to override pgprot on a per-page basis.
+ *
+ * This only makes sense for IO mappings, and it makes no sense for
+ * cow mappings. In general, using multiple vmas is preferable;
+ * vm_insert_pfn_prot should only be used if using multiple VMAs is
+ * impractical.
+ */
+int vm_insert_pfn_prot(struct vm_area_struct *vma, unsigned long addr,
+ unsigned long pfn, pgprot_t pgprot)
+{
int ret;
- pgprot_t pgprot = vma->vm_page_prot;
/*
* Technically, architectures with pte_special can avoid all these
* restrictions (same for remap_pfn_range). However we would like
@@ -1628,7 +1649,7 @@ int vm_insert_pfn(struct vm_area_struct

return ret;
}
-EXPORT_SYMBOL(vm_insert_pfn);
+EXPORT_SYMBOL(vm_insert_pfn_prot);

int vm_insert_mixed(struct vm_area_struct *vma, unsigned long addr,
unsigned long pfn)



2018-08-14 17:49:19

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.4 22/43] x86/mm: Move swap offset/type up in PTE to work around erratum

4.4-stable review patch. If anyone has any objections, please let me know.

------------------

From: Dave Hansen <[email protected]>

commit 00839ee3b299303c6a5e26a0a2485427a3afcbbf upstream

This erratum can result in Accessed/Dirty getting set by the hardware
when we do not expect them to be (on !Present PTEs).

Instead of trying to fix them up after this happens, we just
allow the bits to get set and try to ignore them. We do this by
shifting the layout of the bits we use for swap offset/type in
our 64-bit PTEs.

It looks like this:

bitnrs: | ... | 11| 10| 9|8|7|6|5| 4| 3|2|1|0|
names: | ... |SW3|SW2|SW1|G|L|D|A|CD|WT|U|W|P|
before: | OFFSET (9-63) |0|X|X| TYPE(1-5) |0|
after: | OFFSET (14-63) | TYPE (9-13) |0|X|X|X| X| X|X|X|0|

Note that D was already a don't care (X) even before. We just
move TYPE up and turn its old spot (which could be hit by the
A bit) into all don't cares.

We take 5 bits away from the offset, but that still leaves us
with 50 bits which lets us index into a 62-bit swapfile (4 EiB).
I think that's probably fine for the moment. We could
theoretically reclaim 5 of the bits (1, 2, 3, 4, 7) but it
doesn't gain us anything.

Signed-off-by: Dave Hansen <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Brian Gerst <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: Denys Vlasenko <[email protected]>
Cc: H. Peter Anvin <[email protected]>
Cc: Josh Poimboeuf <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Luis R. Rodriguez <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Toshi Kani <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Ingo Molnar <[email protected]>
Signed-off-by: Guenter Roeck <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
arch/x86/include/asm/pgtable_64.h | 26 ++++++++++++++++++++------
1 file changed, 20 insertions(+), 6 deletions(-)

--- a/arch/x86/include/asm/pgtable_64.h
+++ b/arch/x86/include/asm/pgtable_64.h
@@ -163,18 +163,32 @@ static inline int pgd_large(pgd_t pgd) {
#define pte_offset_map(dir, address) pte_offset_kernel((dir), (address))
#define pte_unmap(pte) ((void)(pte))/* NOP */

-/* Encode and de-code a swap entry */
+/*
+ * Encode and de-code a swap entry
+ *
+ * | ... | 11| 10| 9|8|7|6|5| 4| 3|2|1|0| <- bit number
+ * | ... |SW3|SW2|SW1|G|L|D|A|CD|WT|U|W|P| <- bit names
+ * | OFFSET (14->63) | TYPE (10-13) |0|X|X|X| X| X|X|X|0| <- swp entry
+ *
+ * G (8) is aliased and used as a PROT_NONE indicator for
+ * !present ptes. We need to start storing swap entries above
+ * there. We also need to avoid using A and D because of an
+ * erratum where they can be incorrectly set by hardware on
+ * non-present PTEs.
+ */
+#define SWP_TYPE_FIRST_BIT (_PAGE_BIT_PROTNONE + 1)
#define SWP_TYPE_BITS 5
-#define SWP_OFFSET_SHIFT (_PAGE_BIT_PROTNONE + 1)
+/* Place the offset above the type: */
+#define SWP_OFFSET_FIRST_BIT (SWP_TYPE_FIRST_BIT + SWP_TYPE_BITS + 1)

#define MAX_SWAPFILES_CHECK() BUILD_BUG_ON(MAX_SWAPFILES_SHIFT > SWP_TYPE_BITS)

-#define __swp_type(x) (((x).val >> (_PAGE_BIT_PRESENT + 1)) \
+#define __swp_type(x) (((x).val >> (SWP_TYPE_FIRST_BIT)) \
& ((1U << SWP_TYPE_BITS) - 1))
-#define __swp_offset(x) ((x).val >> SWP_OFFSET_SHIFT)
+#define __swp_offset(x) ((x).val >> SWP_OFFSET_FIRST_BIT)
#define __swp_entry(type, offset) ((swp_entry_t) { \
- ((type) << (_PAGE_BIT_PRESENT + 1)) \
- | ((offset) << SWP_OFFSET_SHIFT) })
+ ((type) << (SWP_TYPE_FIRST_BIT)) \
+ | ((offset) << SWP_OFFSET_FIRST_BIT) })
#define __pte_to_swp_entry(pte) ((swp_entry_t) { pte_val((pte)) })
#define __swp_entry_to_pte(x) ((pte_t) { .pte = (x).val })




2018-08-14 17:49:20

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.4 32/43] x86/speculation/l1tf: Disallow non privileged high MMIO PROT_NONE mappings

4.4-stable review patch. If anyone has any objections, please let me know.

------------------

From: Andi Kleen <[email protected]>

commit 42e4089c7890725fcd329999252dc489b72f2921 upstream

For L1TF PROT_NONE mappings are protected by inverting the PFN in the page
table entry. This sets the high bits in the CPU's address space, thus
making sure to point to not point an unmapped entry to valid cached memory.

Some server system BIOSes put the MMIO mappings high up in the physical
address space. If such an high mapping was mapped to unprivileged users
they could attack low memory by setting such a mapping to PROT_NONE. This
could happen through a special device driver which is not access
protected. Normal /dev/mem is of course access protected.

To avoid this forbid PROT_NONE mappings or mprotect for high MMIO mappings.

Valid page mappings are allowed because the system is then unsafe anyways.

It's not expected that users commonly use PROT_NONE on MMIO. But to
minimize any impact this is only enforced if the mapping actually refers to
a high MMIO address (defined as the MAX_PA-1 bit being set), and also skip
the check for root.

For mmaps this is straight forward and can be handled in vm_insert_pfn and
in remap_pfn_range().

For mprotect it's a bit trickier. At the point where the actual PTEs are
accessed a lot of state has been changed and it would be difficult to undo
on an error. Since this is a uncommon case use a separate early page talk
walk pass for MMIO PROT_NONE mappings that checks for this condition
early. For non MMIO and non PROT_NONE there are no changes.

[dwmw2: Backport to 4.9]
[groeck: Backport to 4.4]

Signed-off-by: Andi Kleen <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Reviewed-by: Josh Poimboeuf <[email protected]>
Acked-by: Dave Hansen <[email protected]>
Signed-off-by: David Woodhouse <[email protected]>
Signed-off-by: Guenter Roeck <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
arch/x86/include/asm/pgtable.h | 8 ++++++
arch/x86/mm/mmap.c | 21 +++++++++++++++++
include/asm-generic/pgtable.h | 12 ++++++++++
mm/memory.c | 29 ++++++++++++++++++------
mm/mprotect.c | 49 +++++++++++++++++++++++++++++++++++++++++
5 files changed, 112 insertions(+), 7 deletions(-)

--- a/arch/x86/include/asm/pgtable.h
+++ b/arch/x86/include/asm/pgtable.h
@@ -942,6 +942,14 @@ static inline pte_t pte_swp_clear_soft_d
}
#endif

+#define __HAVE_ARCH_PFN_MODIFY_ALLOWED 1
+extern bool pfn_modify_allowed(unsigned long pfn, pgprot_t prot);
+
+static inline bool arch_has_pfn_modify_check(void)
+{
+ return boot_cpu_has_bug(X86_BUG_L1TF);
+}
+
#include <asm-generic/pgtable.h>
#endif /* __ASSEMBLY__ */

--- a/arch/x86/mm/mmap.c
+++ b/arch/x86/mm/mmap.c
@@ -121,3 +121,24 @@ const char *arch_vma_name(struct vm_area
return "[mpx]";
return NULL;
}
+
+/*
+ * Only allow root to set high MMIO mappings to PROT_NONE.
+ * This prevents an unpriv. user to set them to PROT_NONE and invert
+ * them, then pointing to valid memory for L1TF speculation.
+ *
+ * Note: for locked down kernels may want to disable the root override.
+ */
+bool pfn_modify_allowed(unsigned long pfn, pgprot_t prot)
+{
+ if (!boot_cpu_has_bug(X86_BUG_L1TF))
+ return true;
+ if (!__pte_needs_invert(pgprot_val(prot)))
+ return true;
+ /* If it's real memory always allow */
+ if (pfn_valid(pfn))
+ return true;
+ if (pfn > l1tf_pfn_limit() && !capable(CAP_SYS_ADMIN))
+ return false;
+ return true;
+}
--- a/include/asm-generic/pgtable.h
+++ b/include/asm-generic/pgtable.h
@@ -805,4 +805,16 @@ static inline int pmd_free_pte_page(pmd_
#define io_remap_pfn_range remap_pfn_range
#endif

+#ifndef __HAVE_ARCH_PFN_MODIFY_ALLOWED
+static inline bool pfn_modify_allowed(unsigned long pfn, pgprot_t prot)
+{
+ return true;
+}
+
+static inline bool arch_has_pfn_modify_check(void)
+{
+ return false;
+}
+#endif
+
#endif /* _ASM_GENERIC_PGTABLE_H */
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -1645,6 +1645,9 @@ int vm_insert_pfn_prot(struct vm_area_st
if (track_pfn_insert(vma, &pgprot, pfn))
return -EINVAL;

+ if (!pfn_modify_allowed(pfn, pgprot))
+ return -EACCES;
+
ret = insert_pfn(vma, addr, pfn, pgprot);

return ret;
@@ -1663,6 +1666,9 @@ int vm_insert_mixed(struct vm_area_struc
if (track_pfn_insert(vma, &pgprot, pfn))
return -EINVAL;

+ if (!pfn_modify_allowed(pfn, pgprot))
+ return -EACCES;
+
/*
* If we don't have pte special, then we have to use the pfn_valid()
* based VM_MIXEDMAP scheme (see vm_normal_page), and thus we *must*
@@ -1691,6 +1697,7 @@ static int remap_pte_range(struct mm_str
{
pte_t *pte;
spinlock_t *ptl;
+ int err = 0;

pte = pte_alloc_map_lock(mm, pmd, addr, &ptl);
if (!pte)
@@ -1698,12 +1705,16 @@ static int remap_pte_range(struct mm_str
arch_enter_lazy_mmu_mode();
do {
BUG_ON(!pte_none(*pte));
+ if (!pfn_modify_allowed(pfn, prot)) {
+ err = -EACCES;
+ break;
+ }
set_pte_at(mm, addr, pte, pte_mkspecial(pfn_pte(pfn, prot)));
pfn++;
} while (pte++, addr += PAGE_SIZE, addr != end);
arch_leave_lazy_mmu_mode();
pte_unmap_unlock(pte - 1, ptl);
- return 0;
+ return err;
}

static inline int remap_pmd_range(struct mm_struct *mm, pud_t *pud,
@@ -1712,6 +1723,7 @@ static inline int remap_pmd_range(struct
{
pmd_t *pmd;
unsigned long next;
+ int err;

pfn -= addr >> PAGE_SHIFT;
pmd = pmd_alloc(mm, pud, addr);
@@ -1720,9 +1732,10 @@ static inline int remap_pmd_range(struct
VM_BUG_ON(pmd_trans_huge(*pmd));
do {
next = pmd_addr_end(addr, end);
- if (remap_pte_range(mm, pmd, addr, next,
- pfn + (addr >> PAGE_SHIFT), prot))
- return -ENOMEM;
+ err = remap_pte_range(mm, pmd, addr, next,
+ pfn + (addr >> PAGE_SHIFT), prot);
+ if (err)
+ return err;
} while (pmd++, addr = next, addr != end);
return 0;
}
@@ -1733,6 +1746,7 @@ static inline int remap_pud_range(struct
{
pud_t *pud;
unsigned long next;
+ int err;

pfn -= addr >> PAGE_SHIFT;
pud = pud_alloc(mm, pgd, addr);
@@ -1740,9 +1754,10 @@ static inline int remap_pud_range(struct
return -ENOMEM;
do {
next = pud_addr_end(addr, end);
- if (remap_pmd_range(mm, pud, addr, next,
- pfn + (addr >> PAGE_SHIFT), prot))
- return -ENOMEM;
+ err = remap_pmd_range(mm, pud, addr, next,
+ pfn + (addr >> PAGE_SHIFT), prot);
+ if (err)
+ return err;
} while (pud++, addr = next, addr != end);
return 0;
}
--- a/mm/mprotect.c
+++ b/mm/mprotect.c
@@ -255,6 +255,42 @@ unsigned long change_protection(struct v
return pages;
}

+static int prot_none_pte_entry(pte_t *pte, unsigned long addr,
+ unsigned long next, struct mm_walk *walk)
+{
+ return pfn_modify_allowed(pte_pfn(*pte), *(pgprot_t *)(walk->private)) ?
+ 0 : -EACCES;
+}
+
+static int prot_none_hugetlb_entry(pte_t *pte, unsigned long hmask,
+ unsigned long addr, unsigned long next,
+ struct mm_walk *walk)
+{
+ return pfn_modify_allowed(pte_pfn(*pte), *(pgprot_t *)(walk->private)) ?
+ 0 : -EACCES;
+}
+
+static int prot_none_test(unsigned long addr, unsigned long next,
+ struct mm_walk *walk)
+{
+ return 0;
+}
+
+static int prot_none_walk(struct vm_area_struct *vma, unsigned long start,
+ unsigned long end, unsigned long newflags)
+{
+ pgprot_t new_pgprot = vm_get_page_prot(newflags);
+ struct mm_walk prot_none_walk = {
+ .pte_entry = prot_none_pte_entry,
+ .hugetlb_entry = prot_none_hugetlb_entry,
+ .test_walk = prot_none_test,
+ .mm = current->mm,
+ .private = &new_pgprot,
+ };
+
+ return walk_page_range(start, end, &prot_none_walk);
+}
+
int
mprotect_fixup(struct vm_area_struct *vma, struct vm_area_struct **pprev,
unsigned long start, unsigned long end, unsigned long newflags)
@@ -273,6 +309,19 @@ mprotect_fixup(struct vm_area_struct *vm
}

/*
+ * Do PROT_NONE PFN permission checks here when we can still
+ * bail out without undoing a lot of state. This is a rather
+ * uncommon case, so doesn't need to be very optimized.
+ */
+ if (arch_has_pfn_modify_check() &&
+ (vma->vm_flags & (VM_PFNMAP|VM_MIXEDMAP)) &&
+ (newflags & (VM_READ|VM_WRITE|VM_EXEC)) == 0) {
+ error = prot_none_walk(vma, start, end, newflags);
+ if (error)
+ return error;
+ }
+
+ /*
* If we make a private mapping writable we increase our commit;
* but (without finer accounting) cannot reduce our commit if we
* make it unwritable again. hugetlb mapping were accounted for



2018-08-14 17:49:21

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.4 09/43] scsi: sr: Avoid that opening a CD-ROM hangs with runtime power management enabled

4.4-stable review patch. If anyone has any objections, please let me know.

------------------

From: Bart Van Assche <[email protected]>

commit 1214fd7b497400d200e3f4e64e2338b303a20949 upstream.

Surround scsi_execute() calls with scsi_autopm_get_device() and
scsi_autopm_put_device(). Note: removing sr_mutex protection from the
scsi_cd_get() and scsi_cd_put() calls is safe because the purpose of
sr_mutex is to serialize cdrom_*() calls.

This patch avoids that complaints similar to the following appear in the
kernel log if runtime power management is enabled:

INFO: task systemd-udevd:650 blocked for more than 120 seconds.
Not tainted 4.18.0-rc7-dbg+ #1
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
systemd-udevd D28176 650 513 0x00000104
Call Trace:
__schedule+0x444/0xfe0
schedule+0x4e/0xe0
schedule_preempt_disabled+0x18/0x30
__mutex_lock+0x41c/0xc70
mutex_lock_nested+0x1b/0x20
__blkdev_get+0x106/0x970
blkdev_get+0x22c/0x5a0
blkdev_open+0xe9/0x100
do_dentry_open.isra.19+0x33e/0x570
vfs_open+0x7c/0xd0
path_openat+0x6e3/0x1120
do_filp_open+0x11c/0x1c0
do_sys_open+0x208/0x2d0
__x64_sys_openat+0x59/0x70
do_syscall_64+0x77/0x230
entry_SYSCALL_64_after_hwframe+0x49/0xbe

Signed-off-by: Bart Van Assche <[email protected]>
Cc: Maurizio Lombardi <[email protected]>
Cc: Johannes Thumshirn <[email protected]>
Cc: Alan Stern <[email protected]>
Cc: <[email protected]>
Tested-by: Johannes Thumshirn <[email protected]>
Reviewed-by: Johannes Thumshirn <[email protected]>
Signed-off-by: Martin K. Petersen <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
drivers/scsi/sr.c | 29 +++++++++++++++++++++--------
1 file changed, 21 insertions(+), 8 deletions(-)

--- a/drivers/scsi/sr.c
+++ b/drivers/scsi/sr.c
@@ -520,18 +520,26 @@ static int sr_init_command(struct scsi_c
static int sr_block_open(struct block_device *bdev, fmode_t mode)
{
struct scsi_cd *cd;
+ struct scsi_device *sdev;
int ret = -ENXIO;

+ cd = scsi_cd_get(bdev->bd_disk);
+ if (!cd)
+ goto out;
+
+ sdev = cd->device;
+ scsi_autopm_get_device(sdev);
check_disk_change(bdev);

mutex_lock(&sr_mutex);
- cd = scsi_cd_get(bdev->bd_disk);
- if (cd) {
- ret = cdrom_open(&cd->cdi, bdev, mode);
- if (ret)
- scsi_cd_put(cd);
- }
+ ret = cdrom_open(&cd->cdi, bdev, mode);
mutex_unlock(&sr_mutex);
+
+ scsi_autopm_put_device(sdev);
+ if (ret)
+ scsi_cd_put(cd);
+
+out:
return ret;
}

@@ -559,6 +567,8 @@ static int sr_block_ioctl(struct block_d
if (ret)
goto out;

+ scsi_autopm_get_device(sdev);
+
/*
* Send SCSI addressing ioctls directly to mid level, send other
* ioctls to cdrom/block level.
@@ -567,15 +577,18 @@ static int sr_block_ioctl(struct block_d
case SCSI_IOCTL_GET_IDLUN:
case SCSI_IOCTL_GET_BUS_NUMBER:
ret = scsi_ioctl(sdev, cmd, argp);
- goto out;
+ goto put;
}

ret = cdrom_ioctl(&cd->cdi, bdev, mode, cmd, arg);
if (ret != -ENOSYS)
- goto out;
+ goto put;

ret = scsi_ioctl(sdev, cmd, argp);

+put:
+ scsi_autopm_put_device(sdev);
+
out:
mutex_unlock(&sr_mutex);
return ret;



2018-08-14 17:49:28

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.4 38/43] x86/speculation/l1tf: Fix up pte->pfn conversion for PAE

4.4-stable review patch. If anyone has any objections, please let me know.

------------------

From: Michal Hocko <[email protected]>

commit e14d7dfb41f5807a0c1c26a13f2b8ef16af24935 upstream

Jan has noticed that pte_pfn and co. resp. pfn_pte are incorrect for
CONFIG_PAE because phys_addr_t is wider than unsigned long and so the
pte_val reps. shift left would get truncated. Fix this up by using proper
types.

[dwmw2: Backport to 4.9]

Fixes: 6b28baca9b1f ("x86/speculation/l1tf: Protect PROT_NONE PTEs against speculation")
Reported-by: Jan Beulich <[email protected]>
Signed-off-by: Michal Hocko <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Acked-by: Vlastimil Babka <[email protected]>
Signed-off-by: David Woodhouse <[email protected]>
Signed-off-by: Guenter Roeck <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
arch/x86/include/asm/pgtable.h | 10 +++++-----
1 file changed, 5 insertions(+), 5 deletions(-)

--- a/arch/x86/include/asm/pgtable.h
+++ b/arch/x86/include/asm/pgtable.h
@@ -154,21 +154,21 @@ static inline u64 protnone_mask(u64 val)

static inline unsigned long pte_pfn(pte_t pte)
{
- unsigned long pfn = pte_val(pte);
+ phys_addr_t pfn = pte_val(pte);
pfn ^= protnone_mask(pfn);
return (pfn & PTE_PFN_MASK) >> PAGE_SHIFT;
}

static inline unsigned long pmd_pfn(pmd_t pmd)
{
- unsigned long pfn = pmd_val(pmd);
+ phys_addr_t pfn = pmd_val(pmd);
pfn ^= protnone_mask(pfn);
return (pfn & pmd_pfn_mask(pmd)) >> PAGE_SHIFT;
}

static inline unsigned long pud_pfn(pud_t pud)
{
- unsigned long pfn = pud_val(pud);
+ phys_addr_t pfn = pud_val(pud);
pfn ^= protnone_mask(pfn);
return (pfn & pud_pfn_mask(pud)) >> PAGE_SHIFT;
}
@@ -369,7 +369,7 @@ static inline pgprotval_t massage_pgprot

static inline pte_t pfn_pte(unsigned long page_nr, pgprot_t pgprot)
{
- phys_addr_t pfn = page_nr << PAGE_SHIFT;
+ phys_addr_t pfn = (phys_addr_t)page_nr << PAGE_SHIFT;
pfn ^= protnone_mask(pgprot_val(pgprot));
pfn &= PTE_PFN_MASK;
return __pte(pfn | massage_pgprot(pgprot));
@@ -377,7 +377,7 @@ static inline pte_t pfn_pte(unsigned lon

static inline pmd_t pfn_pmd(unsigned long page_nr, pgprot_t pgprot)
{
- phys_addr_t pfn = page_nr << PAGE_SHIFT;
+ phys_addr_t pfn = (phys_addr_t)page_nr << PAGE_SHIFT;
pfn ^= protnone_mask(pgprot_val(pgprot));
pfn &= PHYSICAL_PMD_PAGE_MASK;
return __pmd(pfn | massage_pgprot(pgprot));



2018-08-14 17:49:36

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.4 39/43] x86/speculation/l1tf: Invert all not present mappings

4.4-stable review patch. If anyone has any objections, please let me know.

------------------

From: Andi Kleen <[email protected]>

commit f22cc87f6c1f771b57c407555cfefd811cdd9507 upstream

For kernel mappings PAGE_PROTNONE is not necessarily set for a non present
mapping, but the inversion logic explicitely checks for !PRESENT and
PROT_NONE.

Remove the PROT_NONE check and make the inversion unconditional for all not
present mappings.

Signed-off-by: Andi Kleen <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Signed-off-by: David Woodhouse <[email protected]>
Signed-off-by: Guenter Roeck <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
arch/x86/include/asm/pgtable-invert.h | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

--- a/arch/x86/include/asm/pgtable-invert.h
+++ b/arch/x86/include/asm/pgtable-invert.h
@@ -6,7 +6,7 @@

static inline bool __pte_needs_invert(u64 val)
{
- return (val & (_PAGE_PRESENT|_PAGE_PROTNONE)) == _PAGE_PROTNONE;
+ return !(val & _PAGE_PRESENT);
}

/* Get a mask to xor with the page table entry to get the correct pfn. */



2018-08-14 17:49:39

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.4 04/43] fork: unconditionally clear stack on fork

4.4-stable review patch. If anyone has any objections, please let me know.

------------------

From: Kees Cook <[email protected]>

commit e01e80634ecdde1dd113ac43b3adad21b47f3957 upstream.

One of the classes of kernel stack content leaks[1] is exposing the
contents of prior heap or stack contents when a new process stack is
allocated. Normally, those stacks are not zeroed, and the old contents
remain in place. In the face of stack content exposure flaws, those
contents can leak to userspace.

Fixing this will make the kernel no longer vulnerable to these flaws, as
the stack will be wiped each time a stack is assigned to a new process.
There's not a meaningful change in runtime performance; it almost looks
like it provides a benefit.

Performing back-to-back kernel builds before:
Run times: 157.86 157.09 158.90 160.94 160.80
Mean: 159.12
Std Dev: 1.54

and after:
Run times: 159.31 157.34 156.71 158.15 160.81
Mean: 158.46
Std Dev: 1.46

Instead of making this a build or runtime config, Andy Lutomirski
recommended this just be enabled by default.

[1] A noisy search for many kinds of stack content leaks can be seen here:
https://cve.mitre.org/cgi-bin/cvekey.cgi?keyword=linux+kernel+stack+leak

I did some more with perf and cycle counts on running 100,000 execs of
/bin/true.

before:
Cycles: 218858861551 218853036130 214727610969 227656844122 224980542841
Mean: 221015379122.60
Std Dev: 4662486552.47

after:
Cycles: 213868945060 213119275204 211820169456 224426673259 225489986348
Mean: 217745009865.40
Std Dev: 5935559279.99

It continues to look like it's faster, though the deviation is rather
wide, but I'm not sure what I could do that would be less noisy. I'm
open to ideas!

Link: http://lkml.kernel.org/r/20180221021659.GA37073@beast
Signed-off-by: Kees Cook <[email protected]>
Acked-by: Michal Hocko <[email protected]>
Reviewed-by: Andrew Morton <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Laura Abbott <[email protected]>
Cc: Rasmus Villemoes <[email protected]>
Cc: Mel Gorman <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
[ Srivatsa: Backported to 4.4.y ]
Signed-off-by: Srivatsa S. Bhat <[email protected]>
Reviewed-by: Srinidhi Rao <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
include/linux/thread_info.h | 6 +-----
1 file changed, 1 insertion(+), 5 deletions(-)

--- a/include/linux/thread_info.h
+++ b/include/linux/thread_info.h
@@ -55,11 +55,7 @@ extern long do_no_restart_syscall(struct

#ifdef __KERNEL__

-#ifdef CONFIG_DEBUG_STACK_USAGE
-# define THREADINFO_GFP (GFP_KERNEL | __GFP_NOTRACK | __GFP_ZERO)
-#else
-# define THREADINFO_GFP (GFP_KERNEL | __GFP_NOTRACK)
-#endif
+#define THREADINFO_GFP (GFP_KERNEL | __GFP_NOTRACK | __GFP_ZERO)

/*
* flag set/clear/test wrappers



2018-08-14 17:49:39

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.4 05/43] parisc: Enable CONFIG_MLONGCALLS by default

4.4-stable review patch. If anyone has any objections, please let me know.

------------------

From: Helge Deller <[email protected]>

commit 66509a276c8c1d19ee3f661a41b418d101c57d29 upstream.

Enable the -mlong-calls compiler option by default, because otherwise in most
cases linking the vmlinux binary fails due to truncations of R_PARISC_PCREL22F
relocations. This fixes building the 64-bit defconfig.

Cc: [email protected] # 4.0+
Signed-off-by: Helge Deller <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
arch/parisc/Kconfig | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

--- a/arch/parisc/Kconfig
+++ b/arch/parisc/Kconfig
@@ -177,7 +177,7 @@ config PREFETCH

config MLONGCALLS
bool "Enable the -mlong-calls compiler option for big kernels"
- def_bool y if (!MODULES)
+ default y
depends on PA8X00
help
If you configure the kernel to include many drivers built-in instead



2018-08-14 17:49:42

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.4 41/43] x86/mm/pat: Make set_memory_np() L1TF safe

4.4-stable review patch. If anyone has any objections, please let me know.

------------------

From: Andi Kleen <[email protected]>

commit 958f79b9ee55dfaf00c8106ed1c22a2919e0028b upstream

set_memory_np() is used to mark kernel mappings not present, but it has
it's own open coded mechanism which does not have the L1TF protection of
inverting the address bits.

Replace the open coded PTE manipulation with the L1TF protecting low level
PTE routines.

Passes the CPA self test.

Signed-off-by: Andi Kleen <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
[ dwmw2: Pull in pud_mkhuge() from commit a00cc7d9dd, and pfn_pud() ]
Signed-off-by: David Woodhouse <[email protected]>
[groeck: port to 4.4]
Signed-off-by: Guenter Roeck <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
arch/x86/include/asm/pgtable.h | 27 +++++++++++++++++++++++++++
arch/x86/mm/pageattr.c | 8 ++++----
2 files changed, 31 insertions(+), 4 deletions(-)

--- a/arch/x86/include/asm/pgtable.h
+++ b/arch/x86/include/asm/pgtable.h
@@ -378,12 +378,39 @@ static inline pmd_t pfn_pmd(unsigned lon
return __pmd(pfn | massage_pgprot(pgprot));
}

+static inline pud_t pfn_pud(unsigned long page_nr, pgprot_t pgprot)
+{
+ phys_addr_t pfn = page_nr << PAGE_SHIFT;
+ pfn ^= protnone_mask(pgprot_val(pgprot));
+ pfn &= PHYSICAL_PUD_PAGE_MASK;
+ return __pud(pfn | massage_pgprot(pgprot));
+}
+
static inline pmd_t pmd_mknotpresent(pmd_t pmd)
{
return pfn_pmd(pmd_pfn(pmd),
__pgprot(pmd_flags(pmd) & ~(_PAGE_PRESENT|_PAGE_PROTNONE)));
}

+static inline pud_t pud_set_flags(pud_t pud, pudval_t set)
+{
+ pudval_t v = native_pud_val(pud);
+
+ return __pud(v | set);
+}
+
+static inline pud_t pud_clear_flags(pud_t pud, pudval_t clear)
+{
+ pudval_t v = native_pud_val(pud);
+
+ return __pud(v & ~clear);
+}
+
+static inline pud_t pud_mkhuge(pud_t pud)
+{
+ return pud_set_flags(pud, _PAGE_PSE);
+}
+
static inline u64 flip_protnone_guard(u64 oldval, u64 val, u64 mask);

static inline pte_t pte_modify(pte_t pte, pgprot_t newprot)
--- a/arch/x86/mm/pageattr.c
+++ b/arch/x86/mm/pageattr.c
@@ -1006,8 +1006,8 @@ static int populate_pmd(struct cpa_data

pmd = pmd_offset(pud, start);

- set_pmd(pmd, __pmd(cpa->pfn | _PAGE_PSE |
- massage_pgprot(pmd_pgprot)));
+ set_pmd(pmd, pmd_mkhuge(pfn_pmd(cpa->pfn,
+ canon_pgprot(pmd_pgprot))));

start += PMD_SIZE;
cpa->pfn += PMD_SIZE;
@@ -1079,8 +1079,8 @@ static int populate_pud(struct cpa_data
* Map everything starting from the Gb boundary, possibly with 1G pages
*/
while (end - start >= PUD_SIZE) {
- set_pud(pud, __pud(cpa->pfn | _PAGE_PSE |
- massage_pgprot(pud_pgprot)));
+ set_pud(pud, pud_mkhuge(pfn_pud(cpa->pfn,
+ canon_pgprot(pud_pgprot))));

start += PUD_SIZE;
cpa->pfn += PUD_SIZE;



2018-08-14 17:49:50

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.4 35/43] x86/speculation/l1tf: Extend 64bit swap file size limit

4.4-stable review patch. If anyone has any objections, please let me know.

------------------

From: Vlastimil Babka <[email protected]>

commit 1a7ed1ba4bba6c075d5ad61bb75e3fbc870840d6 upstream

The previous patch has limited swap file size so that large offsets cannot
clear bits above MAX_PA/2 in the pte and interfere with L1TF mitigation.

It assumed that offsets are encoded starting with bit 12, same as pfn. But
on x86_64, offsets are encoded starting with bit 9.

Thus the limit can be raised by 3 bits. That means 16TB with 42bit MAX_PA
and 256TB with 46bit MAX_PA.

Fixes: 377eeaa8e11f ("x86/speculation/l1tf: Limit swap file size to MAX_PA/2")
Signed-off-by: Vlastimil Babka <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Signed-off-by: David Woodhouse <[email protected]>
Signed-off-by: Guenter Roeck <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
arch/x86/mm/init.c | 10 +++++++++-
1 file changed, 9 insertions(+), 1 deletion(-)

--- a/arch/x86/mm/init.c
+++ b/arch/x86/mm/init.c
@@ -778,7 +778,15 @@ unsigned long max_swapfile_size(void)

if (boot_cpu_has_bug(X86_BUG_L1TF)) {
/* Limit the swap file size to MAX_PA/2 for L1TF workaround */
- pages = min_t(unsigned long, l1tf_pfn_limit() + 1, pages);
+ unsigned long l1tf_limit = l1tf_pfn_limit() + 1;
+ /*
+ * We encode swap offsets also with 3 bits below those for pfn
+ * which makes the usable limit higher.
+ */
+#ifdef CONFIG_X86_64
+ l1tf_limit <<= PAGE_SHIFT - SWP_OFFSET_FIRST_BIT;
+#endif
+ pages = min_t(unsigned long, l1tf_limit, pages);
}
return pages;
}



2018-08-14 17:49:56

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.4 36/43] x86/cpufeatures: Add detection of L1D cache flush support.

4.4-stable review patch. If anyone has any objections, please let me know.

------------------

From: Konrad Rzeszutek Wilk <[email protected]>

commit 11e34e64e4103955fc4568750914c75d65ea87ee upstream

336996-Speculative-Execution-Side-Channel-Mitigations.pdf defines a new MSR
(IA32_FLUSH_CMD) which is detected by CPUID.7.EDX[28]=1 bit being set.

This new MSR "gives software a way to invalidate structures with finer
granularity than other architectual methods like WBINVD."

A copy of this document is available at
https://bugzilla.kernel.org/show_bug.cgi?id=199511

Signed-off-by: Konrad Rzeszutek Wilk <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Signed-off-by: David Woodhouse <[email protected]>
Signed-off-by: Guenter Roeck <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
arch/x86/include/asm/cpufeatures.h | 1 +
1 file changed, 1 insertion(+)

--- a/arch/x86/include/asm/cpufeatures.h
+++ b/arch/x86/include/asm/cpufeatures.h
@@ -310,6 +310,7 @@
#define X86_FEATURE_AVX512_4FMAPS (18*32+ 3) /* AVX-512 Multiply Accumulation Single precision */
#define X86_FEATURE_SPEC_CTRL (18*32+26) /* "" Speculation Control (IBRS + IBPB) */
#define X86_FEATURE_INTEL_STIBP (18*32+27) /* "" Single Thread Indirect Branch Predictors */
+#define X86_FEATURE_FLUSH_L1D (18*32+28) /* Flush L1D cache */
#define X86_FEATURE_ARCH_CAPABILITIES (18*32+29) /* IA32_ARCH_CAPABILITIES MSR (Intel) */
#define X86_FEATURE_SPEC_CTRL_SSBD (18*32+31) /* "" Speculative Store Bypass Disable */




2018-08-14 17:50:20

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.4 42/43] x86/mm/kmmio: Make the tracer robust against L1TF

4.4-stable review patch. If anyone has any objections, please let me know.

------------------

From: Andi Kleen <[email protected]>

commit 1063711b57393c1999248cccb57bebfaf16739e7 upstream

The mmio tracer sets io mapping PTEs and PMDs to non present when enabled
without inverting the address bits, which makes the PTE entry vulnerable
for L1TF.

Make it use the right low level macros to actually invert the address bits
to protect against L1TF.

In principle this could be avoided because MMIO tracing is not likely to be
enabled on production machines, but the fix is straigt forward and for
consistency sake it's better to get rid of the open coded PTE manipulation.

Signed-off-by: Andi Kleen <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Signed-off-by: David Woodhouse <[email protected]>
Signed-off-by: Guenter Roeck <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
arch/x86/mm/kmmio.c | 25 +++++++++++++++----------
1 file changed, 15 insertions(+), 10 deletions(-)

--- a/arch/x86/mm/kmmio.c
+++ b/arch/x86/mm/kmmio.c
@@ -125,24 +125,29 @@ static struct kmmio_fault_page *get_kmmi

static void clear_pmd_presence(pmd_t *pmd, bool clear, pmdval_t *old)
{
+ pmd_t new_pmd;
pmdval_t v = pmd_val(*pmd);
if (clear) {
- *old = v & _PAGE_PRESENT;
- v &= ~_PAGE_PRESENT;
- } else /* presume this has been called with clear==true previously */
- v |= *old;
- set_pmd(pmd, __pmd(v));
+ *old = v;
+ new_pmd = pmd_mknotpresent(*pmd);
+ } else {
+ /* Presume this has been called with clear==true previously */
+ new_pmd = __pmd(*old);
+ }
+ set_pmd(pmd, new_pmd);
}

static void clear_pte_presence(pte_t *pte, bool clear, pteval_t *old)
{
pteval_t v = pte_val(*pte);
if (clear) {
- *old = v & _PAGE_PRESENT;
- v &= ~_PAGE_PRESENT;
- } else /* presume this has been called with clear==true previously */
- v |= *old;
- set_pte_atomic(pte, __pte(v));
+ *old = v;
+ /* Nothing should care about address */
+ pte_clear(&init_mm, 0, pte);
+ } else {
+ /* Presume this has been called with clear==true previously */
+ set_pte_atomic(pte, __pte(*old));
+ }
}

static int clear_page_presence(struct kmmio_fault_page *f, bool clear)



2018-08-14 17:50:21

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.4 43/43] x86/speculation/l1tf: Fix up CPU feature flags

4.4-stable review patch. If anyone has any objections, please let me know.

------------------

From: Guenter Roeck <[email protected]>

In linux-4.4.y, the definition of X86_FEATURE_RETPOLINE and
X86_FEATURE_RETPOLINE_AMD is different from the upstream
definition. Result is an overlap with the newly introduced
X86_FEATURE_L1TF_PTEINV. Update RETPOLINE definitions to match
upstream definitions to improve alignment with upstream code.

Signed-off-by: Guenter Roeck <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
arch/x86/include/asm/cpufeatures.h | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)

--- a/arch/x86/include/asm/cpufeatures.h
+++ b/arch/x86/include/asm/cpufeatures.h
@@ -193,12 +193,12 @@
#define X86_FEATURE_HW_PSTATE ( 7*32+ 8) /* AMD HW-PState */
#define X86_FEATURE_PROC_FEEDBACK ( 7*32+ 9) /* AMD ProcFeedbackInterface */

+#define X86_FEATURE_RETPOLINE ( 7*32+12) /* "" Generic Retpoline mitigation for Spectre variant 2 */
+#define X86_FEATURE_RETPOLINE_AMD ( 7*32+13) /* "" AMD Retpoline mitigation for Spectre variant 2 */
+
#define X86_FEATURE_INTEL_PT ( 7*32+15) /* Intel Processor Trace */
#define X86_FEATURE_RSB_CTXSW ( 7*32+19) /* "" Fill RSB on context switches */

-#define X86_FEATURE_RETPOLINE ( 7*32+29) /* "" Generic Retpoline mitigation for Spectre variant 2 */
-#define X86_FEATURE_RETPOLINE_AMD ( 7*32+30) /* "" AMD Retpoline mitigation for Spectre variant 2 */
-
#define X86_FEATURE_MSR_SPEC_CTRL ( 7*32+16) /* "" MSR SPEC_CTRL is implemented */
#define X86_FEATURE_SSBD ( 7*32+17) /* Speculative Store Bypass Disable */




2018-08-14 17:50:21

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.4 08/43] ACPI / LPSS: Add missing prv_offset setting for byt/cht PWM devices

4.4-stable review patch. If anyone has any objections, please let me know.

------------------

From: Hans de Goede <[email protected]>

commit fdcb613d49321b5bf5d5a1bd0fba8e7c241dcc70 upstream.

The LPSS PWM device on on Bay Trail and Cherry Trail devices has a set
of private registers at offset 0x800, the current lpss_device_desc for
them already sets the LPSS_SAVE_CTX flag to have these saved/restored
over device-suspend, but the current lpss_device_desc was not setting
the prv_offset field, leading to the regular device registers getting
saved/restored instead.

This is causing the PWM controller to no longer work, resulting in a black
screen, after a suspend/resume on systems where the firmware clears the
APB clock and reset bits at offset 0x804.

This commit fixes this by properly setting prv_offset to 0x800 for
the PWM devices.

Cc: [email protected]
Fixes: e1c748179754 ("ACPI / LPSS: Add Intel BayTrail ACPI mode PWM")
Fixes: 1bfbd8eb8a7f ("ACPI / LPSS: Add ACPI IDs for Intel Braswell")
Signed-off-by: Hans de Goede <[email protected]>
Acked-by: Rafael J . Wysocki <[email protected]>
Signed-off-by: Thierry Reding <[email protected]>
Signed-off-by: Sudip Mukherjee <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
drivers/acpi/acpi_lpss.c | 2 ++
1 file changed, 2 insertions(+)

--- a/drivers/acpi/acpi_lpss.c
+++ b/drivers/acpi/acpi_lpss.c
@@ -154,10 +154,12 @@ static const struct lpss_device_desc lpt

static const struct lpss_device_desc byt_pwm_dev_desc = {
.flags = LPSS_SAVE_CTX,
+ .prv_offset = 0x800,
};

static const struct lpss_device_desc bsw_pwm_dev_desc = {
.flags = LPSS_SAVE_CTX | LPSS_NO_D3_DELAY,
+ .prv_offset = 0x800,
};

static const struct lpss_device_desc byt_uart_dev_desc = {



2018-08-14 17:50:38

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.4 37/43] x86/speculation/l1tf: Protect PAE swap entries against L1TF

4.4-stable review patch. If anyone has any objections, please let me know.

------------------

From: Vlastimil Babka <[email protected]>

commit 0d0f6249058834ffe1ceaad0bb31464af66f6e7a upstream

The PAE 3-level paging code currently doesn't mitigate L1TF by flipping the
offset bits, and uses the high PTE word, thus bits 32-36 for type, 37-63 for
offset. The lower word is zeroed, thus systems with less than 4GB memory are
safe. With 4GB to 128GB the swap type selects the memory locations vulnerable
to L1TF; with even more memory, also the swap offfset influences the address.
This might be a problem with 32bit PAE guests running on large 64bit hosts.

By continuing to keep the whole swap entry in either high or low 32bit word of
PTE we would limit the swap size too much. Thus this patch uses the whole PAE
PTE with the same layout as the 64bit version does. The macros just become a
bit tricky since they assume the arch-dependent swp_entry_t to be 32bit.

Signed-off-by: Vlastimil Babka <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Acked-by: Michal Hocko <[email protected]>
Signed-off-by: David Woodhouse <[email protected]>
Signed-off-by: Guenter Roeck <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
arch/x86/include/asm/pgtable-3level.h | 35 ++++++++++++++++++++++++++++++++--
arch/x86/mm/init.c | 2 -
2 files changed, 34 insertions(+), 3 deletions(-)

--- a/arch/x86/include/asm/pgtable-3level.h
+++ b/arch/x86/include/asm/pgtable-3level.h
@@ -177,12 +177,43 @@ static inline pmd_t native_pmdp_get_and_
#endif

/* Encode and de-code a swap entry */
+#define SWP_TYPE_BITS 5
+
+#define SWP_OFFSET_FIRST_BIT (_PAGE_BIT_PROTNONE + 1)
+
+/* We always extract/encode the offset by shifting it all the way up, and then down again */
+#define SWP_OFFSET_SHIFT (SWP_OFFSET_FIRST_BIT + SWP_TYPE_BITS)
+
#define MAX_SWAPFILES_CHECK() BUILD_BUG_ON(MAX_SWAPFILES_SHIFT > 5)
#define __swp_type(x) (((x).val) & 0x1f)
#define __swp_offset(x) ((x).val >> 5)
#define __swp_entry(type, offset) ((swp_entry_t){(type) | (offset) << 5})
-#define __pte_to_swp_entry(pte) ((swp_entry_t){ (pte).pte_high })
-#define __swp_entry_to_pte(x) ((pte_t){ { .pte_high = (x).val } })
+
+/*
+ * Normally, __swp_entry() converts from arch-independent swp_entry_t to
+ * arch-dependent swp_entry_t, and __swp_entry_to_pte() just stores the result
+ * to pte. But here we have 32bit swp_entry_t and 64bit pte, and need to use the
+ * whole 64 bits. Thus, we shift the "real" arch-dependent conversion to
+ * __swp_entry_to_pte() through the following helper macro based on 64bit
+ * __swp_entry().
+ */
+#define __swp_pteval_entry(type, offset) ((pteval_t) { \
+ (~(pteval_t)(offset) << SWP_OFFSET_SHIFT >> SWP_TYPE_BITS) \
+ | ((pteval_t)(type) << (64 - SWP_TYPE_BITS)) })
+
+#define __swp_entry_to_pte(x) ((pte_t){ .pte = \
+ __swp_pteval_entry(__swp_type(x), __swp_offset(x)) })
+/*
+ * Analogically, __pte_to_swp_entry() doesn't just extract the arch-dependent
+ * swp_entry_t, but also has to convert it from 64bit to the 32bit
+ * intermediate representation, using the following macros based on 64bit
+ * __swp_type() and __swp_offset().
+ */
+#define __pteval_swp_type(x) ((unsigned long)((x).pte >> (64 - SWP_TYPE_BITS)))
+#define __pteval_swp_offset(x) ((unsigned long)(~((x).pte) << SWP_TYPE_BITS >> SWP_OFFSET_SHIFT))
+
+#define __pte_to_swp_entry(pte) (__swp_entry(__pteval_swp_type(pte), \
+ __pteval_swp_offset(pte)))

#include <asm/pgtable-invert.h>

--- a/arch/x86/mm/init.c
+++ b/arch/x86/mm/init.c
@@ -783,7 +783,7 @@ unsigned long max_swapfile_size(void)
* We encode swap offsets also with 3 bits below those for pfn
* which makes the usable limit higher.
*/
-#ifdef CONFIG_X86_64
+#if CONFIG_PGTABLE_LEVELS > 2
l1tf_limit <<= PAGE_SHIFT - SWP_OFFSET_FIRST_BIT;
#endif
pages = min_t(unsigned long, l1tf_limit, pages);



2018-08-14 17:50:41

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.4 40/43] x86/speculation/l1tf: Make pmd/pud_mknotpresent() invert

4.4-stable review patch. If anyone has any objections, please let me know.

------------------

From: Andi Kleen <[email protected]>

commit 0768f91530ff46683e0b372df14fd79fe8d156e5 upstream

Some cases in THP like:
- MADV_FREE
- mprotect
- split

mark the PMD non present for temporarily to prevent races. The window for
an L1TF attack in these contexts is very small, but it wants to be fixed
for correctness sake.

Use the proper low level functions for pmd/pud_mknotpresent() to address
this.

Signed-off-by: Andi Kleen <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Signed-off-by: David Woodhouse <[email protected]>
Signed-off-by: Guenter Roeck <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
arch/x86/include/asm/pgtable.h | 11 ++++++-----
1 file changed, 6 insertions(+), 5 deletions(-)

--- a/arch/x86/include/asm/pgtable.h
+++ b/arch/x86/include/asm/pgtable.h
@@ -315,11 +315,6 @@ static inline pmd_t pmd_mkwrite(pmd_t pm
return pmd_set_flags(pmd, _PAGE_RW);
}

-static inline pmd_t pmd_mknotpresent(pmd_t pmd)
-{
- return pmd_clear_flags(pmd, _PAGE_PRESENT | _PAGE_PROTNONE);
-}
-
#ifdef CONFIG_HAVE_ARCH_SOFT_DIRTY
static inline int pte_soft_dirty(pte_t pte)
{
@@ -383,6 +378,12 @@ static inline pmd_t pfn_pmd(unsigned lon
return __pmd(pfn | massage_pgprot(pgprot));
}

+static inline pmd_t pmd_mknotpresent(pmd_t pmd)
+{
+ return pfn_pmd(pmd_pfn(pmd),
+ __pgprot(pmd_flags(pmd) & ~(_PAGE_PRESENT|_PAGE_PROTNONE)));
+}
+
static inline u64 flip_protnone_guard(u64 oldval, u64 val, u64 mask);

static inline pte_t pte_modify(pte_t pte, pgprot_t newprot)



2018-08-14 17:51:44

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.4 27/43] x86/speculation/l1tf: Protect PROT_NONE PTEs against speculation

4.4-stable review patch. If anyone has any objections, please let me know.

------------------

From: Andi Kleen <[email protected]>

commit 6b28baca9b1f0d4a42b865da7a05b1c81424bd5c upstream

When PTEs are set to PROT_NONE the kernel just clears the Present bit and
preserves the PFN, which creates attack surface for L1TF speculation
speculation attacks.

This is important inside guests, because L1TF speculation bypasses physical
page remapping. While the host has its own migitations preventing leaking
data from other VMs into the guest, this would still risk leaking the wrong
page inside the current guest.

This uses the same technique as Linus' swap entry patch: while an entry is
is in PROTNONE state invert the complete PFN part part of it. This ensures
that the the highest bit will point to non existing memory.

The invert is done by pte/pmd_modify and pfn/pmd/pud_pte for PROTNONE and
pte/pmd/pud_pfn undo it.

This assume that no code path touches the PFN part of a PTE directly
without using these primitives.

This doesn't handle the case that MMIO is on the top of the CPU physical
memory. If such an MMIO region was exposed by an unpriviledged driver for
mmap it would be possible to attack some real memory. However this
situation is all rather unlikely.

For 32bit non PAE the inversion is not done because there are really not
enough bits to protect anything.

Q: Why does the guest need to be protected when the HyperVisor already has
L1TF mitigations?

A: Here's an example:

Physical pages 1 2 get mapped into a guest as
GPA 1 -> PA 2
GPA 2 -> PA 1
through EPT.

The L1TF speculation ignores the EPT remapping.

Now the guest kernel maps GPA 1 to process A and GPA 2 to process B, and
they belong to different users and should be isolated.

A sets the GPA 1 PA 2 PTE to PROT_NONE to bypass the EPT remapping and
gets read access to the underlying physical page. Which in this case
points to PA 2, so it can read process B's data, if it happened to be in
L1, so isolation inside the guest is broken.

There's nothing the hypervisor can do about this. This mitigation has to
be done in the guest itself.

[ tglx: Massaged changelog ]
[ dwmw2: backported to 4.9 ]

Signed-off-by: Andi Kleen <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Reviewed-by: Josh Poimboeuf <[email protected]>
Acked-by: Michal Hocko <[email protected]>
Acked-by: Vlastimil Babka <[email protected]>
Acked-by: Dave Hansen <[email protected]>
Signed-off-by: David Woodhouse <[email protected]>
Signed-off-by: Guenter Roeck <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
arch/x86/include/asm/pgtable-2level.h | 17 +++++++++++++++
arch/x86/include/asm/pgtable-3level.h | 2 +
arch/x86/include/asm/pgtable-invert.h | 32 ++++++++++++++++++++++++++++
arch/x86/include/asm/pgtable.h | 38 ++++++++++++++++++++++++----------
arch/x86/include/asm/pgtable_64.h | 2 +
5 files changed, 80 insertions(+), 11 deletions(-)
create mode 100644 arch/x86/include/asm/pgtable-invert.h

--- a/arch/x86/include/asm/pgtable-2level.h
+++ b/arch/x86/include/asm/pgtable-2level.h
@@ -77,4 +77,21 @@ static inline unsigned long pte_bitop(un
#define __pte_to_swp_entry(pte) ((swp_entry_t) { (pte).pte_low })
#define __swp_entry_to_pte(x) ((pte_t) { .pte = (x).val })

+/* No inverted PFNs on 2 level page tables */
+
+static inline u64 protnone_mask(u64 val)
+{
+ return 0;
+}
+
+static inline u64 flip_protnone_guard(u64 oldval, u64 val, u64 mask)
+{
+ return val;
+}
+
+static inline bool __pte_needs_invert(u64 val)
+{
+ return false;
+}
+
#endif /* _ASM_X86_PGTABLE_2LEVEL_H */
--- a/arch/x86/include/asm/pgtable-3level.h
+++ b/arch/x86/include/asm/pgtable-3level.h
@@ -184,4 +184,6 @@ static inline pmd_t native_pmdp_get_and_
#define __pte_to_swp_entry(pte) ((swp_entry_t){ (pte).pte_high })
#define __swp_entry_to_pte(x) ((pte_t){ { .pte_high = (x).val } })

+#include <asm/pgtable-invert.h>
+
#endif /* _ASM_X86_PGTABLE_3LEVEL_H */
--- /dev/null
+++ b/arch/x86/include/asm/pgtable-invert.h
@@ -0,0 +1,32 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef _ASM_PGTABLE_INVERT_H
+#define _ASM_PGTABLE_INVERT_H 1
+
+#ifndef __ASSEMBLY__
+
+static inline bool __pte_needs_invert(u64 val)
+{
+ return (val & (_PAGE_PRESENT|_PAGE_PROTNONE)) == _PAGE_PROTNONE;
+}
+
+/* Get a mask to xor with the page table entry to get the correct pfn. */
+static inline u64 protnone_mask(u64 val)
+{
+ return __pte_needs_invert(val) ? ~0ull : 0;
+}
+
+static inline u64 flip_protnone_guard(u64 oldval, u64 val, u64 mask)
+{
+ /*
+ * When a PTE transitions from NONE to !NONE or vice-versa
+ * invert the PFN part to stop speculation.
+ * pte_pfn undoes this when needed.
+ */
+ if (__pte_needs_invert(oldval) != __pte_needs_invert(val))
+ val = (val & ~mask) | (~val & mask);
+ return val;
+}
+
+#endif /* __ASSEMBLY__ */
+
+#endif
--- a/arch/x86/include/asm/pgtable.h
+++ b/arch/x86/include/asm/pgtable.h
@@ -148,19 +148,29 @@ static inline int pte_special(pte_t pte)
return pte_flags(pte) & _PAGE_SPECIAL;
}

+/* Entries that were set to PROT_NONE are inverted */
+
+static inline u64 protnone_mask(u64 val);
+
static inline unsigned long pte_pfn(pte_t pte)
{
- return (pte_val(pte) & PTE_PFN_MASK) >> PAGE_SHIFT;
+ unsigned long pfn = pte_val(pte);
+ pfn ^= protnone_mask(pfn);
+ return (pfn & PTE_PFN_MASK) >> PAGE_SHIFT;
}

static inline unsigned long pmd_pfn(pmd_t pmd)
{
- return (pmd_val(pmd) & pmd_pfn_mask(pmd)) >> PAGE_SHIFT;
+ unsigned long pfn = pmd_val(pmd);
+ pfn ^= protnone_mask(pfn);
+ return (pfn & pmd_pfn_mask(pmd)) >> PAGE_SHIFT;
}

static inline unsigned long pud_pfn(pud_t pud)
{
- return (pud_val(pud) & pud_pfn_mask(pud)) >> PAGE_SHIFT;
+ unsigned long pfn = pud_val(pud);
+ pfn ^= protnone_mask(pfn);
+ return (pfn & pud_pfn_mask(pud)) >> PAGE_SHIFT;
}

#define pte_page(pte) pfn_to_page(pte_pfn(pte))
@@ -359,19 +369,25 @@ static inline pgprotval_t massage_pgprot

static inline pte_t pfn_pte(unsigned long page_nr, pgprot_t pgprot)
{
- return __pte(((phys_addr_t)page_nr << PAGE_SHIFT) |
- massage_pgprot(pgprot));
+ phys_addr_t pfn = page_nr << PAGE_SHIFT;
+ pfn ^= protnone_mask(pgprot_val(pgprot));
+ pfn &= PTE_PFN_MASK;
+ return __pte(pfn | massage_pgprot(pgprot));
}

static inline pmd_t pfn_pmd(unsigned long page_nr, pgprot_t pgprot)
{
- return __pmd(((phys_addr_t)page_nr << PAGE_SHIFT) |
- massage_pgprot(pgprot));
+ phys_addr_t pfn = page_nr << PAGE_SHIFT;
+ pfn ^= protnone_mask(pgprot_val(pgprot));
+ pfn &= PHYSICAL_PMD_PAGE_MASK;
+ return __pmd(pfn | massage_pgprot(pgprot));
}

+static inline u64 flip_protnone_guard(u64 oldval, u64 val, u64 mask);
+
static inline pte_t pte_modify(pte_t pte, pgprot_t newprot)
{
- pteval_t val = pte_val(pte);
+ pteval_t val = pte_val(pte), oldval = val;

/*
* Chop off the NX bit (if present), and add the NX portion of
@@ -379,17 +395,17 @@ static inline pte_t pte_modify(pte_t pte
*/
val &= _PAGE_CHG_MASK;
val |= massage_pgprot(newprot) & ~_PAGE_CHG_MASK;
-
+ val = flip_protnone_guard(oldval, val, PTE_PFN_MASK);
return __pte(val);
}

static inline pmd_t pmd_modify(pmd_t pmd, pgprot_t newprot)
{
- pmdval_t val = pmd_val(pmd);
+ pmdval_t val = pmd_val(pmd), oldval = val;

val &= _HPAGE_CHG_MASK;
val |= massage_pgprot(newprot) & ~_HPAGE_CHG_MASK;
-
+ val = flip_protnone_guard(oldval, val, PHYSICAL_PMD_PAGE_MASK);
return __pmd(val);
}

--- a/arch/x86/include/asm/pgtable_64.h
+++ b/arch/x86/include/asm/pgtable_64.h
@@ -235,6 +235,8 @@ extern void cleanup_highmap(void);
extern void init_extra_mapping_uc(unsigned long phys, unsigned long size);
extern void init_extra_mapping_wb(unsigned long phys, unsigned long size);

+#include <asm/pgtable-invert.h>
+
#endif /* !__ASSEMBLY__ */

#endif /* _ASM_X86_PGTABLE_64_H */



2018-08-15 06:17:14

by Greg Kroah-Hartman

[permalink] [raw]
Subject: Re: [PATCH 4.4 00/43] 4.4.148-stable review

On Tue, Aug 14, 2018 at 07:17:36PM +0200, Greg Kroah-Hartman wrote:
> This is the start of the stable review cycle for the 4.4.148 release.
> There are 43 patches in this series, all will be posted as a response
> to this one. If anyone has any issues with these being applied, please
> let me know.
>
> Responses should be made by Thu Aug 16 17:14:59 UTC 2018.
> Anything received after that time might be too late.
>
> The whole patch series can be found in one patch at:
> https://www.kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.4.148-rc1.gz

-rc2 is now out:
https://www.kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.4.148-rc2.gz

2018-08-15 13:11:27

by Guenter Roeck

[permalink] [raw]
Subject: Re: [PATCH 4.4 00/43] 4.4.148-stable review

On 08/14/2018 10:17 AM, Greg Kroah-Hartman wrote:
> This is the start of the stable review cycle for the 4.4.148 release.
> There are 43 patches in this series, all will be posted as a response
> to this one. If anyone has any issues with these being applied, please
> let me know.
>
> Responses should be made by Thu Aug 16 17:14:59 UTC 2018.
> Anything received after that time might be too late.
>

For v4.4.147-46-g41fa2bf:

Build results:
total: 148 pass: 148 fail: 0
Qemu test results:
total: 260 pass: 260 fail: 0

We'll also need https://lore.kernel.org/patchwork/patch/974688/ or an equivalent
after it is available upstream in all stable releases, but that can come later.

Details are available at http://kerneltests.org/builders/.

Guenter

2018-08-15 20:53:46

by Dan Rue

[permalink] [raw]
Subject: Re: [PATCH 4.4 00/43] 4.4.148-stable review

On Tue, Aug 14, 2018 at 07:17:36PM +0200, Greg Kroah-Hartman wrote:
> This is the start of the stable review cycle for the 4.4.148 release.
> There are 43 patches in this series, all will be posted as a response
> to this one. If anyone has any issues with these being applied, please
> let me know.
>
> Responses should be made by Thu Aug 16 17:14:59 UTC 2018.
> Anything received after that time might be too late.

Results from Linaro’s test farm.
No regressions on arm64, arm and x86_64.

Summary
------------------------------------------------------------------------

kernel: 4.4.148-rc2
git repo: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git
git branch: linux-4.4.y
git commit: 41fa2bf55dc979a622e7cb367c6f815d6a508a1a
git describe: v4.4.147-46-g41fa2bf55dc9
Test details: https://qa-reports.linaro.org/lkft/linux-stable-rc-4.4-oe/build/v4.4.147-46-g41fa2bf55dc9


No regressions (compared to build v4.4.147)


Ran 9270 total tests in the following environments and test suites.

Environments
--------------
- juno-r2 - arm64
- qemu_arm
- qemu_x86_64
- x15 - arm
- x86_64

Test Suites
-----------
* boot
* kselftest
* libhugetlbfs
* ltp-cap_bounds-tests
* ltp-containers-tests
* ltp-cve-tests
* ltp-fcntl-locktests-tests
* ltp-filecaps-tests
* ltp-fs-tests
* ltp-fs_bind-tests
* ltp-fs_perms_simple-tests
* ltp-fsx-tests
* ltp-hugetlb-tests
* ltp-io-tests
* ltp-ipc-tests
* ltp-math-tests
* ltp-nptl-tests
* ltp-pty-tests
* ltp-sched-tests
* ltp-securebits-tests
* ltp-syscalls-tests
* ltp-timers-tests
* ltp-open-posix-tests
* kselftest-vsyscall-mode-native
* kselftest-vsyscall-mode-none

Summary
------------------------------------------------------------------------

kernel: 4.4.148-rc2
git repo: https://git.linaro.org/lkft/arm64-stable-rc.git
git branch: 4.4.148-rc2-hikey-20180815-258
git commit: f76c67628ea73cd7814850723fd3cf4ec681c404
git describe: 4.4.148-rc2-hikey-20180815-258
Test details: https://qa-reports.linaro.org/lkft/linaro-hikey-stable-rc-4.4-oe/build/4.4.148-rc2-hikey-20180815-258


No regressions (compared to build 4.4.148-rc1-hikey-20180814-257)


Ran 2723 total tests in the following environments and test suites.

Environments
--------------
- hi6220-hikey - arm64
- qemu_arm64

Test Suites
-----------
* boot
* kselftest
* libhugetlbfs
* ltp-cap_bounds-tests
* ltp-containers-tests
* ltp-cve-tests
* ltp-fcntl-locktests-tests
* ltp-filecaps-tests
* ltp-fs-tests
* ltp-fs_bind-tests
* ltp-fs_perms_simple-tests
* ltp-fsx-tests
* ltp-hugetlb-tests
* ltp-io-tests
* ltp-ipc-tests
* ltp-math-tests
* ltp-nptl-tests
* ltp-pty-tests
* ltp-sched-tests
* ltp-securebits-tests
* ltp-syscalls-tests
* ltp-timers-tests

--
Linaro LKFT
https://lkft.linaro.org

2018-09-07 17:08:39

by Ben Hutchings

[permalink] [raw]
Subject: Re: [PATCH 4.4 31/43] mm: fix cache mode tracking in vm_insert_mixed()

On Tue, 2018-08-14 at 19:18 +0200, Greg Kroah-Hartman wrote:
> 4.4-stable review patch.  If anyone has any objections, please let me know.
>
> ------------------
>
> From: Dan Williams <[email protected]>
>
> commit 87744ab3832b83ba71b931f86f9cfdb000d07da5 upstream
>
> vm_insert_mixed() unlike vm_insert_pfn_prot() and vmf_insert_pfn_pmd(),
> fails to check the pgprot_t it uses for the mapping against the one
> recorded in the memtype tracking tree.  Add the missing call to
> track_pfn_insert() to preclude cases where incompatible aliased mappings
> are established for a given physical address range.
[...]

This apparently breaks a number of DRM drivers. The upstream fixes
are:

8ef4227615e1 x86/io: add interface to reserve io memtype for a resource range. (v1.1)
7cf321d118a8 drm/drivers: add support for using the arch wc mapping API.

They appear to apply cleanly to 4.4-stable. They are included in 4.9
so no other stable branch needs them.

Ben.

--
Ben Hutchings, Software Developer   Codethink Ltd
https://www.codethink.co.uk/ Dale House, 35 Dale Street
Manchester, M1 2HF, United Kingdom

2018-09-07 20:05:19

by Greg Kroah-Hartman

[permalink] [raw]
Subject: Re: [PATCH 4.4 31/43] mm: fix cache mode tracking in vm_insert_mixed()

On Fri, Sep 07, 2018 at 06:05:23PM +0100, Ben Hutchings wrote:
> On Tue, 2018-08-14 at 19:18 +0200, Greg Kroah-Hartman wrote:
> > 4.4-stable review patch.??If anyone has any objections, please let me know.
> >
> > ------------------
> >
> > From: Dan Williams <[email protected]>
> >
> > commit 87744ab3832b83ba71b931f86f9cfdb000d07da5 upstream
> >
> > vm_insert_mixed() unlike vm_insert_pfn_prot() and vmf_insert_pfn_pmd(),
> > fails to check the pgprot_t it uses for the mapping against the one
> > recorded in the memtype tracking tree.??Add the missing call to
> > track_pfn_insert() to preclude cases where incompatible aliased mappings
> > are established for a given physical address range.
> [...]
>
> This apparently breaks a number of DRM drivers. The upstream fixes
> are:
>
> 8ef4227615e1 x86/io: add interface to reserve io memtype for a resource range. (v1.1)
> 7cf321d118a8 drm/drivers: add support for using the arch wc mapping API.
>
> They appear to apply cleanly to 4.4-stable. They are included in 4.9
> so no other stable branch needs them.

Thanks, now queued up.

greg k-h

2018-09-09 16:48:29

by Ben Hutchings

[permalink] [raw]
Subject: Re: [PATCH 4.4 41/43] x86/mm/pat: Make set_memory_np() L1TF safe

On Tue, 2018-08-14 at 19:18 +0200, Greg Kroah-Hartman wrote:
> 4.4-stable review patch.  If anyone has any objections, please let me know.
>
> ------------------
>
> From: Andi Kleen <[email protected]>
>
> commit 958f79b9ee55dfaf00c8106ed1c22a2919e0028b upstream
[...]
> --- a/arch/x86/include/asm/pgtable.h
> +++ b/arch/x86/include/asm/pgtable.h
> @@ -378,12 +378,39 @@ static inline pmd_t pfn_pmd(unsigned lon
> >   return __pmd(pfn | massage_pgprot(pgprot));
>  }
>  
> +static inline pud_t pfn_pud(unsigned long page_nr, pgprot_t pgprot)
> +{
> + phys_addr_t pfn = page_nr << PAGE_SHIFT;
> + pfn ^= protnone_mask(pgprot_val(pgprot));
> + pfn &= PHYSICAL_PUD_PAGE_MASK;
> + return __pud(pfn | massage_pgprot(pgprot));
> +}
[...]

This (and the backport to 4.9) are missing the fix from commit
e14d7dfb41f5 "x86/speculation/l1tf: Fix up pte->pfn conversion for
PAE", as that was applied earlier in the series.  But since PAE implies
only 3-level paging I don't know how the PUD functions get used or
whether this actually matters.

Ben.

--
Ben Hutchings, Software Developer   Codethink Ltd
https://www.codethink.co.uk/ Dale House, 35 Dale Street
Manchester, M1 2HF, United Kingdom

2018-09-09 17:07:36

by Guenter Roeck

[permalink] [raw]
Subject: Re: [PATCH 4.4 41/43] x86/mm/pat: Make set_memory_np() L1TF safe

On 09/09/2018 09:46 AM, Ben Hutchings wrote:
> On Tue, 2018-08-14 at 19:18 +0200, Greg Kroah-Hartman wrote:
>> 4.4-stable review patch.  If anyone has any objections, please let me know.
>>
>> ------------------
>>
>> From: Andi Kleen <[email protected]>
>>
>> commit 958f79b9ee55dfaf00c8106ed1c22a2919e0028b upstream
> [...]
>> --- a/arch/x86/include/asm/pgtable.h
>> +++ b/arch/x86/include/asm/pgtable.h
>> @@ -378,12 +378,39 @@ static inline pmd_t pfn_pmd(unsigned lon
>>>   return __pmd(pfn | massage_pgprot(pgprot));
>>  }
>>
>> +static inline pud_t pfn_pud(unsigned long page_nr, pgprot_t pgprot)
>> +{
>> + phys_addr_t pfn = page_nr << PAGE_SHIFT;
>> + pfn ^= protnone_mask(pgprot_val(pgprot));
>> + pfn &= PHYSICAL_PUD_PAGE_MASK;
>> + return __pud(pfn | massage_pgprot(pgprot));
>> +}
> [...]
>
> This (and the backport to 4.9) are missing the fix from commit
> e14d7dfb41f5 "x86/speculation/l1tf: Fix up pte->pfn conversion for
> PAE", as that was applied earlier in the series.  But since PAE implies
> only 3-level paging I don't know how the PUD functions get used or
> whether this actually matters.
>
Excellent find.

e14d7dfb41f5 (re-)applies cleanly to both 4.4.y and 4.9.y. Since most of its
changes are already applied, the only remaining change is

diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h
index c535012bdb56..5736306bdaab 100644
--- a/arch/x86/include/asm/pgtable.h
+++ b/arch/x86/include/asm/pgtable.h
@@ -420,7 +420,7 @@ static inline pmd_t pfn_pmd(unsigned long page_nr, pgprot_t pgprot)

static inline pud_t pfn_pud(unsigned long page_nr, pgprot_t pgprot)
{
- phys_addr_t pfn = page_nr << PAGE_SHIFT;
+ phys_addr_t pfn = (phys_addr_t)page_nr << PAGE_SHIFT;
pfn ^= protnone_mask(pgprot_val(pgprot));
pfn &= PHYSICAL_PUD_PAGE_MASK;
return __pud(pfn | massage_pgprot(pgprot));

after cherry-picking it into both branches.

I think we should re-apply it to both 4.4.y and 4.9.y be on the safe side.

Thanks,
Guenter

2018-09-10 07:18:08

by Greg Kroah-Hartman

[permalink] [raw]
Subject: Re: [PATCH 4.4 41/43] x86/mm/pat: Make set_memory_np() L1TF safe

On Sun, Sep 09, 2018 at 10:06:06AM -0700, Guenter Roeck wrote:
> On 09/09/2018 09:46 AM, Ben Hutchings wrote:
> > On Tue, 2018-08-14 at 19:18 +0200, Greg Kroah-Hartman wrote:
> > > 4.4-stable review patch.??If anyone has any objections, please let me know.
> > >
> > > ------------------
> > >
> > > From: Andi Kleen <[email protected]>
> > >
> > > commit 958f79b9ee55dfaf00c8106ed1c22a2919e0028b upstream
> > [...]
> > > --- a/arch/x86/include/asm/pgtable.h
> > > +++ b/arch/x86/include/asm/pgtable.h
> > > @@ -378,12 +378,39 @@ static inline pmd_t pfn_pmd(unsigned lon
> > > > ? return __pmd(pfn | massage_pgprot(pgprot));
> > > ?}
> > > +static inline pud_t pfn_pud(unsigned long page_nr, pgprot_t pgprot)
> > > +{
> > > + phys_addr_t pfn = page_nr << PAGE_SHIFT;
> > > + pfn ^= protnone_mask(pgprot_val(pgprot));
> > > + pfn &= PHYSICAL_PUD_PAGE_MASK;
> > > + return __pud(pfn | massage_pgprot(pgprot));
> > > +}
> > [...]
> >
> > This (and the backport to 4.9) are missing the fix from commit
> > e14d7dfb41f5 "x86/speculation/l1tf: Fix up pte->pfn conversion for
> > PAE", as that was applied earlier in the series.??But since PAE implies
> > only 3-level paging I don't know how the PUD functions get used or
> > whether this actually matters.
> >
> Excellent find.
>
> e14d7dfb41f5 (re-)applies cleanly to both 4.4.y and 4.9.y. Since most of its
> changes are already applied, the only remaining change is
>
> diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h
> index c535012bdb56..5736306bdaab 100644
> --- a/arch/x86/include/asm/pgtable.h
> +++ b/arch/x86/include/asm/pgtable.h
> @@ -420,7 +420,7 @@ static inline pmd_t pfn_pmd(unsigned long page_nr, pgprot_t pgprot)
>
> static inline pud_t pfn_pud(unsigned long page_nr, pgprot_t pgprot)
> {
> - phys_addr_t pfn = page_nr << PAGE_SHIFT;
> + phys_addr_t pfn = (phys_addr_t)page_nr << PAGE_SHIFT;
> pfn ^= protnone_mask(pgprot_val(pgprot));
> pfn &= PHYSICAL_PUD_PAGE_MASK;
> return __pud(pfn | massage_pgprot(pgprot));
>
> after cherry-picking it into both branches.
>
> I think we should re-apply it to both 4.4.y and 4.9.y be on the safe side.

Thanks for finding and figuring out the fix. Now queued up to both
trees.

greg k-h