2018-01-22 08:51:54

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.14 00/89] 4.14.15-stable review

This is the start of the stable review cycle for the 4.14.15 release.
There are 89 patches in this series, all will be posted as a response
to this one. If anyone has any issues with these being applied, please
let me know.

Responses should be made by Wed Jan 24 08:39:25 UTC 2018.
Anything received after that time might be too late.

The whole patch series can be found in one patch at:
kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.14.15-rc1.gz
or in the git tree and branch at:
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-4.14.y
and the diffstat can be found below.

thanks,

greg k-h

-------------
Pseudo-Shortlog of commits:

Greg Kroah-Hartman <[email protected]>
Linux 4.14.15-rc1

Yan Markman <[email protected]>
net: mvpp2: do not disable GMAC padding

Kirill A. Shutemov <[email protected]>
mm, page_vma_mapped: Drop faulty pointer arithmetics in check_pte()

Tom Lendacky <[email protected]>
x86/mm: Rework wbinvd, hlt operation in stop_this_cpu()

Andi Kleen <[email protected]>
x86/retpoline: Optimize inline assembler for vmexit_fill_RSB

zhenwei.pi <[email protected]>
x86/pti: Document fix wrong index

Masami Hiramatsu <[email protected]>
kprobes/x86: Disable optimizing on the function jumps to indirect thunk

Masami Hiramatsu <[email protected]>
kprobes/x86: Blacklist indirect thunk functions for kprobes

Masami Hiramatsu <[email protected]>
retpoline: Introduce start/end markers of indirect thunk

Thomas Gleixner <[email protected]>
x86/mce: Make machine check speculation protected

Marc Zyngier <[email protected]>
arm64: KVM: Fix SMCCC handling of unimplemented SMC/HVC calls

Punit Agrawal <[email protected]>
KVM: arm/arm64: Check pagesize when allocating a hugepage at Stage 2

James Hogan <[email protected]>
MIPS: CM: Drop WARN_ON(vp != 0)

Lorenzo Pieralisi <[email protected]>
alpha/PCI: Fix noname IRQ level detection

Laura Abbott <[email protected]>
x86: Use __nostackprotect for sme_encrypt_kernel

Wei Yongjun <[email protected]>
dm crypt: fix error return code in crypt_ctr()

Ondrej Kozina <[email protected]>
dm crypt: wipe kernel key copy after IV initialization

Milan Broz <[email protected]>
dm crypt: fix crash by adding missing check for auth key size

Mikulas Patocka <[email protected]>
dm integrity: don't store cipher request on the stack

Dennis Yang <[email protected]>
dm thin metadata: THIN_MAX_CONCURRENT_LOCKS should be 6

Joe Thornber <[email protected]>
dm btree: fix serious bug in btree_split_beneath()

Rob Clark <[email protected]>
drm/vmwgfx: fix memory corruption with legacy/sou connectors

Sergey Senozhatsky <[email protected]>
workqueue: avoid hard lockups in show_workqueue_state()

Hannes Reinecke <[email protected]>
scsi: libsas: Disable asynchronous aborts for SATA devices

Xinyu Lin <[email protected]>
libata: apply MAX_SEC_1024 to all LITEON EP1 series devices

Alexey Dobriyan <[email protected]>
proc: fix coredump vs read /proc/*/stat race

Xi Kangjie <[email protected]>
scripts/gdb/linux/tasks.py: fix get_thread_info

Jeremy Compostella <[email protected]>
i2c: core-smbus: prevent stack corruption on read I2C_BLOCK_DATA

Marc Kleine-Budde <[email protected]>
can: af_can: canfd_rcv(): replace WARN_ONCE by pr_warn_once

Marc Kleine-Budde <[email protected]>
can: af_can: can_rcv(): replace WARN_ONCE by pr_warn_once

Stephane Grosjean <[email protected]>
can: peak: fix potential bug in packet fragmentation

Thomas Petazzoni <[email protected]>
ARM: dts: kirkwood: fix pin-muxing of MPP7 on OpenBlocks A7

Maxime Ripard <[email protected]>
ARM: sunxi_defconfig: Enable CMA

Gregory CLEMENT <[email protected]>
ARM64: dts: marvell: armada-cp110: Fix clock resources for various node

Arnd Bergmann <[email protected]>
phy: work around 'phys' references to usb-nop-xceiv devices

Steven Rostedt (VMware) <[email protected]>
tracing: Fix converting enum's from the map in trace_event_eval_update()

Johan Hovold <[email protected]>
Input: twl4030-vibra - fix sibling-node lookup

Johan Hovold <[email protected]>
Input: twl6040-vibra - fix child-node lookup

Johan Hovold <[email protected]>
Input: 88pm860x-ts - fix child-node lookup

Nick Desaulniers <[email protected]>
Input: synaptics-rmi4 - prevent UAF reported by KASAN

Nir Perry <[email protected]>
Input: ALPS - fix multi-touch decoding on SS4 plus touchpads

Tom Lendacky <[email protected]>
x86/mm: Encrypt the initrd earlier for BSP microcode update

Tero Kristo <[email protected]>
ARM: OMAP3: hwmod_data: add missing module_offs for MMC3

Tom Lendacky <[email protected]>
x86/mm: Prepare sme_encrypt_kernel() for PAGE aligned encryption

Tom Lendacky <[email protected]>
x86/mm: Centralize PMD flags in sme_encrypt_kernel()

Tom Lendacky <[email protected]>
x86/mm: Use a struct to reduce parameters for SME PGD mapping

Tom Lendacky <[email protected]>
x86/mm: Clean up register saving in the __enc_copy() assembly code

Thomas Gleixner <[email protected]>
x86/apic/vector: Fix off by one in error path

Joe Lawrence <[email protected]>
pipe: avoid round_pipe_size() nr_pages overflow on 32-bit

Len Brown <[email protected]>
x86/tsc: Fix erroneous TSC rate on Skylake Xeon

Len Brown <[email protected]>
x86/tsc: Future-proof native_calibrate_tsc()

Andi Kleen <[email protected]>
x86/idt: Mark IDT tables __initconst

Eric W. Biederman <[email protected]>
x86/mm/pkeys: Fix fill_sig_info_pkey

Thomas Gleixner <[email protected]>
x86/intel_rdt/cqm: Prevent use after free

Andi Kleen <[email protected]>
module: Add retpoline tag to VERMAGIC

Paolo Bonzini <[email protected]>
x86/cpufeature: Move processor tracing out of scattered features

Josh Poimboeuf <[email protected]>
objtool: Improve error message for bad file argument

Tom Lendacky <[email protected]>
x86/retpoline: Add LFENCE to the retpoline/RSB filling RSB macros

David Woodhouse <[email protected]>
x86/retpoline: Fill RSB on context switch for affected CPUs

Andrey Ryabinin <[email protected]>
x86/kasan: Panic if there is not enough memory to boot

Benoît Thébaudeau <[email protected]>
mmc: sdhci-esdhc-imx: Fix i.MX53 eSDHCv3 clock

Josh Poimboeuf <[email protected]>
objtool: Fix seg fault with gold linker

Josh Snyder <[email protected]>
delayacct: Account blkio completion on the correct task

Sagi Grimberg <[email protected]>
iser-target: Fix possible use-after-free in connection establishment error

Eric Biggers <[email protected]>
af_key: fix buffer overread in parse_exthdrs()

Eric Biggers <[email protected]>
af_key: fix buffer overread in verify_address_len()

Thomas Gleixner <[email protected]>
timers: Unconditionally check deferrable base

Leon Romanovsky <[email protected]>
RDMA/mlx5: Fix out-of-bound access while querying AH

Dan Carpenter <[email protected]>
IB/hfi1: Prevent a NULL dereference

Takashi Iwai <[email protected]>
ALSA: hda - Apply the existing quirk to iMac 14,1

Takashi Iwai <[email protected]>
ALSA: hda - Apply headphone noise quirk for another Dell XPS 13 variant

Takashi Iwai <[email protected]>
ALSA: pcm: Remove yet superfluous WARN_ON()

Takashi Iwai <[email protected]>
ALSA: seq: Make ioctls race-free

Li Jinyue <[email protected]>
futex: Prevent overflow by strengthen input validation

Peter Zijlstra <[email protected]>
futex: Avoid violating the 10th rule of futex

Oliver O'Halloran <[email protected]>
powerpc/powernv: Check device-tree for RFI flush settings

Michael Neuling <[email protected]>
powerpc/pseries: Query hypervisor for RFI flush settings

Michael Ellerman <[email protected]>
powerpc/64s: Support disabling RFI flush with no_rfi_flush and nopti

Michael Ellerman <[email protected]>
powerpc/64s: Add support for RFI flush of L1-D cache

Nicholas Piggin <[email protected]>
powerpc/64s: Convert slb_miss_common to use RFI_TO_USER/KERNEL

Nicholas Piggin <[email protected]>
powerpc/64: Convert fast_exception_return to use RFI_TO_USER/KERNEL

Nicholas Piggin <[email protected]>
powerpc/64: Convert the syscall exit path to use RFI_TO_USER/KERNEL

Nicholas Piggin <[email protected]>
powerpc/64s: Simple RFI macro conversions

Nicholas Piggin <[email protected]>
powerpc/64: Add macros for annotating the destination of rfid/hrfid

Michael Neuling <[email protected]>
powerpc/pseries: Add H_GET_CPU_CHARACTERISTICS flags & wrapper

Simon Ser <[email protected]>
objtool: Fix seg fault caused by missing parameter

Lukas Bulwahn <[email protected]>
objtool: Fix Clang enum conversion warning

Simon Ser <[email protected]>
objtool: Fix seg fault with clang-compiled objects

Rob Clark <[email protected]>
drm/nouveau/disp/gf119: add missing drive vfunc ptr

Andrew Morton <[email protected]>
tools/objtool/Makefile: don't assume sync-check.sh is executable


-------------

Diffstat:

Documentation/x86/pti.txt | 2 +-
Makefile | 4 +-
arch/alpha/kernel/sys_sio.c | 35 +-
arch/arm/boot/dts/kirkwood-openblocks_a7.dts | 10 +-
arch/arm/configs/sunxi_defconfig | 2 +
arch/arm/mach-omap2/omap_hwmod_3xxx_data.c | 1 +
.../boot/dts/marvell/armada-cp110-master.dtsi | 13 +-
.../arm64/boot/dts/marvell/armada-cp110-slave.dtsi | 9 +-
arch/arm64/kvm/handle_exit.c | 4 +-
arch/mips/kernel/mips-cm.c | 1 -
arch/powerpc/include/asm/exception-64e.h | 6 +
arch/powerpc/include/asm/exception-64s.h | 57 +++-
arch/powerpc/include/asm/feature-fixups.h | 13 +
arch/powerpc/include/asm/hvcall.h | 17 +
arch/powerpc/include/asm/paca.h | 10 +
arch/powerpc/include/asm/plpar_wrappers.h | 14 +
arch/powerpc/include/asm/setup.h | 13 +
arch/powerpc/kernel/asm-offsets.c | 5 +
arch/powerpc/kernel/entry_64.S | 44 ++-
arch/powerpc/kernel/exceptions-64s.S | 135 +++++++-
arch/powerpc/kernel/setup_64.c | 101 ++++++
arch/powerpc/kernel/vmlinux.lds.S | 9 +
arch/powerpc/kvm/book3s_hv_rmhandlers.S | 7 +-
arch/powerpc/kvm/book3s_rmhandlers.S | 7 +-
arch/powerpc/kvm/book3s_segment.S | 4 +-
arch/powerpc/lib/feature-fixups.c | 41 +++
arch/powerpc/platforms/powernv/setup.c | 49 +++
arch/powerpc/platforms/pseries/setup.c | 35 ++
arch/x86/entry/entry_32.S | 11 +
arch/x86/entry/entry_64.S | 13 +-
arch/x86/include/asm/cpufeatures.h | 3 +-
arch/x86/include/asm/mem_encrypt.h | 4 +-
arch/x86/include/asm/nospec-branch.h | 16 +-
arch/x86/include/asm/traps.h | 1 +
arch/x86/kernel/apic/vector.c | 7 +-
arch/x86/kernel/cpu/bugs.c | 36 +++
arch/x86/kernel/cpu/intel_rdt.c | 8 +-
arch/x86/kernel/cpu/mcheck/mce.c | 5 +
arch/x86/kernel/cpu/scattered.c | 1 -
arch/x86/kernel/head64.c | 4 +-
arch/x86/kernel/idt.c | 12 +-
arch/x86/kernel/kprobes/opt.c | 23 +-
arch/x86/kernel/process.c | 25 +-
arch/x86/kernel/setup.c | 8 -
arch/x86/kernel/tsc.c | 3 +-
arch/x86/kernel/vmlinux.lds.S | 6 +
arch/x86/lib/retpoline.S | 5 +-
arch/x86/mm/fault.c | 7 +-
arch/x86/mm/kasan_init_64.c | 24 +-
arch/x86/mm/mem_encrypt.c | 356 +++++++++++++++------
arch/x86/mm/mem_encrypt_boot.S | 80 ++---
drivers/ata/libata-core.c | 1 +
.../gpu/drm/nouveau/nvkm/engine/disp/sorgf119.c | 1 +
drivers/gpu/drm/vmwgfx/vmwgfx_ldu.c | 4 +-
drivers/gpu/drm/vmwgfx/vmwgfx_scrn.c | 4 +-
drivers/i2c/i2c-core-smbus.c | 13 +-
drivers/infiniband/hw/hfi1/file_ops.c | 4 +-
drivers/infiniband/hw/mlx5/qp.c | 7 +-
drivers/infiniband/ulp/isert/ib_isert.c | 1 +
drivers/input/misc/twl4030-vibra.c | 6 +-
drivers/input/misc/twl6040-vibra.c | 3 +-
drivers/input/mouse/alps.c | 23 +-
drivers/input/mouse/alps.h | 10 +-
drivers/input/rmi4/rmi_driver.c | 4 +-
drivers/input/touchscreen/88pm860x-ts.c | 16 +-
drivers/md/dm-crypt.c | 20 +-
drivers/md/dm-integrity.c | 49 ++-
drivers/md/dm-thin-metadata.c | 6 +-
drivers/md/persistent-data/dm-btree.c | 19 +-
drivers/mmc/host/sdhci-esdhc-imx.c | 14 +
drivers/net/can/usb/peak_usb/pcan_usb_fd.c | 21 +-
drivers/net/ethernet/marvell/mvpp2.c | 9 -
drivers/phy/phy-core.c | 4 +
drivers/scsi/libsas/sas_scsi_host.c | 17 +-
fs/pipe.c | 17 +-
fs/proc/array.c | 7 +-
include/linux/delayacct.h | 8 +-
include/linux/swapops.h | 21 ++
include/linux/vermagic.h | 8 +-
kernel/delayacct.c | 42 ++-
kernel/futex.c | 86 ++++-
kernel/locking/rtmutex.c | 26 +-
kernel/locking/rtmutex_common.h | 1 +
kernel/sched/core.c | 6 +-
kernel/time/timer.c | 2 +-
kernel/trace/trace_events.c | 16 +-
kernel/workqueue.c | 13 +
mm/page_vma_mapped.c | 63 ++--
net/can/af_can.c | 36 +--
net/key/af_key.c | 8 +
scripts/Makefile.build | 14 +-
scripts/gdb/linux/tasks.py | 2 +
sound/core/pcm_lib.c | 1 -
sound/core/seq/seq_clientmgr.c | 3 +
sound/core/seq/seq_clientmgr.h | 1 +
sound/pci/hda/patch_cirrus.c | 1 +
sound/pci/hda/patch_realtek.c | 1 +
tools/objtool/Makefile | 2 +-
tools/objtool/arch/x86/decode.c | 2 +-
tools/objtool/builtin-orc.c | 4 +-
tools/objtool/elf.c | 4 +-
tools/objtool/orc_gen.c | 2 +
virt/kvm/arm/mmu.c | 2 +-
103 files changed, 1514 insertions(+), 447 deletions(-)




2018-01-22 08:53:12

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.14 41/89] x86/tsc: Fix erroneous TSC rate on Skylake Xeon

4.14-stable review patch. If anyone has any objections, please let me know.

------------------

From: Len Brown <[email protected]>

commit b511203093489eb1829cb4de86e8214752205ac6 upstream.

The INTEL_FAM6_SKYLAKE_X hardcoded crystal_khz value of 25MHZ is
problematic:

- SKX workstations (with same model # as server variants) use a 24 MHz
crystal. This results in a -4.0% time drift rate on SKX workstations.

- SKX servers subject the crystal to an EMI reduction circuit that reduces its
actual frequency by (approximately) -0.25%. This results in -1 second per
10 minute time drift as compared to network time.

This issue can also trigger a timer and power problem, on configurations
that use the LAPIC timer (versus the TSC deadline timer). Clock ticks
scheduled with the LAPIC timer arrive a few usec before the time they are
expected (according to the slow TSC). This causes Linux to poll-idle, when
it should be in an idle power saving state. The idle and clock code do not
graciously recover from this error, sometimes resulting in significant
polling and measurable power impact.

Stop using native_calibrate_tsc() for INTEL_FAM6_SKYLAKE_X.
native_calibrate_tsc() will return 0, boot will run with tsc_khz = cpu_khz,
and the TSC refined calibration will update tsc_khz to correct for the
difference.

[ tglx: Sanitized change log ]

Fixes: 6baf3d61821f ("x86/tsc: Add additional Intel CPU models to the crystal quirk list")
Signed-off-by: Len Brown <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Cc: [email protected]
Cc: Prarit Bhargava <[email protected]>
Link: https://lkml.kernel.org/r/ff6dcea166e8ff8f2f6a03c17beab2cb436aa779.1513920414.git.len.brown@intel.com
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
arch/x86/kernel/tsc.c | 1 -
1 file changed, 1 deletion(-)

--- a/arch/x86/kernel/tsc.c
+++ b/arch/x86/kernel/tsc.c
@@ -602,7 +602,6 @@ unsigned long native_calibrate_tsc(void)
case INTEL_FAM6_KABYLAKE_DESKTOP:
crystal_khz = 24000; /* 24.0 MHz */
break;
- case INTEL_FAM6_SKYLAKE_X:
case INTEL_FAM6_ATOM_DENVERTON:
crystal_khz = 25000; /* 25.0 MHz */
break;



2018-01-22 08:53:18

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.14 27/89] iser-target: Fix possible use-after-free in connection establishment error

4.14-stable review patch. If anyone has any objections, please let me know.

------------------

From: Sagi Grimberg <[email protected]>

commit cd52cb26e7ead5093635e98e07e221e4df482d34 upstream.

In case we fail to establish the connection we must drain our pre-posted
login recieve work request before continuing safely with connection
teardown.

Fixes: a060b5629ab0 ("IB/core: generic RDMA READ/WRITE API")
Reported-by: Amrani, Ram <[email protected]>
Signed-off-by: Sagi Grimberg <[email protected]>
Signed-off-by: Doug Ledford <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
drivers/infiniband/ulp/isert/ib_isert.c | 1 +
1 file changed, 1 insertion(+)

--- a/drivers/infiniband/ulp/isert/ib_isert.c
+++ b/drivers/infiniband/ulp/isert/ib_isert.c
@@ -741,6 +741,7 @@ isert_connect_error(struct rdma_cm_id *c
{
struct isert_conn *isert_conn = cma_id->qp->qp_context;

+ ib_drain_qp(isert_conn->qp);
list_del_init(&isert_conn->node);
isert_conn->cm_id = NULL;
isert_put_conn(isert_conn);



2018-01-22 08:53:41

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.14 63/89] i2c: core-smbus: prevent stack corruption on read I2C_BLOCK_DATA

4.14-stable review patch. If anyone has any objections, please let me know.

------------------

From: Jeremy Compostella <[email protected]>

commit 89c6efa61f5709327ecfa24bff18e57a4e80c7fa upstream.

On a I2C_SMBUS_I2C_BLOCK_DATA read request, if data->block[0] is
greater than I2C_SMBUS_BLOCK_MAX + 1, the underlying I2C driver writes
data out of the msgbuf1 array boundary.

It is possible from a user application to run into that issue by
calling the I2C_SMBUS ioctl with data.block[0] greater than
I2C_SMBUS_BLOCK_MAX + 1.

This patch makes the code compliant with
Documentation/i2c/dev-interface by raising an error when the requested
size is larger than 32 bytes.

Call Trace:
[<ffffffff8139f695>] dump_stack+0x67/0x92
[<ffffffff811802a4>] panic+0xc5/0x1eb
[<ffffffff810ecb5f>] ? vprintk_default+0x1f/0x30
[<ffffffff817456d3>] ? i2cdev_ioctl_smbus+0x303/0x320
[<ffffffff8109a68b>] __stack_chk_fail+0x1b/0x20
[<ffffffff817456d3>] i2cdev_ioctl_smbus+0x303/0x320
[<ffffffff81745aed>] i2cdev_ioctl+0x4d/0x1e0
[<ffffffff811f761a>] do_vfs_ioctl+0x2ba/0x490
[<ffffffff81336e43>] ? security_file_ioctl+0x43/0x60
[<ffffffff811f7869>] SyS_ioctl+0x79/0x90
[<ffffffff81a22e97>] entry_SYSCALL_64_fastpath+0x12/0x6a

Signed-off-by: Jeremy Compostella <[email protected]>
Signed-off-by: Wolfram Sang <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
drivers/i2c/i2c-core-smbus.c | 13 +++++++------
1 file changed, 7 insertions(+), 6 deletions(-)

--- a/drivers/i2c/i2c-core-smbus.c
+++ b/drivers/i2c/i2c-core-smbus.c
@@ -396,16 +396,17 @@ static s32 i2c_smbus_xfer_emulated(struc
the underlying bus driver */
break;
case I2C_SMBUS_I2C_BLOCK_DATA:
+ if (data->block[0] > I2C_SMBUS_BLOCK_MAX) {
+ dev_err(&adapter->dev, "Invalid block %s size %d\n",
+ read_write == I2C_SMBUS_READ ? "read" : "write",
+ data->block[0]);
+ return -EINVAL;
+ }
+
if (read_write == I2C_SMBUS_READ) {
msg[1].len = data->block[0];
} else {
msg[0].len = data->block[0] + 1;
- if (msg[0].len > I2C_SMBUS_BLOCK_MAX + 1) {
- dev_err(&adapter->dev,
- "Invalid block write size %d\n",
- data->block[0]);
- return -EINVAL;
- }
for (i = 1; i <= data->block[0]; i++)
msgbuf0[i] = data->block[i];
}



2018-01-22 08:53:47

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.14 39/89] x86/idt: Mark IDT tables __initconst

4.14-stable review patch. If anyone has any objections, please let me know.

------------------

From: Andi Kleen <[email protected]>

commit 327867faa4d66628fcd92a843adb3345736a5313 upstream.

const variables must use __initconst, not __initdata.

Fix this up for the IDT tables, which got it consistently wrong.

Fixes: 16bc18d895ce ("x86/idt: Move 32-bit idt_descr to C code")
Signed-off-by: Andi Kleen <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
arch/x86/kernel/idt.c | 12 ++++++------
1 file changed, 6 insertions(+), 6 deletions(-)

--- a/arch/x86/kernel/idt.c
+++ b/arch/x86/kernel/idt.c
@@ -56,7 +56,7 @@ struct idt_data {
* Early traps running on the DEFAULT_STACK because the other interrupt
* stacks work only after cpu_init().
*/
-static const __initdata struct idt_data early_idts[] = {
+static const __initconst struct idt_data early_idts[] = {
INTG(X86_TRAP_DB, debug),
SYSG(X86_TRAP_BP, int3),
#ifdef CONFIG_X86_32
@@ -70,7 +70,7 @@ static const __initdata struct idt_data
* the traps which use them are reinitialized with IST after cpu_init() has
* set up TSS.
*/
-static const __initdata struct idt_data def_idts[] = {
+static const __initconst struct idt_data def_idts[] = {
INTG(X86_TRAP_DE, divide_error),
INTG(X86_TRAP_NMI, nmi),
INTG(X86_TRAP_BR, bounds),
@@ -108,7 +108,7 @@ static const __initdata struct idt_data
/*
* The APIC and SMP idt entries
*/
-static const __initdata struct idt_data apic_idts[] = {
+static const __initconst struct idt_data apic_idts[] = {
#ifdef CONFIG_SMP
INTG(RESCHEDULE_VECTOR, reschedule_interrupt),
INTG(CALL_FUNCTION_VECTOR, call_function_interrupt),
@@ -150,7 +150,7 @@ static const __initdata struct idt_data
* Early traps running on the DEFAULT_STACK because the other interrupt
* stacks work only after cpu_init().
*/
-static const __initdata struct idt_data early_pf_idts[] = {
+static const __initconst struct idt_data early_pf_idts[] = {
INTG(X86_TRAP_PF, page_fault),
};

@@ -158,7 +158,7 @@ static const __initdata struct idt_data
* Override for the debug_idt. Same as the default, but with interrupt
* stack set to DEFAULT_STACK (0). Required for NMI trap handling.
*/
-static const __initdata struct idt_data dbg_idts[] = {
+static const __initconst struct idt_data dbg_idts[] = {
INTG(X86_TRAP_DB, debug),
INTG(X86_TRAP_BP, int3),
};
@@ -180,7 +180,7 @@ gate_desc debug_idt_table[IDT_ENTRIES] _
* The exceptions which use Interrupt stacks. They are setup after
* cpu_init() when the TSS has been initialized.
*/
-static const __initdata struct idt_data ist_idts[] = {
+static const __initconst struct idt_data ist_idts[] = {
ISTG(X86_TRAP_DB, debug, DEBUG_STACK),
ISTG(X86_TRAP_NMI, nmi, NMI_STACK),
SISTG(X86_TRAP_BP, int3, DEBUG_STACK),



2018-01-22 08:53:48

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.14 25/89] af_key: fix buffer overread in verify_address_len()

4.14-stable review patch. If anyone has any objections, please let me know.

------------------

From: Eric Biggers <[email protected]>

commit 06b335cb51af018d5feeff5dd4fd53847ddb675a upstream.

If a message sent to a PF_KEY socket ended with one of the extensions
that takes a 'struct sadb_address' but there were not enough bytes
remaining in the message for the ->sa_family member of the 'struct
sockaddr' which is supposed to follow, then verify_address_len() read
past the end of the message, into uninitialized memory. Fix it by
returning -EINVAL in this case.

This bug was found using syzkaller with KMSAN.

Reproducer:

#include <linux/pfkeyv2.h>
#include <sys/socket.h>
#include <unistd.h>

int main()
{
int sock = socket(PF_KEY, SOCK_RAW, PF_KEY_V2);
char buf[24] = { 0 };
struct sadb_msg *msg = (void *)buf;
struct sadb_address *addr = (void *)(msg + 1);

msg->sadb_msg_version = PF_KEY_V2;
msg->sadb_msg_type = SADB_DELETE;
msg->sadb_msg_len = 3;
addr->sadb_address_len = 1;
addr->sadb_address_exttype = SADB_EXT_ADDRESS_SRC;

write(sock, buf, 24);
}

Reported-by: Alexander Potapenko <[email protected]>
Signed-off-by: Eric Biggers <[email protected]>
Signed-off-by: Steffen Klassert <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
net/key/af_key.c | 5 +++++
1 file changed, 5 insertions(+)

--- a/net/key/af_key.c
+++ b/net/key/af_key.c
@@ -401,6 +401,11 @@ static int verify_address_len(const void
#endif
int len;

+ if (sp->sadb_address_len <
+ DIV_ROUND_UP(sizeof(*sp) + offsetofend(typeof(*addr), sa_family),
+ sizeof(uint64_t)))
+ return -EINVAL;
+
switch (addr->sa_family) {
case AF_INET:
len = DIV_ROUND_UP(sizeof(*sp) + sizeof(*sin), sizeof(uint64_t));



2018-01-22 08:53:48

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.14 44/89] x86/mm: Clean up register saving in the __enc_copy() assembly code

4.14-stable review patch. If anyone has any objections, please let me know.

------------------

From: Tom Lendacky <[email protected]>

commit 1303880179e67c59e801429b7e5d0f6b21137d99 upstream.

Clean up the use of PUSH and POP and when registers are saved in the
__enc_copy() assembly function in order to improve the readability of the code.

Move parameter register saving into general purpose registers earlier
in the code and move all the pushes to the beginning of the function
with corresponding pops at the end.

We do this to prepare fixes.

Tested-by: Gabriel Craciunescu <[email protected]>
Signed-off-by: Tom Lendacky <[email protected]>
Reviewed-by: Borislav Petkov <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Brijesh Singh <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Ingo Molnar <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
arch/x86/mm/mem_encrypt_boot.S | 20 ++++++++++----------
1 file changed, 10 insertions(+), 10 deletions(-)

--- a/arch/x86/mm/mem_encrypt_boot.S
+++ b/arch/x86/mm/mem_encrypt_boot.S
@@ -103,20 +103,19 @@ ENTRY(__enc_copy)
orq $X86_CR4_PGE, %rdx
mov %rdx, %cr4

+ push %r15
+
+ movq %rcx, %r9 /* Save kernel length */
+ movq %rdi, %r10 /* Save encrypted kernel address */
+ movq %rsi, %r11 /* Save decrypted kernel address */
+
/* Set the PAT register PA5 entry to write-protect */
- push %rcx
movl $MSR_IA32_CR_PAT, %ecx
rdmsr
- push %rdx /* Save original PAT value */
+ mov %rdx, %r15 /* Save original PAT value */
andl $0xffff00ff, %edx /* Clear PA5 */
orl $0x00000500, %edx /* Set PA5 to WP */
wrmsr
- pop %rdx /* RDX contains original PAT value */
- pop %rcx
-
- movq %rcx, %r9 /* Save kernel length */
- movq %rdi, %r10 /* Save encrypted kernel address */
- movq %rsi, %r11 /* Save decrypted kernel address */

wbinvd /* Invalidate any cache entries */

@@ -138,12 +137,13 @@ ENTRY(__enc_copy)
jnz 1b /* Kernel length not zero? */

/* Restore PAT register */
- push %rdx /* Save original PAT value */
movl $MSR_IA32_CR_PAT, %ecx
rdmsr
- pop %rdx /* Restore original PAT value */
+ mov %r15, %rdx /* Restore original PAT value */
wrmsr

+ pop %r15
+
ret
.L__enc_copy_end:
ENDPROC(__enc_copy)



2018-01-22 08:54:46

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.14 55/89] tracing: Fix converting enums from the map in trace_event_eval_update()

4.14-stable review patch. If anyone has any objections, please let me know.

------------------

From: Steven Rostedt (VMware) <[email protected]>

commit 1ebe1eaf2f02784921759992ae1fde1a9bec8fd0 upstream.

Since enums do not get converted by the TRACE_EVENT macro into their values,
the event format displaces the enum name and not the value. This breaks
tools like perf and trace-cmd that need to interpret the raw binary data. To
solve this, an enum map was created to convert these enums into their actual
numbers on boot up. This is done by TRACE_EVENTS() adding a
TRACE_DEFINE_ENUM() macro.

Some enums were not being converted. This was caused by an optization that
had a bug in it.

All calls get checked against this enum map to see if it should be converted
or not, and it compares the call's system to the system that the enum map
was created under. If they match, then they call is processed.

To cut down on the number of iterations needed to find the maps with a
matching system, since calls and maps are grouped by system, when a match is
made, the index into the map array is saved, so that the next call, if it
belongs to the same system as the previous call, could start right at that
array index and not have to scan all the previous arrays.

The problem was, the saved index was used as the variable to know if this is
a call in a new system or not. If the index was zero, it was assumed that
the call is in a new system and would keep incrementing the saved index
until it found a matching system. The issue arises when the first matching
system was at index zero. The next map, if it belonged to the same system,
would then think it was the first match and increment the index to one. If
the next call belong to the same system, it would begin its search of the
maps off by one, and miss the first enum that should be converted. This left
a single enum not converted properly.

Also add a comment to describe exactly what that index was for. It took me a
bit too long to figure out what I was thinking when debugging this issue.

Link: http://lkml.kernel.org/r/[email protected]

Fixes: 0c564a538aa93 ("tracing: Add TRACE_DEFINE_ENUM() macro to map enums to their values")
Reported-by: Chuck Lever <[email protected]>
Teste-by: Chuck Lever <[email protected]>
Signed-off-by: Steven Rostedt (VMware) <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
kernel/trace/trace_events.c | 16 +++++++++++++++-
1 file changed, 15 insertions(+), 1 deletion(-)

--- a/kernel/trace/trace_events.c
+++ b/kernel/trace/trace_events.c
@@ -2213,6 +2213,7 @@ void trace_event_eval_update(struct trac
{
struct trace_event_call *call, *p;
const char *last_system = NULL;
+ bool first = false;
int last_i;
int i;

@@ -2220,15 +2221,28 @@ void trace_event_eval_update(struct trac
list_for_each_entry_safe(call, p, &ftrace_events, list) {
/* events are usually grouped together with systems */
if (!last_system || call->class->system != last_system) {
+ first = true;
last_i = 0;
last_system = call->class->system;
}

+ /*
+ * Since calls are grouped by systems, the likelyhood that the
+ * next call in the iteration belongs to the same system as the
+ * previous call is high. As an optimization, we skip seaching
+ * for a map[] that matches the call's system if the last call
+ * was from the same system. That's what last_i is for. If the
+ * call has the same system as the previous call, then last_i
+ * will be the index of the first map[] that has a matching
+ * system.
+ */
for (i = last_i; i < len; i++) {
if (call->class->system == map[i]->system) {
/* Save the first system if need be */
- if (!last_i)
+ if (first) {
last_i = i;
+ first = false;
+ }
update_event_printk(call, map[i]);
}
}



2018-01-22 08:54:56

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.14 49/89] x86/mm: Encrypt the initrd earlier for BSP microcode update

4.14-stable review patch. If anyone has any objections, please let me know.

------------------

From: Tom Lendacky <[email protected]>

commit 107cd2532181b96c549e8f224cdcca8631c3076b upstream.

Currently the BSP microcode update code examines the initrd very early
in the boot process. If SME is active, the initrd is treated as being
encrypted but it has not been encrypted (in place) yet. Update the
early boot code that encrypts the kernel to also encrypt the initrd so
that early BSP microcode updates work.

Tested-by: Gabriel Craciunescu <[email protected]>
Signed-off-by: Tom Lendacky <[email protected]>
Reviewed-by: Borislav Petkov <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Brijesh Singh <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Ingo Molnar <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
arch/x86/include/asm/mem_encrypt.h | 4 +-
arch/x86/kernel/head64.c | 4 +-
arch/x86/kernel/setup.c | 8 ----
arch/x86/mm/mem_encrypt.c | 66 ++++++++++++++++++++++++++++++++-----
arch/x86/mm/mem_encrypt_boot.S | 46 ++++++++++++-------------
5 files changed, 85 insertions(+), 43 deletions(-)

--- a/arch/x86/include/asm/mem_encrypt.h
+++ b/arch/x86/include/asm/mem_encrypt.h
@@ -39,7 +39,7 @@ void __init sme_unmap_bootdata(char *rea

void __init sme_early_init(void);

-void __init sme_encrypt_kernel(void);
+void __init sme_encrypt_kernel(struct boot_params *bp);
void __init sme_enable(struct boot_params *bp);

/* Architecture __weak replacement functions */
@@ -61,7 +61,7 @@ static inline void __init sme_unmap_boot

static inline void __init sme_early_init(void) { }

-static inline void __init sme_encrypt_kernel(void) { }
+static inline void __init sme_encrypt_kernel(struct boot_params *bp) { }
static inline void __init sme_enable(struct boot_params *bp) { }

#endif /* CONFIG_AMD_MEM_ENCRYPT */
--- a/arch/x86/kernel/head64.c
+++ b/arch/x86/kernel/head64.c
@@ -157,8 +157,8 @@ unsigned long __head __startup_64(unsign
p = fixup_pointer(&phys_base, physaddr);
*p += load_delta - sme_get_me_mask();

- /* Encrypt the kernel (if SME is active) */
- sme_encrypt_kernel();
+ /* Encrypt the kernel and related (if SME is active) */
+ sme_encrypt_kernel(bp);

/*
* Return the SME encryption mask (if SME is active) to be used as a
--- a/arch/x86/kernel/setup.c
+++ b/arch/x86/kernel/setup.c
@@ -376,14 +376,6 @@ static void __init reserve_initrd(void)
!ramdisk_image || !ramdisk_size)
return; /* No initrd provided by bootloader */

- /*
- * If SME is active, this memory will be marked encrypted by the
- * kernel when it is accessed (including relocation). However, the
- * ramdisk image was loaded decrypted by the bootloader, so make
- * sure that it is encrypted before accessing it.
- */
- sme_early_encrypt(ramdisk_image, ramdisk_end - ramdisk_image);
-
initrd_start = 0;

mapped_size = memblock_mem_size(max_pfn_mapped);
--- a/arch/x86/mm/mem_encrypt.c
+++ b/arch/x86/mm/mem_encrypt.c
@@ -487,11 +487,12 @@ static unsigned long __init sme_pgtable_
return total;
}

-void __init sme_encrypt_kernel(void)
+void __init sme_encrypt_kernel(struct boot_params *bp)
{
unsigned long workarea_start, workarea_end, workarea_len;
unsigned long execute_start, execute_end, execute_len;
unsigned long kernel_start, kernel_end, kernel_len;
+ unsigned long initrd_start, initrd_end, initrd_len;
struct sme_populate_pgd_data ppd;
unsigned long pgtable_area_len;
unsigned long decrypted_base;
@@ -500,14 +501,15 @@ void __init sme_encrypt_kernel(void)
return;

/*
- * Prepare for encrypting the kernel by building new pagetables with
- * the necessary attributes needed to encrypt the kernel in place.
+ * Prepare for encrypting the kernel and initrd by building new
+ * pagetables with the necessary attributes needed to encrypt the
+ * kernel in place.
*
* One range of virtual addresses will map the memory occupied
- * by the kernel as encrypted.
+ * by the kernel and initrd as encrypted.
*
* Another range of virtual addresses will map the memory occupied
- * by the kernel as decrypted and write-protected.
+ * by the kernel and initrd as decrypted and write-protected.
*
* The use of write-protect attribute will prevent any of the
* memory from being cached.
@@ -518,6 +520,20 @@ void __init sme_encrypt_kernel(void)
kernel_end = ALIGN(__pa_symbol(_end), PMD_PAGE_SIZE);
kernel_len = kernel_end - kernel_start;

+ initrd_start = 0;
+ initrd_end = 0;
+ initrd_len = 0;
+#ifdef CONFIG_BLK_DEV_INITRD
+ initrd_len = (unsigned long)bp->hdr.ramdisk_size |
+ ((unsigned long)bp->ext_ramdisk_size << 32);
+ if (initrd_len) {
+ initrd_start = (unsigned long)bp->hdr.ramdisk_image |
+ ((unsigned long)bp->ext_ramdisk_image << 32);
+ initrd_end = PAGE_ALIGN(initrd_start + initrd_len);
+ initrd_len = initrd_end - initrd_start;
+ }
+#endif
+
/* Set the encryption workarea to be immediately after the kernel */
workarea_start = kernel_end;

@@ -540,6 +556,8 @@ void __init sme_encrypt_kernel(void)
*/
pgtable_area_len = sizeof(pgd_t) * PTRS_PER_PGD;
pgtable_area_len += sme_pgtable_calc(execute_end - kernel_start) * 2;
+ if (initrd_len)
+ pgtable_area_len += sme_pgtable_calc(initrd_len) * 2;

/* PUDs and PMDs needed in the current pagetables for the workarea */
pgtable_area_len += sme_pgtable_calc(execute_len + pgtable_area_len);
@@ -578,9 +596,9 @@ void __init sme_encrypt_kernel(void)

/*
* A new pagetable structure is being built to allow for the kernel
- * to be encrypted. It starts with an empty PGD that will then be
- * populated with new PUDs and PMDs as the encrypted and decrypted
- * kernel mappings are created.
+ * and initrd to be encrypted. It starts with an empty PGD that will
+ * then be populated with new PUDs and PMDs as the encrypted and
+ * decrypted kernel mappings are created.
*/
ppd.pgd = ppd.pgtable_area;
memset(ppd.pgd, 0, sizeof(pgd_t) * PTRS_PER_PGD);
@@ -593,6 +611,12 @@ void __init sme_encrypt_kernel(void)
* the base of the mapping.
*/
decrypted_base = (pgd_index(workarea_end) + 1) & (PTRS_PER_PGD - 1);
+ if (initrd_len) {
+ unsigned long check_base;
+
+ check_base = (pgd_index(initrd_end) + 1) & (PTRS_PER_PGD - 1);
+ decrypted_base = max(decrypted_base, check_base);
+ }
decrypted_base <<= PGDIR_SHIFT;

/* Add encrypted kernel (identity) mappings */
@@ -607,6 +631,21 @@ void __init sme_encrypt_kernel(void)
ppd.vaddr_end = kernel_end + decrypted_base;
sme_map_range_decrypted_wp(&ppd);

+ if (initrd_len) {
+ /* Add encrypted initrd (identity) mappings */
+ ppd.paddr = initrd_start;
+ ppd.vaddr = initrd_start;
+ ppd.vaddr_end = initrd_end;
+ sme_map_range_encrypted(&ppd);
+ /*
+ * Add decrypted, write-protected initrd (non-identity) mappings
+ */
+ ppd.paddr = initrd_start;
+ ppd.vaddr = initrd_start + decrypted_base;
+ ppd.vaddr_end = initrd_end + decrypted_base;
+ sme_map_range_decrypted_wp(&ppd);
+ }
+
/* Add decrypted workarea mappings to both kernel mappings */
ppd.paddr = workarea_start;
ppd.vaddr = workarea_start;
@@ -622,6 +661,11 @@ void __init sme_encrypt_kernel(void)
sme_encrypt_execute(kernel_start, kernel_start + decrypted_base,
kernel_len, workarea_start, (unsigned long)ppd.pgd);

+ if (initrd_len)
+ sme_encrypt_execute(initrd_start, initrd_start + decrypted_base,
+ initrd_len, workarea_start,
+ (unsigned long)ppd.pgd);
+
/*
* At this point we are running encrypted. Remove the mappings for
* the decrypted areas - all that is needed for this is to remove
@@ -631,6 +675,12 @@ void __init sme_encrypt_kernel(void)
ppd.vaddr_end = kernel_end + decrypted_base;
sme_clear_pgd(&ppd);

+ if (initrd_len) {
+ ppd.vaddr = initrd_start + decrypted_base;
+ ppd.vaddr_end = initrd_end + decrypted_base;
+ sme_clear_pgd(&ppd);
+ }
+
ppd.vaddr = workarea_start + decrypted_base;
ppd.vaddr_end = workarea_end + decrypted_base;
sme_clear_pgd(&ppd);
--- a/arch/x86/mm/mem_encrypt_boot.S
+++ b/arch/x86/mm/mem_encrypt_boot.S
@@ -22,9 +22,9 @@ ENTRY(sme_encrypt_execute)

/*
* Entry parameters:
- * RDI - virtual address for the encrypted kernel mapping
- * RSI - virtual address for the decrypted kernel mapping
- * RDX - length of kernel
+ * RDI - virtual address for the encrypted mapping
+ * RSI - virtual address for the decrypted mapping
+ * RDX - length to encrypt
* RCX - virtual address of the encryption workarea, including:
* - stack page (PAGE_SIZE)
* - encryption routine page (PAGE_SIZE)
@@ -41,9 +41,9 @@ ENTRY(sme_encrypt_execute)
addq $PAGE_SIZE, %rax /* Workarea encryption routine */

push %r12
- movq %rdi, %r10 /* Encrypted kernel */
- movq %rsi, %r11 /* Decrypted kernel */
- movq %rdx, %r12 /* Kernel length */
+ movq %rdi, %r10 /* Encrypted area */
+ movq %rsi, %r11 /* Decrypted area */
+ movq %rdx, %r12 /* Area length */

/* Copy encryption routine into the workarea */
movq %rax, %rdi /* Workarea encryption routine */
@@ -52,10 +52,10 @@ ENTRY(sme_encrypt_execute)
rep movsb

/* Setup registers for call */
- movq %r10, %rdi /* Encrypted kernel */
- movq %r11, %rsi /* Decrypted kernel */
+ movq %r10, %rdi /* Encrypted area */
+ movq %r11, %rsi /* Decrypted area */
movq %r8, %rdx /* Pagetables used for encryption */
- movq %r12, %rcx /* Kernel length */
+ movq %r12, %rcx /* Area length */
movq %rax, %r8 /* Workarea encryption routine */
addq $PAGE_SIZE, %r8 /* Workarea intermediate copy buffer */

@@ -71,7 +71,7 @@ ENDPROC(sme_encrypt_execute)

ENTRY(__enc_copy)
/*
- * Routine used to encrypt kernel.
+ * Routine used to encrypt memory in place.
* This routine must be run outside of the kernel proper since
* the kernel will be encrypted during the process. So this
* routine is defined here and then copied to an area outside
@@ -79,19 +79,19 @@ ENTRY(__enc_copy)
* during execution.
*
* On entry the registers must be:
- * RDI - virtual address for the encrypted kernel mapping
- * RSI - virtual address for the decrypted kernel mapping
+ * RDI - virtual address for the encrypted mapping
+ * RSI - virtual address for the decrypted mapping
* RDX - address of the pagetables to use for encryption
- * RCX - length of kernel
+ * RCX - length of area
* R8 - intermediate copy buffer
*
* RAX - points to this routine
*
- * The kernel will be encrypted by copying from the non-encrypted
- * kernel space to an intermediate buffer and then copying from the
- * intermediate buffer back to the encrypted kernel space. The physical
- * addresses of the two kernel space mappings are the same which
- * results in the kernel being encrypted "in place".
+ * The area will be encrypted by copying from the non-encrypted
+ * memory space to an intermediate buffer and then copying from the
+ * intermediate buffer back to the encrypted memory space. The physical
+ * addresses of the two mappings are the same which results in the area
+ * being encrypted "in place".
*/
/* Enable the new page tables */
mov %rdx, %cr3
@@ -106,9 +106,9 @@ ENTRY(__enc_copy)
push %r15
push %r12

- movq %rcx, %r9 /* Save kernel length */
- movq %rdi, %r10 /* Save encrypted kernel address */
- movq %rsi, %r11 /* Save decrypted kernel address */
+ movq %rcx, %r9 /* Save area length */
+ movq %rdi, %r10 /* Save encrypted area address */
+ movq %rsi, %r11 /* Save decrypted area address */

/* Set the PAT register PA5 entry to write-protect */
movl $MSR_IA32_CR_PAT, %ecx
@@ -128,13 +128,13 @@ ENTRY(__enc_copy)
movq %r9, %r12

2:
- movq %r11, %rsi /* Source - decrypted kernel */
+ movq %r11, %rsi /* Source - decrypted area */
movq %r8, %rdi /* Dest - intermediate copy buffer */
movq %r12, %rcx
rep movsb

movq %r8, %rsi /* Source - intermediate copy buffer */
- movq %r10, %rdi /* Dest - encrypted kernel */
+ movq %r10, %rdi /* Dest - encrypted area */
movq %r12, %rcx
rep movsb




2018-01-22 08:55:49

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.14 88/89] mm, page_vma_mapped: Drop faulty pointer arithmetics in check_pte()

4.14-stable review patch. If anyone has any objections, please let me know.

------------------

From: Kirill A. Shutemov <[email protected]>

commit 0d665e7b109d512b7cae3ccef6e8654714887844 upstream.

Tetsuo reported random crashes under memory pressure on 32-bit x86
system and tracked down to change that introduced
page_vma_mapped_walk().

The root cause of the issue is the faulty pointer math in check_pte().
As ->pte may point to an arbitrary page we have to check that they are
belong to the section before doing math. Otherwise it may lead to weird
results.

It wasn't noticed until now as mem_map[] is virtually contiguous on
flatmem or vmemmap sparsemem. Pointer arithmetic just works against all
'struct page' pointers. But with classic sparsemem, it doesn't because
each section memap is allocated separately and so consecutive pfns
crossing two sections might have struct pages at completely unrelated
addresses.

Let's restructure code a bit and replace pointer arithmetic with
operations on pfns.

Signed-off-by: Kirill A. Shutemov <[email protected]>
Reported-and-tested-by: Tetsuo Handa <[email protected]>
Acked-by: Michal Hocko <[email protected]>
Fixes: ace71a19cec5 ("mm: introduce page_vma_mapped_walk()")
Signed-off-by: Linus Torvalds <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
include/linux/swapops.h | 21 ++++++++++++++++
mm/page_vma_mapped.c | 63 ++++++++++++++++++++++++++++--------------------
2 files changed, 59 insertions(+), 25 deletions(-)

--- a/include/linux/swapops.h
+++ b/include/linux/swapops.h
@@ -124,6 +124,11 @@ static inline bool is_write_device_priva
return unlikely(swp_type(entry) == SWP_DEVICE_WRITE);
}

+static inline unsigned long device_private_entry_to_pfn(swp_entry_t entry)
+{
+ return swp_offset(entry);
+}
+
static inline struct page *device_private_entry_to_page(swp_entry_t entry)
{
return pfn_to_page(swp_offset(entry));
@@ -154,6 +159,11 @@ static inline bool is_write_device_priva
return false;
}

+static inline unsigned long device_private_entry_to_pfn(swp_entry_t entry)
+{
+ return 0;
+}
+
static inline struct page *device_private_entry_to_page(swp_entry_t entry)
{
return NULL;
@@ -189,6 +199,11 @@ static inline int is_write_migration_ent
return unlikely(swp_type(entry) == SWP_MIGRATION_WRITE);
}

+static inline unsigned long migration_entry_to_pfn(swp_entry_t entry)
+{
+ return swp_offset(entry);
+}
+
static inline struct page *migration_entry_to_page(swp_entry_t entry)
{
struct page *p = pfn_to_page(swp_offset(entry));
@@ -218,6 +233,12 @@ static inline int is_migration_entry(swp
{
return 0;
}
+
+static inline unsigned long migration_entry_to_pfn(swp_entry_t entry)
+{
+ return 0;
+}
+
static inline struct page *migration_entry_to_page(swp_entry_t entry)
{
return NULL;
--- a/mm/page_vma_mapped.c
+++ b/mm/page_vma_mapped.c
@@ -30,10 +30,29 @@ static bool map_pte(struct page_vma_mapp
return true;
}

+/**
+ * check_pte - check if @pvmw->page is mapped at the @pvmw->pte
+ *
+ * page_vma_mapped_walk() found a place where @pvmw->page is *potentially*
+ * mapped. check_pte() has to validate this.
+ *
+ * @pvmw->pte may point to empty PTE, swap PTE or PTE pointing to arbitrary
+ * page.
+ *
+ * If PVMW_MIGRATION flag is set, returns true if @pvmw->pte contains migration
+ * entry that points to @pvmw->page or any subpage in case of THP.
+ *
+ * If PVMW_MIGRATION flag is not set, returns true if @pvmw->pte points to
+ * @pvmw->page or any subpage in case of THP.
+ *
+ * Otherwise, return false.
+ *
+ */
static bool check_pte(struct page_vma_mapped_walk *pvmw)
{
+ unsigned long pfn;
+
if (pvmw->flags & PVMW_MIGRATION) {
-#ifdef CONFIG_MIGRATION
swp_entry_t entry;
if (!is_swap_pte(*pvmw->pte))
return false;
@@ -41,37 +60,31 @@ static bool check_pte(struct page_vma_ma

if (!is_migration_entry(entry))
return false;
- if (migration_entry_to_page(entry) - pvmw->page >=
- hpage_nr_pages(pvmw->page)) {
- return false;
- }
- if (migration_entry_to_page(entry) < pvmw->page)
- return false;
-#else
- WARN_ON_ONCE(1);
-#endif
- } else {
- if (is_swap_pte(*pvmw->pte)) {
- swp_entry_t entry;

- entry = pte_to_swp_entry(*pvmw->pte);
- if (is_device_private_entry(entry) &&
- device_private_entry_to_page(entry) == pvmw->page)
- return true;
- }
+ pfn = migration_entry_to_pfn(entry);
+ } else if (is_swap_pte(*pvmw->pte)) {
+ swp_entry_t entry;

- if (!pte_present(*pvmw->pte))
+ /* Handle un-addressable ZONE_DEVICE memory */
+ entry = pte_to_swp_entry(*pvmw->pte);
+ if (!is_device_private_entry(entry))
return false;

- /* THP can be referenced by any subpage */
- if (pte_page(*pvmw->pte) - pvmw->page >=
- hpage_nr_pages(pvmw->page)) {
- return false;
- }
- if (pte_page(*pvmw->pte) < pvmw->page)
+ pfn = device_private_entry_to_pfn(entry);
+ } else {
+ if (!pte_present(*pvmw->pte))
return false;
+
+ pfn = pte_pfn(*pvmw->pte);
}

+ if (pfn < page_to_pfn(pvmw->page))
+ return false;
+
+ /* THP can be referenced by any subpage */
+ if (pfn - page_to_pfn(pvmw->page) >= hpage_nr_pages(pvmw->page))
+ return false;
+
return true;
}




2018-01-22 08:57:04

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.14 79/89] KVM: arm/arm64: Check pagesize when allocating a hugepage at Stage 2

4.14-stable review patch. If anyone has any objections, please let me know.

------------------

From: Punit Agrawal <[email protected]>

commit c507babf10ead4d5c8cca704539b170752a8ac84 upstream.

KVM only supports PMD hugepages at stage 2 but doesn't actually check
that the provided hugepage memory pagesize is PMD_SIZE before populating
stage 2 entries.

In cases where the backing hugepage size is smaller than PMD_SIZE (such
as when using contiguous hugepages), KVM can end up creating stage 2
mappings that extend beyond the supplied memory.

Fix this by checking for the pagesize of userspace vma before creating
PMD hugepage at stage 2.

Fixes: 66b3923a1a0f77a ("arm64: hugetlb: add support for PTE contiguous bit")
Signed-off-by: Punit Agrawal <[email protected]>
Cc: Marc Zyngier <[email protected]>
Reviewed-by: Christoffer Dall <[email protected]>
Signed-off-by: Christoffer Dall <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
virt/kvm/arm/mmu.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

--- a/virt/kvm/arm/mmu.c
+++ b/virt/kvm/arm/mmu.c
@@ -1310,7 +1310,7 @@ static int user_mem_abort(struct kvm_vcp
return -EFAULT;
}

- if (is_vm_hugetlb_page(vma) && !logging_active) {
+ if (vma_kernel_pagesize(vma) == PMD_SIZE && !logging_active) {
hugetlb = true;
gfn = (fault_ipa & PMD_MASK) >> PAGE_SHIFT;
} else {



2018-01-22 08:57:18

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.14 87/89] x86/mm: Rework wbinvd, hlt operation in stop_this_cpu()

4.14-stable review patch. If anyone has any objections, please let me know.

------------------

From: Tom Lendacky <[email protected]>

commit f23d74f6c66c3697e032550eeef3f640391a3a7d upstream.

Some issues have been reported with the for loop in stop_this_cpu() that
issues the 'wbinvd; hlt' sequence. Reverting this sequence to halt()
has been shown to resolve the issue.

However, the wbinvd is needed when running with SME. The reason for the
wbinvd is to prevent cache flush races between encrypted and non-encrypted
entries that have the same physical address. This can occur when
kexec'ing from memory encryption active to inactive or vice-versa. The
important thing is to not have outside of kernel text memory references
(such as stack usage), so the usage of the native_*() functions is needed
since these expand as inline asm sequences. So instead of reverting the
change, rework the sequence.

Move the wbinvd instruction outside of the for loop as native_wbinvd()
and make its execution conditional on X86_FEATURE_SME. In the for loop,
change the asm 'wbinvd; hlt' sequence back to a halt sequence but use
the native_halt() call.

Fixes: bba4ed011a52 ("x86/mm, kexec: Allow kexec to be used with SME")
Reported-by: Dave Young <[email protected]>
Signed-off-by: Tom Lendacky <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Tested-by: Dave Young <[email protected]>
Cc: Juergen Gross <[email protected]>
Cc: Tony Luck <[email protected]>
Cc: Yu Chen <[email protected]>
Cc: Baoquan He <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: Borislav Petkov <[email protected]>
Cc: Rui Zhang <[email protected]>
Cc: Arjan van de Ven <[email protected]>
Cc: Boris Ostrovsky <[email protected]>
Cc: Dan Williams <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
arch/x86/kernel/process.c | 25 +++++++++++++++----------
1 file changed, 15 insertions(+), 10 deletions(-)

--- a/arch/x86/kernel/process.c
+++ b/arch/x86/kernel/process.c
@@ -380,19 +380,24 @@ void stop_this_cpu(void *dummy)
disable_local_APIC();
mcheck_cpu_clear(this_cpu_ptr(&cpu_info));

+ /*
+ * Use wbinvd on processors that support SME. This provides support
+ * for performing a successful kexec when going from SME inactive
+ * to SME active (or vice-versa). The cache must be cleared so that
+ * if there are entries with the same physical address, both with and
+ * without the encryption bit, they don't race each other when flushed
+ * and potentially end up with the wrong entry being committed to
+ * memory.
+ */
+ if (boot_cpu_has(X86_FEATURE_SME))
+ native_wbinvd();
for (;;) {
/*
- * Use wbinvd followed by hlt to stop the processor. This
- * provides support for kexec on a processor that supports
- * SME. With kexec, going from SME inactive to SME active
- * requires clearing cache entries so that addresses without
- * the encryption bit set don't corrupt the same physical
- * address that has the encryption bit set when caches are
- * flushed. To achieve this a wbinvd is performed followed by
- * a hlt. Even if the processor is not in the kexec/SME
- * scenario this only adds a wbinvd to a halting processor.
+ * Use native_halt() so that memory contents don't change
+ * (stack usage and variables) after possibly issuing the
+ * native_wbinvd() above.
*/
- asm volatile("wbinvd; hlt" : : : "memory");
+ native_halt();
}
}




2018-01-22 08:57:34

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.14 86/89] x86/retpoline: Optimize inline assembler for vmexit_fill_RSB

4.14-stable review patch. If anyone has any objections, please let me know.

------------------

From: Andi Kleen <[email protected]>

commit 3f7d875566d8e79c5e0b2c9a413e91b2c29e0854 upstream.

The generated assembler for the C fill RSB inline asm operations has
several issues:

- The C code sets up the loop register, which is then immediately
overwritten in __FILL_RETURN_BUFFER with the same value again.

- The C code also passes in the iteration count in another register, which
is not used at all.

Remove these two unnecessary operations. Just rely on the single constant
passed to the macro for the iterations.

Signed-off-by: Andi Kleen <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Acked-by: David Woodhouse <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
arch/x86/include/asm/nospec-branch.h | 7 ++++---
1 file changed, 4 insertions(+), 3 deletions(-)

--- a/arch/x86/include/asm/nospec-branch.h
+++ b/arch/x86/include/asm/nospec-branch.h
@@ -206,16 +206,17 @@ extern char __indirect_thunk_end[];
static inline void vmexit_fill_RSB(void)
{
#ifdef CONFIG_RETPOLINE
- unsigned long loops = RSB_CLEAR_LOOPS / 2;
+ unsigned long loops;

asm volatile (ANNOTATE_NOSPEC_ALTERNATIVE
ALTERNATIVE("jmp 910f",
__stringify(__FILL_RETURN_BUFFER(%0, RSB_CLEAR_LOOPS, %1)),
X86_FEATURE_RETPOLINE)
"910:"
- : "=&r" (loops), ASM_CALL_CONSTRAINT
- : "r" (loops) : "memory" );
+ : "=r" (loops), ASM_CALL_CONSTRAINT
+ : : "memory" );
#endif
}
+
#endif /* __ASSEMBLY__ */
#endif /* __NOSPEC_BRANCH_H__ */



2018-01-22 08:58:17

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.14 82/89] retpoline: Introduce start/end markers of indirect thunk

4.14-stable review patch. If anyone has any objections, please let me know.

------------------

From: Masami Hiramatsu <[email protected]>

commit 736e80a4213e9bbce40a7c050337047128b472ac upstream.

Introduce start/end markers of __x86_indirect_thunk_* functions.
To make it easy, consolidate .text.__x86.indirect_thunk.* sections
to one .text.__x86.indirect_thunk section and put it in the
end of kernel text section and adds __indirect_thunk_start/end
so that other subsystem (e.g. kprobes) can identify it.

Signed-off-by: Masami Hiramatsu <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Acked-by: David Woodhouse <[email protected]>
Cc: Andi Kleen <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Ananth N Mavinakayanahalli <[email protected]>
Cc: Arjan van de Ven <[email protected]>
Cc: Greg Kroah-Hartman <[email protected]>
Link: https://lkml.kernel.org/r/151629206178.10241.6828804696410044771.stgit@devbox
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
arch/x86/include/asm/nospec-branch.h | 3 +++
arch/x86/kernel/vmlinux.lds.S | 6 ++++++
arch/x86/lib/retpoline.S | 2 +-
3 files changed, 10 insertions(+), 1 deletion(-)

--- a/arch/x86/include/asm/nospec-branch.h
+++ b/arch/x86/include/asm/nospec-branch.h
@@ -194,6 +194,9 @@ enum spectre_v2_mitigation {
SPECTRE_V2_IBRS,
};

+extern char __indirect_thunk_start[];
+extern char __indirect_thunk_end[];
+
/*
* On VMEXIT we must ensure that no RSB predictions learned in the guest
* can be followed in the host, by overwriting the RSB completely. Both
--- a/arch/x86/kernel/vmlinux.lds.S
+++ b/arch/x86/kernel/vmlinux.lds.S
@@ -124,6 +124,12 @@ SECTIONS
ASSERT(. - _entry_trampoline == PAGE_SIZE, "entry trampoline is too big");
#endif

+#ifdef CONFIG_RETPOLINE
+ __indirect_thunk_start = .;
+ *(.text.__x86.indirect_thunk)
+ __indirect_thunk_end = .;
+#endif
+
/* End of text section */
_etext = .;
} :text = 0x9090
--- a/arch/x86/lib/retpoline.S
+++ b/arch/x86/lib/retpoline.S
@@ -9,7 +9,7 @@
#include <asm/nospec-branch.h>

.macro THUNK reg
- .section .text.__x86.indirect_thunk.\reg
+ .section .text.__x86.indirect_thunk

ENTRY(__x86_indirect_thunk_\reg)
CFI_STARTPROC



2018-01-22 08:58:31

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.14 83/89] kprobes/x86: Blacklist indirect thunk functions for kprobes

4.14-stable review patch. If anyone has any objections, please let me know.

------------------

From: Masami Hiramatsu <[email protected]>

commit c1804a236894ecc942da7dc6c5abe209e56cba93 upstream.

Mark __x86_indirect_thunk_* functions as blacklist for kprobes
because those functions can be called from anywhere in the kernel
including blacklist functions of kprobes.

Signed-off-by: Masami Hiramatsu <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Acked-by: David Woodhouse <[email protected]>
Cc: Andi Kleen <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Ananth N Mavinakayanahalli <[email protected]>
Cc: Arjan van de Ven <[email protected]>
Cc: Greg Kroah-Hartman <[email protected]>
Link: https://lkml.kernel.org/r/151629209111.10241.5444852823378068683.stgit@devbox
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
arch/x86/lib/retpoline.S | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)

--- a/arch/x86/lib/retpoline.S
+++ b/arch/x86/lib/retpoline.S
@@ -25,7 +25,8 @@ ENDPROC(__x86_indirect_thunk_\reg)
* than one per register with the correct names. So we do it
* the simple and nasty way...
*/
-#define EXPORT_THUNK(reg) EXPORT_SYMBOL(__x86_indirect_thunk_ ## reg)
+#define __EXPORT_THUNK(sym) _ASM_NOKPROBE(sym); EXPORT_SYMBOL(sym)
+#define EXPORT_THUNK(reg) __EXPORT_THUNK(__x86_indirect_thunk_ ## reg)
#define GENERATE_THUNK(reg) THUNK reg ; EXPORT_THUNK(reg)

GENERATE_THUNK(_ASM_AX)



2018-01-22 08:58:40

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.14 84/89] kprobes/x86: Disable optimizing on the function jumps to indirect thunk

4.14-stable review patch. If anyone has any objections, please let me know.

------------------

From: Masami Hiramatsu <[email protected]>

commit c86a32c09f8ced67971a2310e3b0dda4d1749007 upstream.

Since indirect jump instructions will be replaced by jump
to __x86_indirect_thunk_*, those jmp instruction must be
treated as an indirect jump. Since optprobe prohibits to
optimize probes in the function which uses an indirect jump,
it also needs to find out the function which jump to
__x86_indirect_thunk_* and disable optimization.

Add a check that the jump target address is between the
__indirect_thunk_start/end when optimizing kprobe.

Signed-off-by: Masami Hiramatsu <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Acked-by: David Woodhouse <[email protected]>
Cc: Andi Kleen <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Ananth N Mavinakayanahalli <[email protected]>
Cc: Arjan van de Ven <[email protected]>
Cc: Greg Kroah-Hartman <[email protected]>
Link: https://lkml.kernel.org/r/151629212062.10241.6991266100233002273.stgit@devbox
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
arch/x86/kernel/kprobes/opt.c | 23 ++++++++++++++++++++++-
1 file changed, 22 insertions(+), 1 deletion(-)

--- a/arch/x86/kernel/kprobes/opt.c
+++ b/arch/x86/kernel/kprobes/opt.c
@@ -40,6 +40,7 @@
#include <asm/debugreg.h>
#include <asm/set_memory.h>
#include <asm/sections.h>
+#include <asm/nospec-branch.h>

#include "common.h"

@@ -205,7 +206,7 @@ static int copy_optimized_instructions(u
}

/* Check whether insn is indirect jump */
-static int insn_is_indirect_jump(struct insn *insn)
+static int __insn_is_indirect_jump(struct insn *insn)
{
return ((insn->opcode.bytes[0] == 0xff &&
(X86_MODRM_REG(insn->modrm.value) & 6) == 4) || /* Jump */
@@ -239,6 +240,26 @@ static int insn_jump_into_range(struct i
return (start <= target && target <= start + len);
}

+static int insn_is_indirect_jump(struct insn *insn)
+{
+ int ret = __insn_is_indirect_jump(insn);
+
+#ifdef CONFIG_RETPOLINE
+ /*
+ * Jump to x86_indirect_thunk_* is treated as an indirect jump.
+ * Note that even with CONFIG_RETPOLINE=y, the kernel compiled with
+ * older gcc may use indirect jump. So we add this check instead of
+ * replace indirect-jump check.
+ */
+ if (!ret)
+ ret = insn_jump_into_range(insn,
+ (unsigned long)__indirect_thunk_start,
+ (unsigned long)__indirect_thunk_end -
+ (unsigned long)__indirect_thunk_start);
+#endif
+ return ret;
+}
+
/* Decode whole function to ensure any instructions don't jump into target */
static int can_optimize(unsigned long paddr)
{



2018-01-22 08:59:11

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.14 85/89] x86/pti: Document fix wrong index

4.14-stable review patch. If anyone has any objections, please let me know.

------------------

From: zhenwei.pi <[email protected]>

commit 98f0fceec7f84d80bc053e49e596088573086421 upstream.

In section <2. Runtime Cost>, fix wrong index.

Signed-off-by: zhenwei.pi <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Cc: [email protected]
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
Documentation/x86/pti.txt | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

--- a/Documentation/x86/pti.txt
+++ b/Documentation/x86/pti.txt
@@ -78,7 +78,7 @@ this protection comes at a cost:
non-PTI SYSCALL entry code, so requires mapping fewer
things into the userspace page tables. The downside is
that stacks must be switched at entry time.
- d. Global pages are disabled for all kernel structures not
+ c. Global pages are disabled for all kernel structures not
mapped into both kernel and userspace page tables. This
feature of the MMU allows different processes to share TLB
entries mapping the kernel. Losing the feature means more



2018-01-22 08:59:45

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.14 80/89] arm64: KVM: Fix SMCCC handling of unimplemented SMC/HVC calls

4.14-stable review patch. If anyone has any objections, please let me know.

------------------

From: Marc Zyngier <[email protected]>

commit acfb3b883f6d6a4b5d27ad7fdded11f6a09ae6dd upstream.

KVM doesn't follow the SMCCC when it comes to unimplemented calls,
and inject an UNDEF instead of returning an error. Since firmware
calls are now used for security mitigation, they are becoming more
common, and the undef is counter productive.

Instead, let's follow the SMCCC which states that -1 must be returned
to the caller when getting an unknown function number.

Signed-off-by: Marc Zyngier <[email protected]>
Signed-off-by: Christoffer Dall <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
arch/arm64/kvm/handle_exit.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)

--- a/arch/arm64/kvm/handle_exit.c
+++ b/arch/arm64/kvm/handle_exit.c
@@ -44,7 +44,7 @@ static int handle_hvc(struct kvm_vcpu *v

ret = kvm_psci_call(vcpu);
if (ret < 0) {
- kvm_inject_undefined(vcpu);
+ vcpu_set_reg(vcpu, 0, ~0UL);
return 1;
}

@@ -53,7 +53,7 @@ static int handle_hvc(struct kvm_vcpu *v

static int handle_smc(struct kvm_vcpu *vcpu, struct kvm_run *run)
{
- kvm_inject_undefined(vcpu);
+ vcpu_set_reg(vcpu, 0, ~0UL);
return 1;
}




2018-01-22 08:59:58

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.14 58/89] ARM: sunxi_defconfig: Enable CMA

4.14-stable review patch. If anyone has any objections, please let me know.

------------------

From: Maxime Ripard <[email protected]>

commit c13e7f313da33d1488355440f1a10feb1897480a upstream.

The DRM driver most notably, but also out of tree drivers (for now) like
the VPU or GPU drivers, are quite big consumers of large, contiguous memory
buffers. However, the sunxi_defconfig doesn't enable CMA in order to
mitigate that, which makes them almost unusable.

Enable it to make sure it somewhat works.

Signed-off-by: Maxime Ripard <[email protected]>
Signed-off-by: Arnd Bergmann <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
arch/arm/configs/sunxi_defconfig | 2 ++
1 file changed, 2 insertions(+)

--- a/arch/arm/configs/sunxi_defconfig
+++ b/arch/arm/configs/sunxi_defconfig
@@ -10,6 +10,7 @@ CONFIG_SMP=y
CONFIG_NR_CPUS=8
CONFIG_AEABI=y
CONFIG_HIGHMEM=y
+CONFIG_CMA=y
CONFIG_ARM_APPENDED_DTB=y
CONFIG_ARM_ATAG_DTB_COMPAT=y
CONFIG_CPU_FREQ=y
@@ -33,6 +34,7 @@ CONFIG_CAN_SUN4I=y
# CONFIG_WIRELESS is not set
CONFIG_DEVTMPFS=y
CONFIG_DEVTMPFS_MOUNT=y
+CONFIG_DMA_CMA=y
CONFIG_BLK_DEV_SD=y
CONFIG_ATA=y
CONFIG_AHCI_SUNXI=y



2018-01-22 08:59:59

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.14 81/89] x86/mce: Make machine check speculation protected

4.14-stable review patch. If anyone has any objections, please let me know.

------------------

From: Thomas Gleixner <[email protected]>

commit 6f41c34d69eb005e7848716bbcafc979b35037d5 upstream.

The machine check idtentry uses an indirect branch directly from the low
level code. This evades the speculation protection.

Replace it by a direct call into C code and issue the indirect call there
so the compiler can apply the proper speculation protection.

Signed-off-by: Thomas Gleixner <[email protected]>
Reviewed-by:Borislav Petkov <[email protected]>
Reviewed-by: David Woodhouse <[email protected]>
Niced-by: Peter Zijlstra <[email protected]>
Link: https://lkml.kernel.org/r/alpine.DEB.2.20.1801181626290.1847@nanos
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
arch/x86/entry/entry_64.S | 2 +-
arch/x86/include/asm/traps.h | 1 +
arch/x86/kernel/cpu/mcheck/mce.c | 5 +++++
3 files changed, 7 insertions(+), 1 deletion(-)

--- a/arch/x86/entry/entry_64.S
+++ b/arch/x86/entry/entry_64.S
@@ -1258,7 +1258,7 @@ idtentry async_page_fault do_async_page_
#endif

#ifdef CONFIG_X86_MCE
-idtentry machine_check has_error_code=0 paranoid=1 do_sym=*machine_check_vector(%rip)
+idtentry machine_check do_mce has_error_code=0 paranoid=1
#endif

/*
--- a/arch/x86/include/asm/traps.h
+++ b/arch/x86/include/asm/traps.h
@@ -88,6 +88,7 @@ dotraplinkage void do_simd_coprocessor_e
#ifdef CONFIG_X86_32
dotraplinkage void do_iret_error(struct pt_regs *, long);
#endif
+dotraplinkage void do_mce(struct pt_regs *, long);

static inline int get_si_code(unsigned long condition)
{
--- a/arch/x86/kernel/cpu/mcheck/mce.c
+++ b/arch/x86/kernel/cpu/mcheck/mce.c
@@ -1788,6 +1788,11 @@ static void unexpected_machine_check(str
void (*machine_check_vector)(struct pt_regs *, long error_code) =
unexpected_machine_check;

+dotraplinkage void do_mce(struct pt_regs *regs, long error_code)
+{
+ machine_check_vector(regs, error_code);
+}
+
/*
* Called for each booted CPU to set up machine checks.
* Must be called with preempt off:



2018-01-22 09:00:18

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.14 56/89] phy: work around phys references to usb-nop-xceiv devices

4.14-stable review patch. If anyone has any objections, please let me know.

------------------

From: Arnd Bergmann <[email protected]>

commit b7563e2796f8b23c98afcfea7363194227fa089d upstream.

Stefan Wahren reports a problem with a warning fix that was merged
for v4.15: we had lots of device nodes with a 'phys' property pointing
to a device node that is not compliant with the binding documented in
Documentation/devicetree/bindings/phy/phy-bindings.txt

This generally works because USB HCD drivers that support both the generic
phy subsystem and the older usb-phy subsystem ignore most errors from
phy_get() and related calls and then use the usb-phy driver instead.

However, it turns out that making the usb-nop-xceiv device compatible with
the generic-phy binding changes the phy_get() return code from -EINVAL to
-EPROBE_DEFER, and the dwc2 usb controller driver for bcm2835 now returns
-EPROBE_DEFER from its probe function rather than ignoring the failure,
breaking all USB support on raspberry-pi when CONFIG_GENERIC_PHY is
enabled. The same code is used in the dwc3 driver and the usb_add_hcd()
function, so a reasonable assumption would be that many other platforms
are affected as well.

I have reviewed all the related patches and concluded that "usb-nop-xceiv"
is the only USB phy that is affected by the change, and since it is by far
the most commonly referenced phy, all the other USB phy drivers appear
to be used in ways that are are either safe in DT (they don't use the
'phys' property), or in the driver (they already ignore -EPROBE_DEFER
from generic-phy when usb-phy is available).

To work around the problem, this adds a special case to _of_phy_get()
so we ignore any PHY node that is compatible with "usb-nop-xceiv",
as we know that this can never load no matter how much we defer. In the
future, we might implement a generic-phy driver for "usb-nop-xceiv"
and then remove this workaround.

Since we generally want older kernels to also want to work with the
fixed devicetree files, it would be good to backport the patch into
stable kernels as well (3.13+ are possibly affected), even though they
don't contain any of the patches that may have caused regressions.

Fixes: 014d6da6cb25 ARM: dts: bcm283x: Fix DTC warnings about missing phy-cells
Fixes: c5bbf358b790 arm: dts: nspire: Add missing #phy-cells to usb-nop-xceiv
Fixes: 44e5dced2ef6 arm: dts: marvell: Add missing #phy-cells to usb-nop-xceiv
Fixes: f568f6f554b8 ARM: dts: omap: Add missing #phy-cells to usb-nop-xceiv
Fixes: d745d5f277bf ARM: dts: imx51-zii-rdu1: Add missing #phy-cells to usb-nop-xceiv
Fixes: 915fbe59cbf2 ARM: dts: imx: Add missing #phy-cells to usb-nop-xceiv
Link: https://marc.info/?l=linux-usb&m=151518314314753&w=2
Link: https://patchwork.kernel.org/patch/10158145/
Cc: Felipe Balbi <[email protected]>
Cc: Eric Anholt <[email protected]>
Tested-by: Stefan Wahren <[email protected]>
Acked-by: Rob Herring <[email protected]>
Tested-by: Hans Verkuil <[email protected]>
Acked-by: Kishon Vijay Abraham I <[email protected]>
Signed-off-by: Arnd Bergmann <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
drivers/phy/phy-core.c | 4 ++++
1 file changed, 4 insertions(+)

--- a/drivers/phy/phy-core.c
+++ b/drivers/phy/phy-core.c
@@ -395,6 +395,10 @@ static struct phy *_of_phy_get(struct de
if (ret)
return ERR_PTR(-ENODEV);

+ /* This phy type handled by the usb-phy subsystem for now */
+ if (of_device_is_compatible(args.np, "usb-nop-xceiv"))
+ return ERR_PTR(-ENODEV);
+
mutex_lock(&phy_provider_mutex);
phy_provider = of_phy_provider_lookup(args.np);
if (IS_ERR(phy_provider) || !try_module_get(phy_provider->owner)) {



2018-01-22 09:00:20

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.14 89/89] net: mvpp2: do not disable GMAC padding

4.14-stable review patch. If anyone has any objections, please let me know.

------------------

From: Yan Markman <[email protected]>

commit e749aca84b10f3987b2ee1f76e0c7d8aacc5653c upstream.

Short fragmented packets may never be sent by the hardware when padding
is disabled. This patch stop modifying the GMAC padding bits, to leave
them to their reset value (disabled).

Fixes: 3919357fb0bb ("net: mvpp2: initialize the GMAC when using a port")
Signed-off-by: Yan Markman <[email protected]>
[Antoine: commit message]
Signed-off-by: Antoine Tenart <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
drivers/net/ethernet/marvell/mvpp2.c | 9 ---------
1 file changed, 9 deletions(-)

--- a/drivers/net/ethernet/marvell/mvpp2.c
+++ b/drivers/net/ethernet/marvell/mvpp2.c
@@ -4552,11 +4552,6 @@ static void mvpp2_port_mii_gmac_configur
MVPP22_CTRL4_QSGMII_BYPASS_ACTIVE;
val &= ~MVPP22_CTRL4_EXT_PIN_GMII_SEL;
writel(val, port->base + MVPP22_GMAC_CTRL_4_REG);
-
- val = readl(port->base + MVPP2_GMAC_CTRL_2_REG);
- val |= MVPP2_GMAC_DISABLE_PADDING;
- val &= ~MVPP2_GMAC_FLOW_CTRL_MASK;
- writel(val, port->base + MVPP2_GMAC_CTRL_2_REG);
} else if (phy_interface_mode_is_rgmii(port->phy_interface)) {
val = readl(port->base + MVPP22_GMAC_CTRL_4_REG);
val |= MVPP22_CTRL4_EXT_PIN_GMII_SEL |
@@ -4564,10 +4559,6 @@ static void mvpp2_port_mii_gmac_configur
MVPP22_CTRL4_QSGMII_BYPASS_ACTIVE;
val &= ~MVPP22_CTRL4_DP_CLK_SEL;
writel(val, port->base + MVPP22_GMAC_CTRL_4_REG);
-
- val = readl(port->base + MVPP2_GMAC_CTRL_2_REG);
- val &= ~MVPP2_GMAC_DISABLE_PADDING;
- writel(val, port->base + MVPP2_GMAC_CTRL_2_REG);
}

/* The port is connected to a copper PHY */



2018-01-22 09:00:24

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.14 57/89] ARM64: dts: marvell: armada-cp110: Fix clock resources for various node

4.14-stable review patch. If anyone has any objections, please let me know.

------------------

From: Gregory CLEMENT <[email protected]>

commit e3af9f7c6ece29fdb7fe0aeb83ac5d3077a06edb upstream.

On the CP modules we found on Armada 7K/8K, many IP block actually also
need a "functional" clock (from the bus). This patch add them which allows
to fix some issues hanging the kernel:

If Ethernet and sdhci driver are built as modules and sdhci was loaded
first then the kernel hang.

Fixes: bb16ea1742c8 ("mmc: sdhci-xenon: Fix clock resource by adding an optional bus clock")
Reported-by: Riku Voipio <[email protected]>
Signed-off-by: Gregory CLEMENT <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
arch/arm64/boot/dts/marvell/armada-cp110-master.dtsi | 13 ++++++++-----
arch/arm64/boot/dts/marvell/armada-cp110-slave.dtsi | 9 ++++++---
2 files changed, 14 insertions(+), 8 deletions(-)

--- a/arch/arm64/boot/dts/marvell/armada-cp110-master.dtsi
+++ b/arch/arm64/boot/dts/marvell/armada-cp110-master.dtsi
@@ -63,8 +63,10 @@
cpm_ethernet: ethernet@0 {
compatible = "marvell,armada-7k-pp22";
reg = <0x0 0x100000>, <0x129000 0xb000>;
- clocks = <&cpm_clk 1 3>, <&cpm_clk 1 9>, <&cpm_clk 1 5>;
- clock-names = "pp_clk", "gop_clk", "mg_clk";
+ clocks = <&cpm_clk 1 3>, <&cpm_clk 1 9>,
+ <&cpm_clk 1 5>, <&cpm_clk 1 18>;
+ clock-names = "pp_clk", "gop_clk",
+ "mg_clk","axi_clk";
marvell,system-controller = <&cpm_syscon0>;
status = "disabled";
dma-coherent;
@@ -114,7 +116,8 @@
#size-cells = <0>;
compatible = "marvell,orion-mdio";
reg = <0x12a200 0x10>;
- clocks = <&cpm_clk 1 9>, <&cpm_clk 1 5>;
+ clocks = <&cpm_clk 1 9>, <&cpm_clk 1 5>,
+ <&cpm_clk 1 6>, <&cpm_clk 1 18>;
status = "disabled";
};

@@ -295,8 +298,8 @@
compatible = "marvell,armada-cp110-sdhci";
reg = <0x780000 0x300>;
interrupts = <ICU_GRP_NSR 27 IRQ_TYPE_LEVEL_HIGH>;
- clock-names = "core";
- clocks = <&cpm_clk 1 4>;
+ clock-names = "core","axi";
+ clocks = <&cpm_clk 1 4>, <&cpm_clk 1 18>;
dma-coherent;
status = "disabled";
};
--- a/arch/arm64/boot/dts/marvell/armada-cp110-slave.dtsi
+++ b/arch/arm64/boot/dts/marvell/armada-cp110-slave.dtsi
@@ -63,8 +63,10 @@
cps_ethernet: ethernet@0 {
compatible = "marvell,armada-7k-pp22";
reg = <0x0 0x100000>, <0x129000 0xb000>;
- clocks = <&cps_clk 1 3>, <&cps_clk 1 9>, <&cps_clk 1 5>;
- clock-names = "pp_clk", "gop_clk", "mg_clk";
+ clocks = <&cps_clk 1 3>, <&cps_clk 1 9>,
+ <&cps_clk 1 5>, <&cps_clk 1 18>;
+ clock-names = "pp_clk", "gop_clk",
+ "mg_clk", "axi_clk";
marvell,system-controller = <&cps_syscon0>;
status = "disabled";
dma-coherent;
@@ -114,7 +116,8 @@
#size-cells = <0>;
compatible = "marvell,orion-mdio";
reg = <0x12a200 0x10>;
- clocks = <&cps_clk 1 9>, <&cps_clk 1 5>;
+ clocks = <&cps_clk 1 9>, <&cps_clk 1 5>,
+ <&cps_clk 1 6>, <&cps_clk 1 18>;
status = "disabled";
};




2018-01-22 09:00:29

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.14 54/89] Input: twl4030-vibra - fix sibling-node lookup

4.14-stable review patch. If anyone has any objections, please let me know.

------------------

From: Johan Hovold <[email protected]>

commit 5b189201993ab03001a398de731045bfea90c689 upstream.

A helper purported to look up a child node based on its name was using
the wrong of-helper and ended up prematurely freeing the parent of-node
while searching the whole device tree depth-first starting at the parent
node.

Fixes: 64b9e4d803b1 ("input: twl4030-vibra: Support for DT booted kernel")
Fixes: e661d0a04462 ("Input: twl4030-vibra - fix ERROR: Bad of_node_put() warning")
Signed-off-by: Johan Hovold <[email protected]>
Signed-off-by: Dmitry Torokhov <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
drivers/input/misc/twl4030-vibra.c | 6 ++++--
1 file changed, 4 insertions(+), 2 deletions(-)

--- a/drivers/input/misc/twl4030-vibra.c
+++ b/drivers/input/misc/twl4030-vibra.c
@@ -178,12 +178,14 @@ static SIMPLE_DEV_PM_OPS(twl4030_vibra_p
twl4030_vibra_suspend, twl4030_vibra_resume);

static bool twl4030_vibra_check_coexist(struct twl4030_vibra_data *pdata,
- struct device_node *node)
+ struct device_node *parent)
{
+ struct device_node *node;
+
if (pdata && pdata->coexist)
return true;

- node = of_find_node_by_name(node, "codec");
+ node = of_get_child_by_name(parent, "codec");
if (node) {
of_node_put(node);
return true;



2018-01-22 09:00:53

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.14 52/89] Input: 88pm860x-ts - fix child-node lookup

4.14-stable review patch. If anyone has any objections, please let me know.

------------------

From: Johan Hovold <[email protected]>

commit 906bf7daa0618d0ef39f4872ca42218c29a3631f upstream.

Fix child node-lookup during probe, which ended up searching the whole
device tree depth-first starting at parent rather than just matching on
its children.

To make things worse, the parent node was prematurely freed, while the
child node was leaked.

Fixes: 2e57d56747e6 ("mfd: 88pm860x: Device tree support")
Signed-off-by: Johan Hovold <[email protected]>
Signed-off-by: Dmitry Torokhov <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
drivers/input/touchscreen/88pm860x-ts.c | 16 ++++++++++++----
1 file changed, 12 insertions(+), 4 deletions(-)

--- a/drivers/input/touchscreen/88pm860x-ts.c
+++ b/drivers/input/touchscreen/88pm860x-ts.c
@@ -126,7 +126,7 @@ static int pm860x_touch_dt_init(struct p
int data, n, ret;
if (!np)
return -ENODEV;
- np = of_find_node_by_name(np, "touch");
+ np = of_get_child_by_name(np, "touch");
if (!np) {
dev_err(&pdev->dev, "Can't find touch node\n");
return -EINVAL;
@@ -144,13 +144,13 @@ static int pm860x_touch_dt_init(struct p
if (data) {
ret = pm860x_reg_write(i2c, PM8607_GPADC_MISC1, data);
if (ret < 0)
- return -EINVAL;
+ goto err_put_node;
}
/* set tsi prebias time */
if (!of_property_read_u32(np, "marvell,88pm860x-tsi-prebias", &data)) {
ret = pm860x_reg_write(i2c, PM8607_TSI_PREBIAS, data);
if (ret < 0)
- return -EINVAL;
+ goto err_put_node;
}
/* set prebias & prechg time of pen detect */
data = 0;
@@ -161,10 +161,18 @@ static int pm860x_touch_dt_init(struct p
if (data) {
ret = pm860x_reg_write(i2c, PM8607_PD_PREBIAS, data);
if (ret < 0)
- return -EINVAL;
+ goto err_put_node;
}
of_property_read_u32(np, "marvell,88pm860x-resistor-X", res_x);
+
+ of_node_put(np);
+
return 0;
+
+err_put_node:
+ of_node_put(np);
+
+ return -EINVAL;
}
#else
#define pm860x_touch_dt_init(x, y, z) (-1)



2018-01-22 09:01:06

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.14 53/89] Input: twl6040-vibra - fix child-node lookup

4.14-stable review patch. If anyone has any objections, please let me know.

------------------

From: Johan Hovold <[email protected]>

commit dcaf12a8b0bbdbfcfa2be8dff2c4948d9844b4ad upstream.

Fix child-node lookup during probe, which ended up searching the whole
device tree depth-first starting at parent rather than just matching on
its children.

Later sanity checks on node properties (which would likely be missing)
should prevent this from causing much trouble however, especially as the
original premature free of the parent node has already been fixed
separately (but that "fix" was apparently never backported to stable).

Fixes: e7ec014a47e4 ("Input: twl6040-vibra - update for device tree support")
Fixes: c52c545ead97 ("Input: twl6040-vibra - fix DT node memory management")
Signed-off-by: Johan Hovold <[email protected]>
Acked-by: Peter Ujfalusi <[email protected]>
Tested-by: H. Nikolaus Schaller <[email protected]> (on Pyra OMAP5 hardware)
Signed-off-by: Dmitry Torokhov <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
drivers/input/misc/twl6040-vibra.c | 3 +--
1 file changed, 1 insertion(+), 2 deletions(-)

--- a/drivers/input/misc/twl6040-vibra.c
+++ b/drivers/input/misc/twl6040-vibra.c
@@ -248,8 +248,7 @@ static int twl6040_vibra_probe(struct pl
int vddvibr_uV = 0;
int error;

- of_node_get(twl6040_core_dev->of_node);
- twl6040_core_node = of_find_node_by_name(twl6040_core_dev->of_node,
+ twl6040_core_node = of_get_child_by_name(twl6040_core_dev->of_node,
"vibra");
if (!twl6040_core_node) {
dev_err(&pdev->dev, "parent of node is missing?\n");



2018-01-22 09:01:18

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.14 78/89] MIPS: CM: Drop WARN_ON(vp != 0)

4.14-stable review patch. If anyone has any objections, please let me know.

------------------

From: James Hogan <[email protected]>

commit c04de7b1ad645b61c141df8ca903ba0cc03a57f7 upstream.

Since commit 68923cdc2eb3 ("MIPS: CM: Add cluster & block args to
mips_cm_lock_other()"), mips_smp_send_ipi_mask() has used
mips_cm_lock_other_cpu() with each CPU number, rather than
mips_cm_lock_other() with the first VPE in each core. Prior to r6,
multicore multithreaded systems such as dual-core dual-thread
interAptivs with CPU Idle enabled (e.g. MIPS Creator Ci40) results in
mips_cm_lock_other() repeatedly hitting WARN_ON(vp != 0).

There doesn't appear to be anything fundamentally wrong about passing a
non-zero VP/VPE number, even if it is a core's region that is locked
into the other region before r6, so remove that particular WARN_ON().

Fixes: 68923cdc2eb3 ("MIPS: CM: Add cluster & block args to mips_cm_lock_other()")
Signed-off-by: James Hogan <[email protected]>
Reviewed-by: Paul Burton <[email protected]>
Cc: [email protected]
Patchwork: https://patchwork.linux-mips.org/patch/17883/
Signed-off-by: Ralf Baechle <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
arch/mips/kernel/mips-cm.c | 1 -
1 file changed, 1 deletion(-)

--- a/arch/mips/kernel/mips-cm.c
+++ b/arch/mips/kernel/mips-cm.c
@@ -292,7 +292,6 @@ void mips_cm_lock_other(unsigned int clu
*this_cpu_ptr(&cm_core_lock_flags));
} else {
WARN_ON(cluster != 0);
- WARN_ON(vp != 0);
WARN_ON(block != CM_GCR_Cx_OTHER_BLOCK_LOCAL);

/*



2018-01-22 09:01:48

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.14 75/89] dm crypt: fix error return code in crypt_ctr()

4.14-stable review patch. If anyone has any objections, please let me know.

------------------

From: Wei Yongjun <[email protected]>

commit 3cc2e57c4beabcbbaa46e1ac6d77ca8276a4a42d upstream.

Fix to return error code -ENOMEM from the mempool_create_kmalloc_pool()
error handling case instead of 0, as done elsewhere in this function.

Fixes: ef43aa38063a6 ("dm crypt: add cryptographic data integrity protection (authenticated encryption)")
Signed-off-by: Wei Yongjun <[email protected]>
Signed-off-by: Mike Snitzer <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
drivers/md/dm-crypt.c | 1 +
1 file changed, 1 insertion(+)

--- a/drivers/md/dm-crypt.c
+++ b/drivers/md/dm-crypt.c
@@ -2746,6 +2746,7 @@ static int crypt_ctr(struct dm_target *t
cc->tag_pool_max_sectors * cc->on_disk_tag_size);
if (!cc->tag_pool) {
ti->error = "Cannot allocate integrity tags mempool";
+ ret = -ENOMEM;
goto bad;
}




2018-01-22 09:02:01

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.14 72/89] dm integrity: dont store cipher request on the stack

4.14-stable review patch. If anyone has any objections, please let me know.

------------------

From: Mikulas Patocka <[email protected]>

commit 717f4b1c52135f279112df82583e0c77e80f90de upstream.

Some asynchronous cipher implementations may use DMA. The stack may
be mapped in the vmalloc area that doesn't support DMA. Therefore,
the cipher request and initialization vector shouldn't be on the
stack.

Fix this by allocating the request and iv with kmalloc.

Signed-off-by: Mikulas Patocka <[email protected]>
Signed-off-by: Mike Snitzer <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
drivers/md/dm-integrity.c | 49 ++++++++++++++++++++++++++++++++++------------
1 file changed, 37 insertions(+), 12 deletions(-)

--- a/drivers/md/dm-integrity.c
+++ b/drivers/md/dm-integrity.c
@@ -2558,7 +2558,8 @@ static int create_journal(struct dm_inte
int r = 0;
unsigned i;
__u64 journal_pages, journal_desc_size, journal_tree_size;
- unsigned char *crypt_data = NULL;
+ unsigned char *crypt_data = NULL, *crypt_iv = NULL;
+ struct skcipher_request *req = NULL;

ic->commit_ids[0] = cpu_to_le64(0x1111111111111111ULL);
ic->commit_ids[1] = cpu_to_le64(0x2222222222222222ULL);
@@ -2616,9 +2617,20 @@ static int create_journal(struct dm_inte

if (blocksize == 1) {
struct scatterlist *sg;
- SKCIPHER_REQUEST_ON_STACK(req, ic->journal_crypt);
- unsigned char iv[ivsize];
- skcipher_request_set_tfm(req, ic->journal_crypt);
+
+ req = skcipher_request_alloc(ic->journal_crypt, GFP_KERNEL);
+ if (!req) {
+ *error = "Could not allocate crypt request";
+ r = -ENOMEM;
+ goto bad;
+ }
+
+ crypt_iv = kmalloc(ivsize, GFP_KERNEL);
+ if (!crypt_iv) {
+ *error = "Could not allocate iv";
+ r = -ENOMEM;
+ goto bad;
+ }

ic->journal_xor = dm_integrity_alloc_page_list(ic);
if (!ic->journal_xor) {
@@ -2640,9 +2652,9 @@ static int create_journal(struct dm_inte
sg_set_buf(&sg[i], va, PAGE_SIZE);
}
sg_set_buf(&sg[i], &ic->commit_ids, sizeof ic->commit_ids);
- memset(iv, 0x00, ivsize);
+ memset(crypt_iv, 0x00, ivsize);

- skcipher_request_set_crypt(req, sg, sg, PAGE_SIZE * ic->journal_pages + sizeof ic->commit_ids, iv);
+ skcipher_request_set_crypt(req, sg, sg, PAGE_SIZE * ic->journal_pages + sizeof ic->commit_ids, crypt_iv);
init_completion(&comp.comp);
comp.in_flight = (atomic_t)ATOMIC_INIT(1);
if (do_crypt(true, req, &comp))
@@ -2658,10 +2670,22 @@ static int create_journal(struct dm_inte
crypto_free_skcipher(ic->journal_crypt);
ic->journal_crypt = NULL;
} else {
- SKCIPHER_REQUEST_ON_STACK(req, ic->journal_crypt);
- unsigned char iv[ivsize];
unsigned crypt_len = roundup(ivsize, blocksize);

+ req = skcipher_request_alloc(ic->journal_crypt, GFP_KERNEL);
+ if (!req) {
+ *error = "Could not allocate crypt request";
+ r = -ENOMEM;
+ goto bad;
+ }
+
+ crypt_iv = kmalloc(ivsize, GFP_KERNEL);
+ if (!crypt_iv) {
+ *error = "Could not allocate iv";
+ r = -ENOMEM;
+ goto bad;
+ }
+
crypt_data = kmalloc(crypt_len, GFP_KERNEL);
if (!crypt_data) {
*error = "Unable to allocate crypt data";
@@ -2669,8 +2693,6 @@ static int create_journal(struct dm_inte
goto bad;
}

- skcipher_request_set_tfm(req, ic->journal_crypt);
-
ic->journal_scatterlist = dm_integrity_alloc_journal_scatterlist(ic, ic->journal);
if (!ic->journal_scatterlist) {
*error = "Unable to allocate sg list";
@@ -2694,12 +2716,12 @@ static int create_journal(struct dm_inte
struct skcipher_request *section_req;
__u32 section_le = cpu_to_le32(i);

- memset(iv, 0x00, ivsize);
+ memset(crypt_iv, 0x00, ivsize);
memset(crypt_data, 0x00, crypt_len);
memcpy(crypt_data, &section_le, min((size_t)crypt_len, sizeof(section_le)));

sg_init_one(&sg, crypt_data, crypt_len);
- skcipher_request_set_crypt(req, &sg, &sg, crypt_len, iv);
+ skcipher_request_set_crypt(req, &sg, &sg, crypt_len, crypt_iv);
init_completion(&comp.comp);
comp.in_flight = (atomic_t)ATOMIC_INIT(1);
if (do_crypt(true, req, &comp))
@@ -2757,6 +2779,9 @@ retest_commit_id:
}
bad:
kfree(crypt_data);
+ kfree(crypt_iv);
+ skcipher_request_free(req);
+
return r;
}




2018-01-22 09:02:28

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.14 70/89] dm btree: fix serious bug in btree_split_beneath()

4.14-stable review patch. If anyone has any objections, please let me know.

------------------

From: Joe Thornber <[email protected]>

commit bc68d0a43560e950850fc69b58f0f8254b28f6d6 upstream.

When inserting a new key/value pair into a btree we walk down the spine of
btree nodes performing the following 2 operations:

i) space for a new entry
ii) adjusting the first key entry if the new key is lower than any in the node.

If the _root_ node is full, the function btree_split_beneath() allocates 2 new
nodes, and redistibutes the root nodes entries between them. The root node is
left with 2 entries corresponding to the 2 new nodes.

btree_split_beneath() then adjusts the spine to point to one of the two new
children. This means the first key is never adjusted if the new key was lower,
ie. operation (ii) gets missed out. This can result in the new key being
'lost' for a period; until another low valued key is inserted that will uncover
it.

This is a serious bug, and quite hard to make trigger in normal use. A
reproducing test case ("thin create devices-in-reverse-order") is
available as part of the thin-provision-tools project:
https://github.com/jthornber/thin-provisioning-tools/blob/master/functional-tests/device-mapper/dm-tests.scm#L593

Fix the issue by changing btree_split_beneath() so it no longer adjusts
the spine. Instead it unlocks both the new nodes, and lets the main
loop in btree_insert_raw() relock the appropriate one and make any
neccessary adjustments.

Reported-by: Monty Pavel <[email protected]>
Signed-off-by: Joe Thornber <[email protected]>
Signed-off-by: Mike Snitzer <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
drivers/md/persistent-data/dm-btree.c | 19 ++-----------------
1 file changed, 2 insertions(+), 17 deletions(-)

--- a/drivers/md/persistent-data/dm-btree.c
+++ b/drivers/md/persistent-data/dm-btree.c
@@ -683,23 +683,8 @@ static int btree_split_beneath(struct sh
pn->keys[1] = rn->keys[0];
memcpy_disk(value_ptr(pn, 1), &val, sizeof(__le64));

- /*
- * rejig the spine. This is ugly, since it knows too
- * much about the spine
- */
- if (s->nodes[0] != new_parent) {
- unlock_block(s->info, s->nodes[0]);
- s->nodes[0] = new_parent;
- }
- if (key < le64_to_cpu(rn->keys[0])) {
- unlock_block(s->info, right);
- s->nodes[1] = left;
- } else {
- unlock_block(s->info, left);
- s->nodes[1] = right;
- }
- s->count = 2;
-
+ unlock_block(s->info, left);
+ unlock_block(s->info, right);
return 0;
}




2018-01-22 09:02:30

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.14 76/89] x86: Use __nostackprotect for sme_encrypt_kernel

4.14-stable review patch. If anyone has any objections, please let me know.

------------------

From: Laura Abbott <[email protected]>

commit 91cfc88c66bf8ab95937606569670cf67fa73e09 upstream.

Commit bacf6b499e11 ("x86/mm: Use a struct to reduce parameters for SME
PGD mapping") moved some parameters into a structure.

The structure was large enough to trigger the stack protection canary in
sme_encrypt_kernel which doesn't work this early, causing reboots.

Mark sme_encrypt_kernel appropriately to not use the canary.

Fixes: bacf6b499e11 ("x86/mm: Use a struct to reduce parameters for SME PGD mapping")
Signed-off-by: Laura Abbott <[email protected]>
Cc: Tom Lendacky <[email protected]>
Cc: Ingo Molnar <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
arch/x86/mm/mem_encrypt.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

--- a/arch/x86/mm/mem_encrypt.c
+++ b/arch/x86/mm/mem_encrypt.c
@@ -487,7 +487,7 @@ static unsigned long __init sme_pgtable_
return total;
}

-void __init sme_encrypt_kernel(struct boot_params *bp)
+void __init __nostackprotector sme_encrypt_kernel(struct boot_params *bp)
{
unsigned long workarea_start, workarea_end, workarea_len;
unsigned long execute_start, execute_end, execute_len;



2018-01-22 09:02:36

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.14 73/89] dm crypt: fix crash by adding missing check for auth key size

4.14-stable review patch. If anyone has any objections, please let me know.

------------------

From: Milan Broz <[email protected]>

commit 27c7003697fc2c78f965984aa224ef26cd6b2949 upstream.

If dm-crypt uses authenticated mode with separate MAC, there are two
concatenated part of the key structure - key(s) for encryption and
authentication key.

Add a missing check for authenticated key length. If this key length is
smaller than actually provided key, dm-crypt now properly fails instead
of crashing.

Fixes: ef43aa3806 ("dm crypt: add cryptographic data integrity protection (authenticated encryption)")
Reported-by: Salah Coronya <[email protected]>
Signed-off-by: Milan Broz <[email protected]>
Signed-off-by: Mike Snitzer <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
drivers/md/dm-crypt.c | 7 ++++++-
1 file changed, 6 insertions(+), 1 deletion(-)

--- a/drivers/md/dm-crypt.c
+++ b/drivers/md/dm-crypt.c
@@ -1954,10 +1954,15 @@ static int crypt_setkey(struct crypt_con
/* Ignore extra keys (which are used for IV etc) */
subkey_size = crypt_subkey_size(cc);

- if (crypt_integrity_hmac(cc))
+ if (crypt_integrity_hmac(cc)) {
+ if (subkey_size < cc->key_mac_size)
+ return -EINVAL;
+
crypt_copy_authenckey(cc->authenc_key, cc->key,
subkey_size - cc->key_mac_size,
cc->key_mac_size);
+ }
+
for (i = 0; i < cc->tfms_count; i++) {
if (crypt_integrity_hmac(cc))
r = crypto_aead_setkey(cc->cipher_tfm.tfms_aead[i],



2018-01-22 09:02:39

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.14 77/89] alpha/PCI: Fix noname IRQ level detection

4.14-stable review patch. If anyone has any objections, please let me know.

------------------

From: Lorenzo Pieralisi <[email protected]>

commit 86be89939d11a84800f66e2a283b915b704bf33d upstream.

The conversion of the alpha architecture PCI host bridge legacy IRQ
mapping/swizzling to the new PCI host bridge map/swizzle hooks carried
out through:

commit 0e4c2eeb758a ("alpha/PCI: Replace pci_fixup_irqs() call with
host bridge IRQ mapping hooks")

implies that IRQ for devices are now allocated through pci_assign_irq()
function in pci_device_probe() that is called when a driver matching a
device is found in order to probe the device through the device driver.

Alpha noname platforms required IRQ level programming to be executed
in sio_fixup_irq_levels(), that is called in noname_init_pci(), a
platform hook called within a subsys_initcall.

In noname_init_pci(), present IRQs are detected through
sio_collect_irq_levels() that check the struct pci_dev->irq number
to detect if an IRQ has been allocated for the device.

By the time sio_collect_irq_levels() is called, some devices may still
have not a matching driver loaded to match them (eg loadable module)
therefore their IRQ allocation is still pending - which means that
sio_collect_irq_levels() does not programme the correct IRQ level for
those devices, causing their IRQ handling to be broken when the device
driver is actually loaded and the device is probed.

Fix the issue by adding code in the noname map_irq() function
(noname_map_irq()) that, whilst mapping/swizzling the IRQ line, it also
ensures that the correct IRQ level programming is executed at platform
level, fixing the issue.

Fixes: 0e4c2eeb758a ("alpha/PCI: Replace pci_fixup_irqs() call with
host bridge IRQ mapping hooks")
Reported-by: Mikulas Patocka <[email protected]>
Signed-off-by: Lorenzo Pieralisi <[email protected]>
Cc: Bjorn Helgaas <[email protected]>
Cc: Richard Henderson <[email protected]>
Cc: Ivan Kokshaysky <[email protected]>
Cc: Mikulas Patocka <[email protected]>
Cc: Meelis Roos <[email protected]>
Signed-off-by: Matt Turner <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
arch/alpha/kernel/sys_sio.c | 35 +++++++++++++++++++++++++++++------
1 file changed, 29 insertions(+), 6 deletions(-)

--- a/arch/alpha/kernel/sys_sio.c
+++ b/arch/alpha/kernel/sys_sio.c
@@ -102,6 +102,15 @@ sio_pci_route(void)
alpha_mv.sys.sio.route_tab);
}

+static bool sio_pci_dev_irq_needs_level(const struct pci_dev *dev)
+{
+ if ((dev->class >> 16 == PCI_BASE_CLASS_BRIDGE) &&
+ (dev->class >> 8 != PCI_CLASS_BRIDGE_PCMCIA))
+ return false;
+
+ return true;
+}
+
static unsigned int __init
sio_collect_irq_levels(void)
{
@@ -110,8 +119,7 @@ sio_collect_irq_levels(void)

/* Iterate through the devices, collecting IRQ levels. */
for_each_pci_dev(dev) {
- if ((dev->class >> 16 == PCI_BASE_CLASS_BRIDGE) &&
- (dev->class >> 8 != PCI_CLASS_BRIDGE_PCMCIA))
+ if (!sio_pci_dev_irq_needs_level(dev))
continue;

if (dev->irq)
@@ -120,8 +128,7 @@ sio_collect_irq_levels(void)
return level_bits;
}

-static void __init
-sio_fixup_irq_levels(unsigned int level_bits)
+static void __sio_fixup_irq_levels(unsigned int level_bits, bool reset)
{
unsigned int old_level_bits;

@@ -139,12 +146,21 @@ sio_fixup_irq_levels(unsigned int level_
*/
old_level_bits = inb(0x4d0) | (inb(0x4d1) << 8);

- level_bits |= (old_level_bits & 0x71ff);
+ if (reset)
+ old_level_bits &= 0x71ff;
+
+ level_bits |= old_level_bits;

outb((level_bits >> 0) & 0xff, 0x4d0);
outb((level_bits >> 8) & 0xff, 0x4d1);
}

+static inline void
+sio_fixup_irq_levels(unsigned int level_bits)
+{
+ __sio_fixup_irq_levels(level_bits, true);
+}
+
static inline int
noname_map_irq(const struct pci_dev *dev, u8 slot, u8 pin)
{
@@ -181,7 +197,14 @@ noname_map_irq(const struct pci_dev *dev
const long min_idsel = 6, max_idsel = 14, irqs_per_slot = 5;
int irq = COMMON_TABLE_LOOKUP, tmp;
tmp = __kernel_extbl(alpha_mv.sys.sio.route_tab, irq);
- return irq >= 0 ? tmp : -1;
+
+ irq = irq >= 0 ? tmp : -1;
+
+ /* Fixup IRQ level if an actual IRQ mapping is detected */
+ if (sio_pci_dev_irq_needs_level(dev) && irq >= 0)
+ __sio_fixup_irq_levels(1 << irq, false);
+
+ return irq;
}

static inline int



2018-01-22 09:02:45

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.14 69/89] drm/vmwgfx: fix memory corruption with legacy/sou connectors

4.14-stable review patch. If anyone has any objections, please let me know.

------------------

From: Rob Clark <[email protected]>

commit 8a510a5c75261ba0ec39155326982aa786541e29 upstream.

It looks like in all cases 'struct vmw_connector_state' is used. But
only in stdu connectors, was atomic_{duplicate,destroy}_state() properly
subclassed. Leading to writes beyond the end of the allocated connector
state block and all sorts of fun memory corruption related crashes.

Fixes: d7721ca71126 "drm/vmwgfx: Connector atomic state"
Signed-off-by: Rob Clark <[email protected]>
Reviewed-by: Thomas Hellstrom <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
drivers/gpu/drm/vmwgfx/vmwgfx_ldu.c | 4 ++--
drivers/gpu/drm/vmwgfx/vmwgfx_scrn.c | 4 ++--
2 files changed, 4 insertions(+), 4 deletions(-)

--- a/drivers/gpu/drm/vmwgfx/vmwgfx_ldu.c
+++ b/drivers/gpu/drm/vmwgfx/vmwgfx_ldu.c
@@ -266,8 +266,8 @@ static const struct drm_connector_funcs
.set_property = vmw_du_connector_set_property,
.destroy = vmw_ldu_connector_destroy,
.reset = vmw_du_connector_reset,
- .atomic_duplicate_state = drm_atomic_helper_connector_duplicate_state,
- .atomic_destroy_state = drm_atomic_helper_connector_destroy_state,
+ .atomic_duplicate_state = vmw_du_connector_duplicate_state,
+ .atomic_destroy_state = vmw_du_connector_destroy_state,
.atomic_set_property = vmw_du_connector_atomic_set_property,
.atomic_get_property = vmw_du_connector_atomic_get_property,
};
--- a/drivers/gpu/drm/vmwgfx/vmwgfx_scrn.c
+++ b/drivers/gpu/drm/vmwgfx/vmwgfx_scrn.c
@@ -420,8 +420,8 @@ static const struct drm_connector_funcs
.set_property = vmw_du_connector_set_property,
.destroy = vmw_sou_connector_destroy,
.reset = vmw_du_connector_reset,
- .atomic_duplicate_state = drm_atomic_helper_connector_duplicate_state,
- .atomic_destroy_state = drm_atomic_helper_connector_destroy_state,
+ .atomic_duplicate_state = vmw_du_connector_duplicate_state,
+ .atomic_destroy_state = vmw_du_connector_destroy_state,
.atomic_set_property = vmw_du_connector_atomic_set_property,
.atomic_get_property = vmw_du_connector_atomic_get_property,
};



2018-01-22 09:02:54

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.14 74/89] dm crypt: wipe kernel key copy after IV initialization

4.14-stable review patch. If anyone has any objections, please let me know.

------------------

From: Ondrej Kozina <[email protected]>

commit dc94902bde1e158cd19c4deab208e5d6eb382a44 upstream.

Loading key via kernel keyring service erases the internal
key copy immediately after we pass it in crypto layer. This is
wrong because IV is initialized later and we use wrong key
for the initialization (instead of real key there's just zeroed
block).

The bug may cause data corruption if key is loaded via kernel keyring
service first and later same crypt device is reactivated using exactly
same key in hexbyte representation, or vice versa. The bug (and fix)
affects only ciphers using following IVs: essiv, lmk and tcw.

Fixes: c538f6ec9f56 ("dm crypt: add ability to use keys from the kernel key retention service")
Signed-off-by: Ondrej Kozina <[email protected]>
Reviewed-by: Milan Broz <[email protected]>
Signed-off-by: Mike Snitzer <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
drivers/md/dm-crypt.c | 12 ++++++++----
1 file changed, 8 insertions(+), 4 deletions(-)

--- a/drivers/md/dm-crypt.c
+++ b/drivers/md/dm-crypt.c
@@ -2058,9 +2058,6 @@ static int crypt_set_keyring_key(struct

ret = crypt_setkey(cc);

- /* wipe the kernel key payload copy in each case */
- memset(cc->key, 0, cc->key_size * sizeof(u8));
-
if (!ret) {
set_bit(DM_CRYPT_KEY_VALID, &cc->flags);
kzfree(cc->key_string);
@@ -2528,6 +2525,10 @@ static int crypt_ctr_cipher(struct dm_ta
}
}

+ /* wipe the kernel key payload copy */
+ if (cc->key_string)
+ memset(cc->key, 0, cc->key_size * sizeof(u8));
+
return ret;
}

@@ -2966,6 +2967,9 @@ static int crypt_message(struct dm_targe
return ret;
if (cc->iv_gen_ops && cc->iv_gen_ops->init)
ret = cc->iv_gen_ops->init(cc);
+ /* wipe the kernel key payload copy */
+ if (cc->key_string)
+ memset(cc->key, 0, cc->key_size * sizeof(u8));
return ret;
}
if (argc == 2 && !strcasecmp(argv[1], "wipe")) {
@@ -3012,7 +3016,7 @@ static void crypt_io_hints(struct dm_tar

static struct target_type crypt_target = {
.name = "crypt",
- .version = {1, 18, 0},
+ .version = {1, 18, 1},
.module = THIS_MODULE,
.ctr = crypt_ctr,
.dtr = crypt_dtr,



2018-01-22 09:02:59

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.14 71/89] dm thin metadata: THIN_MAX_CONCURRENT_LOCKS should be 6

4.14-stable review patch. If anyone has any objections, please let me know.

------------------

From: Dennis Yang <[email protected]>

commit 490ae017f54e55bde382d45ea24bddfb6d1a0aaf upstream.

For btree removal, there is a corner case that a single thread
could takes 6 locks which is more than THIN_MAX_CONCURRENT_LOCKS(5)
and leads to deadlock.

A btree removal might eventually call
rebalance_children()->rebalance3() to rebalance entries of three
neighbor child nodes when shadow_spine has already acquired two
write locks. In rebalance3(), it tries to shadow and acquire the
write locks of all three child nodes. However, shadowing a child
node requires acquiring a read lock of the original child node and
a write lock of the new block. Although the read lock will be
released after block shadowing, shadowing the third child node
in rebalance3() could still take the sixth lock.
(2 write locks for shadow_spine +
2 write locks for the first two child nodes's shadow +
1 write lock for the last child node's shadow +
1 read lock for the last child node)

Signed-off-by: Dennis Yang <[email protected]>
Acked-by: Joe Thornber <[email protected]>
Signed-off-by: Mike Snitzer <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
drivers/md/dm-thin-metadata.c | 6 +++++-
1 file changed, 5 insertions(+), 1 deletion(-)

--- a/drivers/md/dm-thin-metadata.c
+++ b/drivers/md/dm-thin-metadata.c
@@ -80,10 +80,14 @@
#define SECTOR_TO_BLOCK_SHIFT 3

/*
+ * For btree insert:
* 3 for btree insert +
* 2 for btree lookup used within space map
+ * For btree remove:
+ * 2 for shadow spine +
+ * 4 for rebalance 3 child node
*/
-#define THIN_MAX_CONCURRENT_LOCKS 5
+#define THIN_MAX_CONCURRENT_LOCKS 6

/* This should be plenty */
#define SPACE_MAP_ROOT_SIZE 128



2018-01-22 09:03:00

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.14 51/89] Input: synaptics-rmi4 - prevent UAF reported by KASAN

4.14-stable review patch. If anyone has any objections, please let me know.

------------------

From: Nick Desaulniers <[email protected]>

commit 55edde9fff1ae4114c893c572e641620c76c9c21 upstream.

KASAN found a UAF due to dangling pointer. As the report below says,
rmi_f11_attention() accesses drvdata->attn_data.data, which was freed in
rmi_irq_fn.

[ 311.424062] BUG: KASAN: use-after-free in rmi_f11_attention+0x526/0x5e0 [rmi_core]
[ 311.424067] Read of size 27 at addr ffff88041fd610db by task irq/131-i2c_hid/1162
[ 311.424075] CPU: 0 PID: 1162 Comm: irq/131-i2c_hid Not tainted 4.15.0-rc8+ #2
[ 311.424076] Hardware name: Razer Blade Stealth/Razer, BIOS 6.05 01/26/2017
[ 311.424078] Call Trace:
[ 311.424086] dump_stack+0xae/0x12d
[ 311.424090] ? _atomic_dec_and_lock+0x103/0x103
[ 311.424094] ? show_regs_print_info+0xa/0xa
[ 311.424099] ? input_handle_event+0x10b/0x810
[ 311.424104] print_address_description+0x65/0x229
[ 311.424108] kasan_report.cold.5+0xa7/0x281
[ 311.424117] rmi_f11_attention+0x526/0x5e0 [rmi_core]
[ 311.424123] ? memcpy+0x1f/0x50
[ 311.424132] ? rmi_f11_attention+0x526/0x5e0 [rmi_core]
[ 311.424143] ? rmi_f11_probe+0x1e20/0x1e20 [rmi_core]
[ 311.424153] ? rmi_process_interrupt_requests+0x220/0x2a0 [rmi_core]
[ 311.424163] ? rmi_irq_fn+0x22c/0x270 [rmi_core]
[ 311.424173] ? rmi_process_interrupt_requests+0x2a0/0x2a0 [rmi_core]
[ 311.424177] ? free_irq+0xa0/0xa0
[ 311.424180] ? irq_finalize_oneshot.part.39+0xeb/0x180
[ 311.424190] ? rmi_process_interrupt_requests+0x2a0/0x2a0 [rmi_core]
[ 311.424193] ? irq_thread_fn+0x3d/0x80
[ 311.424197] ? irq_finalize_oneshot.part.39+0x180/0x180
[ 311.424200] ? irq_thread+0x21d/0x290
[ 311.424203] ? irq_thread_check_affinity+0x170/0x170
[ 311.424207] ? remove_wait_queue+0x150/0x150
[ 311.424212] ? kasan_unpoison_shadow+0x30/0x40
[ 311.424214] ? __init_waitqueue_head+0xa0/0xd0
[ 311.424218] ? task_non_contending.cold.55+0x18/0x18
[ 311.424221] ? irq_forced_thread_fn+0xa0/0xa0
[ 311.424226] ? irq_thread_check_affinity+0x170/0x170
[ 311.424230] ? kthread+0x19e/0x1c0
[ 311.424233] ? kthread_create_worker_on_cpu+0xc0/0xc0
[ 311.424237] ? ret_from_fork+0x32/0x40

[ 311.424244] Allocated by task 899:
[ 311.424249] kasan_kmalloc+0xbf/0xe0
[ 311.424252] __kmalloc_track_caller+0xd9/0x1f0
[ 311.424255] kmemdup+0x17/0x40
[ 311.424264] rmi_set_attn_data+0xa4/0x1b0 [rmi_core]
[ 311.424269] rmi_raw_event+0x10b/0x1f0 [hid_rmi]
[ 311.424278] hid_input_report+0x1a8/0x2c0 [hid]
[ 311.424283] i2c_hid_irq+0x146/0x1d0 [i2c_hid]
[ 311.424286] irq_thread_fn+0x3d/0x80
[ 311.424288] irq_thread+0x21d/0x290
[ 311.424291] kthread+0x19e/0x1c0
[ 311.424293] ret_from_fork+0x32/0x40

[ 311.424296] Freed by task 1162:
[ 311.424300] kasan_slab_free+0x71/0xc0
[ 311.424303] kfree+0x90/0x190
[ 311.424311] rmi_irq_fn+0x1b2/0x270 [rmi_core]
[ 311.424319] rmi_irq_fn+0x257/0x270 [rmi_core]
[ 311.424322] irq_thread_fn+0x3d/0x80
[ 311.424324] irq_thread+0x21d/0x290
[ 311.424327] kthread+0x19e/0x1c0
[ 311.424330] ret_from_fork+0x32/0x40

[ 311.424334] The buggy address belongs to the object at ffff88041fd610c0 which belongs to the cache kmalloc-64 of size 64
[ 311.424340] The buggy address is located 27 bytes inside of 64-byte region [ffff88041fd610c0, ffff88041fd61100)
[ 311.424344] The buggy address belongs to the page:
[ 311.424348] page:ffffea00107f5840 count:1 mapcount:0 mapping: (null) index:0x0
[ 311.424353] flags: 0x17ffffc0000100(slab)
[ 311.424358] raw: 0017ffffc0000100 0000000000000000 0000000000000000 00000001802a002a
[ 311.424363] raw: dead000000000100 dead000000000200 ffff8804228036c0 0000000000000000
[ 311.424366] page dumped because: kasan: bad access detected

[ 311.424369] Memory state around the buggy address:
[ 311.424373] ffff88041fd60f80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 311.424377] ffff88041fd61000: fb fb fb fb fb fb fb fb fc fc fc fc fb fb fb fb
[ 311.424381] >ffff88041fd61080: fb fb fb fb fc fc fc fc fb fb fb fb fb fb fb fb
[ 311.424384] ^
[ 311.424387] ffff88041fd61100: fc fc fc fc fb fb fb fb fb fb fb fb fc fc fc fc
[ 311.424391] ffff88041fd61180: fb fb fb fb fb fb fb fb fc fc fc fc fb fb fb fb

Signed-off-by: Nick Desaulniers <[email protected]>
Signed-off-by: Dmitry Torokhov <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
drivers/input/rmi4/rmi_driver.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)

--- a/drivers/input/rmi4/rmi_driver.c
+++ b/drivers/input/rmi4/rmi_driver.c
@@ -230,8 +230,10 @@ static irqreturn_t rmi_irq_fn(int irq, v
rmi_dbg(RMI_DEBUG_CORE, &rmi_dev->dev,
"Failed to process interrupt request: %d\n", ret);

- if (count)
+ if (count) {
kfree(attn_data.data);
+ attn_data.data = NULL;
+ }

if (!kfifo_is_empty(&drvdata->attn_fifo))
return rmi_irq_fn(irq, dev_id);



2018-01-22 09:03:13

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.14 67/89] scsi: libsas: Disable asynchronous aborts for SATA devices

4.14-stable review patch. If anyone has any objections, please let me know.

------------------

From: Hannes Reinecke <[email protected]>

commit c9f926000fe3b84135a81602a9f7e63a6a7898e2 upstream.

Handling CD-ROM devices from libsas is decidedly odd, as libata relies
on SCSI EH to be started to figure out that no medium is present. So we
cannot do asynchronous aborts for SATA devices.

Fixes: 909657615d9 ("scsi: libsas: allow async aborts")
Signed-off-by: Hannes Reinecke <[email protected]>
Reviewed-by: Christoph Hellwig <[email protected]>
Tested-by: Yves-Alexis Perez <[email protected]>
Signed-off-by: Martin K. Petersen <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
drivers/scsi/libsas/sas_scsi_host.c | 17 +++++++++++++++--
1 file changed, 15 insertions(+), 2 deletions(-)

--- a/drivers/scsi/libsas/sas_scsi_host.c
+++ b/drivers/scsi/libsas/sas_scsi_host.c
@@ -486,15 +486,28 @@ static int sas_queue_reset(struct domain

int sas_eh_abort_handler(struct scsi_cmnd *cmd)
{
- int res;
+ int res = TMF_RESP_FUNC_FAILED;
struct sas_task *task = TO_SAS_TASK(cmd);
struct Scsi_Host *host = cmd->device->host;
+ struct domain_device *dev = cmd_to_domain_dev(cmd);
struct sas_internal *i = to_sas_internal(host->transportt);
+ unsigned long flags;

if (!i->dft->lldd_abort_task)
return FAILED;

- res = i->dft->lldd_abort_task(task);
+ spin_lock_irqsave(host->host_lock, flags);
+ /* We cannot do async aborts for SATA devices */
+ if (dev_is_sata(dev) && !host->host_eh_scheduled) {
+ spin_unlock_irqrestore(host->host_lock, flags);
+ return FAILED;
+ }
+ spin_unlock_irqrestore(host->host_lock, flags);
+
+ if (task)
+ res = i->dft->lldd_abort_task(task);
+ else
+ SAS_DPRINTK("no task to abort\n");
if (res == TMF_RESP_FUNC_SUCC || res == TMF_RESP_FUNC_COMPLETE)
return SUCCESS;




2018-01-22 09:04:04

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.14 68/89] workqueue: avoid hard lockups in show_workqueue_state()

4.14-stable review patch. If anyone has any objections, please let me know.

------------------

From: Sergey Senozhatsky <[email protected]>

commit 62635ea8c18f0f62df4cc58379e4f1d33afd5801 upstream.

show_workqueue_state() can print out a lot of messages while being in
atomic context, e.g. sysrq-t -> show_workqueue_state(). If the console
device is slow it may end up triggering NMI hard lockup watchdog.

Signed-off-by: Sergey Senozhatsky <[email protected]>
Signed-off-by: Tejun Heo <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
kernel/workqueue.c | 13 +++++++++++++
1 file changed, 13 insertions(+)

--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -48,6 +48,7 @@
#include <linux/nodemask.h>
#include <linux/moduleparam.h>
#include <linux/uaccess.h>
+#include <linux/nmi.h>

#include "workqueue_internal.h"

@@ -4479,6 +4480,12 @@ void show_workqueue_state(void)
if (pwq->nr_active || !list_empty(&pwq->delayed_works))
show_pwq(pwq);
spin_unlock_irqrestore(&pwq->pool->lock, flags);
+ /*
+ * We could be printing a lot from atomic context, e.g.
+ * sysrq-t -> show_workqueue_state(). Avoid triggering
+ * hard lockup.
+ */
+ touch_nmi_watchdog();
}
}

@@ -4506,6 +4513,12 @@ void show_workqueue_state(void)
pr_cont("\n");
next_pool:
spin_unlock_irqrestore(&pool->lock, flags);
+ /*
+ * We could be printing a lot from atomic context, e.g.
+ * sysrq-t -> show_workqueue_state(). Avoid triggering
+ * hard lockup.
+ */
+ touch_nmi_watchdog();
}

rcu_read_unlock_sched();



2018-01-22 09:04:31

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.14 65/89] proc: fix coredump vs read /proc/*/stat race

4.14-stable review patch. If anyone has any objections, please let me know.

------------------

From: Alexey Dobriyan <[email protected]>

commit 8bb2ee192e482c5d500df9f2b1b26a560bd3026f upstream.

do_task_stat() accesses IP and SP of a task without bumping reference
count of a stack (which became an entity with independent lifetime at
some point).

Steps to reproduce:

#include <stdio.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <sys/time.h>
#include <sys/resource.h>
#include <unistd.h>
#include <sys/wait.h>

int main(void)
{
setrlimit(RLIMIT_CORE, &(struct rlimit){});

while (1) {
char buf[64];
char buf2[4096];
pid_t pid;
int fd;

pid = fork();
if (pid == 0) {
*(volatile int *)0 = 0;
}

snprintf(buf, sizeof(buf), "/proc/%u/stat", pid);
fd = open(buf, O_RDONLY);
read(fd, buf2, sizeof(buf2));
close(fd);

waitpid(pid, NULL, 0);
}
return 0;
}

BUG: unable to handle kernel paging request at 0000000000003fd8
IP: do_task_stat+0x8b4/0xaf0
PGD 800000003d73e067 P4D 800000003d73e067 PUD 3d558067 PMD 0
Oops: 0000 [#1] PREEMPT SMP PTI
CPU: 0 PID: 1417 Comm: a.out Not tainted 4.15.0-rc8-dirty #2
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1.fc27 04/01/2014
RIP: 0010:do_task_stat+0x8b4/0xaf0
Call Trace:
proc_single_show+0x43/0x70
seq_read+0xe6/0x3b0
__vfs_read+0x1e/0x120
vfs_read+0x84/0x110
SyS_read+0x3d/0xa0
entry_SYSCALL_64_fastpath+0x13/0x6c
RIP: 0033:0x7f4d7928cba0
RSP: 002b:00007ffddb245158 EFLAGS: 00000246
Code: 03 b7 a0 01 00 00 4c 8b 4c 24 70 4c 8b 44 24 78 4c 89 74 24 18 e9 91 f9 ff ff f6 45 4d 02 0f 84 fd f7 ff ff 48 8b 45 40 48 89 ef <48> 8b 80 d8 3f 00 00 48 89 44 24 20 e8 9b 97 eb ff 48 89 44 24
RIP: do_task_stat+0x8b4/0xaf0 RSP: ffffc90000607cc8
CR2: 0000000000003fd8

John Ogness said: for my tests I added an else case to verify that the
race is hit and correctly mitigated.

Link: http://lkml.kernel.org/r/20180116175054.GA11513@avx2
Signed-off-by: Alexey Dobriyan <[email protected]>
Reported-by: "Kohli, Gaurav" <[email protected]>
Tested-by: John Ogness <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Oleg Nesterov <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
fs/proc/array.c | 7 +++++--
1 file changed, 5 insertions(+), 2 deletions(-)

--- a/fs/proc/array.c
+++ b/fs/proc/array.c
@@ -424,8 +424,11 @@ static int do_task_stat(struct seq_file
* safe because the task has stopped executing permanently.
*/
if (permitted && (task->flags & PF_DUMPCORE)) {
- eip = KSTK_EIP(task);
- esp = KSTK_ESP(task);
+ if (try_get_task_stack(task)) {
+ eip = KSTK_EIP(task);
+ esp = KSTK_ESP(task);
+ put_task_stack(task);
+ }
}
}




2018-01-22 09:04:48

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.14 66/89] libata: apply MAX_SEC_1024 to all LITEON EP1 series devices

4.14-stable review patch. If anyone has any objections, please let me know.

------------------

From: Xinyu Lin <[email protected]>

commit db5ff909798ef0099004ad50a0ff5fde92426fd1 upstream.

LITEON EP1 has the same timeout issues as CX1 series devices.

Revert max_sectors to the value of 1024.

Fixes: e0edc8c54646 ("libata: apply MAX_SEC_1024 to all CX1-JB*-HP devices")
Signed-off-by: Xinyu Lin <[email protected]>
Signed-off-by: Tejun Heo <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
drivers/ata/libata-core.c | 1 +
1 file changed, 1 insertion(+)

--- a/drivers/ata/libata-core.c
+++ b/drivers/ata/libata-core.c
@@ -4439,6 +4439,7 @@ static const struct ata_blacklist_entry
* https://bugzilla.kernel.org/show_bug.cgi?id=121671
*/
{ "LITEON CX1-JB*-HP", NULL, ATA_HORKAGE_MAX_SEC_1024 },
+ { "LITEON EP1-*", NULL, ATA_HORKAGE_MAX_SEC_1024 },

/* Devices we expect to fail diagnostics */




2018-01-22 09:05:21

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.14 61/89] can: af_can: can_rcv(): replace WARN_ONCE by pr_warn_once

4.14-stable review patch. If anyone has any objections, please let me know.

------------------

From: Marc Kleine-Budde <[email protected]>

commit 8cb68751c115d176ec851ca56ecfbb411568c9e8 upstream.

If an invalid CAN frame is received, from a driver or from a tun
interface, a Kernel warning is generated.

This patch replaces the WARN_ONCE by a simple pr_warn_once, so that a
kernel, bootet with panic_on_warn, does not panic. A printk seems to be
more appropriate here.

Reported-by: [email protected]
Suggested-by: Dmitry Vyukov <[email protected]>
Acked-by: Oliver Hartkopp <[email protected]>
Signed-off-by: Marc Kleine-Budde <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
net/can/af_can.c | 18 +++++++-----------
1 file changed, 7 insertions(+), 11 deletions(-)

--- a/net/can/af_can.c
+++ b/net/can/af_can.c
@@ -721,20 +721,16 @@ static int can_rcv(struct sk_buff *skb,
{
struct canfd_frame *cfd = (struct canfd_frame *)skb->data;

- if (WARN_ONCE(dev->type != ARPHRD_CAN ||
- skb->len != CAN_MTU ||
- cfd->len > CAN_MAX_DLEN,
- "PF_CAN: dropped non conform CAN skbuf: "
- "dev type %d, len %d, datalen %d\n",
- dev->type, skb->len, cfd->len))
- goto drop;
+ if (unlikely(dev->type != ARPHRD_CAN || skb->len != CAN_MTU ||
+ cfd->len > CAN_MAX_DLEN)) {
+ pr_warn_once("PF_CAN: dropped non conform CAN skbuf: dev type %d, len %d, datalen %d\n",
+ dev->type, skb->len, cfd->len);
+ kfree_skb(skb);
+ return NET_RX_DROP;
+ }

can_receive(skb, dev);
return NET_RX_SUCCESS;
-
-drop:
- kfree_skb(skb);
- return NET_RX_DROP;
}

static int canfd_rcv(struct sk_buff *skb, struct net_device *dev,



2018-01-22 09:05:29

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.14 64/89] scripts/gdb/linux/tasks.py: fix get_thread_info

4.14-stable review patch. If anyone has any objections, please let me know.

------------------

From: Xi Kangjie <[email protected]>

commit 883d50f56d263f70fd73c0d96b09eb36c34e9305 upstream.

Since kernel 4.9, the thread_info has been moved into task_struct, no
longer locates at the bottom of kernel stack.

See commits c65eacbe290b ("sched/core: Allow putting thread_info into
task_struct") and 15f4eae70d36 ("x86: Move thread_info into
task_struct").

Before fix:
(gdb) set $current = $lx_current()
(gdb) p $lx_thread_info($current)
$1 = {flags = 1470918301}
(gdb) p $current.thread_info
$2 = {flags = 2147483648}

After fix:
(gdb) p $lx_thread_info($current)
$1 = {flags = 2147483648}
(gdb) p $current.thread_info
$2 = {flags = 2147483648}

Link: http://lkml.kernel.org/r/[email protected]
Fixes: 15f4eae70d36 ("x86: Move thread_info into task_struct")
Signed-off-by: Xi Kangjie <[email protected]>
Acked-by: Jan Kiszka <[email protected]>
Acked-by: Kieran Bingham <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
scripts/gdb/linux/tasks.py | 2 ++
1 file changed, 2 insertions(+)

--- a/scripts/gdb/linux/tasks.py
+++ b/scripts/gdb/linux/tasks.py
@@ -96,6 +96,8 @@ def get_thread_info(task):
thread_info_addr = task.address + ia64_task_size
thread_info = thread_info_addr.cast(thread_info_ptr_type)
else:
+ if task.type.fields()[0].type == thread_info_type.get_type():
+ return task['thread_info']
thread_info = task['stack'].cast(thread_info_ptr_type)
return thread_info.dereference()




2018-01-22 09:06:01

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.14 62/89] can: af_can: canfd_rcv(): replace WARN_ONCE by pr_warn_once

4.14-stable review patch. If anyone has any objections, please let me know.

------------------

From: Marc Kleine-Budde <[email protected]>

commit d4689846881d160a4d12a514e991a740bcb5d65a upstream.

If an invalid CANFD frame is received, from a driver or from a tun
interface, a Kernel warning is generated.

This patch replaces the WARN_ONCE by a simple pr_warn_once, so that a
kernel, bootet with panic_on_warn, does not panic. A printk seems to be
more appropriate here.

Reported-by: [email protected]
Suggested-by: Dmitry Vyukov <[email protected]>
Acked-by: Oliver Hartkopp <[email protected]>
Signed-off-by: Marc Kleine-Budde <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
net/can/af_can.c | 18 +++++++-----------
1 file changed, 7 insertions(+), 11 deletions(-)

--- a/net/can/af_can.c
+++ b/net/can/af_can.c
@@ -738,20 +738,16 @@ static int canfd_rcv(struct sk_buff *skb
{
struct canfd_frame *cfd = (struct canfd_frame *)skb->data;

- if (WARN_ONCE(dev->type != ARPHRD_CAN ||
- skb->len != CANFD_MTU ||
- cfd->len > CANFD_MAX_DLEN,
- "PF_CAN: dropped non conform CAN FD skbuf: "
- "dev type %d, len %d, datalen %d\n",
- dev->type, skb->len, cfd->len))
- goto drop;
+ if (unlikely(dev->type != ARPHRD_CAN || skb->len != CANFD_MTU ||
+ cfd->len > CANFD_MAX_DLEN)) {
+ pr_warn_once("PF_CAN: dropped non conform CAN FD skbuf: dev type %d, len %d, datalen %d\n",
+ dev->type, skb->len, cfd->len);
+ kfree_skb(skb);
+ return NET_RX_DROP;
+ }

can_receive(skb, dev);
return NET_RX_SUCCESS;
-
-drop:
- kfree_skb(skb);
- return NET_RX_DROP;
}

/*



2018-01-22 09:06:07

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.14 59/89] ARM: dts: kirkwood: fix pin-muxing of MPP7 on OpenBlocks A7

4.14-stable review patch. If anyone has any objections, please let me know.

------------------

From: Thomas Petazzoni <[email protected]>

commit 56aeb07c914a616ab84357d34f8414a69b140cdf upstream.

MPP7 is currently muxed as "gpio", but this function doesn't exist for
MPP7, only "gpo" is available. This causes the following error:

kirkwood-pinctrl f1010000.pin-controller: unsupported function gpio on pin mpp7
pinctrl core: failed to register map default (6): invalid type given
kirkwood-pinctrl f1010000.pin-controller: error claiming hogs: -22
kirkwood-pinctrl f1010000.pin-controller: could not claim hogs: -22
kirkwood-pinctrl f1010000.pin-controller: unable to register pinctrl driver
kirkwood-pinctrl: probe of f1010000.pin-controller failed with error -22

So the pinctrl driver is not probed, all device drivers (including the
UART driver) do a -EPROBE_DEFER, and therefore the system doesn't
really boot (well, it boots, but with no UART, and no devices that
require pin-muxing).

Back when the Device Tree file for this board was introduced, the
definition was already wrong. The pinctrl driver also always described
as "gpo" this function for MPP7. However, between Linux 4.10 and 4.11,
a hog pin failing to be muxed was turned from a simple warning to a
hard error that caused the entire pinctrl driver probe to bail
out. This is probably the result of commit 6118714275f0a ("pinctrl:
core: Fix pinctrl_register_and_init() with pinctrl_enable()").

This commit fixes the Device Tree to use the proper "gpo" function for
MPP7, which fixes the boot of OpenBlocks A7, which was broken since
Linux 4.11.

Fixes: f24b56cbcd9d ("ARM: kirkwood: add support for OpenBlocks A7 platform")
Signed-off-by: Thomas Petazzoni <[email protected]>
Reviewed-by: Andrew Lunn <[email protected]>
Signed-off-by: Gregory CLEMENT <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
arch/arm/boot/dts/kirkwood-openblocks_a7.dts | 10 ++++++++--
1 file changed, 8 insertions(+), 2 deletions(-)

--- a/arch/arm/boot/dts/kirkwood-openblocks_a7.dts
+++ b/arch/arm/boot/dts/kirkwood-openblocks_a7.dts
@@ -53,7 +53,8 @@
};

pinctrl: pin-controller@10000 {
- pinctrl-0 = <&pmx_dip_switches &pmx_gpio_header>;
+ pinctrl-0 = <&pmx_dip_switches &pmx_gpio_header
+ &pmx_gpio_header_gpo>;
pinctrl-names = "default";

pmx_uart0: pmx-uart0 {
@@ -85,11 +86,16 @@
* ground.
*/
pmx_gpio_header: pmx-gpio-header {
- marvell,pins = "mpp17", "mpp7", "mpp29", "mpp28",
+ marvell,pins = "mpp17", "mpp29", "mpp28",
"mpp35", "mpp34", "mpp40";
marvell,function = "gpio";
};

+ pmx_gpio_header_gpo: pxm-gpio-header-gpo {
+ marvell,pins = "mpp7";
+ marvell,function = "gpo";
+ };
+
pmx_gpio_init: pmx-init {
marvell,pins = "mpp38";
marvell,function = "gpio";



2018-01-22 09:06:22

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.14 19/89] ALSA: pcm: Remove yet superfluous WARN_ON()

4.14-stable review patch. If anyone has any objections, please let me know.

------------------

From: Takashi Iwai <[email protected]>

commit 23b19b7b50fe1867da8d431eea9cd3e4b6328c2c upstream.

muldiv32() contains a snd_BUG_ON() (which is morphed as WARN_ON() with
debug option) for checking the case of 0 / 0. This would be helpful
if this happens only as a logical error; however, since the hw refine
is performed with any data set provided by user, the inconsistent
values that can trigger such a condition might be passed easily.
Actually, syzbot caught this by passing some zero'ed old hw_params
ioctl.

So, having snd_BUG_ON() there is simply superfluous and rather
harmful to give unnecessary confusions. Let's get rid of it.

Reported-by: [email protected]
Signed-off-by: Takashi Iwai <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
sound/core/pcm_lib.c | 1 -
1 file changed, 1 deletion(-)

--- a/sound/core/pcm_lib.c
+++ b/sound/core/pcm_lib.c
@@ -560,7 +560,6 @@ static inline unsigned int muldiv32(unsi
{
u_int64_t n = (u_int64_t) a * b;
if (c == 0) {
- snd_BUG_ON(!n);
*r = 0;
return UINT_MAX;
}



2018-01-22 09:06:26

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.14 60/89] can: peak: fix potential bug in packet fragmentation

4.14-stable review patch. If anyone has any objections, please let me know.

------------------

From: Stephane Grosjean <[email protected]>

commit d8a243af1a68395e07ac85384a2740d4134c67f4 upstream.

In some rare conditions when running one PEAK USB-FD interface over
a non high-speed USB controller, one useless USB fragment might be sent.
This patch fixes the way a USB command is fragmented when its length is
greater than 64 bytes and when the underlying USB controller is not a
high-speed one.

Signed-off-by: Stephane Grosjean <[email protected]>
Signed-off-by: Marc Kleine-Budde <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
drivers/net/can/usb/peak_usb/pcan_usb_fd.c | 21 +++++++++++----------
1 file changed, 11 insertions(+), 10 deletions(-)

--- a/drivers/net/can/usb/peak_usb/pcan_usb_fd.c
+++ b/drivers/net/can/usb/peak_usb/pcan_usb_fd.c
@@ -184,7 +184,7 @@ static int pcan_usb_fd_send_cmd(struct p
void *cmd_head = pcan_usb_fd_cmd_buffer(dev);
int err = 0;
u8 *packet_ptr;
- int i, n = 1, packet_len;
+ int packet_len;
ptrdiff_t cmd_len;

/* usb device unregistered? */
@@ -201,17 +201,13 @@ static int pcan_usb_fd_send_cmd(struct p
}

packet_ptr = cmd_head;
+ packet_len = cmd_len;

/* firmware is not able to re-assemble 512 bytes buffer in full-speed */
- if ((dev->udev->speed != USB_SPEED_HIGH) &&
- (cmd_len > PCAN_UFD_LOSPD_PKT_SIZE)) {
- packet_len = PCAN_UFD_LOSPD_PKT_SIZE;
- n += cmd_len / packet_len;
- } else {
- packet_len = cmd_len;
- }
+ if (unlikely(dev->udev->speed != USB_SPEED_HIGH))
+ packet_len = min(packet_len, PCAN_UFD_LOSPD_PKT_SIZE);

- for (i = 0; i < n; i++) {
+ do {
err = usb_bulk_msg(dev->udev,
usb_sndbulkpipe(dev->udev,
PCAN_USBPRO_EP_CMDOUT),
@@ -224,7 +220,12 @@ static int pcan_usb_fd_send_cmd(struct p
}

packet_ptr += packet_len;
- }
+ cmd_len -= packet_len;
+
+ if (cmd_len < PCAN_UFD_LOSPD_PKT_SIZE)
+ packet_len = cmd_len;
+
+ } while (packet_len > 0);

return err;
}



2018-01-22 09:06:38

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.14 50/89] Input: ALPS - fix multi-touch decoding on SS4 plus touchpads

4.14-stable review patch. If anyone has any objections, please let me know.

------------------

From: Nir Perry <[email protected]>

commit 4d94e776bd29670f01befa27e12df784fa05fa2e upstream.

The fix for handling two-finger scroll (i4a646580f793 - "Input: ALPS -
fix two-finger scroll breakage in right side on ALPS touchpad")
introduced a minor "typo" that broke decoding of multi-touch events are
decoded on some ALPS touchpads. For example, tapping with three-fingers
can no longer be used to emulate middle-mouse-button (the kernel doesn't
recognize this as the proper event, and doesn't report it correctly to
userspace). This affects touchpads that use SS4 "plus" protocol
variant, like those found on Dell E7270 & E7470 laptops (tested on
E7270).

First, probably the code in alps_decode_ss4_v2() for case
SS4_PACKET_ID_MULTI used inconsistent indices to "f->mt[]". You can see
0 & 1 are used for the "if" part but 2 & 3 are used for the "else" part.

Second, in the previous patch, new macros were introduced to decode X
coordinates specific to the SS4 "plus" variant, but the macro to
define the maximum X value wasn't changed accordingly. The macros to
decode X values for "plus" variant are effectively shifted right by 1
bit, but the max wasn't shifted too. This causes the driver to
incorrectly handle "no data" cases, which also interfered with how
multi-touch was handled.

Fixes: 4a646580f793 ("Input: ALPS - fix two-finger scroll breakage...")
Signed-off-by: Nir Perry <[email protected]>
Reviewed-by: Masaki Ota <[email protected]>
Signed-off-by: Dmitry Torokhov <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
drivers/input/mouse/alps.c | 23 +++++++++++++----------
drivers/input/mouse/alps.h | 10 ++++++----
2 files changed, 19 insertions(+), 14 deletions(-)

--- a/drivers/input/mouse/alps.c
+++ b/drivers/input/mouse/alps.c
@@ -1250,29 +1250,32 @@ static int alps_decode_ss4_v2(struct alp
case SS4_PACKET_ID_MULTI:
if (priv->flags & ALPS_BUTTONPAD) {
if (IS_SS4PLUS_DEV(priv->dev_id)) {
- f->mt[0].x = SS4_PLUS_BTL_MF_X_V2(p, 0);
- f->mt[1].x = SS4_PLUS_BTL_MF_X_V2(p, 1);
+ f->mt[2].x = SS4_PLUS_BTL_MF_X_V2(p, 0);
+ f->mt[3].x = SS4_PLUS_BTL_MF_X_V2(p, 1);
+ no_data_x = SS4_PLUS_MFPACKET_NO_AX_BL;
} else {
f->mt[2].x = SS4_BTL_MF_X_V2(p, 0);
f->mt[3].x = SS4_BTL_MF_X_V2(p, 1);
+ no_data_x = SS4_MFPACKET_NO_AX_BL;
}
+ no_data_y = SS4_MFPACKET_NO_AY_BL;

f->mt[2].y = SS4_BTL_MF_Y_V2(p, 0);
f->mt[3].y = SS4_BTL_MF_Y_V2(p, 1);
- no_data_x = SS4_MFPACKET_NO_AX_BL;
- no_data_y = SS4_MFPACKET_NO_AY_BL;
} else {
if (IS_SS4PLUS_DEV(priv->dev_id)) {
- f->mt[0].x = SS4_PLUS_STD_MF_X_V2(p, 0);
- f->mt[1].x = SS4_PLUS_STD_MF_X_V2(p, 1);
+ f->mt[2].x = SS4_PLUS_STD_MF_X_V2(p, 0);
+ f->mt[3].x = SS4_PLUS_STD_MF_X_V2(p, 1);
+ no_data_x = SS4_PLUS_MFPACKET_NO_AX;
} else {
- f->mt[0].x = SS4_STD_MF_X_V2(p, 0);
- f->mt[1].x = SS4_STD_MF_X_V2(p, 1);
+ f->mt[2].x = SS4_STD_MF_X_V2(p, 0);
+ f->mt[3].x = SS4_STD_MF_X_V2(p, 1);
+ no_data_x = SS4_MFPACKET_NO_AX;
}
+ no_data_y = SS4_MFPACKET_NO_AY;
+
f->mt[2].y = SS4_STD_MF_Y_V2(p, 0);
f->mt[3].y = SS4_STD_MF_Y_V2(p, 1);
- no_data_x = SS4_MFPACKET_NO_AX;
- no_data_y = SS4_MFPACKET_NO_AY;
}

f->first_mp = 0;
--- a/drivers/input/mouse/alps.h
+++ b/drivers/input/mouse/alps.h
@@ -141,10 +141,12 @@ enum SS4_PACKET_ID {
#define SS4_TS_Z_V2(_b) (s8)(_b[4] & 0x7F)


-#define SS4_MFPACKET_NO_AX 8160 /* X-Coordinate value */
-#define SS4_MFPACKET_NO_AY 4080 /* Y-Coordinate value */
-#define SS4_MFPACKET_NO_AX_BL 8176 /* Buttonless X-Coordinate value */
-#define SS4_MFPACKET_NO_AY_BL 4088 /* Buttonless Y-Coordinate value */
+#define SS4_MFPACKET_NO_AX 8160 /* X-Coordinate value */
+#define SS4_MFPACKET_NO_AY 4080 /* Y-Coordinate value */
+#define SS4_MFPACKET_NO_AX_BL 8176 /* Buttonless X-Coord value */
+#define SS4_MFPACKET_NO_AY_BL 4088 /* Buttonless Y-Coord value */
+#define SS4_PLUS_MFPACKET_NO_AX 4080 /* SS4 PLUS, X */
+#define SS4_PLUS_MFPACKET_NO_AX_BL 4088 /* Buttonless SS4 PLUS, X */

/*
* enum V7_PACKET_ID - defines the packet type for V7



2018-01-22 09:07:03

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.14 28/89] delayacct: Account blkio completion on the correct task

4.14-stable review patch. If anyone has any objections, please let me know.

------------------

From: Josh Snyder <[email protected]>

commit c96f5471ce7d2aefd0dda560cc23f08ab00bc65d upstream.

Before commit:

e33a9bba85a8 ("sched/core: move IO scheduling accounting from io_schedule_timeout() into scheduler")

delayacct_blkio_end() was called after context-switching into the task which
completed I/O.

This resulted in double counting: the task would account a delay both waiting
for I/O and for time spent in the runqueue.

With e33a9bba85a8, delayacct_blkio_end() is called by try_to_wake_up().
In ttwu, we have not yet context-switched. This is more correct, in that
the delay accounting ends when the I/O is complete.

But delayacct_blkio_end() relies on 'get_current()', and we have not yet
context-switched into the task whose I/O completed. This results in the
wrong task having its delay accounting statistics updated.

Instead of doing that, pass the task_struct being woken to delayacct_blkio_end(),
so that it can update the statistics of the correct task.

Signed-off-by: Josh Snyder <[email protected]>
Acked-by: Tejun Heo <[email protected]>
Acked-by: Balbir Singh <[email protected]>
Cc: Brendan Gregg <[email protected]>
Cc: Jens Axboe <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: [email protected]
Fixes: e33a9bba85a8 ("sched/core: move IO scheduling accounting from io_schedule_timeout() into scheduler")
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Ingo Molnar <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
include/linux/delayacct.h | 8 ++++----
kernel/delayacct.c | 42 ++++++++++++++++++++++++++----------------
kernel/sched/core.c | 6 +++---
3 files changed, 33 insertions(+), 23 deletions(-)

--- a/include/linux/delayacct.h
+++ b/include/linux/delayacct.h
@@ -71,7 +71,7 @@ extern void delayacct_init(void);
extern void __delayacct_tsk_init(struct task_struct *);
extern void __delayacct_tsk_exit(struct task_struct *);
extern void __delayacct_blkio_start(void);
-extern void __delayacct_blkio_end(void);
+extern void __delayacct_blkio_end(struct task_struct *);
extern int __delayacct_add_tsk(struct taskstats *, struct task_struct *);
extern __u64 __delayacct_blkio_ticks(struct task_struct *);
extern void __delayacct_freepages_start(void);
@@ -122,10 +122,10 @@ static inline void delayacct_blkio_start
__delayacct_blkio_start();
}

-static inline void delayacct_blkio_end(void)
+static inline void delayacct_blkio_end(struct task_struct *p)
{
if (current->delays)
- __delayacct_blkio_end();
+ __delayacct_blkio_end(p);
delayacct_clear_flag(DELAYACCT_PF_BLKIO);
}

@@ -169,7 +169,7 @@ static inline void delayacct_tsk_free(st
{}
static inline void delayacct_blkio_start(void)
{}
-static inline void delayacct_blkio_end(void)
+static inline void delayacct_blkio_end(struct task_struct *p)
{}
static inline int delayacct_add_tsk(struct taskstats *d,
struct task_struct *tsk)
--- a/kernel/delayacct.c
+++ b/kernel/delayacct.c
@@ -51,16 +51,16 @@ void __delayacct_tsk_init(struct task_st
* Finish delay accounting for a statistic using its timestamps (@start),
* accumalator (@total) and @count
*/
-static void delayacct_end(u64 *start, u64 *total, u32 *count)
+static void delayacct_end(spinlock_t *lock, u64 *start, u64 *total, u32 *count)
{
s64 ns = ktime_get_ns() - *start;
unsigned long flags;

if (ns > 0) {
- spin_lock_irqsave(&current->delays->lock, flags);
+ spin_lock_irqsave(lock, flags);
*total += ns;
(*count)++;
- spin_unlock_irqrestore(&current->delays->lock, flags);
+ spin_unlock_irqrestore(lock, flags);
}
}

@@ -69,17 +69,25 @@ void __delayacct_blkio_start(void)
current->delays->blkio_start = ktime_get_ns();
}

-void __delayacct_blkio_end(void)
+/*
+ * We cannot rely on the `current` macro, as we haven't yet switched back to
+ * the process being woken.
+ */
+void __delayacct_blkio_end(struct task_struct *p)
{
- if (current->delays->flags & DELAYACCT_PF_SWAPIN)
- /* Swapin block I/O */
- delayacct_end(&current->delays->blkio_start,
- &current->delays->swapin_delay,
- &current->delays->swapin_count);
- else /* Other block I/O */
- delayacct_end(&current->delays->blkio_start,
- &current->delays->blkio_delay,
- &current->delays->blkio_count);
+ struct task_delay_info *delays = p->delays;
+ u64 *total;
+ u32 *count;
+
+ if (p->delays->flags & DELAYACCT_PF_SWAPIN) {
+ total = &delays->swapin_delay;
+ count = &delays->swapin_count;
+ } else {
+ total = &delays->blkio_delay;
+ count = &delays->blkio_count;
+ }
+
+ delayacct_end(&delays->lock, &delays->blkio_start, total, count);
}

int __delayacct_add_tsk(struct taskstats *d, struct task_struct *tsk)
@@ -153,8 +161,10 @@ void __delayacct_freepages_start(void)

void __delayacct_freepages_end(void)
{
- delayacct_end(&current->delays->freepages_start,
- &current->delays->freepages_delay,
- &current->delays->freepages_count);
+ delayacct_end(
+ &current->delays->lock,
+ &current->delays->freepages_start,
+ &current->delays->freepages_delay,
+ &current->delays->freepages_count);
}

--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -2046,7 +2046,7 @@ try_to_wake_up(struct task_struct *p, un
p->state = TASK_WAKING;

if (p->in_iowait) {
- delayacct_blkio_end();
+ delayacct_blkio_end(p);
atomic_dec(&task_rq(p)->nr_iowait);
}

@@ -2059,7 +2059,7 @@ try_to_wake_up(struct task_struct *p, un
#else /* CONFIG_SMP */

if (p->in_iowait) {
- delayacct_blkio_end();
+ delayacct_blkio_end(p);
atomic_dec(&task_rq(p)->nr_iowait);
}

@@ -2112,7 +2112,7 @@ static void try_to_wake_up_local(struct

if (!task_on_rq_queued(p)) {
if (p->in_iowait) {
- delayacct_blkio_end();
+ delayacct_blkio_end(p);
atomic_dec(&rq->nr_iowait);
}
ttwu_activate(rq, p, ENQUEUE_WAKEUP | ENQUEUE_NOCLOCK);



2018-01-22 09:08:10

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.14 24/89] timers: Unconditionally check deferrable base

4.14-stable review patch. If anyone has any objections, please let me know.

------------------

From: Thomas Gleixner <[email protected]>

commit ed4bbf7910b28ce3c691aef28d245585eaabda06 upstream.

When the timer base is checked for expired timers then the deferrable base
must be checked as well. This was missed when making the deferrable base
independent of base::nohz_active.

Fixes: ced6d5c11d3e ("timers: Use deferrable base independent of base::nohz_active")
Signed-off-by: Thomas Gleixner <[email protected]>
Cc: Anna-Maria Gleixner <[email protected]>
Cc: Frederic Weisbecker <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Sebastian Siewior <[email protected]>
Cc: Paul McKenney <[email protected]>
Cc: [email protected]
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
kernel/time/timer.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

--- a/kernel/time/timer.c
+++ b/kernel/time/timer.c
@@ -1656,7 +1656,7 @@ void run_local_timers(void)
hrtimer_run_queues();
/* Raise the softirq only if required. */
if (time_before(jiffies, base->clk)) {
- if (!IS_ENABLED(CONFIG_NO_HZ_COMMON) || !base->nohz_active)
+ if (!IS_ENABLED(CONFIG_NO_HZ_COMMON))
return;
/* CPU is awake, so check the deferrable base. */
base++;



2018-01-22 09:08:11

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.14 26/89] af_key: fix buffer overread in parse_exthdrs()

4.14-stable review patch. If anyone has any objections, please let me know.

------------------

From: Eric Biggers <[email protected]>

commit 4e765b4972af7b07adcb1feb16e7a525ce1f6b28 upstream.

If a message sent to a PF_KEY socket ended with an incomplete extension
header (fewer than 4 bytes remaining), then parse_exthdrs() read past
the end of the message, into uninitialized memory. Fix it by returning
-EINVAL in this case.

Reproducer:

#include <linux/pfkeyv2.h>
#include <sys/socket.h>
#include <unistd.h>

int main()
{
int sock = socket(PF_KEY, SOCK_RAW, PF_KEY_V2);
char buf[17] = { 0 };
struct sadb_msg *msg = (void *)buf;

msg->sadb_msg_version = PF_KEY_V2;
msg->sadb_msg_type = SADB_DELETE;
msg->sadb_msg_len = 2;

write(sock, buf, 17);
}

Signed-off-by: Eric Biggers <[email protected]>
Signed-off-by: Steffen Klassert <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
net/key/af_key.c | 3 +++
1 file changed, 3 insertions(+)

--- a/net/key/af_key.c
+++ b/net/key/af_key.c
@@ -516,6 +516,9 @@ static int parse_exthdrs(struct sk_buff
uint16_t ext_type;
int ext_len;

+ if (len < sizeof(*ehdr))
+ return -EINVAL;
+
ext_len = ehdr->sadb_ext_len;
ext_len *= sizeof(uint64_t);
ext_type = ehdr->sadb_ext_type;



2018-01-22 09:08:13

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.14 22/89] IB/hfi1: Prevent a NULL dereference

4.14-stable review patch. If anyone has any objections, please let me know.

------------------

From: Dan Carpenter <[email protected]>

commit 57194fa763bfa1a0908f30d4c77835beaa118fcb upstream.

In the original code, we set "fd->uctxt" to NULL and then dereference it
which will cause an Oops.

Fixes: f2a3bc00a03c ("IB/hfi1: Protect context array set/clear with spinlock")
Signed-off-by: Dan Carpenter <[email protected]>
Reviewed-by: Michael J. Ruhl <[email protected]>
Signed-off-by: Doug Ledford <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
drivers/infiniband/hw/hfi1/file_ops.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)

--- a/drivers/infiniband/hw/hfi1/file_ops.c
+++ b/drivers/infiniband/hw/hfi1/file_ops.c
@@ -881,11 +881,11 @@ static int complete_subctxt(struct hfi1_
}

if (ret) {
- hfi1_rcd_put(fd->uctxt);
- fd->uctxt = NULL;
spin_lock_irqsave(&fd->dd->uctxt_lock, flags);
__clear_bit(fd->subctxt, fd->uctxt->in_use_ctxts);
spin_unlock_irqrestore(&fd->dd->uctxt_lock, flags);
+ hfi1_rcd_put(fd->uctxt);
+ fd->uctxt = NULL;
}

return ret;



2018-01-22 09:08:27

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.14 23/89] RDMA/mlx5: Fix out-of-bound access while querying AH

4.14-stable review patch. If anyone has any objections, please let me know.

------------------

From: Leon Romanovsky <[email protected]>

commit ae59c3f0b6cfd472fed96e50548a799b8971d876 upstream.

The rdma_ah_find_type() accesses the port array based on an index
controlled by userspace. The existing bounds check is after the first use
of the index, so userspace can generate an out of bounds access, as shown
by the KASN report below.

==================================================================
BUG: KASAN: slab-out-of-bounds in to_rdma_ah_attr+0xa8/0x3b0
Read of size 4 at addr ffff880019ae2268 by task ibv_rc_pingpong/409

CPU: 0 PID: 409 Comm: ibv_rc_pingpong Not tainted 4.15.0-rc2-00031-gb60a3faf5b83-dirty #3
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.7.5-0-ge51488c-20140602_164612-nilsson.home.kraxel.org 04/01/2014
Call Trace:
dump_stack+0xe9/0x18f
print_address_description+0xa2/0x350
kasan_report+0x3a5/0x400
to_rdma_ah_attr+0xa8/0x3b0
mlx5_ib_query_qp+0xd35/0x1330
ib_query_qp+0x8a/0xb0
ib_uverbs_query_qp+0x237/0x7f0
ib_uverbs_write+0x617/0xd80
__vfs_write+0xf7/0x500
vfs_write+0x149/0x310
SyS_write+0xca/0x190
entry_SYSCALL_64_fastpath+0x18/0x85
RIP: 0033:0x7fe9c7a275a0
RSP: 002b:00007ffee5498738 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
RAX: ffffffffffffffda RBX: 00007fe9c7ce4b00 RCX: 00007fe9c7a275a0
RDX: 0000000000000018 RSI: 00007ffee5498800 RDI: 0000000000000003
RBP: 000055d0c8d3f010 R08: 00007ffee5498800 R09: 0000000000000018
R10: 00000000000000ba R11: 0000000000000246 R12: 0000000000008000
R13: 0000000000004fb0 R14: 000055d0c8d3f050 R15: 00007ffee5498560

Allocated by task 1:
__kmalloc+0x3f9/0x430
alloc_mad_private+0x25/0x50
ib_mad_post_receive_mads+0x204/0xa60
ib_mad_init_device+0xa59/0x1020
ib_register_device+0x83a/0xbc0
mlx5_ib_add+0x50e/0x5c0
mlx5_add_device+0x142/0x410
mlx5_register_interface+0x18f/0x210
mlx5_ib_init+0x56/0x63
do_one_initcall+0x15b/0x270
kernel_init_freeable+0x2d8/0x3d0
kernel_init+0x14/0x190
ret_from_fork+0x24/0x30

Freed by task 0:
(stack is not available)

The buggy address belongs to the object at ffff880019ae2000
which belongs to the cache kmalloc-512 of size 512
The buggy address is located 104 bytes to the right of
512-byte region [ffff880019ae2000, ffff880019ae2200)
The buggy address belongs to the page:
page:000000005d674e18 count:1 mapcount:0 mapping: (null) index:0x0 compound_mapcount: 0
flags: 0x4000000000008100(slab|head)
raw: 4000000000008100 0000000000000000 0000000000000000 00000001000c000c
raw: dead000000000100 dead000000000200 ffff88001a402000 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
ffff880019ae2100: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
ffff880019ae2180: 00 00 00 00 00 00 00 00 00 00 00 00 00 fc fc fc
>ffff880019ae2200: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
^
ffff880019ae2280: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
ffff880019ae2300: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
==================================================================
Disabling lock debugging due to kernel taint

Fixes: 44c58487d51a ("IB/core: Define 'ib' and 'roce' rdma_ah_attr types")
Signed-off-by: Leon Romanovsky <[email protected]>
Signed-off-by: Jason Gunthorpe <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
drivers/infiniband/hw/mlx5/qp.c | 7 +++----
1 file changed, 3 insertions(+), 4 deletions(-)

--- a/drivers/infiniband/hw/mlx5/qp.c
+++ b/drivers/infiniband/hw/mlx5/qp.c
@@ -4303,12 +4303,11 @@ static void to_rdma_ah_attr(struct mlx5_

memset(ah_attr, 0, sizeof(*ah_attr));

- ah_attr->type = rdma_ah_find_type(&ibdev->ib_dev, path->port);
- rdma_ah_set_port_num(ah_attr, path->port);
- if (rdma_ah_get_port_num(ah_attr) == 0 ||
- rdma_ah_get_port_num(ah_attr) > MLX5_CAP_GEN(dev, num_ports))
+ if (!path->port || path->port > MLX5_CAP_GEN(dev, num_ports))
return;

+ ah_attr->type = rdma_ah_find_type(&ibdev->ib_dev, path->port);
+
rdma_ah_set_port_num(ah_attr, path->port);
rdma_ah_set_sl(ah_attr, path->dci_cfi_prio_sl & 0xf);




2018-01-22 09:09:14

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.14 48/89] ARM: OMAP3: hwmod_data: add missing module_offs for MMC3

4.14-stable review patch. If anyone has any objections, please let me know.

------------------

From: Tero Kristo <[email protected]>

commit 3c4d296e58a23687f2076d8ad531e6ae2b725846 upstream.

MMC3 hwmod data is missing the module_offs definition. MMC3 belongs under
core, so add CORE_MOD for it.

Signed-off-by: Tero Kristo <[email protected]>
Signed-off-by: Tony Lindgren <[email protected]>
Cc: Adam Ford <[email protected]>
Fixes: 6c0afb503937 ("clk: ti: convert to use proper register definition for all accesses")
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
arch/arm/mach-omap2/omap_hwmod_3xxx_data.c | 1 +
1 file changed, 1 insertion(+)

--- a/arch/arm/mach-omap2/omap_hwmod_3xxx_data.c
+++ b/arch/arm/mach-omap2/omap_hwmod_3xxx_data.c
@@ -1656,6 +1656,7 @@ static struct omap_hwmod omap3xxx_mmc3_h
.main_clk = "mmchs3_fck",
.prcm = {
.omap2 = {
+ .module_offs = CORE_MOD,
.prcm_reg_id = 1,
.module_bit = OMAP3430_EN_MMC3_SHIFT,
.idlest_reg_id = 1,



2018-01-22 09:09:15

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.14 47/89] x86/mm: Prepare sme_encrypt_kernel() for PAGE aligned encryption

4.14-stable review patch. If anyone has any objections, please let me know.

------------------

From: Tom Lendacky <[email protected]>

commit cc5f01e28d6c60f274fd1e33b245f679f79f543c upstream.

In preparation for encrypting more than just the kernel, the encryption
support in sme_encrypt_kernel() needs to support 4KB page aligned
encryption instead of just 2MB large page aligned encryption.

Update the routines that populate the PGD to support non-2MB aligned
addresses. This is done by creating PTE page tables for the start
and end portion of the address range that fall outside of the 2MB
alignment. This results in, at most, two extra pages to hold the
PTE entries for each mapping of a range.

Tested-by: Gabriel Craciunescu <[email protected]>
Signed-off-by: Tom Lendacky <[email protected]>
Reviewed-by: Borislav Petkov <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Brijesh Singh <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Ingo Molnar <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
arch/x86/mm/mem_encrypt.c | 123 +++++++++++++++++++++++++++++++++++------
arch/x86/mm/mem_encrypt_boot.S | 20 ++++--
2 files changed, 121 insertions(+), 22 deletions(-)

--- a/arch/x86/mm/mem_encrypt.c
+++ b/arch/x86/mm/mem_encrypt.c
@@ -218,6 +218,7 @@ struct sme_populate_pgd_data {
pgd_t *pgd;

pmdval_t pmd_flags;
+ pteval_t pte_flags;
unsigned long paddr;

unsigned long vaddr;
@@ -242,6 +243,7 @@ static void __init sme_clear_pgd(struct
#define PGD_FLAGS _KERNPG_TABLE_NOENC
#define P4D_FLAGS _KERNPG_TABLE_NOENC
#define PUD_FLAGS _KERNPG_TABLE_NOENC
+#define PMD_FLAGS _KERNPG_TABLE_NOENC

#define PMD_FLAGS_LARGE (__PAGE_KERNEL_LARGE_EXEC & ~_PAGE_GLOBAL)

@@ -251,7 +253,15 @@ static void __init sme_clear_pgd(struct

#define PMD_FLAGS_ENC (PMD_FLAGS_LARGE | _PAGE_ENC)

-static void __init sme_populate_pgd_large(struct sme_populate_pgd_data *ppd)
+#define PTE_FLAGS (__PAGE_KERNEL_EXEC & ~_PAGE_GLOBAL)
+
+#define PTE_FLAGS_DEC PTE_FLAGS
+#define PTE_FLAGS_DEC_WP ((PTE_FLAGS_DEC & ~_PAGE_CACHE_MASK) | \
+ (_PAGE_PAT | _PAGE_PWT))
+
+#define PTE_FLAGS_ENC (PTE_FLAGS | _PAGE_ENC)
+
+static pmd_t __init *sme_prepare_pgd(struct sme_populate_pgd_data *ppd)
{
pgd_t *pgd_p;
p4d_t *p4d_p;
@@ -302,7 +312,7 @@ static void __init sme_populate_pgd_larg
pud_p += pud_index(ppd->vaddr);
if (native_pud_val(*pud_p)) {
if (native_pud_val(*pud_p) & _PAGE_PSE)
- return;
+ return NULL;

pmd_p = (pmd_t *)(native_pud_val(*pud_p) & ~PTE_FLAGS_MASK);
} else {
@@ -316,16 +326,55 @@ static void __init sme_populate_pgd_larg
native_set_pud(pud_p, pud);
}

+ return pmd_p;
+}
+
+static void __init sme_populate_pgd_large(struct sme_populate_pgd_data *ppd)
+{
+ pmd_t *pmd_p;
+
+ pmd_p = sme_prepare_pgd(ppd);
+ if (!pmd_p)
+ return;
+
pmd_p += pmd_index(ppd->vaddr);
if (!native_pmd_val(*pmd_p) || !(native_pmd_val(*pmd_p) & _PAGE_PSE))
native_set_pmd(pmd_p, native_make_pmd(ppd->paddr | ppd->pmd_flags));
}

-static void __init __sme_map_range(struct sme_populate_pgd_data *ppd,
- pmdval_t pmd_flags)
+static void __init sme_populate_pgd(struct sme_populate_pgd_data *ppd)
{
- ppd->pmd_flags = pmd_flags;
+ pmd_t *pmd_p;
+ pte_t *pte_p;

+ pmd_p = sme_prepare_pgd(ppd);
+ if (!pmd_p)
+ return;
+
+ pmd_p += pmd_index(ppd->vaddr);
+ if (native_pmd_val(*pmd_p)) {
+ if (native_pmd_val(*pmd_p) & _PAGE_PSE)
+ return;
+
+ pte_p = (pte_t *)(native_pmd_val(*pmd_p) & ~PTE_FLAGS_MASK);
+ } else {
+ pmd_t pmd;
+
+ pte_p = ppd->pgtable_area;
+ memset(pte_p, 0, sizeof(*pte_p) * PTRS_PER_PTE);
+ ppd->pgtable_area += sizeof(*pte_p) * PTRS_PER_PTE;
+
+ pmd = native_make_pmd((pteval_t)pte_p + PMD_FLAGS);
+ native_set_pmd(pmd_p, pmd);
+ }
+
+ pte_p += pte_index(ppd->vaddr);
+ if (!native_pte_val(*pte_p))
+ native_set_pte(pte_p, native_make_pte(ppd->paddr | ppd->pte_flags));
+}
+
+static void __init __sme_map_range_pmd(struct sme_populate_pgd_data *ppd)
+{
while (ppd->vaddr < ppd->vaddr_end) {
sme_populate_pgd_large(ppd);

@@ -334,33 +383,71 @@ static void __init __sme_map_range(struc
}
}

+static void __init __sme_map_range_pte(struct sme_populate_pgd_data *ppd)
+{
+ while (ppd->vaddr < ppd->vaddr_end) {
+ sme_populate_pgd(ppd);
+
+ ppd->vaddr += PAGE_SIZE;
+ ppd->paddr += PAGE_SIZE;
+ }
+}
+
+static void __init __sme_map_range(struct sme_populate_pgd_data *ppd,
+ pmdval_t pmd_flags, pteval_t pte_flags)
+{
+ unsigned long vaddr_end;
+
+ ppd->pmd_flags = pmd_flags;
+ ppd->pte_flags = pte_flags;
+
+ /* Save original end value since we modify the struct value */
+ vaddr_end = ppd->vaddr_end;
+
+ /* If start is not 2MB aligned, create PTE entries */
+ ppd->vaddr_end = ALIGN(ppd->vaddr, PMD_PAGE_SIZE);
+ __sme_map_range_pte(ppd);
+
+ /* Create PMD entries */
+ ppd->vaddr_end = vaddr_end & PMD_PAGE_MASK;
+ __sme_map_range_pmd(ppd);
+
+ /* If end is not 2MB aligned, create PTE entries */
+ ppd->vaddr_end = vaddr_end;
+ __sme_map_range_pte(ppd);
+}
+
static void __init sme_map_range_encrypted(struct sme_populate_pgd_data *ppd)
{
- __sme_map_range(ppd, PMD_FLAGS_ENC);
+ __sme_map_range(ppd, PMD_FLAGS_ENC, PTE_FLAGS_ENC);
}

static void __init sme_map_range_decrypted(struct sme_populate_pgd_data *ppd)
{
- __sme_map_range(ppd, PMD_FLAGS_DEC);
+ __sme_map_range(ppd, PMD_FLAGS_DEC, PTE_FLAGS_DEC);
}

static void __init sme_map_range_decrypted_wp(struct sme_populate_pgd_data *ppd)
{
- __sme_map_range(ppd, PMD_FLAGS_DEC_WP);
+ __sme_map_range(ppd, PMD_FLAGS_DEC_WP, PTE_FLAGS_DEC_WP);
}

static unsigned long __init sme_pgtable_calc(unsigned long len)
{
- unsigned long p4d_size, pud_size, pmd_size;
+ unsigned long p4d_size, pud_size, pmd_size, pte_size;
unsigned long total;

/*
* Perform a relatively simplistic calculation of the pagetable
- * entries that are needed. That mappings will be covered by 2MB
- * PMD entries so we can conservatively calculate the required
+ * entries that are needed. Those mappings will be covered mostly
+ * by 2MB PMD entries so we can conservatively calculate the required
* number of P4D, PUD and PMD structures needed to perform the
- * mappings. Incrementing the count for each covers the case where
- * the addresses cross entries.
+ * mappings. For mappings that are not 2MB aligned, PTE mappings
+ * would be needed for the start and end portion of the address range
+ * that fall outside of the 2MB alignment. This results in, at most,
+ * two extra pages to hold PTE entries for each range that is mapped.
+ * Incrementing the count for each covers the case where the addresses
+ * cross entries.
*/
if (IS_ENABLED(CONFIG_X86_5LEVEL)) {
p4d_size = (ALIGN(len, PGDIR_SIZE) / PGDIR_SIZE) + 1;
@@ -374,8 +461,9 @@ static unsigned long __init sme_pgtable_
}
pmd_size = (ALIGN(len, PUD_SIZE) / PUD_SIZE) + 1;
pmd_size *= sizeof(pmd_t) * PTRS_PER_PMD;
+ pte_size = 2 * sizeof(pte_t) * PTRS_PER_PTE;

- total = p4d_size + pud_size + pmd_size;
+ total = p4d_size + pud_size + pmd_size + pte_size;

/*
* Now calculate the added pagetable structures needed to populate
@@ -458,10 +546,13 @@ void __init sme_encrypt_kernel(void)

/*
* The total workarea includes the executable encryption area and
- * the pagetable area.
+ * the pagetable area. The start of the workarea is already 2MB
+ * aligned, align the end of the workarea on a 2MB boundary so that
+ * we don't try to create/allocate PTE entries from the workarea
+ * before it is mapped.
*/
workarea_len = execute_len + pgtable_area_len;
- workarea_end = workarea_start + workarea_len;
+ workarea_end = ALIGN(workarea_start + workarea_len, PMD_PAGE_SIZE);

/*
* Set the address to the start of where newly created pagetable
--- a/arch/x86/mm/mem_encrypt_boot.S
+++ b/arch/x86/mm/mem_encrypt_boot.S
@@ -104,6 +104,7 @@ ENTRY(__enc_copy)
mov %rdx, %cr4

push %r15
+ push %r12

movq %rcx, %r9 /* Save kernel length */
movq %rdi, %r10 /* Save encrypted kernel address */
@@ -119,21 +120,27 @@ ENTRY(__enc_copy)

wbinvd /* Invalidate any cache entries */

- /* Copy/encrypt 2MB at a time */
+ /* Copy/encrypt up to 2MB at a time */
+ movq $PMD_PAGE_SIZE, %r12
1:
+ cmpq %r12, %r9
+ jnb 2f
+ movq %r9, %r12
+
+2:
movq %r11, %rsi /* Source - decrypted kernel */
movq %r8, %rdi /* Dest - intermediate copy buffer */
- movq $PMD_PAGE_SIZE, %rcx /* 2MB length */
+ movq %r12, %rcx
rep movsb

movq %r8, %rsi /* Source - intermediate copy buffer */
movq %r10, %rdi /* Dest - encrypted kernel */
- movq $PMD_PAGE_SIZE, %rcx /* 2MB length */
+ movq %r12, %rcx
rep movsb

- addq $PMD_PAGE_SIZE, %r11
- addq $PMD_PAGE_SIZE, %r10
- subq $PMD_PAGE_SIZE, %r9 /* Kernel length decrement */
+ addq %r12, %r11
+ addq %r12, %r10
+ subq %r12, %r9 /* Kernel length decrement */
jnz 1b /* Kernel length not zero? */

/* Restore PAT register */
@@ -142,6 +149,7 @@ ENTRY(__enc_copy)
mov %r15, %rdx /* Restore original PAT value */
wrmsr

+ pop %r12
pop %r15

ret



2018-01-22 09:09:47

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.14 45/89] x86/mm: Use a struct to reduce parameters for SME PGD mapping

4.14-stable review patch. If anyone has any objections, please let me know.

------------------

From: Tom Lendacky <[email protected]>

commit bacf6b499e11760aef73a3bb5ce4e5eea74a3fd4 upstream.

In preparation for follow-on patches, combine the PGD mapping parameters
into a struct to reduce the number of function arguments and allow for
direct updating of the next pagetable mapping area pointer.

Tested-by: Gabriel Craciunescu <[email protected]>
Signed-off-by: Tom Lendacky <[email protected]>
Reviewed-by: Borislav Petkov <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Brijesh Singh <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Ingo Molnar <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
arch/x86/mm/mem_encrypt.c | 92 +++++++++++++++++++++++-----------------------
1 file changed, 47 insertions(+), 45 deletions(-)

--- a/arch/x86/mm/mem_encrypt.c
+++ b/arch/x86/mm/mem_encrypt.c
@@ -213,6 +213,14 @@ void swiotlb_set_mem_attributes(void *va
set_memory_decrypted((unsigned long)vaddr, size >> PAGE_SHIFT);
}

+struct sme_populate_pgd_data {
+ void *pgtable_area;
+ pgd_t *pgd;
+
+ pmdval_t pmd_val;
+ unsigned long vaddr;
+};
+
static void __init sme_clear_pgd(pgd_t *pgd_base, unsigned long start,
unsigned long end)
{
@@ -235,15 +243,14 @@ static void __init sme_clear_pgd(pgd_t *
#define PUD_FLAGS _KERNPG_TABLE_NOENC
#define PMD_FLAGS (__PAGE_KERNEL_LARGE_EXEC & ~_PAGE_GLOBAL)

-static void __init *sme_populate_pgd(pgd_t *pgd_base, void *pgtable_area,
- unsigned long vaddr, pmdval_t pmd_val)
+static void __init sme_populate_pgd_large(struct sme_populate_pgd_data *ppd)
{
pgd_t *pgd_p;
p4d_t *p4d_p;
pud_t *pud_p;
pmd_t *pmd_p;

- pgd_p = pgd_base + pgd_index(vaddr);
+ pgd_p = ppd->pgd + pgd_index(ppd->vaddr);
if (native_pgd_val(*pgd_p)) {
if (IS_ENABLED(CONFIG_X86_5LEVEL))
p4d_p = (p4d_t *)(native_pgd_val(*pgd_p) & ~PTE_FLAGS_MASK);
@@ -253,15 +260,15 @@ static void __init *sme_populate_pgd(pgd
pgd_t pgd;

if (IS_ENABLED(CONFIG_X86_5LEVEL)) {
- p4d_p = pgtable_area;
+ p4d_p = ppd->pgtable_area;
memset(p4d_p, 0, sizeof(*p4d_p) * PTRS_PER_P4D);
- pgtable_area += sizeof(*p4d_p) * PTRS_PER_P4D;
+ ppd->pgtable_area += sizeof(*p4d_p) * PTRS_PER_P4D;

pgd = native_make_pgd((pgdval_t)p4d_p + PGD_FLAGS);
} else {
- pud_p = pgtable_area;
+ pud_p = ppd->pgtable_area;
memset(pud_p, 0, sizeof(*pud_p) * PTRS_PER_PUD);
- pgtable_area += sizeof(*pud_p) * PTRS_PER_PUD;
+ ppd->pgtable_area += sizeof(*pud_p) * PTRS_PER_PUD;

pgd = native_make_pgd((pgdval_t)pud_p + PGD_FLAGS);
}
@@ -269,44 +276,41 @@ static void __init *sme_populate_pgd(pgd
}

if (IS_ENABLED(CONFIG_X86_5LEVEL)) {
- p4d_p += p4d_index(vaddr);
+ p4d_p += p4d_index(ppd->vaddr);
if (native_p4d_val(*p4d_p)) {
pud_p = (pud_t *)(native_p4d_val(*p4d_p) & ~PTE_FLAGS_MASK);
} else {
p4d_t p4d;

- pud_p = pgtable_area;
+ pud_p = ppd->pgtable_area;
memset(pud_p, 0, sizeof(*pud_p) * PTRS_PER_PUD);
- pgtable_area += sizeof(*pud_p) * PTRS_PER_PUD;
+ ppd->pgtable_area += sizeof(*pud_p) * PTRS_PER_PUD;

p4d = native_make_p4d((pudval_t)pud_p + P4D_FLAGS);
native_set_p4d(p4d_p, p4d);
}
}

- pud_p += pud_index(vaddr);
+ pud_p += pud_index(ppd->vaddr);
if (native_pud_val(*pud_p)) {
if (native_pud_val(*pud_p) & _PAGE_PSE)
- goto out;
+ return;

pmd_p = (pmd_t *)(native_pud_val(*pud_p) & ~PTE_FLAGS_MASK);
} else {
pud_t pud;

- pmd_p = pgtable_area;
+ pmd_p = ppd->pgtable_area;
memset(pmd_p, 0, sizeof(*pmd_p) * PTRS_PER_PMD);
- pgtable_area += sizeof(*pmd_p) * PTRS_PER_PMD;
+ ppd->pgtable_area += sizeof(*pmd_p) * PTRS_PER_PMD;

pud = native_make_pud((pmdval_t)pmd_p + PUD_FLAGS);
native_set_pud(pud_p, pud);
}

- pmd_p += pmd_index(vaddr);
+ pmd_p += pmd_index(ppd->vaddr);
if (!native_pmd_val(*pmd_p) || !(native_pmd_val(*pmd_p) & _PAGE_PSE))
- native_set_pmd(pmd_p, native_make_pmd(pmd_val));
-
-out:
- return pgtable_area;
+ native_set_pmd(pmd_p, native_make_pmd(ppd->pmd_val));
}

static unsigned long __init sme_pgtable_calc(unsigned long len)
@@ -364,11 +368,10 @@ void __init sme_encrypt_kernel(void)
unsigned long workarea_start, workarea_end, workarea_len;
unsigned long execute_start, execute_end, execute_len;
unsigned long kernel_start, kernel_end, kernel_len;
+ struct sme_populate_pgd_data ppd;
unsigned long pgtable_area_len;
unsigned long paddr, pmd_flags;
unsigned long decrypted_base;
- void *pgtable_area;
- pgd_t *pgd;

if (!sme_active())
return;
@@ -432,18 +435,18 @@ void __init sme_encrypt_kernel(void)
* pagetables and when the new encrypted and decrypted kernel
* mappings are populated.
*/
- pgtable_area = (void *)execute_end;
+ ppd.pgtable_area = (void *)execute_end;

/*
* Make sure the current pagetable structure has entries for
* addressing the workarea.
*/
- pgd = (pgd_t *)native_read_cr3_pa();
+ ppd.pgd = (pgd_t *)native_read_cr3_pa();
paddr = workarea_start;
while (paddr < workarea_end) {
- pgtable_area = sme_populate_pgd(pgd, pgtable_area,
- paddr,
- paddr + PMD_FLAGS);
+ ppd.pmd_val = paddr + PMD_FLAGS;
+ ppd.vaddr = paddr;
+ sme_populate_pgd_large(&ppd);

paddr += PMD_PAGE_SIZE;
}
@@ -457,17 +460,17 @@ void __init sme_encrypt_kernel(void)
* populated with new PUDs and PMDs as the encrypted and decrypted
* kernel mappings are created.
*/
- pgd = pgtable_area;
- memset(pgd, 0, sizeof(*pgd) * PTRS_PER_PGD);
- pgtable_area += sizeof(*pgd) * PTRS_PER_PGD;
+ ppd.pgd = ppd.pgtable_area;
+ memset(ppd.pgd, 0, sizeof(pgd_t) * PTRS_PER_PGD);
+ ppd.pgtable_area += sizeof(pgd_t) * PTRS_PER_PGD;

/* Add encrypted kernel (identity) mappings */
pmd_flags = PMD_FLAGS | _PAGE_ENC;
paddr = kernel_start;
while (paddr < kernel_end) {
- pgtable_area = sme_populate_pgd(pgd, pgtable_area,
- paddr,
- paddr + pmd_flags);
+ ppd.pmd_val = paddr + pmd_flags;
+ ppd.vaddr = paddr;
+ sme_populate_pgd_large(&ppd);

paddr += PMD_PAGE_SIZE;
}
@@ -485,9 +488,9 @@ void __init sme_encrypt_kernel(void)
pmd_flags = (PMD_FLAGS & ~_PAGE_CACHE_MASK) | (_PAGE_PAT | _PAGE_PWT);
paddr = kernel_start;
while (paddr < kernel_end) {
- pgtable_area = sme_populate_pgd(pgd, pgtable_area,
- paddr + decrypted_base,
- paddr + pmd_flags);
+ ppd.pmd_val = paddr + pmd_flags;
+ ppd.vaddr = paddr + decrypted_base;
+ sme_populate_pgd_large(&ppd);

paddr += PMD_PAGE_SIZE;
}
@@ -495,30 +498,29 @@ void __init sme_encrypt_kernel(void)
/* Add decrypted workarea mappings to both kernel mappings */
paddr = workarea_start;
while (paddr < workarea_end) {
- pgtable_area = sme_populate_pgd(pgd, pgtable_area,
- paddr,
- paddr + PMD_FLAGS);
-
- pgtable_area = sme_populate_pgd(pgd, pgtable_area,
- paddr + decrypted_base,
- paddr + PMD_FLAGS);
+ ppd.pmd_val = paddr + PMD_FLAGS;
+ ppd.vaddr = paddr;
+ sme_populate_pgd_large(&ppd);
+
+ ppd.vaddr = paddr + decrypted_base;
+ sme_populate_pgd_large(&ppd);

paddr += PMD_PAGE_SIZE;
}

/* Perform the encryption */
sme_encrypt_execute(kernel_start, kernel_start + decrypted_base,
- kernel_len, workarea_start, (unsigned long)pgd);
+ kernel_len, workarea_start, (unsigned long)ppd.pgd);

/*
* At this point we are running encrypted. Remove the mappings for
* the decrypted areas - all that is needed for this is to remove
* the PGD entry/entries.
*/
- sme_clear_pgd(pgd, kernel_start + decrypted_base,
+ sme_clear_pgd(ppd.pgd, kernel_start + decrypted_base,
kernel_end + decrypted_base);

- sme_clear_pgd(pgd, workarea_start + decrypted_base,
+ sme_clear_pgd(ppd.pgd, workarea_start + decrypted_base,
workarea_end + decrypted_base);

/* Flush the TLB - no globals so cr3 is enough */



2018-01-22 09:10:03

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.14 46/89] x86/mm: Centralize PMD flags in sme_encrypt_kernel()

4.14-stable review patch. If anyone has any objections, please let me know.

------------------

From: Tom Lendacky <[email protected]>

commit 2b5d00b6c2cdd94f6d6a494a6f6c0c0fc7b8e711 upstream.

In preparation for encrypting more than just the kernel during early
boot processing, centralize the use of the PMD flag settings based
on the type of mapping desired. When 4KB aligned encryption is added,
this will allow either PTE flags or large page PMD flags to be used
without requiring the caller to adjust.

Tested-by: Gabriel Craciunescu <[email protected]>
Signed-off-by: Tom Lendacky <[email protected]>
Reviewed-by: Borislav Petkov <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Brijesh Singh <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Ingo Molnar <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
arch/x86/mm/mem_encrypt.c | 137 ++++++++++++++++++++++++++--------------------
1 file changed, 79 insertions(+), 58 deletions(-)

--- a/arch/x86/mm/mem_encrypt.c
+++ b/arch/x86/mm/mem_encrypt.c
@@ -217,31 +217,39 @@ struct sme_populate_pgd_data {
void *pgtable_area;
pgd_t *pgd;

- pmdval_t pmd_val;
+ pmdval_t pmd_flags;
+ unsigned long paddr;
+
unsigned long vaddr;
+ unsigned long vaddr_end;
};

-static void __init sme_clear_pgd(pgd_t *pgd_base, unsigned long start,
- unsigned long end)
+static void __init sme_clear_pgd(struct sme_populate_pgd_data *ppd)
{
unsigned long pgd_start, pgd_end, pgd_size;
pgd_t *pgd_p;

- pgd_start = start & PGDIR_MASK;
- pgd_end = end & PGDIR_MASK;
+ pgd_start = ppd->vaddr & PGDIR_MASK;
+ pgd_end = ppd->vaddr_end & PGDIR_MASK;

- pgd_size = (((pgd_end - pgd_start) / PGDIR_SIZE) + 1);
- pgd_size *= sizeof(pgd_t);
+ pgd_size = (((pgd_end - pgd_start) / PGDIR_SIZE) + 1) * sizeof(pgd_t);

- pgd_p = pgd_base + pgd_index(start);
+ pgd_p = ppd->pgd + pgd_index(ppd->vaddr);

memset(pgd_p, 0, pgd_size);
}

-#define PGD_FLAGS _KERNPG_TABLE_NOENC
-#define P4D_FLAGS _KERNPG_TABLE_NOENC
-#define PUD_FLAGS _KERNPG_TABLE_NOENC
-#define PMD_FLAGS (__PAGE_KERNEL_LARGE_EXEC & ~_PAGE_GLOBAL)
+#define PGD_FLAGS _KERNPG_TABLE_NOENC
+#define P4D_FLAGS _KERNPG_TABLE_NOENC
+#define PUD_FLAGS _KERNPG_TABLE_NOENC
+
+#define PMD_FLAGS_LARGE (__PAGE_KERNEL_LARGE_EXEC & ~_PAGE_GLOBAL)
+
+#define PMD_FLAGS_DEC PMD_FLAGS_LARGE
+#define PMD_FLAGS_DEC_WP ((PMD_FLAGS_DEC & ~_PAGE_CACHE_MASK) | \
+ (_PAGE_PAT | _PAGE_PWT))
+
+#define PMD_FLAGS_ENC (PMD_FLAGS_LARGE | _PAGE_ENC)

static void __init sme_populate_pgd_large(struct sme_populate_pgd_data *ppd)
{
@@ -310,7 +318,35 @@ static void __init sme_populate_pgd_larg

pmd_p += pmd_index(ppd->vaddr);
if (!native_pmd_val(*pmd_p) || !(native_pmd_val(*pmd_p) & _PAGE_PSE))
- native_set_pmd(pmd_p, native_make_pmd(ppd->pmd_val));
+ native_set_pmd(pmd_p, native_make_pmd(ppd->paddr | ppd->pmd_flags));
+}
+
+static void __init __sme_map_range(struct sme_populate_pgd_data *ppd,
+ pmdval_t pmd_flags)
+{
+ ppd->pmd_flags = pmd_flags;
+
+ while (ppd->vaddr < ppd->vaddr_end) {
+ sme_populate_pgd_large(ppd);
+
+ ppd->vaddr += PMD_PAGE_SIZE;
+ ppd->paddr += PMD_PAGE_SIZE;
+ }
+}
+
+static void __init sme_map_range_encrypted(struct sme_populate_pgd_data *ppd)
+{
+ __sme_map_range(ppd, PMD_FLAGS_ENC);
+}
+
+static void __init sme_map_range_decrypted(struct sme_populate_pgd_data *ppd)
+{
+ __sme_map_range(ppd, PMD_FLAGS_DEC);
+}
+
+static void __init sme_map_range_decrypted_wp(struct sme_populate_pgd_data *ppd)
+{
+ __sme_map_range(ppd, PMD_FLAGS_DEC_WP);
}

static unsigned long __init sme_pgtable_calc(unsigned long len)
@@ -370,7 +406,6 @@ void __init sme_encrypt_kernel(void)
unsigned long kernel_start, kernel_end, kernel_len;
struct sme_populate_pgd_data ppd;
unsigned long pgtable_area_len;
- unsigned long paddr, pmd_flags;
unsigned long decrypted_base;

if (!sme_active())
@@ -442,14 +477,10 @@ void __init sme_encrypt_kernel(void)
* addressing the workarea.
*/
ppd.pgd = (pgd_t *)native_read_cr3_pa();
- paddr = workarea_start;
- while (paddr < workarea_end) {
- ppd.pmd_val = paddr + PMD_FLAGS;
- ppd.vaddr = paddr;
- sme_populate_pgd_large(&ppd);
-
- paddr += PMD_PAGE_SIZE;
- }
+ ppd.paddr = workarea_start;
+ ppd.vaddr = workarea_start;
+ ppd.vaddr_end = workarea_end;
+ sme_map_range_decrypted(&ppd);

/* Flush the TLB - no globals so cr3 is enough */
native_write_cr3(__native_read_cr3());
@@ -464,17 +495,6 @@ void __init sme_encrypt_kernel(void)
memset(ppd.pgd, 0, sizeof(pgd_t) * PTRS_PER_PGD);
ppd.pgtable_area += sizeof(pgd_t) * PTRS_PER_PGD;

- /* Add encrypted kernel (identity) mappings */
- pmd_flags = PMD_FLAGS | _PAGE_ENC;
- paddr = kernel_start;
- while (paddr < kernel_end) {
- ppd.pmd_val = paddr + pmd_flags;
- ppd.vaddr = paddr;
- sme_populate_pgd_large(&ppd);
-
- paddr += PMD_PAGE_SIZE;
- }
-
/*
* A different PGD index/entry must be used to get different
* pagetable entries for the decrypted mapping. Choose the next
@@ -484,29 +504,28 @@ void __init sme_encrypt_kernel(void)
decrypted_base = (pgd_index(workarea_end) + 1) & (PTRS_PER_PGD - 1);
decrypted_base <<= PGDIR_SHIFT;

- /* Add decrypted, write-protected kernel (non-identity) mappings */
- pmd_flags = (PMD_FLAGS & ~_PAGE_CACHE_MASK) | (_PAGE_PAT | _PAGE_PWT);
- paddr = kernel_start;
- while (paddr < kernel_end) {
- ppd.pmd_val = paddr + pmd_flags;
- ppd.vaddr = paddr + decrypted_base;
- sme_populate_pgd_large(&ppd);
+ /* Add encrypted kernel (identity) mappings */
+ ppd.paddr = kernel_start;
+ ppd.vaddr = kernel_start;
+ ppd.vaddr_end = kernel_end;
+ sme_map_range_encrypted(&ppd);

- paddr += PMD_PAGE_SIZE;
- }
+ /* Add decrypted, write-protected kernel (non-identity) mappings */
+ ppd.paddr = kernel_start;
+ ppd.vaddr = kernel_start + decrypted_base;
+ ppd.vaddr_end = kernel_end + decrypted_base;
+ sme_map_range_decrypted_wp(&ppd);

/* Add decrypted workarea mappings to both kernel mappings */
- paddr = workarea_start;
- while (paddr < workarea_end) {
- ppd.pmd_val = paddr + PMD_FLAGS;
- ppd.vaddr = paddr;
- sme_populate_pgd_large(&ppd);
-
- ppd.vaddr = paddr + decrypted_base;
- sme_populate_pgd_large(&ppd);
-
- paddr += PMD_PAGE_SIZE;
- }
+ ppd.paddr = workarea_start;
+ ppd.vaddr = workarea_start;
+ ppd.vaddr_end = workarea_end;
+ sme_map_range_decrypted(&ppd);
+
+ ppd.paddr = workarea_start;
+ ppd.vaddr = workarea_start + decrypted_base;
+ ppd.vaddr_end = workarea_end + decrypted_base;
+ sme_map_range_decrypted(&ppd);

/* Perform the encryption */
sme_encrypt_execute(kernel_start, kernel_start + decrypted_base,
@@ -517,11 +536,13 @@ void __init sme_encrypt_kernel(void)
* the decrypted areas - all that is needed for this is to remove
* the PGD entry/entries.
*/
- sme_clear_pgd(ppd.pgd, kernel_start + decrypted_base,
- kernel_end + decrypted_base);
-
- sme_clear_pgd(ppd.pgd, workarea_start + decrypted_base,
- workarea_end + decrypted_base);
+ ppd.vaddr = kernel_start + decrypted_base;
+ ppd.vaddr_end = kernel_end + decrypted_base;
+ sme_clear_pgd(&ppd);
+
+ ppd.vaddr = workarea_start + decrypted_base;
+ ppd.vaddr_end = workarea_end + decrypted_base;
+ sme_clear_pgd(&ppd);

/* Flush the TLB - no globals so cr3 is enough */
native_write_cr3(__native_read_cr3());



2018-01-22 09:10:24

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.14 43/89] x86/apic/vector: Fix off by one in error path

4.14-stable review patch. If anyone has any objections, please let me know.

------------------

From: Thomas Gleixner <[email protected]>

commit 45d55e7bac4028af93f5fa324e69958a0b868e96 upstream.

Keith reported the following warning:

WARNING: CPU: 28 PID: 1420 at kernel/irq/matrix.c:222 irq_matrix_remove_managed+0x10f/0x120
x86_vector_free_irqs+0xa1/0x180
x86_vector_alloc_irqs+0x1e4/0x3a0
msi_domain_alloc+0x62/0x130

The reason for this is that if the vector allocation fails the error
handling code tries to free the failed vector as well, which causes the
above imbalance warning to trigger.

Adjust the error path to handle this correctly.

Fixes: b5dc8e6c21e7 ("x86/irq: Use hierarchical irqdomain to manage CPU interrupt vectors")
Reported-by: Keith Busch <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Tested-by: Keith Busch <[email protected]>
Cc: [email protected]
Link: https://lkml.kernel.org/r/alpine.DEB.2.20.1801161217300.1823@nanos
Signed-off-by: Greg Kroah-Hartman <[email protected]>


---
arch/x86/kernel/apic/vector.c | 7 +++++--
1 file changed, 5 insertions(+), 2 deletions(-)

--- a/arch/x86/kernel/apic/vector.c
+++ b/arch/x86/kernel/apic/vector.c
@@ -369,8 +369,11 @@ static int x86_vector_alloc_irqs(struct
irq_data->hwirq = virq + i;
err = assign_irq_vector_policy(virq + i, node, data, info,
irq_data);
- if (err)
+ if (err) {
+ irq_data->chip_data = NULL;
+ free_apic_chip_data(data);
goto error;
+ }
/*
* If the apic destination mode is physical, then the
* effective affinity is restricted to a single target
@@ -383,7 +386,7 @@ static int x86_vector_alloc_irqs(struct
return 0;

error:
- x86_vector_free_irqs(domain, virq, i + 1);
+ x86_vector_free_irqs(domain, virq, i);
return err;
}




2018-01-22 09:11:19

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.14 42/89] pipe: avoid round_pipe_size() nr_pages overflow on 32-bit

4.14-stable review patch. If anyone has any objections, please let me know.

------------------

From: Joe Lawrence <[email protected]>

commit d3f14c485867cfb2e0c48aa88c41d0ef4bf5209c upstream.

round_pipe_size() contains a right-bit-shift expression which may
overflow, which would cause undefined results in a subsequent
roundup_pow_of_two() call.

static inline unsigned int round_pipe_size(unsigned int size)
{
unsigned long nr_pages;

nr_pages = (size + PAGE_SIZE - 1) >> PAGE_SHIFT;
return roundup_pow_of_two(nr_pages) << PAGE_SHIFT;
}

PAGE_SIZE is defined as (1UL << PAGE_SHIFT), so:
- 4 bytes wide on 32-bit (0 to 0xffffffff)
- 8 bytes wide on 64-bit (0 to 0xffffffffffffffff)

That means that 32-bit round_pipe_size(), nr_pages may overflow to 0:

size=0x00000000 nr_pages=0x0
size=0x00000001 nr_pages=0x1
size=0xfffff000 nr_pages=0xfffff
size=0xfffff001 nr_pages=0x0 << !
size=0xffffffff nr_pages=0x0 << !

This is bad because roundup_pow_of_two(n) is undefined when n == 0!

64-bit is not a problem as the unsigned int size is 4 bytes wide
(similar to 32-bit) and the larger, 8 byte wide unsigned long, is
sufficient to handle the largest value of the bit shift expression:

size=0xffffffff nr_pages=100000

Modify round_pipe_size() to return 0 if n == 0 and updates its callers to
handle accordingly.

Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Joe Lawrence <[email protected]>
Reported-by: Mikulas Patocka <[email protected]>
Reviewed-by: Mikulas Patocka <[email protected]>
Cc: Al Viro <[email protected]>
Cc: Jens Axboe <[email protected]>
Cc: Michael Kerrisk <[email protected]>
Cc: Randy Dunlap <[email protected]>
Cc: Josh Poimboeuf <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Dong Jinguang <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
fs/pipe.c | 17 +++++++++++++++--
1 file changed, 15 insertions(+), 2 deletions(-)

--- a/fs/pipe.c
+++ b/fs/pipe.c
@@ -1018,13 +1018,19 @@ const struct file_operations pipefifo_fo

/*
* Currently we rely on the pipe array holding a power-of-2 number
- * of pages.
+ * of pages. Returns 0 on error.
*/
static inline unsigned int round_pipe_size(unsigned int size)
{
unsigned long nr_pages;

+ if (size < pipe_min_size)
+ size = pipe_min_size;
+
nr_pages = (size + PAGE_SIZE - 1) >> PAGE_SHIFT;
+ if (nr_pages == 0)
+ return 0;
+
return roundup_pow_of_two(nr_pages) << PAGE_SHIFT;
}

@@ -1040,6 +1046,8 @@ static long pipe_set_size(struct pipe_in
long ret = 0;

size = round_pipe_size(arg);
+ if (size == 0)
+ return -EINVAL;
nr_pages = size >> PAGE_SHIFT;

if (!nr_pages)
@@ -1123,13 +1131,18 @@ out_revert_acct:
int pipe_proc_fn(struct ctl_table *table, int write, void __user *buf,
size_t *lenp, loff_t *ppos)
{
+ unsigned int rounded_pipe_max_size;
int ret;

ret = proc_douintvec_minmax(table, write, buf, lenp, ppos);
if (ret < 0 || !write)
return ret;

- pipe_max_size = round_pipe_size(pipe_max_size);
+ rounded_pipe_max_size = round_pipe_size(pipe_max_size);
+ if (rounded_pipe_max_size == 0)
+ return -EINVAL;
+
+ pipe_max_size = rounded_pipe_max_size;
return ret;
}




2018-01-22 09:11:47

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.14 38/89] x86/mm/pkeys: Fix fill_sig_info_pkey

4.14-stable review patch. If anyone has any objections, please let me know.

------------------

From: Eric W. Biederman <[email protected]>

commit beacd6f7ed5e2915959442245b3b2480c2e37490 upstream.

SEGV_PKUERR is a signal specific si_code which happens to have the same
numeric value as several others: BUS_MCEERR_AR, ILL_ILLTRP, FPE_FLTOVF,
TRAP_HWBKPT, CLD_TRAPPED, POLL_ERR, SEGV_THREAD_ID, as such it is not safe
to just test the si_code the signal number must also be tested to prevent a
false positive in fill_sig_info_pkey.

This error was by inspection, and BUS_MCEERR_AR appears to be a real
candidate for confusion. So pass in si_signo and check for SIG_SEGV to
verify that it is actually a SEGV_PKUERR

Fixes: 019132ff3daf ("x86/mm/pkeys: Fill in pkey field in siginfo")
Signed-off-by: "Eric W. Biederman" <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Cc: [email protected]
Cc: Dave Hansen <[email protected]>
Cc: Oleg Nesterov <[email protected]>
Cc: Al Viro <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
arch/x86/mm/fault.c | 7 ++++---
1 file changed, 4 insertions(+), 3 deletions(-)

--- a/arch/x86/mm/fault.c
+++ b/arch/x86/mm/fault.c
@@ -173,14 +173,15 @@ is_prefetch(struct pt_regs *regs, unsign
* 6. T1 : reaches here, sees vma_pkey(vma)=5, when we really
* faulted on a pte with its pkey=4.
*/
-static void fill_sig_info_pkey(int si_code, siginfo_t *info, u32 *pkey)
+static void fill_sig_info_pkey(int si_signo, int si_code, siginfo_t *info,
+ u32 *pkey)
{
/* This is effectively an #ifdef */
if (!boot_cpu_has(X86_FEATURE_OSPKE))
return;

/* Fault not from Protection Keys: nothing to do */
- if (si_code != SEGV_PKUERR)
+ if ((si_code != SEGV_PKUERR) || (si_signo != SIGSEGV))
return;
/*
* force_sig_info_fault() is called from a number of
@@ -219,7 +220,7 @@ force_sig_info_fault(int si_signo, int s
lsb = PAGE_SHIFT;
info.si_addr_lsb = lsb;

- fill_sig_info_pkey(si_code, &info, pkey);
+ fill_sig_info_pkey(si_signo, si_code, &info, pkey);

force_sig_info(si_signo, &info, tsk);
}



2018-01-22 09:11:57

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.14 40/89] x86/tsc: Future-proof native_calibrate_tsc()

4.14-stable review patch. If anyone has any objections, please let me know.

------------------

From: Len Brown <[email protected]>

commit da4ae6c4a0b8dee5a5377a385545d2250fa8cddb upstream.

If the crystal frequency cannot be determined via CPUID(15).crystal_khz or
the built-in table then native_calibrate_tsc() will still set the
X86_FEATURE_TSC_KNOWN_FREQ flag which prevents the refined TSC calibration.

As a consequence such systems use cpu_khz for the TSC frequency which is
incorrect when cpu_khz != tsc_khz resulting in time drift.

Return early when the crystal frequency cannot be retrieved without setting
the X86_FEATURE_TSC_KNOWN_FREQ flag. This ensures that the refined TSC
calibration is invoked.

[ tglx: Steam-blastered changelog. Sigh ]

Fixes: 4ca4df0b7eb0 ("x86/tsc: Mark TSC frequency determined by CPUID as known")
Signed-off-by: Len Brown <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Cc: [email protected]
Cc: Bin Gao <[email protected]>
Link: https://lkml.kernel.org/r/0fe2503aa7d7fc69137141fc705541a78101d2b9.1513920414.git.len.brown@intel.com
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
arch/x86/kernel/tsc.c | 2 ++
1 file changed, 2 insertions(+)

--- a/arch/x86/kernel/tsc.c
+++ b/arch/x86/kernel/tsc.c
@@ -612,6 +612,8 @@ unsigned long native_calibrate_tsc(void)
}
}

+ if (crystal_khz == 0)
+ return 0;
/*
* TSC frequency determined by CPUID is a "hardware reported"
* frequency and is the most accurate one so far we have. This



2018-01-22 09:12:04

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.14 21/89] ALSA: hda - Apply the existing quirk to iMac 14,1

4.14-stable review patch. If anyone has any objections, please let me know.

------------------

From: Takashi Iwai <[email protected]>

commit 031f335cda879450095873003abb03ae8ed3b74a upstream.

iMac 14,1 requires the same quirk as iMac 12,2, using GPIO 2 and 3 for
headphone and speaker output amps. Add the codec SSID quirk entry
(106b:0600) accordingly.

BugLink: http://lkml.kernel.org/r/CAEw6Zyteav09VGHRfD5QwsfuWv5a43r0tFBNbfcHXoNrxVz7ew@mail.gmail.com
Reported-by: Freaky <[email protected]>
Signed-off-by: Takashi Iwai <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
sound/pci/hda/patch_cirrus.c | 1 +
1 file changed, 1 insertion(+)

--- a/sound/pci/hda/patch_cirrus.c
+++ b/sound/pci/hda/patch_cirrus.c
@@ -408,6 +408,7 @@ static const struct snd_pci_quirk cs420x
/*SND_PCI_QUIRK(0x8086, 0x7270, "IMac 27 Inch", CS420X_IMAC27),*/

/* codec SSID */
+ SND_PCI_QUIRK(0x106b, 0x0600, "iMac 14,1", CS420X_IMAC27_122),
SND_PCI_QUIRK(0x106b, 0x1c00, "MacBookPro 8,1", CS420X_MBP81),
SND_PCI_QUIRK(0x106b, 0x2000, "iMac 12,2", CS420X_IMAC27_122),
SND_PCI_QUIRK(0x106b, 0x2800, "MacBookPro 10,1", CS420X_MBP101),



2018-01-22 09:12:19

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.14 37/89] x86/intel_rdt/cqm: Prevent use after free

4.14-stable review patch. If anyone has any objections, please let me know.

------------------

From: Thomas Gleixner <[email protected]>

commit d47924417319e3b6a728c0b690f183e75bc2a702 upstream.

intel_rdt_iffline_cpu() -> domain_remove_cpu() frees memory first and then
proceeds accessing it.

BUG: KASAN: use-after-free in find_first_bit+0x1f/0x80
Read of size 8 at addr ffff883ff7c1e780 by task cpuhp/31/195
find_first_bit+0x1f/0x80
has_busy_rmid+0x47/0x70
intel_rdt_offline_cpu+0x4b4/0x510

Freed by task 195:
kfree+0x94/0x1a0
intel_rdt_offline_cpu+0x17d/0x510

Do the teardown first and then free memory.

Fixes: 24247aeeabe9 ("x86/intel_rdt/cqm: Improve limbo list processing")
Reported-by: Joseph Salisbury <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Cc: Ravi Shankar <[email protected]>
Cc: Peter Zilstra <[email protected]>
Cc: Stephane Eranian <[email protected]>
Cc: Vikas Shivappa <[email protected]>
Cc: Andi Kleen <[email protected]>
Cc: "Roderick W. Smith" <[email protected]>
Cc: [email protected]
Cc: Fenghua Yu <[email protected]>
Cc: Tony Luck <[email protected]>
Link: https://lkml.kernel.org/r/alpine.DEB.2.20.1801161957510.2366@nanos
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
arch/x86/kernel/cpu/intel_rdt.c | 8 ++++----
1 file changed, 4 insertions(+), 4 deletions(-)

--- a/arch/x86/kernel/cpu/intel_rdt.c
+++ b/arch/x86/kernel/cpu/intel_rdt.c
@@ -525,10 +525,6 @@ static void domain_remove_cpu(int cpu, s
*/
if (static_branch_unlikely(&rdt_mon_enable_key))
rmdir_mondata_subdir_allrdtgrp(r, d->id);
- kfree(d->ctrl_val);
- kfree(d->rmid_busy_llc);
- kfree(d->mbm_total);
- kfree(d->mbm_local);
list_del(&d->list);
if (is_mbm_enabled())
cancel_delayed_work(&d->mbm_over);
@@ -545,6 +541,10 @@ static void domain_remove_cpu(int cpu, s
cancel_delayed_work(&d->cqm_limbo);
}

+ kfree(d->ctrl_val);
+ kfree(d->rmid_busy_llc);
+ kfree(d->mbm_total);
+ kfree(d->mbm_local);
kfree(d);
return;
}



2018-01-22 09:13:13

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.14 36/89] module: Add retpoline tag to VERMAGIC

4.14-stable review patch. If anyone has any objections, please let me know.

------------------

From: Andi Kleen <[email protected]>

commit 6cfb521ac0d5b97470883ff9b7facae264b7ab12 upstream.

Add a marker for retpoline to the module VERMAGIC. This catches the case
when a non RETPOLINE compiled module gets loaded into a retpoline kernel,
making it insecure.

It doesn't handle the case when retpoline has been runtime disabled. Even
in this case the match of the retcompile status will be enforced. This
implies that even with retpoline run time disabled all modules loaded need
to be recompiled.

Signed-off-by: Andi Kleen <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Reviewed-by: Greg Kroah-Hartman <[email protected]>
Acked-by: David Woodhouse <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
include/linux/vermagic.h | 8 +++++++-
1 file changed, 7 insertions(+), 1 deletion(-)

--- a/include/linux/vermagic.h
+++ b/include/linux/vermagic.h
@@ -31,11 +31,17 @@
#else
#define MODULE_RANDSTRUCT_PLUGIN
#endif
+#ifdef RETPOLINE
+#define MODULE_VERMAGIC_RETPOLINE "retpoline "
+#else
+#define MODULE_VERMAGIC_RETPOLINE ""
+#endif

#define VERMAGIC_STRING \
UTS_RELEASE " " \
MODULE_VERMAGIC_SMP MODULE_VERMAGIC_PREEMPT \
MODULE_VERMAGIC_MODULE_UNLOAD MODULE_VERMAGIC_MODVERSIONS \
MODULE_ARCH_VERMAGIC \
- MODULE_RANDSTRUCT_PLUGIN
+ MODULE_RANDSTRUCT_PLUGIN \
+ MODULE_VERMAGIC_RETPOLINE




2018-01-22 09:14:11

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.14 34/89] objtool: Improve error message for bad file argument

4.14-stable review patch. If anyone has any objections, please let me know.

------------------

From: Josh Poimboeuf <[email protected]>

commit 385d11b152c4eb638eeb769edcb3249533bb9a00 upstream.

If a nonexistent file is supplied to objtool, it complains with a
non-helpful error:

open: No such file or directory

Improve it to:

objtool: Can't open 'foo': No such file or directory

Reported-by: Markus <[email protected]>
Signed-off-by: Josh Poimboeuf <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Link: http://lkml.kernel.org/r/406a3d00a21225eee2819844048e17f68523ccf6.1516025651.git.jpoimboe@redhat.com
Signed-off-by: Ingo Molnar <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
tools/objtool/elf.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)

--- a/tools/objtool/elf.c
+++ b/tools/objtool/elf.c
@@ -26,6 +26,7 @@
#include <stdlib.h>
#include <string.h>
#include <unistd.h>
+#include <errno.h>

#include "elf.h"
#include "warn.h"
@@ -358,7 +359,8 @@ struct elf *elf_open(const char *name, i

elf->fd = open(name, flags);
if (elf->fd == -1) {
- perror("open");
+ fprintf(stderr, "objtool: Can't open '%s': %s\n",
+ name, strerror(errno));
goto err;
}




2018-01-22 09:14:19

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.14 32/89] x86/retpoline: Fill RSB on context switch for affected CPUs

4.14-stable review patch. If anyone has any objections, please let me know.

------------------

From: David Woodhouse <[email protected]>

commit c995efd5a740d9cbafbf58bde4973e8b50b4d761 upstream.

On context switch from a shallow call stack to a deeper one, as the CPU
does 'ret' up the deeper side it may encounter RSB entries (predictions for
where the 'ret' goes to) which were populated in userspace.

This is problematic if neither SMEP nor KPTI (the latter of which marks
userspace pages as NX for the kernel) are active, as malicious code in
userspace may then be executed speculatively.

Overwrite the CPU's return prediction stack with calls which are predicted
to return to an infinite loop, to "capture" speculation if this
happens. This is required both for retpoline, and also in conjunction with
IBRS for !SMEP && !KPTI.

On Skylake+ the problem is slightly different, and an *underflow* of the
RSB may cause errant branch predictions to occur. So there it's not so much
overwrite, as *filling* the RSB to attempt to prevent it getting
empty. This is only a partial solution for Skylake+ since there are many
other conditions which may result in the RSB becoming empty. The full
solution on Skylake+ is to use IBRS, which will prevent the problem even
when the RSB becomes empty. With IBRS, the RSB-stuffing will not be
required on context switch.

[ tglx: Added missing vendor check and slighty massaged comments and
changelog ]

Signed-off-by: David Woodhouse <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Acked-by: Arjan van de Ven <[email protected]>
Cc: [email protected]
Cc: Rik van Riel <[email protected]>
Cc: Andi Kleen <[email protected]>
Cc: Josh Poimboeuf <[email protected]>
Cc: [email protected]
Cc: Peter Zijlstra <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Jiri Kosina <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: Kees Cook <[email protected]>
Cc: Tim Chen <[email protected]>
Cc: Greg Kroah-Hartman <[email protected]>
Cc: Paul Turner <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
arch/x86/entry/entry_32.S | 11 +++++++++++
arch/x86/entry/entry_64.S | 11 +++++++++++
arch/x86/include/asm/cpufeatures.h | 1 +
arch/x86/kernel/cpu/bugs.c | 36 ++++++++++++++++++++++++++++++++++++
4 files changed, 59 insertions(+)

--- a/arch/x86/entry/entry_32.S
+++ b/arch/x86/entry/entry_32.S
@@ -244,6 +244,17 @@ ENTRY(__switch_to_asm)
movl %ebx, PER_CPU_VAR(stack_canary)+stack_canary_offset
#endif

+#ifdef CONFIG_RETPOLINE
+ /*
+ * When switching from a shallower to a deeper call stack
+ * the RSB may either underflow or use entries populated
+ * with userspace addresses. On CPUs where those concerns
+ * exist, overwrite the RSB with entries which capture
+ * speculative execution to prevent attack.
+ */
+ FILL_RETURN_BUFFER %ebx, RSB_CLEAR_LOOPS, X86_FEATURE_RSB_CTXSW
+#endif
+
/* restore callee-saved registers */
popl %esi
popl %edi
--- a/arch/x86/entry/entry_64.S
+++ b/arch/x86/entry/entry_64.S
@@ -487,6 +487,17 @@ ENTRY(__switch_to_asm)
movq %rbx, PER_CPU_VAR(irq_stack_union)+stack_canary_offset
#endif

+#ifdef CONFIG_RETPOLINE
+ /*
+ * When switching from a shallower to a deeper call stack
+ * the RSB may either underflow or use entries populated
+ * with userspace addresses. On CPUs where those concerns
+ * exist, overwrite the RSB with entries which capture
+ * speculative execution to prevent attack.
+ */
+ FILL_RETURN_BUFFER %r12, RSB_CLEAR_LOOPS, X86_FEATURE_RSB_CTXSW
+#endif
+
/* restore callee-saved registers */
popq %r15
popq %r14
--- a/arch/x86/include/asm/cpufeatures.h
+++ b/arch/x86/include/asm/cpufeatures.h
@@ -211,6 +211,7 @@
#define X86_FEATURE_AVX512_4FMAPS ( 7*32+17) /* AVX-512 Multiply Accumulation Single precision */

#define X86_FEATURE_MBA ( 7*32+18) /* Memory Bandwidth Allocation */
+#define X86_FEATURE_RSB_CTXSW ( 7*32+19) /* Fill RSB on context switches */

/* Virtualization flags: Linux defined, word 8 */
#define X86_FEATURE_TPR_SHADOW ( 8*32+ 0) /* Intel TPR Shadow */
--- a/arch/x86/kernel/cpu/bugs.c
+++ b/arch/x86/kernel/cpu/bugs.c
@@ -23,6 +23,7 @@
#include <asm/alternative.h>
#include <asm/pgtable.h>
#include <asm/set_memory.h>
+#include <asm/intel-family.h>

static void __init spectre_v2_select_mitigation(void);

@@ -155,6 +156,23 @@ disable:
return SPECTRE_V2_CMD_NONE;
}

+/* Check for Skylake-like CPUs (for RSB handling) */
+static bool __init is_skylake_era(void)
+{
+ if (boot_cpu_data.x86_vendor == X86_VENDOR_INTEL &&
+ boot_cpu_data.x86 == 6) {
+ switch (boot_cpu_data.x86_model) {
+ case INTEL_FAM6_SKYLAKE_MOBILE:
+ case INTEL_FAM6_SKYLAKE_DESKTOP:
+ case INTEL_FAM6_SKYLAKE_X:
+ case INTEL_FAM6_KABYLAKE_MOBILE:
+ case INTEL_FAM6_KABYLAKE_DESKTOP:
+ return true;
+ }
+ }
+ return false;
+}
+
static void __init spectre_v2_select_mitigation(void)
{
enum spectre_v2_mitigation_cmd cmd = spectre_v2_parse_cmdline();
@@ -213,6 +231,24 @@ retpoline_auto:

spectre_v2_enabled = mode;
pr_info("%s\n", spectre_v2_strings[mode]);
+
+ /*
+ * If neither SMEP or KPTI are available, there is a risk of
+ * hitting userspace addresses in the RSB after a context switch
+ * from a shallow call stack to a deeper one. To prevent this fill
+ * the entire RSB, even when using IBRS.
+ *
+ * Skylake era CPUs have a separate issue with *underflow* of the
+ * RSB, when they will predict 'ret' targets from the generic BTB.
+ * The proper mitigation for this is IBRS. If IBRS is not supported
+ * or deactivated in favour of retpolines the RSB fill on context
+ * switch is required.
+ */
+ if ((!boot_cpu_has(X86_FEATURE_PTI) &&
+ !boot_cpu_has(X86_FEATURE_SMEP)) || is_skylake_era()) {
+ setup_force_cpu_cap(X86_FEATURE_RSB_CTXSW);
+ pr_info("Filling RSB on context switch\n");
+ }
}

#undef pr_fmt



2018-01-22 09:14:20

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.14 29/89] objtool: Fix seg fault with gold linker

4.14-stable review patch. If anyone has any objections, please let me know.

------------------

From: Josh Poimboeuf <[email protected]>

commit 2a0098d70640dda192a79966c14d449e7a34d675 upstream.

Objtool segfaults when the gold linker is used with
CONFIG_MODVERSIONS=y and CONFIG_UNWINDER_ORC=y.

With CONFIG_MODVERSIONS=y, the .o file gets passed to the linker before
being passed to objtool. The gold linker seems to strip unused ELF
symbols by default, which confuses objtool and causes the seg fault when
it's trying to generate ORC metadata.

Objtool should really be running immediately after GCC anyway, without a
linker call in between. Change the makefile ordering so that objtool is
called before the linker.

Reported-and-tested-by: Markus <[email protected]>
Signed-off-by: Josh Poimboeuf <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Fixes: ee9f8fce9964 ("x86/unwind: Add the ORC unwinder")
Link: http://lkml.kernel.org/r/355f04da33581f4a3bf82e5b512973624a1e23a2.1516025651.git.jpoimboe@redhat.com
Signed-off-by: Ingo Molnar <[email protected]>
Cc: Guenter Roeck <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
scripts/Makefile.build | 14 ++++++++++----
1 file changed, 10 insertions(+), 4 deletions(-)

--- a/scripts/Makefile.build
+++ b/scripts/Makefile.build
@@ -270,12 +270,18 @@ else
objtool_args += $(call cc-ifversion, -lt, 0405, --no-unreachable)
endif

+ifdef CONFIG_MODVERSIONS
+objtool_o = $(@D)/.tmp_$(@F)
+else
+objtool_o = $(@)
+endif
+
# 'OBJECT_FILES_NON_STANDARD := y': skip objtool checking for a directory
# 'OBJECT_FILES_NON_STANDARD_foo.o := 'y': skip objtool checking for a file
# 'OBJECT_FILES_NON_STANDARD_foo.o := 'n': override directory skip for a file
cmd_objtool = $(if $(patsubst y%,, \
$(OBJECT_FILES_NON_STANDARD_$(basetarget).o)$(OBJECT_FILES_NON_STANDARD)n), \
- $(__objtool_obj) $(objtool_args) "$(@)";)
+ $(__objtool_obj) $(objtool_args) "$(objtool_o)";)
objtool_obj = $(if $(patsubst y%,, \
$(OBJECT_FILES_NON_STANDARD_$(basetarget).o)$(OBJECT_FILES_NON_STANDARD)n), \
$(__objtool_obj))
@@ -291,15 +297,15 @@ objtool_dep = $(objtool_obj) \
define rule_cc_o_c
$(call echo-cmd,checksrc) $(cmd_checksrc) \
$(call cmd_and_fixdep,cc_o_c) \
- $(cmd_modversions_c) \
$(call echo-cmd,objtool) $(cmd_objtool) \
+ $(cmd_modversions_c) \
$(call echo-cmd,record_mcount) $(cmd_record_mcount)
endef

define rule_as_o_S
$(call cmd_and_fixdep,as_o_S) \
- $(cmd_modversions_S) \
- $(call echo-cmd,objtool) $(cmd_objtool)
+ $(call echo-cmd,objtool) $(cmd_objtool) \
+ $(cmd_modversions_S)
endef

# List module undefined symbols (or empty line if not enabled)



2018-01-22 09:14:33

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.14 33/89] x86/retpoline: Add LFENCE to the retpoline/RSB filling RSB macros

4.14-stable review patch. If anyone has any objections, please let me know.

------------------

From: Tom Lendacky <[email protected]>

commit 28d437d550e1e39f805d99f9f8ac399c778827b7 upstream.

The PAUSE instruction is currently used in the retpoline and RSB filling
macros as a speculation trap. The use of PAUSE was originally suggested
because it showed a very, very small difference in the amount of
cycles/time used to execute the retpoline as compared to LFENCE. On AMD,
the PAUSE instruction is not a serializing instruction, so the pause/jmp
loop will use excess power as it is speculated over waiting for return
to mispredict to the correct target.

The RSB filling macro is applicable to AMD, and, if software is unable to
verify that LFENCE is serializing on AMD (possible when running under a
hypervisor), the generic retpoline support will be used and, so, is also
applicable to AMD. Keep the current usage of PAUSE for Intel, but add an
LFENCE instruction to the speculation trap for AMD.

The same sequence has been adopted by GCC for the GCC generated retpolines.

Signed-off-by: Tom Lendacky <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Reviewed-by: Borislav Petkov <[email protected]>
Acked-by: David Woodhouse <[email protected]>
Acked-by: Arjan van de Ven <[email protected]>
Cc: Rik van Riel <[email protected]>
Cc: Andi Kleen <[email protected]>
Cc: Paul Turner <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Tim Chen <[email protected]>
Cc: Jiri Kosina <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Josh Poimboeuf <[email protected]>
Cc: Dan Williams <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Greg Kroah-Hartman <[email protected]>
Cc: Kees Cook <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
arch/x86/include/asm/nospec-branch.h | 6 +++++-
1 file changed, 5 insertions(+), 1 deletion(-)

--- a/arch/x86/include/asm/nospec-branch.h
+++ b/arch/x86/include/asm/nospec-branch.h
@@ -11,7 +11,7 @@
* Fill the CPU return stack buffer.
*
* Each entry in the RSB, if used for a speculative 'ret', contains an
- * infinite 'pause; jmp' loop to capture speculative execution.
+ * infinite 'pause; lfence; jmp' loop to capture speculative execution.
*
* This is required in various cases for retpoline and IBRS-based
* mitigations for the Spectre variant 2 vulnerability. Sometimes to
@@ -38,11 +38,13 @@
call 772f; \
773: /* speculation trap */ \
pause; \
+ lfence; \
jmp 773b; \
772: \
call 774f; \
775: /* speculation trap */ \
pause; \
+ lfence; \
jmp 775b; \
774: \
dec reg; \
@@ -73,6 +75,7 @@
call .Ldo_rop_\@
.Lspec_trap_\@:
pause
+ lfence
jmp .Lspec_trap_\@
.Ldo_rop_\@:
mov \reg, (%_ASM_SP)
@@ -165,6 +168,7 @@
" .align 16\n" \
"901: call 903f;\n" \
"902: pause;\n" \
+ " lfence;\n" \
" jmp 902b;\n" \
" .align 16\n" \
"903: addl $4, %%esp;\n" \



2018-01-22 09:14:33

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.14 31/89] x86/kasan: Panic if there is not enough memory to boot

4.14-stable review patch. If anyone has any objections, please let me know.

------------------

From: Andrey Ryabinin <[email protected]>

commit 0d39e2669d7b0fefd2d8f9e7868ae669b364d9ba upstream.

Currently KASAN doesn't panic in case it don't have enough memory
to boot. Instead, it crashes in some random place:

kernel BUG at arch/x86/mm/physaddr.c:27!

RIP: 0010:__phys_addr+0x268/0x276
Call Trace:
kasan_populate_shadow+0x3f2/0x497
kasan_init+0x12e/0x2b2
setup_arch+0x2825/0x2a2c
start_kernel+0xc8/0x15f4
x86_64_start_reservations+0x2a/0x2c
x86_64_start_kernel+0x72/0x75
secondary_startup_64+0xa5/0xb0

Use memblock_virt_alloc_try_nid() for allocations without failure
fallback. It will panic with an out of memory message.

Reported-by: kernel test robot <[email protected]>
Signed-off-by: Andrey Ryabinin <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Acked-by: Dmitry Vyukov <[email protected]>
Cc: [email protected]
Cc: Alexander Potapenko <[email protected]>
Cc: [email protected]
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
arch/x86/mm/kasan_init_64.c | 24 ++++++++++++++----------
1 file changed, 14 insertions(+), 10 deletions(-)

--- a/arch/x86/mm/kasan_init_64.c
+++ b/arch/x86/mm/kasan_init_64.c
@@ -21,10 +21,14 @@ extern struct range pfn_mapped[E820_MAX_

static p4d_t tmp_p4d_table[PTRS_PER_P4D] __initdata __aligned(PAGE_SIZE);

-static __init void *early_alloc(size_t size, int nid)
+static __init void *early_alloc(size_t size, int nid, bool panic)
{
- return memblock_virt_alloc_try_nid_nopanic(size, size,
- __pa(MAX_DMA_ADDRESS), BOOTMEM_ALLOC_ACCESSIBLE, nid);
+ if (panic)
+ return memblock_virt_alloc_try_nid(size, size,
+ __pa(MAX_DMA_ADDRESS), BOOTMEM_ALLOC_ACCESSIBLE, nid);
+ else
+ return memblock_virt_alloc_try_nid_nopanic(size, size,
+ __pa(MAX_DMA_ADDRESS), BOOTMEM_ALLOC_ACCESSIBLE, nid);
}

static void __init kasan_populate_pmd(pmd_t *pmd, unsigned long addr,
@@ -38,14 +42,14 @@ static void __init kasan_populate_pmd(pm
if (boot_cpu_has(X86_FEATURE_PSE) &&
((end - addr) == PMD_SIZE) &&
IS_ALIGNED(addr, PMD_SIZE)) {
- p = early_alloc(PMD_SIZE, nid);
+ p = early_alloc(PMD_SIZE, nid, false);
if (p && pmd_set_huge(pmd, __pa(p), PAGE_KERNEL))
return;
else if (p)
memblock_free(__pa(p), PMD_SIZE);
}

- p = early_alloc(PAGE_SIZE, nid);
+ p = early_alloc(PAGE_SIZE, nid, true);
pmd_populate_kernel(&init_mm, pmd, p);
}

@@ -57,7 +61,7 @@ static void __init kasan_populate_pmd(pm
if (!pte_none(*pte))
continue;

- p = early_alloc(PAGE_SIZE, nid);
+ p = early_alloc(PAGE_SIZE, nid, true);
entry = pfn_pte(PFN_DOWN(__pa(p)), PAGE_KERNEL);
set_pte_at(&init_mm, addr, pte, entry);
} while (pte++, addr += PAGE_SIZE, addr != end);
@@ -75,14 +79,14 @@ static void __init kasan_populate_pud(pu
if (boot_cpu_has(X86_FEATURE_GBPAGES) &&
((end - addr) == PUD_SIZE) &&
IS_ALIGNED(addr, PUD_SIZE)) {
- p = early_alloc(PUD_SIZE, nid);
+ p = early_alloc(PUD_SIZE, nid, false);
if (p && pud_set_huge(pud, __pa(p), PAGE_KERNEL))
return;
else if (p)
memblock_free(__pa(p), PUD_SIZE);
}

- p = early_alloc(PAGE_SIZE, nid);
+ p = early_alloc(PAGE_SIZE, nid, true);
pud_populate(&init_mm, pud, p);
}

@@ -101,7 +105,7 @@ static void __init kasan_populate_p4d(p4
unsigned long next;

if (p4d_none(*p4d)) {
- void *p = early_alloc(PAGE_SIZE, nid);
+ void *p = early_alloc(PAGE_SIZE, nid, true);

p4d_populate(&init_mm, p4d, p);
}
@@ -122,7 +126,7 @@ static void __init kasan_populate_pgd(pg
unsigned long next;

if (pgd_none(*pgd)) {
- p = early_alloc(PAGE_SIZE, nid);
+ p = early_alloc(PAGE_SIZE, nid, true);
pgd_populate(&init_mm, pgd, p);
}




2018-01-22 09:14:50

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.14 20/89] ALSA: hda - Apply headphone noise quirk for another Dell XPS 13 variant

4.14-stable review patch. If anyone has any objections, please let me know.

------------------

From: Takashi Iwai <[email protected]>

commit e4c9fd10eb21376f44723c40ad12395089251c28 upstream.

There is another Dell XPS 13 variant (SSID 1028:082a) that requires
the existing fixup for reducing the headphone noise.
This patch adds the quirk entry for that.

BugLink: http://lkml.kernel.org/r/CAHXyb9ZCZJzVisuBARa+UORcjRERV8yokez=DP1_5O5isTz0ZA@mail.gmail.com
Reported-and-tested-by: Francisco G. <[email protected]>
Signed-off-by: Takashi Iwai <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
sound/pci/hda/patch_realtek.c | 1 +
1 file changed, 1 insertion(+)

--- a/sound/pci/hda/patch_realtek.c
+++ b/sound/pci/hda/patch_realtek.c
@@ -6173,6 +6173,7 @@ static const struct snd_pci_quirk alc269
SND_PCI_QUIRK(0x1028, 0x075b, "Dell XPS 13 9360", ALC256_FIXUP_DELL_XPS_13_HEADPHONE_NOISE),
SND_PCI_QUIRK(0x1028, 0x075d, "Dell AIO", ALC298_FIXUP_SPK_VOLUME),
SND_PCI_QUIRK(0x1028, 0x0798, "Dell Inspiron 17 7000 Gaming", ALC256_FIXUP_DELL_INSPIRON_7559_SUBWOOFER),
+ SND_PCI_QUIRK(0x1028, 0x082a, "Dell XPS 13 9360", ALC256_FIXUP_DELL_XPS_13_HEADPHONE_NOISE),
SND_PCI_QUIRK(0x1028, 0x164a, "Dell", ALC293_FIXUP_DELL1_MIC_NO_PRESENCE),
SND_PCI_QUIRK(0x1028, 0x164b, "Dell", ALC293_FIXUP_DELL1_MIC_NO_PRESENCE),
SND_PCI_QUIRK(0x103c, 0x1586, "HP", ALC269_FIXUP_HP_MUTE_LED_MIC2),



2018-01-22 09:15:21

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.14 08/89] powerpc/64s: Simple RFI macro conversions

4.14-stable review patch. If anyone has any objections, please let me know.

------------------

From: Nicholas Piggin <[email protected]>

commit 222f20f140623ef6033491d0103ee0875fe87d35 upstream.

This commit does simple conversions of rfi/rfid to the new macros that
include the expected destination context. By simple we mean cases
where there is a single well known destination context, and it's
simply a matter of substituting the instruction for the appropriate
macro.

Signed-off-by: Nicholas Piggin <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>


---
arch/powerpc/include/asm/exception-64s.h | 4 ++--
arch/powerpc/kernel/entry_64.S | 14 +++++++++-----
arch/powerpc/kernel/exceptions-64s.S | 22 +++++++++++-----------
arch/powerpc/kvm/book3s_hv_rmhandlers.S | 7 +++----
arch/powerpc/kvm/book3s_rmhandlers.S | 7 +++++--
arch/powerpc/kvm/book3s_segment.S | 4 ++--
6 files changed, 32 insertions(+), 26 deletions(-)

--- a/arch/powerpc/include/asm/exception-64s.h
+++ b/arch/powerpc/include/asm/exception-64s.h
@@ -242,7 +242,7 @@ END_FTR_SECTION_NESTED(ftr,ftr,943)
mtspr SPRN_##h##SRR0,r12; \
mfspr r12,SPRN_##h##SRR1; /* and SRR1 */ \
mtspr SPRN_##h##SRR1,r10; \
- h##rfid; \
+ h##RFI_TO_KERNEL; \
b . /* prevent speculative execution */
#define EXCEPTION_PROLOG_PSERIES_1(label, h) \
__EXCEPTION_PROLOG_PSERIES_1(label, h)
@@ -256,7 +256,7 @@ END_FTR_SECTION_NESTED(ftr,ftr,943)
mtspr SPRN_##h##SRR0,r12; \
mfspr r12,SPRN_##h##SRR1; /* and SRR1 */ \
mtspr SPRN_##h##SRR1,r10; \
- h##rfid; \
+ h##RFI_TO_KERNEL; \
b . /* prevent speculative execution */

#define EXCEPTION_PROLOG_PSERIES_1_NORI(label, h) \
--- a/arch/powerpc/kernel/entry_64.S
+++ b/arch/powerpc/kernel/entry_64.S
@@ -37,6 +37,11 @@
#include <asm/tm.h>
#include <asm/ppc-opcode.h>
#include <asm/export.h>
+#ifdef CONFIG_PPC_BOOK3S
+#include <asm/exception-64s.h>
+#else
+#include <asm/exception-64e.h>
+#endif

/*
* System calls.
@@ -397,8 +402,7 @@ END_FTR_SECTION_IFSET(CPU_FTR_HAS_PPR)
mtmsrd r10, 1
mtspr SPRN_SRR0, r11
mtspr SPRN_SRR1, r12
-
- rfid
+ RFI_TO_USER
b . /* prevent speculative execution */
#endif
_ASM_NOKPROBE_SYMBOL(system_call_common);
@@ -1073,7 +1077,7 @@ __enter_rtas:

mtspr SPRN_SRR0,r5
mtspr SPRN_SRR1,r6
- rfid
+ RFI_TO_KERNEL
b . /* prevent speculative execution */

rtas_return_loc:
@@ -1098,7 +1102,7 @@ rtas_return_loc:

mtspr SPRN_SRR0,r3
mtspr SPRN_SRR1,r4
- rfid
+ RFI_TO_KERNEL
b . /* prevent speculative execution */
_ASM_NOKPROBE_SYMBOL(__enter_rtas)
_ASM_NOKPROBE_SYMBOL(rtas_return_loc)
@@ -1171,7 +1175,7 @@ _GLOBAL(enter_prom)
LOAD_REG_IMMEDIATE(r12, MSR_SF | MSR_ISF | MSR_LE)
andc r11,r11,r12
mtsrr1 r11
- rfid
+ RFI_TO_KERNEL
#endif /* CONFIG_PPC_BOOK3E */

1: /* Return from OF */
--- a/arch/powerpc/kernel/exceptions-64s.S
+++ b/arch/powerpc/kernel/exceptions-64s.S
@@ -254,7 +254,7 @@ BEGIN_FTR_SECTION
LOAD_HANDLER(r12, machine_check_handle_early)
1: mtspr SPRN_SRR0,r12
mtspr SPRN_SRR1,r11
- rfid
+ RFI_TO_KERNEL
b . /* prevent speculative execution */
2:
/* Stack overflow. Stay on emergency stack and panic.
@@ -443,7 +443,7 @@ EXC_COMMON_BEGIN(machine_check_handle_ea
li r3,MSR_ME
andc r10,r10,r3 /* Turn off MSR_ME */
mtspr SPRN_SRR1,r10
- rfid
+ RFI_TO_KERNEL
b .
2:
/*
@@ -461,7 +461,7 @@ EXC_COMMON_BEGIN(machine_check_handle_ea
*/
bl machine_check_queue_event
MACHINE_CHECK_HANDLER_WINDUP
- rfid
+ RFI_TO_USER_OR_KERNEL
9:
/* Deliver the machine check to host kernel in V mode. */
MACHINE_CHECK_HANDLER_WINDUP
@@ -649,7 +649,7 @@ END_MMU_FTR_SECTION_IFCLR(MMU_FTR_TYPE_R
mtspr SPRN_SRR0,r10
ld r10,PACAKMSR(r13)
mtspr SPRN_SRR1,r10
- rfid
+ RFI_TO_KERNEL
b .

8: std r3,PACA_EXSLB+EX_DAR(r13)
@@ -660,7 +660,7 @@ END_MMU_FTR_SECTION_IFCLR(MMU_FTR_TYPE_R
mtspr SPRN_SRR0,r10
ld r10,PACAKMSR(r13)
mtspr SPRN_SRR1,r10
- rfid
+ RFI_TO_KERNEL
b .

EXC_COMMON_BEGIN(unrecov_slb)
@@ -905,7 +905,7 @@ END_FTR_SECTION_IFSET(CPU_FTR_REAL_LE)
mtspr SPRN_SRR0,r10 ; \
ld r10,PACAKMSR(r13) ; \
mtspr SPRN_SRR1,r10 ; \
- rfid ; \
+ RFI_TO_KERNEL ; \
b . ; /* prevent speculative execution */

#define SYSCALL_FASTENDIAN \
@@ -914,7 +914,7 @@ END_FTR_SECTION_IFSET(CPU_FTR_REAL_LE)
xori r12,r12,MSR_LE ; \
mtspr SPRN_SRR1,r12 ; \
mr r13,r9 ; \
- rfid ; /* return to userspace */ \
+ RFI_TO_USER ; /* return to userspace */ \
b . ; /* prevent speculative execution */

#if defined(CONFIG_RELOCATABLE)
@@ -1299,7 +1299,7 @@ END_FTR_SECTION_IFSET(CPU_FTR_CFAR)
ld r11,PACA_EXGEN+EX_R11(r13)
ld r12,PACA_EXGEN+EX_R12(r13)
ld r13,PACA_EXGEN+EX_R13(r13)
- HRFID
+ HRFI_TO_UNKNOWN
b .
#endif

@@ -1403,7 +1403,7 @@ masked_##_H##interrupt: \
ld r10,PACA_EXGEN+EX_R10(r13); \
ld r11,PACA_EXGEN+EX_R11(r13); \
/* returns to kernel where r13 must be set up, so don't restore it */ \
- ##_H##rfid; \
+ ##_H##RFI_TO_KERNEL; \
b .; \
MASKED_DEC_HANDLER(_H)

@@ -1426,7 +1426,7 @@ TRAMP_REAL_BEGIN(kvmppc_skip_interrupt)
addi r13, r13, 4
mtspr SPRN_SRR0, r13
GET_SCRATCH0(r13)
- rfid
+ RFI_TO_KERNEL
b .

TRAMP_REAL_BEGIN(kvmppc_skip_Hinterrupt)
@@ -1438,7 +1438,7 @@ TRAMP_REAL_BEGIN(kvmppc_skip_Hinterrupt)
addi r13, r13, 4
mtspr SPRN_HSRR0, r13
GET_SCRATCH0(r13)
- hrfid
+ HRFI_TO_KERNEL
b .
#endif

--- a/arch/powerpc/kvm/book3s_hv_rmhandlers.S
+++ b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
@@ -78,7 +78,7 @@ _GLOBAL_TOC(kvmppc_hv_entry_trampoline)
mtmsrd r0,1 /* clear RI in MSR */
mtsrr0 r5
mtsrr1 r6
- RFI
+ RFI_TO_KERNEL

kvmppc_call_hv_entry:
ld r4, HSTATE_KVM_VCPU(r13)
@@ -187,7 +187,7 @@ END_FTR_SECTION_IFCLR(CPU_FTR_ARCH_300)
mtmsrd r6, 1 /* Clear RI in MSR */
mtsrr0 r8
mtsrr1 r7
- RFI
+ RFI_TO_KERNEL

/* Virtual-mode return */
.Lvirt_return:
@@ -1131,8 +1131,7 @@ END_FTR_SECTION_IFSET(CPU_FTR_ARCH_300)

ld r0, VCPU_GPR(R0)(r4)
ld r4, VCPU_GPR(R4)(r4)
-
- hrfid
+ HRFI_TO_GUEST
b .

secondary_too_late:
--- a/arch/powerpc/kvm/book3s_rmhandlers.S
+++ b/arch/powerpc/kvm/book3s_rmhandlers.S
@@ -46,6 +46,9 @@

#define FUNC(name) name

+#define RFI_TO_KERNEL RFI
+#define RFI_TO_GUEST RFI
+
.macro INTERRUPT_TRAMPOLINE intno

.global kvmppc_trampoline_\intno
@@ -141,7 +144,7 @@ kvmppc_handler_skip_ins:
GET_SCRATCH0(r13)

/* And get back into the code */
- RFI
+ RFI_TO_KERNEL
#endif

/*
@@ -164,6 +167,6 @@ _GLOBAL_TOC(kvmppc_entry_trampoline)
ori r5, r5, MSR_EE
mtsrr0 r7
mtsrr1 r6
- RFI
+ RFI_TO_KERNEL

#include "book3s_segment.S"
--- a/arch/powerpc/kvm/book3s_segment.S
+++ b/arch/powerpc/kvm/book3s_segment.S
@@ -156,7 +156,7 @@ no_dcbz32_on:
PPC_LL r9, SVCPU_R9(r3)
PPC_LL r3, (SVCPU_R3)(r3)

- RFI
+ RFI_TO_GUEST
kvmppc_handler_trampoline_enter_end:


@@ -407,5 +407,5 @@ END_FTR_SECTION_IFSET(CPU_FTR_HVMODE)
cmpwi r12, BOOK3S_INTERRUPT_DOORBELL
beqa BOOK3S_INTERRUPT_DOORBELL

- RFI
+ RFI_TO_KERNEL
kvmppc_handler_trampoline_exit_end:



2018-01-22 09:15:37

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.14 09/89] powerpc/64: Convert the syscall exit path to use RFI_TO_USER/KERNEL

4.14-stable review patch. If anyone has any objections, please let me know.

------------------

From: Nicholas Piggin <[email protected]>

commit b8e90cb7bc04a509e821e82ab6ed7a8ef11ba333 upstream.

In the syscall exit path we may be returning to user or kernel
context. We already have a test for that, because we conditionally
restore r13. So use that existing test and branch, and bifurcate the
return based on that.

Signed-off-by: Nicholas Piggin <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
arch/powerpc/kernel/entry_64.S | 12 +++++++++++-
1 file changed, 11 insertions(+), 1 deletion(-)

--- a/arch/powerpc/kernel/entry_64.S
+++ b/arch/powerpc/kernel/entry_64.S
@@ -267,13 +267,23 @@ BEGIN_FTR_SECTION
END_FTR_SECTION_IFSET(CPU_FTR_HAS_PPR)

ld r13,GPR13(r1) /* only restore r13 if returning to usermode */
+ ld r2,GPR2(r1)
+ ld r1,GPR1(r1)
+ mtlr r4
+ mtcr r5
+ mtspr SPRN_SRR0,r7
+ mtspr SPRN_SRR1,r8
+ RFI_TO_USER
+ b . /* prevent speculative execution */
+
+ /* exit to kernel */
1: ld r2,GPR2(r1)
ld r1,GPR1(r1)
mtlr r4
mtcr r5
mtspr SPRN_SRR0,r7
mtspr SPRN_SRR1,r8
- RFI
+ RFI_TO_KERNEL
b . /* prevent speculative execution */

.Lsyscall_error:



2018-01-22 09:16:00

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.14 05/89] objtool: Fix seg fault caused by missing parameter

4.14-stable review patch. If anyone has any objections, please let me know.

------------------

From: Simon Ser <[email protected]>

commit d89e426499cf36b96161bd32970d6783f1fbcb0e upstream.

Fix a seg fault when no parameter is provided to 'objtool orc'.

Signed-off-by: Simon Ser <[email protected]>
Signed-off-by: Josh Poimboeuf <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Link: http://lkml.kernel.org/r/9172803ec7ebb72535bcd0b7f966ae96d515968e.1514666459.git.jpoimboe@redhat.com
Signed-off-by: Ingo Molnar <[email protected]>
Cc: Guenter Roeck <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
tools/objtool/builtin-orc.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)

--- a/tools/objtool/builtin-orc.c
+++ b/tools/objtool/builtin-orc.c
@@ -44,6 +44,9 @@ int cmd_orc(int argc, const char **argv)
const char *objname;

argc--; argv++;
+ if (argc <= 0)
+ usage_with_options(orc_usage, check_options);
+
if (!strncmp(argv[0], "gen", 3)) {
argc = parse_options(argc, argv, check_options, orc_usage, 0);
if (argc != 1)
@@ -52,7 +55,6 @@ int cmd_orc(int argc, const char **argv)
objname = argv[0];

return check(objname, no_fp, no_unreachable, true);
-
}

if (!strcmp(argv[0], "dump")) {



2018-01-22 09:16:04

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.14 07/89] powerpc/64: Add macros for annotating the destination of rfid/hrfid

4.14-stable review patch. If anyone has any objections, please let me know.

------------------

From: Nicholas Piggin <[email protected]>

commit 50e51c13b3822d14ff6df4279423e4b7b2269bc3 upstream.

The rfid/hrfid ((Hypervisor) Return From Interrupt) instruction is
used for switching from the kernel to userspace, and from the
hypervisor to the guest kernel. However it can and is also used for
other transitions, eg. from real mode kernel code to virtual mode
kernel code, and it's not always clear from the code what the
destination context is.

To make it clearer when reading the code, add macros which encode the
expected destination context.

Signed-off-by: Nicholas Piggin <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
arch/powerpc/include/asm/exception-64e.h | 6 ++++++
arch/powerpc/include/asm/exception-64s.h | 29 +++++++++++++++++++++++++++++
2 files changed, 35 insertions(+)

--- a/arch/powerpc/include/asm/exception-64e.h
+++ b/arch/powerpc/include/asm/exception-64e.h
@@ -209,5 +209,11 @@ exc_##label##_book3e:
ori r3,r3,vector_offset@l; \
mtspr SPRN_IVOR##vector_number,r3;

+#define RFI_TO_KERNEL \
+ rfi
+
+#define RFI_TO_USER \
+ rfi
+
#endif /* _ASM_POWERPC_EXCEPTION_64E_H */

--- a/arch/powerpc/include/asm/exception-64s.h
+++ b/arch/powerpc/include/asm/exception-64s.h
@@ -69,6 +69,35 @@
*/
#define EX_R3 EX_DAR

+/* Macros for annotating the expected destination of (h)rfid */
+
+#define RFI_TO_KERNEL \
+ rfid
+
+#define RFI_TO_USER \
+ rfid
+
+#define RFI_TO_USER_OR_KERNEL \
+ rfid
+
+#define RFI_TO_GUEST \
+ rfid
+
+#define HRFI_TO_KERNEL \
+ hrfid
+
+#define HRFI_TO_USER \
+ hrfid
+
+#define HRFI_TO_USER_OR_KERNEL \
+ hrfid
+
+#define HRFI_TO_GUEST \
+ hrfid
+
+#define HRFI_TO_UNKNOWN \
+ hrfid
+
#ifdef CONFIG_RELOCATABLE
#define __EXCEPTION_RELON_PROLOG_PSERIES_1(label, h) \
mfspr r11,SPRN_##h##SRR0; /* save SRR0 */ \



2018-01-22 09:16:27

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.14 06/89] powerpc/pseries: Add H_GET_CPU_CHARACTERISTICS flags & wrapper

4.14-stable review patch. If anyone has any objections, please let me know.

------------------

From: Michael Neuling <[email protected]>

commit 191eccb1580939fb0d47deb405b82a85b0379070 upstream.

A new hypervisor call has been defined to communicate various
characteristics of the CPU to guests. Add definitions for the hcall
number, flags and a wrapper function.

Signed-off-by: Michael Neuling <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
arch/powerpc/include/asm/hvcall.h | 17 +++++++++++++++++
arch/powerpc/include/asm/plpar_wrappers.h | 14 ++++++++++++++
2 files changed, 31 insertions(+)

--- a/arch/powerpc/include/asm/hvcall.h
+++ b/arch/powerpc/include/asm/hvcall.h
@@ -241,6 +241,7 @@
#define H_GET_HCA_INFO 0x1B8
#define H_GET_PERF_COUNT 0x1BC
#define H_MANAGE_TRACE 0x1C0
+#define H_GET_CPU_CHARACTERISTICS 0x1C8
#define H_FREE_LOGICAL_LAN_BUFFER 0x1D4
#define H_QUERY_INT_STATE 0x1E4
#define H_POLL_PENDING 0x1D8
@@ -330,6 +331,17 @@
#define H_SIGNAL_SYS_RESET_ALL_OTHERS -2
/* >= 0 values are CPU number */

+/* H_GET_CPU_CHARACTERISTICS return values */
+#define H_CPU_CHAR_SPEC_BAR_ORI31 (1ull << 63) // IBM bit 0
+#define H_CPU_CHAR_BCCTRL_SERIALISED (1ull << 62) // IBM bit 1
+#define H_CPU_CHAR_L1D_FLUSH_ORI30 (1ull << 61) // IBM bit 2
+#define H_CPU_CHAR_L1D_FLUSH_TRIG2 (1ull << 60) // IBM bit 3
+#define H_CPU_CHAR_L1D_THREAD_PRIV (1ull << 59) // IBM bit 4
+
+#define H_CPU_BEHAV_FAVOUR_SECURITY (1ull << 63) // IBM bit 0
+#define H_CPU_BEHAV_L1D_FLUSH_PR (1ull << 62) // IBM bit 1
+#define H_CPU_BEHAV_BNDS_CHK_SPEC_BAR (1ull << 61) // IBM bit 2
+
/* Flag values used in H_REGISTER_PROC_TBL hcall */
#define PROC_TABLE_OP_MASK 0x18
#define PROC_TABLE_DEREG 0x10
@@ -436,6 +448,11 @@ static inline unsigned int get_longbusy_
}
}

+struct h_cpu_char_result {
+ u64 character;
+ u64 behaviour;
+};
+
#endif /* __ASSEMBLY__ */
#endif /* __KERNEL__ */
#endif /* _ASM_POWERPC_HVCALL_H */
--- a/arch/powerpc/include/asm/plpar_wrappers.h
+++ b/arch/powerpc/include/asm/plpar_wrappers.h
@@ -326,4 +326,18 @@ static inline long plapr_signal_sys_rese
return plpar_hcall_norets(H_SIGNAL_SYS_RESET, cpu);
}

+static inline long plpar_get_cpu_characteristics(struct h_cpu_char_result *p)
+{
+ unsigned long retbuf[PLPAR_HCALL_BUFSIZE];
+ long rc;
+
+ rc = plpar_hcall(H_GET_CPU_CHARACTERISTICS, retbuf);
+ if (rc == H_SUCCESS) {
+ p->character = retbuf[0];
+ p->behaviour = retbuf[1];
+ }
+
+ return rc;
+}
+
#endif /* _ASM_POWERPC_PLPAR_WRAPPERS_H */



2018-01-22 09:16:50

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.14 04/89] objtool: Fix Clang enum conversion warning

4.14-stable review patch. If anyone has any objections, please let me know.

------------------

From: Lukas Bulwahn <[email protected]>

commit e7e83dd3ff1dd2f9e60213f6eedc7e5b08192062 upstream.

Fix the following Clang enum conversion warning:

arch/x86/decode.c:141:20: error: implicit conversion from enumeration
type 'enum op_src_type' to different enumeration
type 'enum op_dest_type' [-Werror,-Wenum-conversion]

op->dest.type = OP_SRC_REG;
~ ^~~~~~~~~~

It just happened to work before because OP_SRC_REG and OP_DEST_REG have
the same value.

Signed-off-by: Lukas Bulwahn <[email protected]>
Signed-off-by: Josh Poimboeuf <[email protected]>
Reviewed-by: Nicholas Mc Guire <[email protected]>
Reviewed-by: Nick Desaulniers <[email protected]>
Cc: Jiri Slaby <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Fixes: baa41469a7b9 ("objtool: Implement stack validation 2.0")
Link: http://lkml.kernel.org/r/b4156c5738bae781c392e7a3691aed4514ebbdf2.1514323568.git.jpoimboe@redhat.com
Signed-off-by: Ingo Molnar <[email protected]>
Cc: Guenter Roeck <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
tools/objtool/arch/x86/decode.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

--- a/tools/objtool/arch/x86/decode.c
+++ b/tools/objtool/arch/x86/decode.c
@@ -138,7 +138,7 @@ int arch_decode_instruction(struct elf *
*type = INSN_STACK;
op->src.type = OP_SRC_ADD;
op->src.reg = op_to_cfi_reg[modrm_reg][rex_r];
- op->dest.type = OP_SRC_REG;
+ op->dest.type = OP_DEST_REG;
op->dest.reg = CFI_SP;
}
break;



2018-01-22 09:17:40

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.14 02/89] drm/nouveau/disp/gf119: add missing drive vfunc ptr

4.14-stable review patch. If anyone has any objections, please let me know.

------------------

From: Rob Clark <[email protected]>

commit 1b5c7ef3d0d0610bda9b63263f7c5b7178d11015 upstream.

Fixes broken dp on GF119:

Call Trace:
? nvkm_dp_train_drive+0x183/0x2c0 [nouveau]
nvkm_dp_acquire+0x4f3/0xcd0 [nouveau]
nv50_disp_super_2_2+0x5d/0x470 [nouveau]
? nvkm_devinit_pll_set+0xf/0x20 [nouveau]
gf119_disp_super+0x19c/0x2f0 [nouveau]
process_one_work+0x193/0x3c0
worker_thread+0x35/0x3b0
kthread+0x125/0x140
? process_one_work+0x3c0/0x3c0
? kthread_park+0x60/0x60
ret_from_fork+0x25/0x30
Code: Bad RIP value.
RIP: (null) RSP: ffffb1e243e4bc38
CR2: 0000000000000000

Fixes: af85389c614a drm/nouveau/disp: shuffle functions around
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=103421
Signed-off-by: Rob Clark <[email protected]>
Signed-off-by: Ben Skeggs <[email protected]>
Cc: Sven Joachim <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
drivers/gpu/drm/nouveau/nvkm/engine/disp/sorgf119.c | 1 +
1 file changed, 1 insertion(+)

--- a/drivers/gpu/drm/nouveau/nvkm/engine/disp/sorgf119.c
+++ b/drivers/gpu/drm/nouveau/nvkm/engine/disp/sorgf119.c
@@ -174,6 +174,7 @@ gf119_sor = {
.links = gf119_sor_dp_links,
.power = g94_sor_dp_power,
.pattern = gf119_sor_dp_pattern,
+ .drive = gf119_sor_dp_drive,
.vcpi = gf119_sor_dp_vcpi,
.audio = gf119_sor_dp_audio,
.audio_sym = gf119_sor_dp_audio_sym,



2018-01-22 09:17:50

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.14 03/89] objtool: Fix seg fault with clang-compiled objects

4.14-stable review patch. If anyone has any objections, please let me know.

------------------

From: Simon Ser <[email protected]>

commit ce90aaf5cde4ce057b297bb6c955caf16ef00ee6 upstream.

Fix a seg fault which happens when an input file provided to 'objtool
orc generate' doesn't have a '.shstrtab' section (for instance, object
files produced by clang don't have this section).

Signed-off-by: Simon Ser <[email protected]>
Signed-off-by: Josh Poimboeuf <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Link: http://lkml.kernel.org/r/c0f2231683e9bed40fac1f13ce2c33b8389854bc.1514666459.git.jpoimboe@redhat.com
Signed-off-by: Ingo Molnar <[email protected]>
Cc: Guenter Roeck <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
tools/objtool/orc_gen.c | 2 ++
1 file changed, 2 insertions(+)

--- a/tools/objtool/orc_gen.c
+++ b/tools/objtool/orc_gen.c
@@ -165,6 +165,8 @@ int create_orc_sections(struct objtool_f

/* create .orc_unwind_ip and .rela.orc_unwind_ip sections */
sec = elf_create_section(file->elf, ".orc_unwind_ip", sizeof(int), idx);
+ if (!sec)
+ return -1;

ip_relasec = elf_create_rela_section(file->elf, sec);
if (!ip_relasec)



2018-01-22 09:18:34

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.14 18/89] ALSA: seq: Make ioctls race-free

4.14-stable review patch. If anyone has any objections, please let me know.

------------------

From: Takashi Iwai <[email protected]>

commit b3defb791b26ea0683a93a4f49c77ec45ec96f10 upstream.

The ALSA sequencer ioctls have no protection against racy calls while
the concurrent operations may lead to interfere with each other. As
reported recently, for example, the concurrent calls of setting client
pool with a combination of write calls may lead to either the
unkillable dead-lock or UAF.

As a slightly big hammer solution, this patch introduces the mutex to
make each ioctl exclusive. Although this may reduce performance via
parallel ioctl calls, usually it's not demanded for sequencer usages,
hence it should be negligible.

Reported-by: Luo Quan <[email protected]>
Reviewed-by: Kees Cook <[email protected]>
Reviewed-by: Greg Kroah-Hartman <[email protected]>
Signed-off-by: Takashi Iwai <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
sound/core/seq/seq_clientmgr.c | 3 +++
sound/core/seq/seq_clientmgr.h | 1 +
2 files changed, 4 insertions(+)

--- a/sound/core/seq/seq_clientmgr.c
+++ b/sound/core/seq/seq_clientmgr.c
@@ -221,6 +221,7 @@ static struct snd_seq_client *seq_create
rwlock_init(&client->ports_lock);
mutex_init(&client->ports_mutex);
INIT_LIST_HEAD(&client->ports_list_head);
+ mutex_init(&client->ioctl_mutex);

/* find free slot in the client table */
spin_lock_irqsave(&clients_lock, flags);
@@ -2126,7 +2127,9 @@ static long snd_seq_ioctl(struct file *f
return -EFAULT;
}

+ mutex_lock(&client->ioctl_mutex);
err = handler->func(client, &buf);
+ mutex_unlock(&client->ioctl_mutex);
if (err >= 0) {
/* Some commands includes a bug in 'dir' field. */
if (handler->cmd == SNDRV_SEQ_IOCTL_SET_QUEUE_CLIENT ||
--- a/sound/core/seq/seq_clientmgr.h
+++ b/sound/core/seq/seq_clientmgr.h
@@ -61,6 +61,7 @@ struct snd_seq_client {
struct list_head ports_list_head;
rwlock_t ports_lock;
struct mutex ports_mutex;
+ struct mutex ioctl_mutex;
int convert32; /* convert 32->64bit */

/* output pool */



2018-01-22 09:18:57

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.14 16/89] futex: Avoid violating the 10th rule of futex

4.14-stable review patch. If anyone has any objections, please let me know.

------------------

From: Peter Zijlstra <[email protected]>

commit c1e2f0eaf015fb7076d51a339011f2383e6dd389 upstream.

Julia reported futex state corruption in the following scenario:

waiter waker stealer (prio > waiter)

futex(WAIT_REQUEUE_PI, uaddr, uaddr2,
timeout=[N ms])
futex_wait_requeue_pi()
futex_wait_queue_me()
freezable_schedule()
<scheduled out>
futex(LOCK_PI, uaddr2)
futex(CMP_REQUEUE_PI, uaddr,
uaddr2, 1, 0)
/* requeues waiter to uaddr2 */
futex(UNLOCK_PI, uaddr2)
wake_futex_pi()
cmp_futex_value_locked(uaddr2, waiter)
wake_up_q()
<woken by waker>
<hrtimer_wakeup() fires,
clears sleeper->task>
futex(LOCK_PI, uaddr2)
__rt_mutex_start_proxy_lock()
try_to_take_rt_mutex() /* steals lock */
rt_mutex_set_owner(lock, stealer)
<preempted>
<scheduled in>
rt_mutex_wait_proxy_lock()
__rt_mutex_slowlock()
try_to_take_rt_mutex() /* fails, lock held by stealer */
if (timeout && !timeout->task)
return -ETIMEDOUT;
fixup_owner()
/* lock wasn't acquired, so,
fixup_pi_state_owner skipped */

return -ETIMEDOUT;

/* At this point, we've returned -ETIMEDOUT to userspace, but the
* futex word shows waiter to be the owner, and the pi_mutex has
* stealer as the owner */

futex_lock(LOCK_PI, uaddr2)
-> bails with EDEADLK, futex word says we're owner.

And suggested that what commit:

73d786bd043e ("futex: Rework inconsistent rt_mutex/futex_q state")

removes from fixup_owner() looks to be just what is needed. And indeed
it is -- I completely missed that requeue_pi could also result in this
case. So we need to restore that, except that subsequent patches, like
commit:

16ffa12d7425 ("futex: Pull rt_mutex_futex_unlock() out from under hb->lock")

changed all the locking rules. Even without that, the sequence:

- if (rt_mutex_futex_trylock(&q->pi_state->pi_mutex)) {
- locked = 1;
- goto out;
- }

- raw_spin_lock_irq(&q->pi_state->pi_mutex.wait_lock);
- owner = rt_mutex_owner(&q->pi_state->pi_mutex);
- if (!owner)
- owner = rt_mutex_next_owner(&q->pi_state->pi_mutex);
- raw_spin_unlock_irq(&q->pi_state->pi_mutex.wait_lock);
- ret = fixup_pi_state_owner(uaddr, q, owner);

already suggests there were races; otherwise we'd never have to look
at next_owner.

So instead of doing 3 consecutive wait_lock sections with who knows
what races, we do it all in a single section. Additionally, the usage
of pi_state->owner in fixup_owner() was only safe because only the
rt_mutex owner would modify it, which this additional case wrecks.

Luckily the values can only change away and not to the value we're
testing, this means we can do a speculative test and double check once
we have the wait_lock.

Fixes: 73d786bd043e ("futex: Rework inconsistent rt_mutex/futex_q state")
Reported-by: Julia Cartwright <[email protected]>
Reported-by: Gratian Crisan <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Tested-by: Julia Cartwright <[email protected]>
Tested-by: Gratian Crisan <[email protected]>
Cc: Darren Hart <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
kernel/futex.c | 83 ++++++++++++++++++++++++++++++++--------
kernel/locking/rtmutex.c | 26 +++++++++---
kernel/locking/rtmutex_common.h | 1
3 files changed, 87 insertions(+), 23 deletions(-)

--- a/kernel/futex.c
+++ b/kernel/futex.c
@@ -2294,21 +2294,17 @@ static void unqueue_me_pi(struct futex_q
spin_unlock(q->lock_ptr);
}

-/*
- * Fixup the pi_state owner with the new owner.
- *
- * Must be called with hash bucket lock held and mm->sem held for non
- * private futexes.
- */
static int fixup_pi_state_owner(u32 __user *uaddr, struct futex_q *q,
- struct task_struct *newowner)
+ struct task_struct *argowner)
{
- u32 newtid = task_pid_vnr(newowner) | FUTEX_WAITERS;
struct futex_pi_state *pi_state = q->pi_state;
u32 uval, uninitialized_var(curval), newval;
- struct task_struct *oldowner;
+ struct task_struct *oldowner, *newowner;
+ u32 newtid;
int ret;

+ lockdep_assert_held(q->lock_ptr);
+
raw_spin_lock_irq(&pi_state->pi_mutex.wait_lock);

oldowner = pi_state->owner;
@@ -2317,11 +2313,17 @@ static int fixup_pi_state_owner(u32 __us
newtid |= FUTEX_OWNER_DIED;

/*
- * We are here either because we stole the rtmutex from the
- * previous highest priority waiter or we are the highest priority
- * waiter but have failed to get the rtmutex the first time.
+ * We are here because either:
+ *
+ * - we stole the lock and pi_state->owner needs updating to reflect
+ * that (@argowner == current),
+ *
+ * or:
*
- * We have to replace the newowner TID in the user space variable.
+ * - someone stole our lock and we need to fix things to point to the
+ * new owner (@argowner == NULL).
+ *
+ * Either way, we have to replace the TID in the user space variable.
* This must be atomic as we have to preserve the owner died bit here.
*
* Note: We write the user space value _before_ changing the pi_state
@@ -2334,6 +2336,42 @@ static int fixup_pi_state_owner(u32 __us
* in the PID check in lookup_pi_state.
*/
retry:
+ if (!argowner) {
+ if (oldowner != current) {
+ /*
+ * We raced against a concurrent self; things are
+ * already fixed up. Nothing to do.
+ */
+ ret = 0;
+ goto out_unlock;
+ }
+
+ if (__rt_mutex_futex_trylock(&pi_state->pi_mutex)) {
+ /* We got the lock after all, nothing to fix. */
+ ret = 0;
+ goto out_unlock;
+ }
+
+ /*
+ * Since we just failed the trylock; there must be an owner.
+ */
+ newowner = rt_mutex_owner(&pi_state->pi_mutex);
+ BUG_ON(!newowner);
+ } else {
+ WARN_ON_ONCE(argowner != current);
+ if (oldowner == current) {
+ /*
+ * We raced against a concurrent self; things are
+ * already fixed up. Nothing to do.
+ */
+ ret = 0;
+ goto out_unlock;
+ }
+ newowner = argowner;
+ }
+
+ newtid = task_pid_vnr(newowner) | FUTEX_WAITERS;
+
if (get_futex_value_locked(&uval, uaddr))
goto handle_fault;

@@ -2434,15 +2472,28 @@ static int fixup_owner(u32 __user *uaddr
* Got the lock. We might not be the anticipated owner if we
* did a lock-steal - fix up the PI-state in that case:
*
- * We can safely read pi_state->owner without holding wait_lock
- * because we now own the rt_mutex, only the owner will attempt
- * to change it.
+ * Speculative pi_state->owner read (we don't hold wait_lock);
+ * since we own the lock pi_state->owner == current is the
+ * stable state, anything else needs more attention.
*/
if (q->pi_state->owner != current)
ret = fixup_pi_state_owner(uaddr, q, current);
goto out;
}

+ /*
+ * If we didn't get the lock; check if anybody stole it from us. In
+ * that case, we need to fix up the uval to point to them instead of
+ * us, otherwise bad things happen. [10]
+ *
+ * Another speculative read; pi_state->owner == current is unstable
+ * but needs our attention.
+ */
+ if (q->pi_state->owner == current) {
+ ret = fixup_pi_state_owner(uaddr, q, NULL);
+ goto out;
+ }
+
/*
* Paranoia check. If we did not take the lock, then we should not be
* the owner of the rt_mutex.
--- a/kernel/locking/rtmutex.c
+++ b/kernel/locking/rtmutex.c
@@ -1290,6 +1290,19 @@ rt_mutex_slowlock(struct rt_mutex *lock,
return ret;
}

+static inline int __rt_mutex_slowtrylock(struct rt_mutex *lock)
+{
+ int ret = try_to_take_rt_mutex(lock, current, NULL);
+
+ /*
+ * try_to_take_rt_mutex() sets the lock waiters bit
+ * unconditionally. Clean this up.
+ */
+ fixup_rt_mutex_waiters(lock);
+
+ return ret;
+}
+
/*
* Slow path try-lock function:
*/
@@ -1312,13 +1325,7 @@ static inline int rt_mutex_slowtrylock(s
*/
raw_spin_lock_irqsave(&lock->wait_lock, flags);

- ret = try_to_take_rt_mutex(lock, current, NULL);
-
- /*
- * try_to_take_rt_mutex() sets the lock waiters bit
- * unconditionally. Clean this up.
- */
- fixup_rt_mutex_waiters(lock);
+ ret = __rt_mutex_slowtrylock(lock);

raw_spin_unlock_irqrestore(&lock->wait_lock, flags);

@@ -1505,6 +1512,11 @@ int __sched rt_mutex_futex_trylock(struc
return rt_mutex_slowtrylock(lock);
}

+int __sched __rt_mutex_futex_trylock(struct rt_mutex *lock)
+{
+ return __rt_mutex_slowtrylock(lock);
+}
+
/**
* rt_mutex_timed_lock - lock a rt_mutex interruptible
* the timeout structure is provided
--- a/kernel/locking/rtmutex_common.h
+++ b/kernel/locking/rtmutex_common.h
@@ -148,6 +148,7 @@ extern bool rt_mutex_cleanup_proxy_lock(
struct rt_mutex_waiter *waiter);

extern int rt_mutex_futex_trylock(struct rt_mutex *l);
+extern int __rt_mutex_futex_trylock(struct rt_mutex *l);

extern void rt_mutex_futex_unlock(struct rt_mutex *lock);
extern bool __rt_mutex_futex_unlock(struct rt_mutex *lock,



2018-01-22 09:19:31

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.14 14/89] powerpc/pseries: Query hypervisor for RFI flush settings

4.14-stable review patch. If anyone has any objections, please let me know.

------------------

From: Michael Neuling <[email protected]>

commit 8989d56878a7735dfdb234707a2fee6faf631085 upstream.

A new hypervisor call is available which tells the guest settings
related to the RFI flush. Use it to query the appropriate flush
instruction(s), and whether the flush is required.

Signed-off-by: Michael Neuling <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
arch/powerpc/platforms/pseries/setup.c | 35 +++++++++++++++++++++++++++++++++
1 file changed, 35 insertions(+)

--- a/arch/powerpc/platforms/pseries/setup.c
+++ b/arch/powerpc/platforms/pseries/setup.c
@@ -459,6 +459,39 @@ static void __init find_and_init_phbs(vo
of_pci_check_probe_only();
}

+static void pseries_setup_rfi_flush(void)
+{
+ struct h_cpu_char_result result;
+ enum l1d_flush_type types;
+ bool enable;
+ long rc;
+
+ /* Enable by default */
+ enable = true;
+
+ rc = plpar_get_cpu_characteristics(&result);
+ if (rc == H_SUCCESS) {
+ types = L1D_FLUSH_NONE;
+
+ if (result.character & H_CPU_CHAR_L1D_FLUSH_TRIG2)
+ types |= L1D_FLUSH_MTTRIG;
+ if (result.character & H_CPU_CHAR_L1D_FLUSH_ORI30)
+ types |= L1D_FLUSH_ORI;
+
+ /* Use fallback if nothing set in hcall */
+ if (types == L1D_FLUSH_NONE)
+ types = L1D_FLUSH_FALLBACK;
+
+ if (!(result.behaviour & H_CPU_BEHAV_L1D_FLUSH_PR))
+ enable = false;
+ } else {
+ /* Default to fallback if case hcall is not available */
+ types = L1D_FLUSH_FALLBACK;
+ }
+
+ setup_rfi_flush(types, enable);
+}
+
static void __init pSeries_setup_arch(void)
{
set_arch_panic_timeout(10, ARCH_PANIC_TIMEOUT);
@@ -476,6 +509,8 @@ static void __init pSeries_setup_arch(vo

fwnmi_init();

+ pseries_setup_rfi_flush();
+
/* By default, only probe PCI (can be overridden by rtas_pci) */
pci_add_flags(PCI_PROBE_ONLY);




2018-01-22 09:19:51

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.14 15/89] powerpc/powernv: Check device-tree for RFI flush settings

4.14-stable review patch. If anyone has any objections, please let me know.

------------------

From: Oliver O'Halloran <[email protected]>

commit 6e032b350cd1fdb830f18f8320ef0e13b4e24094 upstream.

New device-tree properties are available which tell the hypervisor
settings related to the RFI flush. Use them to determine the
appropriate flush instruction to use, and whether the flush is
required.

Signed-off-by: Oliver O'Halloran <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
arch/powerpc/platforms/powernv/setup.c | 49 +++++++++++++++++++++++++++++++++
1 file changed, 49 insertions(+)

--- a/arch/powerpc/platforms/powernv/setup.c
+++ b/arch/powerpc/platforms/powernv/setup.c
@@ -36,13 +36,62 @@
#include <asm/opal.h>
#include <asm/kexec.h>
#include <asm/smp.h>
+#include <asm/setup.h>

#include "powernv.h"

+static void pnv_setup_rfi_flush(void)
+{
+ struct device_node *np, *fw_features;
+ enum l1d_flush_type type;
+ int enable;
+
+ /* Default to fallback in case fw-features are not available */
+ type = L1D_FLUSH_FALLBACK;
+ enable = 1;
+
+ np = of_find_node_by_name(NULL, "ibm,opal");
+ fw_features = of_get_child_by_name(np, "fw-features");
+ of_node_put(np);
+
+ if (fw_features) {
+ np = of_get_child_by_name(fw_features, "inst-l1d-flush-trig2");
+ if (np && of_property_read_bool(np, "enabled"))
+ type = L1D_FLUSH_MTTRIG;
+
+ of_node_put(np);
+
+ np = of_get_child_by_name(fw_features, "inst-l1d-flush-ori30,30,0");
+ if (np && of_property_read_bool(np, "enabled"))
+ type = L1D_FLUSH_ORI;
+
+ of_node_put(np);
+
+ /* Enable unless firmware says NOT to */
+ enable = 2;
+ np = of_get_child_by_name(fw_features, "needs-l1d-flush-msr-hv-1-to-0");
+ if (np && of_property_read_bool(np, "disabled"))
+ enable--;
+
+ of_node_put(np);
+
+ np = of_get_child_by_name(fw_features, "needs-l1d-flush-msr-pr-0-to-1");
+ if (np && of_property_read_bool(np, "disabled"))
+ enable--;
+
+ of_node_put(np);
+ of_node_put(fw_features);
+ }
+
+ setup_rfi_flush(type, enable > 0);
+}
+
static void __init pnv_setup_arch(void)
{
set_arch_panic_timeout(10, ARCH_PANIC_TIMEOUT);

+ pnv_setup_rfi_flush();
+
/* Initialize SMP */
pnv_smp_init();




2018-01-22 09:19:54

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.14 12/89] powerpc/64s: Add support for RFI flush of L1-D cache

4.14-stable review patch. If anyone has any objections, please let me know.

------------------

From: Michael Ellerman <[email protected]>

commit aa8a5e0062ac940f7659394f4817c948dc8c0667 upstream.

On some CPUs we can prevent the Meltdown vulnerability by flushing the
L1-D cache on exit from kernel to user mode, and from hypervisor to
guest.

This is known to be the case on at least Power7, Power8 and Power9. At
this time we do not know the status of the vulnerability on other CPUs
such as the 970 (Apple G5), pasemi CPUs (AmigaOne X1000) or Freescale
CPUs. As more information comes to light we can enable this, or other
mechanisms on those CPUs.

The vulnerability occurs when the load of an architecturally
inaccessible memory region (eg. userspace load of kernel memory) is
speculatively executed to the point where its result can influence the
address of a subsequent speculatively executed load.

In order for that to happen, the first load must hit in the L1,
because before the load is sent to the L2 the permission check is
performed. Therefore if no kernel addresses hit in the L1 the
vulnerability can not occur. We can ensure that is the case by
flushing the L1 whenever we return to userspace. Similarly for
hypervisor vs guest.

In order to flush the L1-D cache on exit, we add a section of nops at
each (h)rfi location that returns to a lower privileged context, and
patch that with some sequence. Newer firmwares are able to advertise
to us that there is a special nop instruction that flushes the L1-D.
If we do not see that advertised, we fall back to doing a displacement
flush in software.

For guest kernels we support migration between some CPU versions, and
different CPUs may use different flush instructions. So that we are
prepared to migrate to a machine with a different flush instruction
activated, we may have to patch more than one flush instruction at
boot if the hypervisor tells us to.

In the end this patch is mostly the work of Nicholas Piggin and
Michael Ellerman. However a cast of thousands contributed to analysis
of the issue, earlier versions of the patch, back ports testing etc.
Many thanks to all of them.

Tested-by: Jon Masters <[email protected]>
Signed-off-by: Nicholas Piggin <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
arch/powerpc/include/asm/exception-64s.h | 40 +++++++++++---
arch/powerpc/include/asm/feature-fixups.h | 13 ++++
arch/powerpc/include/asm/paca.h | 10 +++
arch/powerpc/include/asm/setup.h | 13 ++++
arch/powerpc/kernel/asm-offsets.c | 5 +
arch/powerpc/kernel/exceptions-64s.S | 84 ++++++++++++++++++++++++++++++
arch/powerpc/kernel/setup_64.c | 79 ++++++++++++++++++++++++++++
arch/powerpc/kernel/vmlinux.lds.S | 9 +++
arch/powerpc/lib/feature-fixups.c | 41 ++++++++++++++
9 files changed, 286 insertions(+), 8 deletions(-)

--- a/arch/powerpc/include/asm/exception-64s.h
+++ b/arch/powerpc/include/asm/exception-64s.h
@@ -69,34 +69,58 @@
*/
#define EX_R3 EX_DAR

-/* Macros for annotating the expected destination of (h)rfid */
+/*
+ * Macros for annotating the expected destination of (h)rfid
+ *
+ * The nop instructions allow us to insert one or more instructions to flush the
+ * L1-D cache when returning to userspace or a guest.
+ */
+#define RFI_FLUSH_SLOT \
+ RFI_FLUSH_FIXUP_SECTION; \
+ nop; \
+ nop; \
+ nop

#define RFI_TO_KERNEL \
rfid

#define RFI_TO_USER \
- rfid
+ RFI_FLUSH_SLOT; \
+ rfid; \
+ b rfi_flush_fallback

#define RFI_TO_USER_OR_KERNEL \
- rfid
+ RFI_FLUSH_SLOT; \
+ rfid; \
+ b rfi_flush_fallback

#define RFI_TO_GUEST \
- rfid
+ RFI_FLUSH_SLOT; \
+ rfid; \
+ b rfi_flush_fallback

#define HRFI_TO_KERNEL \
hrfid

#define HRFI_TO_USER \
- hrfid
+ RFI_FLUSH_SLOT; \
+ hrfid; \
+ b hrfi_flush_fallback

#define HRFI_TO_USER_OR_KERNEL \
- hrfid
+ RFI_FLUSH_SLOT; \
+ hrfid; \
+ b hrfi_flush_fallback

#define HRFI_TO_GUEST \
- hrfid
+ RFI_FLUSH_SLOT; \
+ hrfid; \
+ b hrfi_flush_fallback

#define HRFI_TO_UNKNOWN \
- hrfid
+ RFI_FLUSH_SLOT; \
+ hrfid; \
+ b hrfi_flush_fallback

#ifdef CONFIG_RELOCATABLE
#define __EXCEPTION_RELON_PROLOG_PSERIES_1(label, h) \
--- a/arch/powerpc/include/asm/feature-fixups.h
+++ b/arch/powerpc/include/asm/feature-fixups.h
@@ -187,7 +187,20 @@ label##3: \
FTR_ENTRY_OFFSET label##1b-label##3b; \
.popsection;

+#define RFI_FLUSH_FIXUP_SECTION \
+951: \
+ .pushsection __rfi_flush_fixup,"a"; \
+ .align 2; \
+952: \
+ FTR_ENTRY_OFFSET 951b-952b; \
+ .popsection;
+
+
#ifndef __ASSEMBLY__
+#include <linux/types.h>
+
+extern long __start___rfi_flush_fixup, __stop___rfi_flush_fixup;
+
void apply_feature_fixups(void);
void setup_feature_keys(void);
#endif
--- a/arch/powerpc/include/asm/paca.h
+++ b/arch/powerpc/include/asm/paca.h
@@ -231,6 +231,16 @@ struct paca_struct {
struct sibling_subcore_state *sibling_subcore_state;
#endif
#endif
+#ifdef CONFIG_PPC_BOOK3S_64
+ /*
+ * rfi fallback flush must be in its own cacheline to prevent
+ * other paca data leaking into the L1d
+ */
+ u64 exrfi[EX_SIZE] __aligned(0x80);
+ void *rfi_flush_fallback_area;
+ u64 l1d_flush_congruence;
+ u64 l1d_flush_sets;
+#endif
};

extern void copy_mm_to_paca(struct mm_struct *mm);
--- a/arch/powerpc/include/asm/setup.h
+++ b/arch/powerpc/include/asm/setup.h
@@ -39,6 +39,19 @@ static inline void pseries_big_endian_ex
static inline void pseries_little_endian_exceptions(void) {}
#endif /* CONFIG_PPC_PSERIES */

+void rfi_flush_enable(bool enable);
+
+/* These are bit flags */
+enum l1d_flush_type {
+ L1D_FLUSH_NONE = 0x1,
+ L1D_FLUSH_FALLBACK = 0x2,
+ L1D_FLUSH_ORI = 0x4,
+ L1D_FLUSH_MTTRIG = 0x8,
+};
+
+void __init setup_rfi_flush(enum l1d_flush_type, bool enable);
+void do_rfi_flush_fixups(enum l1d_flush_type types);
+
#endif /* !__ASSEMBLY__ */

#endif /* _ASM_POWERPC_SETUP_H */
--- a/arch/powerpc/kernel/asm-offsets.c
+++ b/arch/powerpc/kernel/asm-offsets.c
@@ -237,6 +237,11 @@ int main(void)
OFFSET(PACA_NMI_EMERG_SP, paca_struct, nmi_emergency_sp);
OFFSET(PACA_IN_MCE, paca_struct, in_mce);
OFFSET(PACA_IN_NMI, paca_struct, in_nmi);
+ OFFSET(PACA_RFI_FLUSH_FALLBACK_AREA, paca_struct, rfi_flush_fallback_area);
+ OFFSET(PACA_EXRFI, paca_struct, exrfi);
+ OFFSET(PACA_L1D_FLUSH_CONGRUENCE, paca_struct, l1d_flush_congruence);
+ OFFSET(PACA_L1D_FLUSH_SETS, paca_struct, l1d_flush_sets);
+
#endif
OFFSET(PACAHWCPUID, paca_struct, hw_cpu_id);
OFFSET(PACAKEXECSTATE, paca_struct, kexec_state);
--- a/arch/powerpc/kernel/exceptions-64s.S
+++ b/arch/powerpc/kernel/exceptions-64s.S
@@ -1434,6 +1434,90 @@ masked_##_H##interrupt: \
b .; \
MASKED_DEC_HANDLER(_H)

+TRAMP_REAL_BEGIN(rfi_flush_fallback)
+ SET_SCRATCH0(r13);
+ GET_PACA(r13);
+ std r9,PACA_EXRFI+EX_R9(r13)
+ std r10,PACA_EXRFI+EX_R10(r13)
+ std r11,PACA_EXRFI+EX_R11(r13)
+ std r12,PACA_EXRFI+EX_R12(r13)
+ std r8,PACA_EXRFI+EX_R13(r13)
+ mfctr r9
+ ld r10,PACA_RFI_FLUSH_FALLBACK_AREA(r13)
+ ld r11,PACA_L1D_FLUSH_SETS(r13)
+ ld r12,PACA_L1D_FLUSH_CONGRUENCE(r13)
+ /*
+ * The load adresses are at staggered offsets within cachelines,
+ * which suits some pipelines better (on others it should not
+ * hurt).
+ */
+ addi r12,r12,8
+ mtctr r11
+ DCBT_STOP_ALL_STREAM_IDS(r11) /* Stop prefetch streams */
+
+ /* order ld/st prior to dcbt stop all streams with flushing */
+ sync
+1: li r8,0
+ .rept 8 /* 8-way set associative */
+ ldx r11,r10,r8
+ add r8,r8,r12
+ xor r11,r11,r11 // Ensure r11 is 0 even if fallback area is not
+ add r8,r8,r11 // Add 0, this creates a dependency on the ldx
+ .endr
+ addi r10,r10,128 /* 128 byte cache line */
+ bdnz 1b
+
+ mtctr r9
+ ld r9,PACA_EXRFI+EX_R9(r13)
+ ld r10,PACA_EXRFI+EX_R10(r13)
+ ld r11,PACA_EXRFI+EX_R11(r13)
+ ld r12,PACA_EXRFI+EX_R12(r13)
+ ld r8,PACA_EXRFI+EX_R13(r13)
+ GET_SCRATCH0(r13);
+ rfid
+
+TRAMP_REAL_BEGIN(hrfi_flush_fallback)
+ SET_SCRATCH0(r13);
+ GET_PACA(r13);
+ std r9,PACA_EXRFI+EX_R9(r13)
+ std r10,PACA_EXRFI+EX_R10(r13)
+ std r11,PACA_EXRFI+EX_R11(r13)
+ std r12,PACA_EXRFI+EX_R12(r13)
+ std r8,PACA_EXRFI+EX_R13(r13)
+ mfctr r9
+ ld r10,PACA_RFI_FLUSH_FALLBACK_AREA(r13)
+ ld r11,PACA_L1D_FLUSH_SETS(r13)
+ ld r12,PACA_L1D_FLUSH_CONGRUENCE(r13)
+ /*
+ * The load adresses are at staggered offsets within cachelines,
+ * which suits some pipelines better (on others it should not
+ * hurt).
+ */
+ addi r12,r12,8
+ mtctr r11
+ DCBT_STOP_ALL_STREAM_IDS(r11) /* Stop prefetch streams */
+
+ /* order ld/st prior to dcbt stop all streams with flushing */
+ sync
+1: li r8,0
+ .rept 8 /* 8-way set associative */
+ ldx r11,r10,r8
+ add r8,r8,r12
+ xor r11,r11,r11 // Ensure r11 is 0 even if fallback area is not
+ add r8,r8,r11 // Add 0, this creates a dependency on the ldx
+ .endr
+ addi r10,r10,128 /* 128 byte cache line */
+ bdnz 1b
+
+ mtctr r9
+ ld r9,PACA_EXRFI+EX_R9(r13)
+ ld r10,PACA_EXRFI+EX_R10(r13)
+ ld r11,PACA_EXRFI+EX_R11(r13)
+ ld r12,PACA_EXRFI+EX_R12(r13)
+ ld r8,PACA_EXRFI+EX_R13(r13)
+ GET_SCRATCH0(r13);
+ hrfid
+
/*
* Real mode exceptions actually use this too, but alternate
* instruction code patches (which end up in the common .text area)
--- a/arch/powerpc/kernel/setup_64.c
+++ b/arch/powerpc/kernel/setup_64.c
@@ -784,3 +784,82 @@ static int __init disable_hardlockup_det
return 0;
}
early_initcall(disable_hardlockup_detector);
+
+#ifdef CONFIG_PPC_BOOK3S_64
+static enum l1d_flush_type enabled_flush_types;
+static void *l1d_flush_fallback_area;
+bool rfi_flush;
+
+static void do_nothing(void *unused)
+{
+ /*
+ * We don't need to do the flush explicitly, just enter+exit kernel is
+ * sufficient, the RFI exit handlers will do the right thing.
+ */
+}
+
+void rfi_flush_enable(bool enable)
+{
+ if (rfi_flush == enable)
+ return;
+
+ if (enable) {
+ do_rfi_flush_fixups(enabled_flush_types);
+ on_each_cpu(do_nothing, NULL, 1);
+ } else
+ do_rfi_flush_fixups(L1D_FLUSH_NONE);
+
+ rfi_flush = enable;
+}
+
+static void init_fallback_flush(void)
+{
+ u64 l1d_size, limit;
+ int cpu;
+
+ l1d_size = ppc64_caches.l1d.size;
+ limit = min(safe_stack_limit(), ppc64_rma_size);
+
+ /*
+ * Align to L1d size, and size it at 2x L1d size, to catch possible
+ * hardware prefetch runoff. We don't have a recipe for load patterns to
+ * reliably avoid the prefetcher.
+ */
+ l1d_flush_fallback_area = __va(memblock_alloc_base(l1d_size * 2, l1d_size, limit));
+ memset(l1d_flush_fallback_area, 0, l1d_size * 2);
+
+ for_each_possible_cpu(cpu) {
+ /*
+ * The fallback flush is currently coded for 8-way
+ * associativity. Different associativity is possible, but it
+ * will be treated as 8-way and may not evict the lines as
+ * effectively.
+ *
+ * 128 byte lines are mandatory.
+ */
+ u64 c = l1d_size / 8;
+
+ paca[cpu].rfi_flush_fallback_area = l1d_flush_fallback_area;
+ paca[cpu].l1d_flush_congruence = c;
+ paca[cpu].l1d_flush_sets = c / 128;
+ }
+}
+
+void __init setup_rfi_flush(enum l1d_flush_type types, bool enable)
+{
+ if (types & L1D_FLUSH_FALLBACK) {
+ pr_info("rfi-flush: Using fallback displacement flush\n");
+ init_fallback_flush();
+ }
+
+ if (types & L1D_FLUSH_ORI)
+ pr_info("rfi-flush: Using ori type flush\n");
+
+ if (types & L1D_FLUSH_MTTRIG)
+ pr_info("rfi-flush: Using mttrig type flush\n");
+
+ enabled_flush_types = types;
+
+ rfi_flush_enable(enable);
+}
+#endif /* CONFIG_PPC_BOOK3S_64 */
--- a/arch/powerpc/kernel/vmlinux.lds.S
+++ b/arch/powerpc/kernel/vmlinux.lds.S
@@ -132,6 +132,15 @@ SECTIONS
/* Read-only data */
RO_DATA(PAGE_SIZE)

+#ifdef CONFIG_PPC64
+ . = ALIGN(8);
+ __rfi_flush_fixup : AT(ADDR(__rfi_flush_fixup) - LOAD_OFFSET) {
+ __start___rfi_flush_fixup = .;
+ *(__rfi_flush_fixup)
+ __stop___rfi_flush_fixup = .;
+ }
+#endif
+
EXCEPTION_TABLE(0)

NOTES :kernel :notes
--- a/arch/powerpc/lib/feature-fixups.c
+++ b/arch/powerpc/lib/feature-fixups.c
@@ -116,6 +116,47 @@ void do_feature_fixups(unsigned long val
}
}

+#ifdef CONFIG_PPC_BOOK3S_64
+void do_rfi_flush_fixups(enum l1d_flush_type types)
+{
+ unsigned int instrs[3], *dest;
+ long *start, *end;
+ int i;
+
+ start = PTRRELOC(&__start___rfi_flush_fixup),
+ end = PTRRELOC(&__stop___rfi_flush_fixup);
+
+ instrs[0] = 0x60000000; /* nop */
+ instrs[1] = 0x60000000; /* nop */
+ instrs[2] = 0x60000000; /* nop */
+
+ if (types & L1D_FLUSH_FALLBACK)
+ /* b .+16 to fallback flush */
+ instrs[0] = 0x48000010;
+
+ i = 0;
+ if (types & L1D_FLUSH_ORI) {
+ instrs[i++] = 0x63ff0000; /* ori 31,31,0 speculation barrier */
+ instrs[i++] = 0x63de0000; /* ori 30,30,0 L1d flush*/
+ }
+
+ if (types & L1D_FLUSH_MTTRIG)
+ instrs[i++] = 0x7c12dba6; /* mtspr TRIG2,r0 (SPR #882) */
+
+ for (i = 0; start < end; start++, i++) {
+ dest = (void *)start + *start;
+
+ pr_devel("patching dest %lx\n", (unsigned long)dest);
+
+ patch_instruction(dest, instrs[0]);
+ patch_instruction(dest + 1, instrs[1]);
+ patch_instruction(dest + 2, instrs[2]);
+ }
+
+ printk(KERN_DEBUG "rfi-flush: patched %d locations\n", i);
+}
+#endif /* CONFIG_PPC_BOOK3S_64 */
+
void do_lwsync_fixups(unsigned long value, void *fixup_start, void *fixup_end)
{
long *start, *end;



2018-01-22 09:20:05

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.14 10/89] powerpc/64: Convert fast_exception_return to use RFI_TO_USER/KERNEL

4.14-stable review patch. If anyone has any objections, please let me know.

------------------

From: Nicholas Piggin <[email protected]>

commit a08f828cf47e6c605af21d2cdec68f84e799c318 upstream.

Similar to the syscall return path, in fast_exception_return we may be
returning to user or kernel context. We already have a test for that,
because we conditionally restore r13. So use that existing test and
branch, and bifurcate the return based on that.

Signed-off-by: Nicholas Piggin <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
arch/powerpc/kernel/entry_64.S | 18 ++++++++++++++++--
1 file changed, 16 insertions(+), 2 deletions(-)

--- a/arch/powerpc/kernel/entry_64.S
+++ b/arch/powerpc/kernel/entry_64.S
@@ -892,7 +892,7 @@ BEGIN_FTR_SECTION
END_FTR_SECTION_IFSET(CPU_FTR_HAS_PPR)
ACCOUNT_CPU_USER_EXIT(r13, r2, r4)
REST_GPR(13, r1)
-1:
+
mtspr SPRN_SRR1,r3

ld r2,_CCR(r1)
@@ -905,8 +905,22 @@ END_FTR_SECTION_IFSET(CPU_FTR_HAS_PPR)
ld r3,GPR3(r1)
ld r4,GPR4(r1)
ld r1,GPR1(r1)
+ RFI_TO_USER
+ b . /* prevent speculative execution */

- rfid
+1: mtspr SPRN_SRR1,r3
+
+ ld r2,_CCR(r1)
+ mtcrf 0xFF,r2
+ ld r2,_NIP(r1)
+ mtspr SPRN_SRR0,r2
+
+ ld r0,GPR0(r1)
+ ld r2,GPR2(r1)
+ ld r3,GPR3(r1)
+ ld r4,GPR4(r1)
+ ld r1,GPR1(r1)
+ RFI_TO_KERNEL
b . /* prevent speculative execution */

#endif /* CONFIG_PPC_BOOK3E */



2018-01-22 09:20:05

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.14 17/89] futex: Prevent overflow by strengthen input validation

4.14-stable review patch. If anyone has any objections, please let me know.

------------------

From: Li Jinyue <[email protected]>

commit fbe0e839d1e22d88810f3ee3e2f1479be4c0aa4a upstream.

UBSAN reports signed integer overflow in kernel/futex.c:

UBSAN: Undefined behaviour in kernel/futex.c:2041:18
signed integer overflow:
0 - -2147483648 cannot be represented in type 'int'

Add a sanity check to catch negative values of nr_wake and nr_requeue.

Signed-off-by: Li Jinyue <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Cc: [email protected]
Cc: [email protected]
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
kernel/futex.c | 3 +++
1 file changed, 3 insertions(+)

--- a/kernel/futex.c
+++ b/kernel/futex.c
@@ -1878,6 +1878,9 @@ static int futex_requeue(u32 __user *uad
struct futex_q *this, *next;
DEFINE_WAKE_Q(wake_q);

+ if (nr_wake < 0 || nr_requeue < 0)
+ return -EINVAL;
+
/*
* When PI not supported: return -ENOSYS if requeue_pi is true,
* consequently the compiler knows requeue_pi is always false past



2018-01-22 09:20:39

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.14 13/89] powerpc/64s: Support disabling RFI flush with no_rfi_flush and nopti

4.14-stable review patch. If anyone has any objections, please let me know.

------------------

From: Michael Ellerman <[email protected]>

commit bc9c9304a45480797e13a8e1df96ffcf44fb62fe upstream.

Because there may be some performance overhead of the RFI flush, add
kernel command line options to disable it.

We add a sensibly named 'no_rfi_flush' option, but we also hijack the
x86 option 'nopti'. The RFI flush is not the same as KPTI, but if we
see 'nopti' we can guess that the user is trying to avoid any overhead
of Meltdown mitigations, and it means we don't have to educate every
one about a different command line option.

Signed-off-by: Michael Ellerman <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
arch/powerpc/kernel/setup_64.c | 24 +++++++++++++++++++++++-
1 file changed, 23 insertions(+), 1 deletion(-)

--- a/arch/powerpc/kernel/setup_64.c
+++ b/arch/powerpc/kernel/setup_64.c
@@ -788,8 +788,29 @@ early_initcall(disable_hardlockup_detect
#ifdef CONFIG_PPC_BOOK3S_64
static enum l1d_flush_type enabled_flush_types;
static void *l1d_flush_fallback_area;
+static bool no_rfi_flush;
bool rfi_flush;

+static int __init handle_no_rfi_flush(char *p)
+{
+ pr_info("rfi-flush: disabled on command line.");
+ no_rfi_flush = true;
+ return 0;
+}
+early_param("no_rfi_flush", handle_no_rfi_flush);
+
+/*
+ * The RFI flush is not KPTI, but because users will see doco that says to use
+ * nopti we hijack that option here to also disable the RFI flush.
+ */
+static int __init handle_no_pti(char *p)
+{
+ pr_info("rfi-flush: disabling due to 'nopti' on command line.\n");
+ handle_no_rfi_flush(NULL);
+ return 0;
+}
+early_param("nopti", handle_no_pti);
+
static void do_nothing(void *unused)
{
/*
@@ -860,6 +881,7 @@ void __init setup_rfi_flush(enum l1d_flu

enabled_flush_types = types;

- rfi_flush_enable(enable);
+ if (!no_rfi_flush)
+ rfi_flush_enable(enable);
}
#endif /* CONFIG_PPC_BOOK3S_64 */



2018-01-22 09:21:19

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.14 11/89] powerpc/64s: Convert slb_miss_common to use RFI_TO_USER/KERNEL

4.14-stable review patch. If anyone has any objections, please let me know.

------------------

From: Nicholas Piggin <[email protected]>

commit c7305645eb0c1621351cfc104038831ae87c0053 upstream.

In the SLB miss handler we may be returning to user or kernel. We need
to add a check early on and save the result in the cr4 register, and
then we bifurcate the return path based on that.

Signed-off-by: Nicholas Piggin <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
arch/powerpc/kernel/exceptions-64s.S | 29 ++++++++++++++++++++++++++++-
1 file changed, 28 insertions(+), 1 deletion(-)

--- a/arch/powerpc/kernel/exceptions-64s.S
+++ b/arch/powerpc/kernel/exceptions-64s.S
@@ -596,6 +596,9 @@ EXC_COMMON_BEGIN(slb_miss_common)
stw r9,PACA_EXSLB+EX_CCR(r13) /* save CR in exc. frame */
std r10,PACA_EXSLB+EX_LR(r13) /* save LR */

+ andi. r9,r11,MSR_PR // Check for exception from userspace
+ cmpdi cr4,r9,MSR_PR // And save the result in CR4 for later
+
/*
* Test MSR_RI before calling slb_allocate_realmode, because the
* MSR in r11 gets clobbered. However we still want to allocate
@@ -622,9 +625,32 @@ END_MMU_FTR_SECTION_IFCLR(MMU_FTR_TYPE_R

/* All done -- return from exception. */

+ bne cr4,1f /* returning to kernel */
+
+.machine push
+.machine "power4"
+ mtcrf 0x80,r9
+ mtcrf 0x08,r9 /* MSR[PR] indication is in cr4 */
+ mtcrf 0x04,r9 /* MSR[RI] indication is in cr5 */
+ mtcrf 0x02,r9 /* I/D indication is in cr6 */
+ mtcrf 0x01,r9 /* slb_allocate uses cr0 and cr7 */
+.machine pop
+
+ RESTORE_CTR(r9, PACA_EXSLB)
+ RESTORE_PPR_PACA(PACA_EXSLB, r9)
+ mr r3,r12
+ ld r9,PACA_EXSLB+EX_R9(r13)
+ ld r10,PACA_EXSLB+EX_R10(r13)
+ ld r11,PACA_EXSLB+EX_R11(r13)
+ ld r12,PACA_EXSLB+EX_R12(r13)
+ ld r13,PACA_EXSLB+EX_R13(r13)
+ RFI_TO_USER
+ b . /* prevent speculative execution */
+1:
.machine push
.machine "power4"
mtcrf 0x80,r9
+ mtcrf 0x08,r9 /* MSR[PR] indication is in cr4 */
mtcrf 0x04,r9 /* MSR[RI] indication is in cr5 */
mtcrf 0x02,r9 /* I/D indication is in cr6 */
mtcrf 0x01,r9 /* slb_allocate uses cr0 and cr7 */
@@ -638,9 +664,10 @@ END_MMU_FTR_SECTION_IFCLR(MMU_FTR_TYPE_R
ld r11,PACA_EXSLB+EX_R11(r13)
ld r12,PACA_EXSLB+EX_R12(r13)
ld r13,PACA_EXSLB+EX_R13(r13)
- rfid
+ RFI_TO_KERNEL
b . /* prevent speculative execution */

+
2: std r3,PACA_EXSLB+EX_DAR(r13)
mr r3,r12
mfspr r11,SPRN_SRR0



2018-01-22 09:49:34

by Geert Uytterhoeven

[permalink] [raw]
Subject: Re: [PATCH 4.14 16/89] futex: Avoid violating the 10th rule of futex

Hi Greg,

On Mon, Jan 22, 2018 at 9:44 AM, Greg Kroah-Hartman
<[email protected]> wrote:
> 4.14-stable review patch. If anyone has any objections, please let me know.
>
> ------------------
>
> From: Peter Zijlstra <[email protected]>
>
> commit c1e2f0eaf015fb7076d51a339011f2383e6dd389 upstream.

May be a bit premature, given the fishy use of newtid, cfr. my comment
in https://lkml.org/lkml/2018/1/22/274

Gr{oetje,eeting}s,

Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- [email protected]

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
-- Linus Torvalds

2018-01-22 09:54:01

by Greg Kroah-Hartman

[permalink] [raw]
Subject: Re: [PATCH 4.14 16/89] futex: Avoid violating the 10th rule of futex

On Mon, Jan 22, 2018 at 10:48:55AM +0100, Geert Uytterhoeven wrote:
> Hi Greg,
>
> On Mon, Jan 22, 2018 at 9:44 AM, Greg Kroah-Hartman
> <[email protected]> wrote:
> > 4.14-stable review patch. If anyone has any objections, please let me know.
> >
> > ------------------
> >
> > From: Peter Zijlstra <[email protected]>
> >
> > commit c1e2f0eaf015fb7076d51a339011f2383e6dd389 upstream.
>
> May be a bit premature, given the fishy use of newtid, cfr. my comment
> in https://lkml.org/lkml/2018/1/22/274

Not really a bug, but I will be glad to take any future patches that end
up in Linus's tree to resolve your issue with obsolete compilers like
this :)

thanks,

greg k-h

2018-01-22 10:05:55

by Geert Uytterhoeven

[permalink] [raw]
Subject: Re: [PATCH 4.14 16/89] futex: Avoid violating the 10th rule of futex

Hi Greg,

On Mon, Jan 22, 2018 at 10:53 AM, Greg Kroah-Hartman
<[email protected]> wrote:
> On Mon, Jan 22, 2018 at 10:48:55AM +0100, Geert Uytterhoeven wrote:
>> On Mon, Jan 22, 2018 at 9:44 AM, Greg Kroah-Hartman
>> <[email protected]> wrote:
>> > 4.14-stable review patch. If anyone has any objections, please let me know.
>> >
>> > ------------------
>> >
>> > From: Peter Zijlstra <[email protected]>
>> >
>> > commit c1e2f0eaf015fb7076d51a339011f2383e6dd389 upstream.
>>
>> May be a bit premature, given the fishy use of newtid, cfr. my comment
>> in https://lkml.org/lkml/2018/1/22/274
>
> Not really a bug, but I will be glad to take any future patches that end
> up in Linus's tree to resolve your issue with obsolete compilers like
> this :)

I'm not worried about the old compiler.
I'm worried about a real bug related to the no longer used assignment.

Gr{oetje,eeting}s,

Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- [email protected]

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
-- Linus Torvalds

2018-01-22 19:11:12

by Guenter Roeck

[permalink] [raw]
Subject: Re: [PATCH 4.14 00/89] 4.14.15-stable review

On Mon, Jan 22, 2018 at 09:44:40AM +0100, Greg Kroah-Hartman wrote:
> This is the start of the stable review cycle for the 4.14.15 release.
> There are 89 patches in this series, all will be posted as a response
> to this one. If anyone has any issues with these being applied, please
> let me know.
>

Note: This is for v4.14.14-91-gf7e703b.

Build results:
total: 145 pass: 145 fail: 0
Qemu test results:
total: 126 pass: 126 fail: 0

Details are available at http://kerneltests.org/builders.

Guenter

2018-01-22 20:39:43

by Naresh Kamboju

[permalink] [raw]
Subject: Re: [PATCH 4.14 00/89] 4.14.15-stable review

On 22 January 2018 at 14:14, Greg Kroah-Hartman
<[email protected]> wrote:
> This is the start of the stable review cycle for the 4.14.15 release.
> There are 89 patches in this series, all will be posted as a response
> to this one. If anyone has any issues with these being applied, please
> let me know.
>
> Responses should be made by Wed Jan 24 08:39:25 UTC 2018.
> Anything received after that time might be too late.
>
> The whole patch series can be found in one patch at:
> kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.14.15-rc1.gz
> or in the git tree and branch at:
> git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-4.14.y
> and the diffstat can be found below.
>
> thanks,
>
> greg k-h

Results from Linaro’s test farm.
No regressions on arm64, arm and x86_64.

Summary
------------------------------------------------------------------------

kernel: 4.14.15-rc1
git repo: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git
git branch: linux-4.14.y
git commit: f7e703b5b8e7cef9b95a462a9a2bef85bc40d16a
git describe: v4.14.14-91-gf7e703b5b8e7
Test details: https://qa-reports.linaro.org/lkft/linux-stable-rc-4.14-oe/build/v4.14.14-91-gf7e703b5b8e7


No regressions (compared to build v4.14.14-90-g9554eb9fb784)

Boards, architectures and test suites:
-------------------------------------

hi6220-hikey - arm64
* boot - pass: 20
* kselftest - skip: 16, pass: 46
* libhugetlbfs - skip: 1, pass: 90
* ltp-cap_bounds-tests - pass: 2
* ltp-containers-tests - pass: 64
* ltp-fcntl-locktests-tests - pass: 2
* ltp-filecaps-tests - pass: 2
* ltp-fs-tests - pass: 60
* ltp-fs_bind-tests - pass: 2
* ltp-fs_perms_simple-tests - pass: 19
* ltp-fsx-tests - pass: 2
* ltp-hugetlb-tests - skip: 1, pass: 21
* ltp-io-tests - pass: 3
* ltp-ipc-tests - pass: 9
* ltp-math-tests - pass: 11
* ltp-nptl-tests - pass: 2
* ltp-pty-tests - pass: 4
* ltp-sched-tests - pass: 14
* ltp-securebits-tests - pass: 4
* ltp-syscalls-tests - skip: 121, pass: 983
* ltp-timers-tests - pass: 12

juno-r2 - arm64
* boot - pass: 20
* kselftest - skip: 16, pass: 46
* libhugetlbfs - skip: 1, pass: 90
* ltp-cap_bounds-tests - pass: 2
* ltp-containers-tests - pass: 64
* ltp-fcntl-locktests-tests - pass: 2
* ltp-filecaps-tests - pass: 2
* ltp-fs-tests - pass: 60
* ltp-fs_bind-tests - pass: 2
* ltp-fs_perms_simple-tests - pass: 19
* ltp-fsx-tests - pass: 2
* ltp-hugetlb-tests - pass: 22
* ltp-io-tests - pass: 3
* ltp-ipc-tests - pass: 9
* ltp-math-tests - pass: 11
* ltp-nptl-tests - pass: 2
* ltp-pty-tests - pass: 4
* ltp-sched-tests - pass: 14
* ltp-securebits-tests - pass: 4
* ltp-syscalls-tests - skip: 121, pass: 987
* ltp-timers-tests - pass: 12

x15 - arm
* boot - pass: 20
* kselftest - skip: 18, pass: 43
* libhugetlbfs - skip: 1, pass: 87
* ltp-cap_bounds-tests - pass: 2
* ltp-containers-tests - pass: 64
* ltp-fcntl-locktests-tests - pass: 2
* ltp-filecaps-tests - pass: 2
* ltp-fs-tests - pass: 60
* ltp-fs_bind-tests - pass: 2
* ltp-fs_perms_simple-tests - pass: 19
* ltp-fsx-tests - pass: 2
* ltp-hugetlb-tests - skip: 2, pass: 20
* ltp-io-tests - pass: 3
* ltp-ipc-tests - pass: 9
* ltp-math-tests - pass: 11
* ltp-nptl-tests - pass: 2
* ltp-pty-tests - pass: 4
* ltp-sched-tests - skip: 1, pass: 13
* ltp-securebits-tests - pass: 4
* ltp-syscalls-tests - skip: 66, pass: 1037
* ltp-timers-tests - pass: 12

x86_64
* boot - pass: 20
* kselftest - fail: 1, skip: 17, pass: 57
* libhugetlbfs - skip: 1, pass: 89
* ltp-cap_bounds-tests - pass: 2
* ltp-containers-tests - pass: 64
* ltp-fcntl-locktests-tests - pass: 2
* ltp-filecaps-tests - pass: 2
* ltp-fs-tests - skip: 1, pass: 61
* ltp-fs_bind-tests - pass: 2
* ltp-fs_perms_simple-tests - pass: 19
* ltp-fsx-tests - pass: 2
* ltp-hugetlb-tests - pass: 22
* ltp-io-tests - pass: 3
* ltp-ipc-tests - pass: 9
* ltp-math-tests - pass: 11
* ltp-nptl-tests - pass: 2
* ltp-pty-tests - pass: 4
* ltp-sched-tests - skip: 1, pass: 9
* ltp-securebits-tests - pass: 4
* ltp-syscalls-tests - skip: 116, pass: 1016
* ltp-timers-tests - pass: 12

Documentation - https://collaborate.linaro.org/display/LKFT/Email+Reports
Tested-by: Naresh Kamboju <[email protected]>

2018-01-22 21:01:42

by Shuah Khan

[permalink] [raw]
Subject: Re: [PATCH 4.14 00/89] 4.14.15-stable review

On 01/22/2018 01:44 AM, Greg Kroah-Hartman wrote:
> This is the start of the stable review cycle for the 4.14.15 release.
> There are 89 patches in this series, all will be posted as a response
> to this one. If anyone has any issues with these being applied, please
> let me know.
>
> Responses should be made by Wed Jan 24 08:39:25 UTC 2018.
> Anything received after that time might be too late.
>
> The whole patch series can be found in one patch at:
> kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.14.15-rc1.gz
> or in the git tree and branch at:
> git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-4.14.y
> and the diffstat can be found below.
>
> thanks,
>
> greg k-h
>

Compiled and booted on my test system. No dmesg regressions.

thanks,
-- Shuah


2018-01-23 06:37:37

by Greg Kroah-Hartman

[permalink] [raw]
Subject: Re: [PATCH 4.14 00/89] 4.14.15-stable review

On Mon, Jan 22, 2018 at 11:10:20AM -0800, Guenter Roeck wrote:
> On Mon, Jan 22, 2018 at 09:44:40AM +0100, Greg Kroah-Hartman wrote:
> > This is the start of the stable review cycle for the 4.14.15 release.
> > There are 89 patches in this series, all will be posted as a response
> > to this one. If anyone has any issues with these being applied, please
> > let me know.
> >
>
> Note: This is for v4.14.14-91-gf7e703b.
>
> Build results:
> total: 145 pass: 145 fail: 0
> Qemu test results:
> total: 126 pass: 126 fail: 0
>
> Details are available at http://kerneltests.org/builders.

Thanks for testing all of these and letting me know.

greg k-h

2018-01-25 13:46:07

by Jiri Slaby

[permalink] [raw]
Subject: Re: [PATCH 4.14 17/89] futex: Prevent overflow by strengthen input validation

On 01/22/2018, 09:44 AM, Greg Kroah-Hartman wrote:
> 4.14-stable review patch. If anyone has any objections, please let me know.
>
> ------------------
>
> From: Li Jinyue <[email protected]>
>
> commit fbe0e839d1e22d88810f3ee3e2f1479be4c0aa4a upstream.
>
> UBSAN reports signed integer overflow in kernel/futex.c:
>
> UBSAN: Undefined behaviour in kernel/futex.c:2041:18
> signed integer overflow:
> 0 - -2147483648 cannot be represented in type 'int'
>
> Add a sanity check to catch negative values of nr_wake and nr_requeue.
>
> Signed-off-by: Li Jinyue <[email protected]>
> Signed-off-by: Thomas Gleixner <[email protected]>
> Cc: [email protected]
> Cc: [email protected]
> Link: https://lkml.kernel.org/r/[email protected]
> Signed-off-by: Greg Kroah-Hartman <[email protected]>
>
> ---
> kernel/futex.c | 3 +++
> 1 file changed, 3 insertions(+)
>
> --- a/kernel/futex.c
> +++ b/kernel/futex.c
> @@ -1878,6 +1878,9 @@ static int futex_requeue(u32 __user *uad
> struct futex_q *this, *next;
> DEFINE_WAKE_Q(wake_q);
>
> + if (nr_wake < 0 || nr_requeue < 0)
> + return -EINVAL;

This breaks strace's test suite on 4.14.15 (and is present in upstream
obviously too):
futex(0x7ff568b44ffc, 0x3, 0xfacefeed, 0xbadda7a0ca7b100d,
0x7ff568b44ffc, 0x9caffee1) = -1: Invalid argument

strace uses weird values in the testkit to pass down to futex as can be
seen. I think like in:
commit e78c38f6bdd900b2ad9ac9df8eff58b745dc5b3c
Author: Jiri Slaby <[email protected]>
Date: Mon Oct 23 13:41:51 2017 +0200

futex: futex_wake_op, do not fail on invalid op

something similar should be done here too. Maybe:
if (nr_wake < 0)
nr_wake = 0;
if (nr_requeue < 0)
nr_requeue = 0;
?

Maybe also with some pr_info_ratelimited like in the above commit?

thanks,
--
js
suse labs

2018-01-25 14:04:54

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [PATCH 4.14 17/89] futex: Prevent overflow by strengthen input validation

On Thu, 25 Jan 2018, Jiri Slaby wrote:
> On 01/22/2018, 09:44 AM, Greg Kroah-Hartman wrote:
> > + if (nr_wake < 0 || nr_requeue < 0)
> > + return -EINVAL;
>
> This breaks strace's test suite on 4.14.15 (and is present in upstream
> obviously too):
> futex(0x7ff568b44ffc, 0x3, 0xfacefeed, 0xbadda7a0ca7b100d,
> 0x7ff568b44ffc, 0x9caffee1) = -1: Invalid argument

And why the hell is strace expecting this to be valid?

Thanks,

tglx

2018-01-25 14:08:16

by Jiri Slaby

[permalink] [raw]
Subject: Re: [PATCH 4.14 17/89] futex: Prevent overflow by strengthen input validation

On 01/25/2018, 03:03 PM, Thomas Gleixner wrote:
> On Thu, 25 Jan 2018, Jiri Slaby wrote:
>> On 01/22/2018, 09:44 AM, Greg Kroah-Hartman wrote:
>>> + if (nr_wake < 0 || nr_requeue < 0)
>>> + return -EINVAL;
>>
>> This breaks strace's test suite on 4.14.15 (and is present in upstream
>> obviously too):
>> futex(0x7ff568b44ffc, 0x3, 0xfacefeed, 0xbadda7a0ca7b100d,
>> 0x7ff568b44ffc, 0x9caffee1) = -1: Invalid argument
>
> And why the hell is strace expecting this to be valid?

You ought to ask somebody else, I was confused the very same way:

My FIX:
https://github.com/strace/strace/pull/16/commits/777587ea509481666274df88671949b390f05cc3

Their NACK:
https://github.com/strace/strace/pull/16#issuecomment-341614984

thanks,
--
js
suse labs

2018-01-25 14:40:32

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [PATCH 4.14 17/89] futex: Prevent overflow by strengthen input validation

On Thu, 25 Jan 2018, Jiri Slaby wrote:

> On 01/25/2018, 03:03 PM, Thomas Gleixner wrote:
> > On Thu, 25 Jan 2018, Jiri Slaby wrote:
> >> On 01/22/2018, 09:44 AM, Greg Kroah-Hartman wrote:
> >>> + if (nr_wake < 0 || nr_requeue < 0)
> >>> + return -EINVAL;
> >>
> >> This breaks strace's test suite on 4.14.15 (and is present in upstream
> >> obviously too):
> >> futex(0x7ff568b44ffc, 0x3, 0xfacefeed, 0xbadda7a0ca7b100d,
> >> 0x7ff568b44ffc, 0x9caffee1) = -1: Invalid argument
> >
> > And why the hell is strace expecting this to be valid?
>
> You ought to ask somebody else, I was confused the very same way:
>
> My FIX:
> https://github.com/strace/strace/pull/16/commits/777587ea509481666274df88671949b390f05cc3
>
> Their NACK:
> https://github.com/strace/strace/pull/16#issuecomment-341614984

https://github.com/strace/strace/commit/79d10dfc20985225e4ea044d3875c4cea09053d7

Update futex test in accordance with kernel's v4.15-rc7-202-gfbe0e83

* futex.c (VALP, VALP_PR, VAL2P, VAL2P_PR): New macro definitions.
(main): Allow EINVAL on *REQUEUE* checks with VAL/VAL2 with higher bit
being set, check that the existing behaviour preserved with VALP/VAL2P
where higher bit is unset.

So what's the problem?

Thanks,

tglx

2018-01-25 14:49:32

by Jiri Slaby

[permalink] [raw]
Subject: Re: [PATCH 4.14 17/89] futex: Prevent overflow by strengthen input validation

On 01/25/2018, 03:30 PM, Thomas Gleixner wrote:
> So what's the problem?

The problem I see is that every stable kernel now requires updated
strace with their commit from yesterday to build correctly. In
particular, the new stable kernels cause rpm build failures of strace in
all our distros (based on those stable kernels). Sure, we can patch
strace in every distro every nth kernel update, but it's mere
impractical. Kernel should not break userspace, right?

BTW why was the patch applied to stable? We actually do pass
-fno-strict-overflow.

thanks,
--
js
suse labs

2018-01-25 15:13:13

by Greg Kroah-Hartman

[permalink] [raw]
Subject: Re: [PATCH 4.14 17/89] futex: Prevent overflow by strengthen input validation

On Thu, Jan 25, 2018 at 03:47:32PM +0100, Jiri Slaby wrote:
> On 01/25/2018, 03:30 PM, Thomas Gleixner wrote:
> > So what's the problem?
>
> The problem I see is that every stable kernel now requires updated
> strace with their commit from yesterday to build correctly. In
> particular, the new stable kernels cause rpm build failures of strace in
> all our distros (based on those stable kernels). Sure, we can patch
> strace in every distro every nth kernel update, but it's mere
> impractical. Kernel should not break userspace, right?

Well, when userspace is doing something stupid... :)

> BTW why was the patch applied to stable? We actually do pass
> -fno-strict-overflow.

The same reason it was applied upstream, it fixes a reported
issue.

Does that mean that all UBSAN overflow error reports are not valid
because of how we build the kernel?

thanks,

greg k-h

2018-01-25 15:24:12

by Jiri Slaby

[permalink] [raw]
Subject: Re: [PATCH 4.14 17/89] futex: Prevent overflow by strengthen input validation

On 01/25/2018, 04:12 PM, Greg Kroah-Hartman wrote:
> On Thu, Jan 25, 2018 at 03:47:32PM +0100, Jiri Slaby wrote:
>> On 01/25/2018, 03:30 PM, Thomas Gleixner wrote:
>>> So what's the problem?
>>
>> The problem I see is that every stable kernel now requires updated
>> strace with their commit from yesterday to build correctly. In
>> particular, the new stable kernels cause rpm build failures of strace in
>> all our distros (based on those stable kernels). Sure, we can patch
>> strace in every distro every nth kernel update, but it's mere
>> impractical. Kernel should not break userspace, right?
>
> Well, when userspace is doing something stupid... :)

No doubt... But does that mean we no longer maintain the "no userspace
breakage even if it is stupid" rule?

>> BTW why was the patch applied to stable? We actually do pass
>> -fno-strict-overflow.
>
> The same reason it was applied upstream, it fixes a reported
> issue.
>
> Does that mean that all UBSAN overflow error reports are not valid
> because of how we build the kernel?

IMO yes, because with the option, signed overflow is not undefined.

In the long term, it would be nice to get rid of *all* signed integer
overflows and kill the compiler option from Makefile. Therefore the
fixes are indeed very valid in upstream.

thanks,
--
js
suse labs

2018-01-25 15:33:06

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH 4.14 17/89] futex: Prevent overflow by strengthen input validation

On Thu, Jan 25, 2018 at 04:21:51PM +0100, Jiri Slaby wrote:
> > The same reason it was applied upstream, it fixes a reported
> > issue.
> >
> > Does that mean that all UBSAN overflow error reports are not valid
> > because of how we build the kernel?
>
> IMO yes, because with the option, signed overflow is not undefined.
>
> In the long term, it would be nice to get rid of *all* signed integer
> overflows and kill the compiler option from Makefile. Therefore the
> fixes are indeed very valid in upstream.

I actually think the option is unconditionally good. Undefined behaviour
in a language is bad. Sadly C has lots of it, but any reduction we can
have we must take.

2018-01-25 21:43:21

by Darren Hart

[permalink] [raw]
Subject: Re: [PATCH 4.14 17/89] futex: Prevent overflow by strengthen input validation

On Thu, Jan 25, 2018 at 04:21:51PM +0100, Jiri Slaby wrote:
> On 01/25/2018, 04:12 PM, Greg Kroah-Hartman wrote:
> > On Thu, Jan 25, 2018 at 03:47:32PM +0100, Jiri Slaby wrote:
> >> On 01/25/2018, 03:30 PM, Thomas Gleixner wrote:
> >>> So what's the problem?
> >>
> >> The problem I see is that every stable kernel now requires updated
> >> strace with their commit from yesterday to build correctly. In
> >> particular, the new stable kernels cause rpm build failures of strace in
> >> all our distros (based on those stable kernels). Sure, we can patch
> >> strace in every distro every nth kernel update, but it's mere
> >> impractical. Kernel should not break userspace, right?
> >
> > Well, when userspace is doing something stupid... :)
>
> No doubt... But does that mean we no longer maintain the "no userspace
> breakage even if it is stupid" rule?

One of the reasons we have been adding these earlier input validation checks to
futex has been to mitigate security exploits taking advantage of the complex
nature of the system call. Granted we should have done this initially, but if we
avoid some of these nasty exploits (and the real harm they enable), then yeah,
this is worth fixing userspace which is relying on undefined behavior.

I'd still like to out why various distros are sending garbage to uadd2 for
network setup (but that's another topic).

--
Darren Hart
VMware Open Source Technology Center