2019-06-09 18:09:46

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.9 00/83] 4.9.181-stable review

This is the start of the stable review cycle for the 4.9.181 release.
There are 83 patches in this series, all will be posted as a response
to this one. If anyone has any issues with these being applied, please
let me know.

Responses should be made by Tue 11 Jun 2019 04:39:58 PM UTC.
Anything received after that time might be too late.

The whole patch series can be found in one patch at:
https://www.kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.9.181-rc1.gz
or in the git tree and branch at:
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-4.9.y
and the diffstat can be found below.

thanks,

greg k-h

-------------
Pseudo-Shortlog of commits:

Greg Kroah-Hartman <[email protected]>
Linux 4.9.181-rc1

Kirill Smelkov <[email protected]>
fuse: Add FOPEN_STREAM to use stream_open()

Kirill Smelkov <[email protected]>
fs: stream_open - opener for stream-like files so that read and write can run simultaneously without deadlock

Jiri Slaby <[email protected]>
TTY: serial_core, add ->install

Chris Wilson <[email protected]>
drm/i915: Fix I915_EXEC_RING_MASK

Christian König <[email protected]>
drm/radeon: prefer lower reference dividers

Patrik Jakobsson <[email protected]>
drm/gma500/cdv: Check vbt config bits when detecting lvds panels

Dan Carpenter <[email protected]>
genwqe: Prevent an integer overflow in the ioctl

Greg Kroah-Hartman <[email protected]>
Revert "MIPS: perf: ath79: Fix perfcount IRQ assignment"

Paul Burton <[email protected]>
MIPS: pistachio: Build uImage.gz by default

Jiri Kosina <[email protected]>
x86/power: Fix 'nosmt' vs hibernation triple fault during resume

Miklos Szeredi <[email protected]>
fuse: fallocate: fix return with locked inode

John David Anglin <[email protected]>
parisc: Use implicit space register selection for loading the coherence index of I/O pdirs

Linus Torvalds <[email protected]>
rcu: locking and unlocking need to always be at least barriers

Hangbin Liu <[email protected]>
Revert "fib_rules: return 0 directly if an exactly same rule exists when NLM_F_EXCL not supplied"

Greg Kroah-Hartman <[email protected]>
Revert "fib_rules: fix error in backport of e9919a24d302 ("fib_rules: return 0...")"

Olivier Matz <[email protected]>
ipv6: use READ_ONCE() for inet->hdrincl as in ipv4

Olivier Matz <[email protected]>
ipv6: fix EFAULT on sendto with icmpv6 and hdrincl

Paolo Abeni <[email protected]>
pktgen: do not sleep with the thread lock held.

Zhu Yanjun <[email protected]>
net: rds: fix memory leak in rds_ib_flush_mr_pool

Erez Alfasi <[email protected]>
net/mlx4_en: ethtool, Remove unsupported SFP EEPROM high pages query

David Ahern <[email protected]>
neighbor: Call __ipv4_neigh_lookup_noref in neigh_xmit

Vivien Didelot <[email protected]>
ethtool: fix potential userspace buffer overflow

Nadav Amit <[email protected]>
media: uvcvideo: Fix uvc_alloc_entity() allocation alignment

Ard Biesheuvel <[email protected]>
efi/libstub: Unify command line param parsing

Greg Kroah-Hartman <[email protected]>
Revert "x86/build: Move _etext to actual end of .text"

Linus Torvalds <[email protected]>
mm: make page ref count overflow check tighter and more explicit

Linus Torvalds <[email protected]>
mm: prevent get_user_pages() from overflowing page refcount

Punit Agrawal <[email protected]>
mm, gup: ensure real head page is ref-counted when using hugepages

Will Deacon <[email protected]>
mm, gup: remove broken VM_BUG_ON_PAGE compound check for hugepages

Matthew Wilcox <[email protected]>
fs: prevent page refcount overflow in pipe_buf_get

Todd Kjos <[email protected]>
binder: replace "%p" with "%pK"

Ben Hutchings <[email protected]>
binder: Replace "%p" with "%pK" for stable

Arend van Spriel <[email protected]>
brcmfmac: add subtype check for event handling in data path

Arend van Spriel <[email protected]>
brcmfmac: assure SSID length from firmware is limited

Arend Van Spriel <[email protected]>
brcmfmac: add length checks in scheduled scan result handler

Thomas Hellstrom <[email protected]>
drm/vmwgfx: Don't send drm sysfs hotplug events on initial master set

Kees Cook <[email protected]>
gcc-plugins: Fix build failures under Darwin host

Roberto Bergantinos Corpas <[email protected]>
CIFS: cifs_read_allocate_pages: don't iterate through whole page array on ENOMEM

Dan Carpenter <[email protected]>
staging: vc04_services: prevent integer overflow in create_pagelist()

Jonathan Corbet <[email protected]>
docs: Fix conf.py for Sphinx 2.0

Zhenliang Wei <[email protected]>
kernel/signal.c: trace_signal_deliver when signal_group_exit

Jiri Slaby <[email protected]>
memcg: make it work on sparse non-0-node systems

Joe Burmeister <[email protected]>
tty: max310x: Fix external crystal register setup

Jorge Ramirez-Ortiz <[email protected]>
tty: serial: msm_serial: Fix XON/XOFF

Lyude Paul <[email protected]>
drm/nouveau/i2c: Disable i2c bus access after ->fini()

Kailang Yang <[email protected]>
ALSA: hda/realtek - Set default power save node to 0

Ravi Bangoria <[email protected]>
powerpc/perf: Fix MMCRA corruption by bhrb_filter

Filipe Manana <[email protected]>
Btrfs: fix race updating log root item during fsync

Steffen Maier <[email protected]>
scsi: zfcp: fix to prevent port_remove with pure auto scan LUNs (only sdevs)

Steffen Maier <[email protected]>
scsi: zfcp: fix missing zfcp_port reference put on -EBUSY from port_remove

Mauro Carvalho Chehab <[email protected]>
media: smsusb: better handle optional alignment

Alan Stern <[email protected]>
media: usb: siano: Fix false-positive "uninitialized variable" warning

Alan Stern <[email protected]>
media: usb: siano: Fix general protection fault in smsusb

Oliver Neukum <[email protected]>
USB: rio500: fix memory leak in close after disconnect

Oliver Neukum <[email protected]>
USB: rio500: refuse more than one device at a time

Maximilian Luz <[email protected]>
USB: Add LPM quirk for Surface Dock GigE adapter

Oliver Neukum <[email protected]>
USB: sisusbvga: fix oops in error path of sisusb_probe

Alan Stern <[email protected]>
USB: Fix slab-out-of-bounds write in usb_get_bos_descriptor

Shuah Khan <[email protected]>
usbip: usbip_host: fix stub_dev lock context imbalance regression

Shuah Khan <[email protected]>
usbip: usbip_host: fix BUG: sleeping function called from invalid context

Carsten Schmid <[email protected]>
usb: xhci: avoid null pointer deref when bos field is NULL

Andrey Smirnov <[email protected]>
xhci: Convert xhci_handshake() to use readl_poll_timeout_atomic()

Fabio Estevam <[email protected]>
xhci: Use %zu for printing size_t type

Henry Lin <[email protected]>
xhci: update bounce buffer with correct sg num

Rasmus Villemoes <[email protected]>
include/linux/bitops.h: sanitize rotate primitives

James Clarke <[email protected]>
sparc64: Fix regression in non-hypervisor TLB flush xcall

Junwei Hu <[email protected]>
tipc: fix modprobe tipc failed after switch order of device registration

David S. Miller <[email protected]>
Revert "tipc: fix modprobe tipc failed after switch order of device registration"

Konrad Rzeszutek Wilk <[email protected]>
xen/pciback: Don't disable PCI_COMMAND on PCI device reset.

Daniel Axtens <[email protected]>
crypto: vmx - ghash: do nosimd fallback manually

Antoine Tenart <[email protected]>
net: mvpp2: fix bad MVPP2_TXQ_SCHED_TOKEN_CNTR_REG queue value

Jisheng Zhang <[email protected]>
net: mvneta: Fix err code path of probe

Rasmus Villemoes <[email protected]>
net: dsa: mv88e6xxx: fix handling of upper half of STATS_TYPE_PORT

Eric Dumazet <[email protected]>
ipv4/igmp: fix build error if !CONFIG_IP_MULTICAST

Eric Dumazet <[email protected]>
ipv4/igmp: fix another memory leak in igmpv3_del_delrec()

Michael Chan <[email protected]>
bnxt_en: Fix aggregation buffer leak under OOM condition.

Chris Packham <[email protected]>
tipc: Avoid copying bytes beyond the supplied data

Kloetzke Jan <[email protected]>
usbnet: fix kernel crash after disconnect

Jisheng Zhang <[email protected]>
net: stmmac: fix reset gpio free missing

Eric Dumazet <[email protected]>
net-gro: fix use-after-free read in napi_gro_frags()

Andy Duan <[email protected]>
net: fec: fix the clk mismatch in failed_reset path

Eric Dumazet <[email protected]>
llc: fix skb leak in llc_build_and_send_ui_pkt()

Mike Manning <[email protected]>
ipv6: Consider sk_bound_dev_if when binding a raw socket to an address


-------------

Diffstat:

Documentation/conf.py | 2 +-
Makefile | 4 +-
arch/mips/ath79/setup.c | 6 +
arch/mips/pistachio/Platform | 1 +
arch/powerpc/perf/core-book3s.c | 6 +-
arch/powerpc/perf/power8-pmu.c | 3 +
arch/powerpc/perf/power9-pmu.c | 3 +
arch/sparc/mm/ultra.S | 4 +-
arch/x86/kernel/vmlinux.lds.S | 6 +-
arch/x86/power/cpu.c | 10 +
arch/x86/power/hibernate_64.c | 33 ++
drivers/android/binder.c | 36 +-
drivers/crypto/vmx/ghash.c | 213 +++++-------
drivers/firmware/efi/libstub/arm-stub.c | 23 +-
drivers/firmware/efi/libstub/arm64-stub.c | 4 +-
drivers/firmware/efi/libstub/efi-stub-helper.c | 19 +-
drivers/firmware/efi/libstub/efistub.h | 2 +
drivers/gpu/drm/gma500/cdv_intel_lvds.c | 3 +
drivers/gpu/drm/gma500/intel_bios.c | 3 +
drivers/gpu/drm/gma500/psb_drv.h | 1 +
drivers/gpu/drm/nouveau/include/nvkm/subdev/i2c.h | 2 +
drivers/gpu/drm/nouveau/nvkm/subdev/i2c/aux.c | 26 +-
drivers/gpu/drm/nouveau/nvkm/subdev/i2c/aux.h | 2 +
drivers/gpu/drm/nouveau/nvkm/subdev/i2c/base.c | 15 +
drivers/gpu/drm/nouveau/nvkm/subdev/i2c/bus.c | 21 +-
drivers/gpu/drm/nouveau/nvkm/subdev/i2c/bus.h | 1 +
drivers/gpu/drm/radeon/radeon_display.c | 4 +-
drivers/gpu/drm/vmwgfx/vmwgfx_drv.c | 8 +-
drivers/irqchip/irq-ath79-misc.c | 11 -
drivers/media/usb/siano/smsusb.c | 33 +-
drivers/media/usb/uvc/uvc_driver.c | 2 +-
drivers/misc/genwqe/card_dev.c | 2 +
drivers/misc/genwqe/card_utils.c | 4 +
drivers/net/dsa/mv88e6xxx/chip.c | 2 +-
drivers/net/ethernet/broadcom/bnxt/bnxt.c | 2 +
drivers/net/ethernet/freescale/fec_main.c | 2 +-
drivers/net/ethernet/marvell/mvneta.c | 4 +-
drivers/net/ethernet/marvell/mvpp2.c | 10 +-
drivers/net/ethernet/mellanox/mlx4/en_ethtool.c | 4 +-
drivers/net/ethernet/mellanox/mlx4/port.c | 5 -
drivers/net/ethernet/stmicro/stmmac/stmmac_mdio.c | 3 +-
drivers/net/usb/usbnet.c | 6 +
.../broadcom/brcm80211/brcmfmac/cfg80211.c | 16 +-
.../wireless/broadcom/brcm80211/brcmfmac/core.c | 5 +-
.../wireless/broadcom/brcm80211/brcmfmac/fweh.h | 16 +-
.../wireless/broadcom/brcm80211/brcmfmac/msgbuf.c | 2 +-
drivers/parisc/ccio-dma.c | 4 +-
drivers/parisc/sba_iommu.c | 3 +-
drivers/s390/scsi/zfcp_ext.h | 1 +
drivers/s390/scsi/zfcp_scsi.c | 9 +
drivers/s390/scsi/zfcp_sysfs.c | 55 +++-
drivers/s390/scsi/zfcp_unit.c | 8 +-
.../interface/vchiq_arm/vchiq_2835_arm.c | 9 +
drivers/tty/serial/max310x.c | 2 +-
drivers/tty/serial/msm_serial.c | 5 +-
drivers/tty/serial/serial_core.c | 24 +-
drivers/usb/core/config.c | 4 +-
drivers/usb/core/quirks.c | 3 +
drivers/usb/host/xhci-ring.c | 17 +-
drivers/usb/host/xhci.c | 24 +-
drivers/usb/misc/rio500.c | 41 ++-
drivers/usb/misc/sisusbvga/sisusb.c | 15 +-
drivers/usb/usbip/stub_dev.c | 75 +++--
drivers/xen/xen-pciback/pciback_ops.c | 2 -
drivers/xen/xenbus/xenbus_dev_frontend.c | 2 +-
fs/btrfs/tree-log.c | 8 +-
fs/cifs/file.c | 4 +-
fs/fuse/dev.c | 12 +-
fs/fuse/file.c | 6 +-
fs/open.c | 18 +
fs/pipe.c | 4 +-
fs/read_write.c | 5 +-
fs/splice.c | 12 +-
include/linux/bitops.h | 16 +-
include/linux/cpu.h | 4 +
include/linux/efi.h | 2 +-
include/linux/fs.h | 4 +
include/linux/list_lru.h | 1 +
include/linux/mm.h | 6 +-
include/linux/pipe_fs_i.h | 10 +-
include/linux/rcupdate.h | 6 +-
include/uapi/drm/i915_drm.h | 2 +-
include/uapi/linux/fuse.h | 2 +
include/uapi/linux/tipc_config.h | 10 +-
kernel/cpu.c | 4 +-
kernel/power/hibernate.c | 9 +
kernel/signal.c | 2 +
kernel/trace/trace.c | 6 +-
mm/gup.c | 54 ++-
mm/hugetlb.c | 16 +-
mm/list_lru.c | 8 +-
net/core/dev.c | 2 +-
net/core/ethtool.c | 5 +-
net/core/fib_rules.c | 7 +-
net/core/neighbour.c | 9 +-
net/core/pktgen.c | 11 +
net/ipv4/igmp.c | 47 ++-
net/ipv6/raw.c | 27 +-
net/llc/llc_output.c | 2 +
net/rds/ib_rdma.c | 10 +-
net/tipc/core.c | 32 +-
net/tipc/subscr.c | 14 +-
net/tipc/subscr.h | 5 +-
scripts/coccinelle/api/stream_open.cocci | 363 +++++++++++++++++++++
scripts/gcc-plugins/gcc-common.h | 4 +
sound/pci/hda/patch_realtek.c | 2 +-
106 files changed, 1223 insertions(+), 441 deletions(-)



2019-06-09 18:09:54

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.9 71/83] rcu: locking and unlocking need to always be at least barriers

From: Linus Torvalds <[email protected]>

commit 66be4e66a7f422128748e3c3ef6ee72b20a6197b upstream.

Herbert Xu pointed out that commit bb73c52bad36 ("rcu: Don't disable
preemption for Tiny and Tree RCU readers") was incorrect in making the
preempt_disable/enable() be conditional on CONFIG_PREEMPT_COUNT.

If CONFIG_PREEMPT_COUNT isn't enabled, the preemption enable/disable is
a no-op, but still is a compiler barrier.

And RCU locking still _needs_ that compiler barrier.

It is simply fundamentally not true that RCU locking would be a complete
no-op: we still need to guarantee (for example) that things that can
trap and cause preemption cannot migrate into the RCU locked region.

The way we do that is by making it a barrier.

See for example commit 386afc91144b ("spinlocks and preemption points
need to be at least compiler barriers") from back in 2013 that had
similar issues with spinlocks that become no-ops on UP: they must still
constrain the compiler from moving other operations into the critical
region.

Now, it is true that a lot of RCU operations already use READ_ONCE() and
WRITE_ONCE() (which in practice likely would never be re-ordered wrt
anything remotely interesting), but it is also true that that is not
globally the case, and that it's not even necessarily always possible
(ie bitfields etc).

Reported-by: Herbert Xu <[email protected]>
Fixes: bb73c52bad36 ("rcu: Don't disable preemption for Tiny and Tree RCU readers")
Cc: [email protected]
Cc: Boqun Feng <[email protected]>
Cc: Paul E. McKenney <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
include/linux/rcupdate.h | 6 ++----
1 file changed, 2 insertions(+), 4 deletions(-)

--- a/include/linux/rcupdate.h
+++ b/include/linux/rcupdate.h
@@ -306,14 +306,12 @@ void synchronize_rcu(void);

static inline void __rcu_read_lock(void)
{
- if (IS_ENABLED(CONFIG_PREEMPT_COUNT))
- preempt_disable();
+ preempt_disable();
}

static inline void __rcu_read_unlock(void)
{
- if (IS_ENABLED(CONFIG_PREEMPT_COUNT))
- preempt_enable();
+ preempt_enable();
}

static inline void synchronize_rcu(void)


2019-06-09 18:09:57

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.9 75/83] MIPS: pistachio: Build uImage.gz by default

From: Paul Burton <[email protected]>

commit e4f2d1af7163becb181419af9dece9206001e0a6 upstream.

The pistachio platform uses the U-Boot bootloader & generally boots a
kernel in the uImage format. As such it's useful to build one when
building the kernel, but to do so currently requires the user to
manually specify a uImage target on the make command line.

Make uImage.gz the pistachio platform's default build target, so that
the default is to build a kernel image that we can actually boot on a
board such as the MIPS Creator Ci40.

Marked for stable backport as far as v4.1 where pistachio support was
introduced. This is primarily useful for CI systems such as kernelci.org
which will benefit from us building a suitable image which can then be
booted as part of automated testing, extending our test coverage to the
affected stable branches.

Signed-off-by: Paul Burton <[email protected]>
Reviewed-by: Philippe Mathieu-Daudé <[email protected]>
Reviewed-by: Kevin Hilman <[email protected]>
Tested-by: Kevin Hilman <[email protected]>
URL: https://groups.io/g/kernelci/message/388
Cc: [email protected] # v4.1+
Cc: [email protected]
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
arch/mips/pistachio/Platform | 1 +
1 file changed, 1 insertion(+)

--- a/arch/mips/pistachio/Platform
+++ b/arch/mips/pistachio/Platform
@@ -6,3 +6,4 @@ cflags-$(CONFIG_MACH_PISTACHIO) += \
-I$(srctree)/arch/mips/include/asm/mach-pistachio
load-$(CONFIG_MACH_PISTACHIO) += 0xffffffff80400000
zload-$(CONFIG_MACH_PISTACHIO) += 0xffffffff81000000
+all-$(CONFIG_MACH_PISTACHIO) := uImage.gz


2019-06-09 18:10:08

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.9 79/83] drm/radeon: prefer lower reference dividers

From: Christian König <[email protected]>

commit 2e26ccb119bde03584be53406bbd22e711b0d6e6 upstream.

Instead of the closest reference divider prefer the lowest,
this fixes flickering issues on HP Compaq nx9420.

Bugs: https://bugs.freedesktop.org/show_bug.cgi?id=108514
Suggested-by: Paul Dufresne <[email protected]>
Signed-off-by: Christian König <[email protected]>
Acked-by: Alex Deucher <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
Cc: [email protected]
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
drivers/gpu/drm/radeon/radeon_display.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)

--- a/drivers/gpu/drm/radeon/radeon_display.c
+++ b/drivers/gpu/drm/radeon/radeon_display.c
@@ -935,12 +935,12 @@ static void avivo_get_fb_ref_div(unsigne
ref_div_max = max(min(100 / post_div, ref_div_max), 1u);

/* get matching reference and feedback divider */
- *ref_div = min(max(DIV_ROUND_CLOSEST(den, post_div), 1u), ref_div_max);
+ *ref_div = min(max(den/post_div, 1u), ref_div_max);
*fb_div = DIV_ROUND_CLOSEST(nom * *ref_div * post_div, den);

/* limit fb divider to its maximum */
if (*fb_div > fb_div_max) {
- *ref_div = DIV_ROUND_CLOSEST(*ref_div * fb_div_max, *fb_div);
+ *ref_div = (*ref_div * fb_div_max)/(*fb_div);
*fb_div = fb_div_max;
}
}


2019-06-09 18:10:12

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.9 57/83] mm: prevent get_user_pages() from overflowing page refcount

From: Linus Torvalds <[email protected]>

commit 8fde12ca79aff9b5ba951fce1a2641901b8d8e64 upstream.

If the page refcount wraps around past zero, it will be freed while
there are still four billion references to it. One of the possible
avenues for an attacker to try to make this happen is by doing direct IO
on a page multiple times. This patch makes get_user_pages() refuse to
take a new page reference if there are already more than two billion
references to the page.

Reported-by: Jann Horn <[email protected]>
Acked-by: Matthew Wilcox <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
[bwh: Backported to 4.9:
- Add the "err" variable in follow_hugetlb_page()
- Adjust context]
Signed-off-by: Ben Hutchings <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
mm/gup.c | 45 ++++++++++++++++++++++++++++++++++-----------
mm/hugetlb.c | 16 +++++++++++++++-
2 files changed, 49 insertions(+), 12 deletions(-)

--- a/mm/gup.c
+++ b/mm/gup.c
@@ -153,7 +153,10 @@ retry:
}

if (flags & FOLL_GET) {
- get_page(page);
+ if (unlikely(!try_get_page(page))) {
+ page = ERR_PTR(-ENOMEM);
+ goto out;
+ }

/* drop the pgmap reference now that we hold the page */
if (pgmap) {
@@ -292,7 +295,10 @@ struct page *follow_page_mask(struct vm_
if (pmd_trans_unstable(pmd))
ret = -EBUSY;
} else {
- get_page(page);
+ if (unlikely(!try_get_page(page))) {
+ spin_unlock(ptl);
+ return ERR_PTR(-ENOMEM);
+ }
spin_unlock(ptl);
lock_page(page);
ret = split_huge_page(page);
@@ -348,7 +354,10 @@ static int get_gate_page(struct mm_struc
goto unmap;
*page = pte_page(*pte);
}
- get_page(*page);
+ if (unlikely(!try_get_page(*page))) {
+ ret = -ENOMEM;
+ goto unmap;
+ }
out:
ret = 0;
unmap:
@@ -1231,6 +1240,20 @@ struct page *get_dump_page(unsigned long
*/
#ifdef CONFIG_HAVE_GENERIC_RCU_GUP

+/*
+ * Return the compund head page with ref appropriately incremented,
+ * or NULL if that failed.
+ */
+static inline struct page *try_get_compound_head(struct page *page, int refs)
+{
+ struct page *head = compound_head(page);
+ if (WARN_ON_ONCE(page_ref_count(head) < 0))
+ return NULL;
+ if (unlikely(!page_cache_add_speculative(head, refs)))
+ return NULL;
+ return head;
+}
+
#ifdef __HAVE_ARCH_PTE_SPECIAL
static int gup_pte_range(pmd_t pmd, unsigned long addr, unsigned long end,
int write, struct page **pages, int *nr)
@@ -1263,9 +1286,9 @@ static int gup_pte_range(pmd_t pmd, unsi

VM_BUG_ON(!pfn_valid(pte_pfn(pte)));
page = pte_page(pte);
- head = compound_head(page);

- if (!page_cache_get_speculative(head))
+ head = try_get_compound_head(page, 1);
+ if (!head)
goto pte_unmap;

if (unlikely(pte_val(pte) != pte_val(*ptep))) {
@@ -1321,8 +1344,8 @@ static int gup_huge_pmd(pmd_t orig, pmd_
refs++;
} while (addr += PAGE_SIZE, addr != end);

- head = compound_head(pmd_page(orig));
- if (!page_cache_add_speculative(head, refs)) {
+ head = try_get_compound_head(pmd_page(orig), refs);
+ if (!head) {
*nr -= refs;
return 0;
}
@@ -1355,8 +1378,8 @@ static int gup_huge_pud(pud_t orig, pud_
refs++;
} while (addr += PAGE_SIZE, addr != end);

- head = compound_head(pud_page(orig));
- if (!page_cache_add_speculative(head, refs)) {
+ head = try_get_compound_head(pud_page(orig), refs);
+ if (!head) {
*nr -= refs;
return 0;
}
@@ -1390,8 +1413,8 @@ static int gup_huge_pgd(pgd_t orig, pgd_
refs++;
} while (addr += PAGE_SIZE, addr != end);

- head = compound_head(pgd_page(orig));
- if (!page_cache_add_speculative(head, refs)) {
+ head = try_get_compound_head(pgd_page(orig), refs);
+ if (!head) {
*nr -= refs;
return 0;
}
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -3984,6 +3984,7 @@ long follow_hugetlb_page(struct mm_struc
unsigned long vaddr = *position;
unsigned long remainder = *nr_pages;
struct hstate *h = hstate_vma(vma);
+ int err = -EFAULT;

while (vaddr < vma->vm_end && remainder) {
pte_t *pte;
@@ -4055,6 +4056,19 @@ long follow_hugetlb_page(struct mm_struc

pfn_offset = (vaddr & ~huge_page_mask(h)) >> PAGE_SHIFT;
page = pte_page(huge_ptep_get(pte));
+
+ /*
+ * Instead of doing 'try_get_page()' below in the same_page
+ * loop, just check the count once here.
+ */
+ if (unlikely(page_count(page) <= 0)) {
+ if (pages) {
+ spin_unlock(ptl);
+ remainder = 0;
+ err = -ENOMEM;
+ break;
+ }
+ }
same_page:
if (pages) {
pages[i] = mem_map_offset(page, pfn_offset);
@@ -4081,7 +4095,7 @@ same_page:
*nr_pages = remainder;
*position = vaddr;

- return i ? i : -EFAULT;
+ return i ? i : err;
}

#ifndef __HAVE_ARCH_FLUSH_HUGETLB_TLB_RANGE


2019-06-09 18:10:12

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.9 82/83] fs: stream_open - opener for stream-like files so that read and write can run simultaneously without deadlock

From: Kirill Smelkov <[email protected]>

commit 10dce8af34226d90fa56746a934f8da5dcdba3df upstream.

Commit 9c225f2655e3 ("vfs: atomic f_pos accesses as per POSIX") added
locking for file.f_pos access and in particular made concurrent read and
write not possible - now both those functions take f_pos lock for the
whole run, and so if e.g. a read is blocked waiting for data, write will
deadlock waiting for that read to complete.

This caused regression for stream-like files where previously read and
write could run simultaneously, but after that patch could not do so
anymore. See e.g. commit 581d21a2d02a ("xenbus: fix deadlock on writes
to /proc/xen/xenbus") which fixes such regression for particular case of
/proc/xen/xenbus.

The patch that added f_pos lock in 2014 did so to guarantee POSIX thread
safety for read/write/lseek and added the locking to file descriptors of
all regular files. In 2014 that thread-safety problem was not new as it
was already discussed earlier in 2006.

However even though 2006'th version of Linus's patch was adding f_pos
locking "only for files that are marked seekable with FMODE_LSEEK (thus
avoiding the stream-like objects like pipes and sockets)", the 2014
version - the one that actually made it into the tree as 9c225f2655e3 -
is doing so irregardless of whether a file is seekable or not.

See

https://lore.kernel.org/lkml/[email protected]/
https://lwn.net/Articles/180387
https://lwn.net/Articles/180396

for historic context.

The reason that it did so is, probably, that there are many files that
are marked non-seekable, but e.g. their read implementation actually
depends on knowing current position to correctly handle the read. Some
examples:

kernel/power/user.c snapshot_read
fs/debugfs/file.c u32_array_read
fs/fuse/control.c fuse_conn_waiting_read + ...
drivers/hwmon/asus_atk0110.c atk_debugfs_ggrp_read
arch/s390/hypfs/inode.c hypfs_read_iter
...

Despite that, many nonseekable_open users implement read and write with
pure stream semantics - they don't depend on passed ppos at all. And for
those cases where read could wait for something inside, it creates a
situation similar to xenbus - the write could be never made to go until
read is done, and read is waiting for some, potentially external, event,
for potentially unbounded time -> deadlock.

Besides xenbus, there are 14 such places in the kernel that I've found
with semantic patch (see below):

drivers/xen/evtchn.c:667:8-24: ERROR: evtchn_fops: .read() can deadlock .write()
drivers/isdn/capi/capi.c:963:8-24: ERROR: capi_fops: .read() can deadlock .write()
drivers/input/evdev.c:527:1-17: ERROR: evdev_fops: .read() can deadlock .write()
drivers/char/pcmcia/cm4000_cs.c:1685:7-23: ERROR: cm4000_fops: .read() can deadlock .write()
net/rfkill/core.c:1146:8-24: ERROR: rfkill_fops: .read() can deadlock .write()
drivers/s390/char/fs3270.c:488:1-17: ERROR: fs3270_fops: .read() can deadlock .write()
drivers/usb/misc/ldusb.c:310:1-17: ERROR: ld_usb_fops: .read() can deadlock .write()
drivers/hid/uhid.c:635:1-17: ERROR: uhid_fops: .read() can deadlock .write()
net/batman-adv/icmp_socket.c:80:1-17: ERROR: batadv_fops: .read() can deadlock .write()
drivers/media/rc/lirc_dev.c:198:1-17: ERROR: lirc_fops: .read() can deadlock .write()
drivers/leds/uleds.c:77:1-17: ERROR: uleds_fops: .read() can deadlock .write()
drivers/input/misc/uinput.c:400:1-17: ERROR: uinput_fops: .read() can deadlock .write()
drivers/infiniband/core/user_mad.c:985:7-23: ERROR: umad_fops: .read() can deadlock .write()
drivers/gnss/core.c:45:1-17: ERROR: gnss_fops: .read() can deadlock .write()

In addition to the cases above another regression caused by f_pos
locking is that now FUSE filesystems that implement open with
FOPEN_NONSEEKABLE flag, can no longer implement bidirectional
stream-like files - for the same reason as above e.g. read can deadlock
write locking on file.f_pos in the kernel.

FUSE's FOPEN_NONSEEKABLE was added in 2008 in a7c1b990f715 ("fuse:
implement nonseekable open") to support OSSPD. OSSPD implements /dev/dsp
in userspace with FOPEN_NONSEEKABLE flag, with corresponding read and
write routines not depending on current position at all, and with both
read and write being potentially blocking operations:

See

https://github.com/libfuse/osspd
https://lwn.net/Articles/308445

https://github.com/libfuse/osspd/blob/14a9cff0/osspd.c#L1406
https://github.com/libfuse/osspd/blob/14a9cff0/osspd.c#L1438-L1477
https://github.com/libfuse/osspd/blob/14a9cff0/osspd.c#L1479-L1510

Corresponding libfuse example/test also describes FOPEN_NONSEEKABLE as
"somewhat pipe-like files ..." with read handler not using offset.
However that test implements only read without write and cannot exercise
the deadlock scenario:

https://github.com/libfuse/libfuse/blob/fuse-3.4.2-3-ga1bff7d/example/poll.c#L124-L131
https://github.com/libfuse/libfuse/blob/fuse-3.4.2-3-ga1bff7d/example/poll.c#L146-L163
https://github.com/libfuse/libfuse/blob/fuse-3.4.2-3-ga1bff7d/example/poll.c#L209-L216

I've actually hit the read vs write deadlock for real while implementing
my FUSE filesystem where there is /head/watch file, for which open
creates separate bidirectional socket-like stream in between filesystem
and its user with both read and write being later performed
simultaneously. And there it is semantically not easy to split the
stream into two separate read-only and write-only channels:

https://lab.nexedi.com/kirr/wendelin.core/blob/f13aa600/wcfs/wcfs.go#L88-169

Let's fix this regression. The plan is:

1. We can't change nonseekable_open to include &~FMODE_ATOMIC_POS -
doing so would break many in-kernel nonseekable_open users which
actually use ppos in read/write handlers.

2. Add stream_open() to kernel to open stream-like non-seekable file
descriptors. Read and write on such file descriptors would never use
nor change ppos. And with that property on stream-like files read and
write will be running without taking f_pos lock - i.e. read and write
could be running simultaneously.

3. With semantic patch search and convert to stream_open all in-kernel
nonseekable_open users for which read and write actually do not
depend on ppos and where there is no other methods in file_operations
which assume @offset access.

4. Add FOPEN_STREAM to fs/fuse/ and open in-kernel file-descriptors via
steam_open if that bit is present in filesystem open reply.

It was tempting to change fs/fuse/ open handler to use stream_open
instead of nonseekable_open on just FOPEN_NONSEEKABLE flags, but
grepping through Debian codesearch shows users of FOPEN_NONSEEKABLE,
and in particular GVFS which actually uses offset in its read and
write handlers

https://codesearch.debian.net/search?q=-%3Enonseekable+%3D
https://gitlab.gnome.org/GNOME/gvfs/blob/1.40.0-6-gcbc54396/client/gvfsfusedaemon.c#L1080
https://gitlab.gnome.org/GNOME/gvfs/blob/1.40.0-6-gcbc54396/client/gvfsfusedaemon.c#L1247-1346
https://gitlab.gnome.org/GNOME/gvfs/blob/1.40.0-6-gcbc54396/client/gvfsfusedaemon.c#L1399-1481

so if we would do such a change it will break a real user.

5. Add stream_open and FOPEN_STREAM handling to stable kernels starting
from v3.14+ (the kernel where 9c225f2655 first appeared).

This will allow to patch OSSPD and other FUSE filesystems that
provide stream-like files to return FOPEN_STREAM | FOPEN_NONSEEKABLE
in their open handler and this way avoid the deadlock on all kernel
versions. This should work because fs/fuse/ ignores unknown open
flags returned from a filesystem and so passing FOPEN_STREAM to a
kernel that is not aware of this flag cannot hurt. In turn the kernel
that is not aware of FOPEN_STREAM will be < v3.14 where just
FOPEN_NONSEEKABLE is sufficient to implement streams without read vs
write deadlock.

This patch adds stream_open, converts /proc/xen/xenbus to it and adds
semantic patch to automatically locate in-kernel places that are either
required to be converted due to read vs write deadlock, or that are just
safe to be converted because read and write do not use ppos and there
are no other funky methods in file_operations.

Regarding semantic patch I've verified each generated change manually -
that it is correct to convert - and each other nonseekable_open instance
left - that it is either not correct to convert there, or that it is not
converted due to current stream_open.cocci limitations.

The script also does not convert files that should be valid to convert,
but that currently have .llseek = noop_llseek or generic_file_llseek for
unknown reason despite file being opened with nonseekable_open (e.g.
drivers/input/mousedev.c)

Cc: Michael Kerrisk <[email protected]>
Cc: Yongzhi Pan <[email protected]>
Cc: Jonathan Corbet <[email protected]>
Cc: David Vrabel <[email protected]>
Cc: Juergen Gross <[email protected]>
Cc: Miklos Szeredi <[email protected]>
Cc: Tejun Heo <[email protected]>
Cc: Kirill Tkhai <[email protected]>
Cc: Arnd Bergmann <[email protected]>
Cc: Christoph Hellwig <[email protected]>
Cc: Greg Kroah-Hartman <[email protected]>
Cc: Julia Lawall <[email protected]>
Cc: Nikolaus Rath <[email protected]>
Cc: Han-Wen Nienhuys <[email protected]>
Signed-off-by: Kirill Smelkov <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>


---
drivers/xen/xenbus/xenbus_dev_frontend.c | 2
fs/open.c | 18 +
fs/read_write.c | 5
include/linux/fs.h | 4
scripts/coccinelle/api/stream_open.cocci | 363 +++++++++++++++++++++++++++++++
5 files changed, 389 insertions(+), 3 deletions(-)

--- a/drivers/xen/xenbus/xenbus_dev_frontend.c
+++ b/drivers/xen/xenbus/xenbus_dev_frontend.c
@@ -536,7 +536,7 @@ static int xenbus_file_open(struct inode
if (xen_store_evtchn == 0)
return -ENOENT;

- nonseekable_open(inode, filp);
+ stream_open(inode, filp);

u = kzalloc(sizeof(*u), GFP_KERNEL);
if (u == NULL)
--- a/fs/open.c
+++ b/fs/open.c
@@ -1192,3 +1192,21 @@ int nonseekable_open(struct inode *inode
}

EXPORT_SYMBOL(nonseekable_open);
+
+/*
+ * stream_open is used by subsystems that want stream-like file descriptors.
+ * Such file descriptors are not seekable and don't have notion of position
+ * (file.f_pos is always 0). Contrary to file descriptors of other regular
+ * files, .read() and .write() can run simultaneously.
+ *
+ * stream_open never fails and is marked to return int so that it could be
+ * directly used as file_operations.open .
+ */
+int stream_open(struct inode *inode, struct file *filp)
+{
+ filp->f_mode &= ~(FMODE_LSEEK | FMODE_PREAD | FMODE_PWRITE | FMODE_ATOMIC_POS);
+ filp->f_mode |= FMODE_STREAM;
+ return 0;
+}
+
+EXPORT_SYMBOL(stream_open);
--- a/fs/read_write.c
+++ b/fs/read_write.c
@@ -575,12 +575,13 @@ EXPORT_SYMBOL(vfs_write);

static inline loff_t file_pos_read(struct file *file)
{
- return file->f_pos;
+ return file->f_mode & FMODE_STREAM ? 0 : file->f_pos;
}

static inline void file_pos_write(struct file *file, loff_t pos)
{
- file->f_pos = pos;
+ if ((file->f_mode & FMODE_STREAM) == 0)
+ file->f_pos = pos;
}

SYSCALL_DEFINE3(read, unsigned int, fd, char __user *, buf, size_t, count)
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -143,6 +143,9 @@ typedef int (dio_iodone_t)(struct kiocb
/* Has write method(s) */
#define FMODE_CAN_WRITE ((__force fmode_t)0x40000)

+/* File is stream-like */
+#define FMODE_STREAM ((__force fmode_t)0x200000)
+
/* File was opened by fanotify and shouldn't generate fanotify events */
#define FMODE_NONOTIFY ((__force fmode_t)0x4000000)

@@ -2843,6 +2846,7 @@ extern loff_t no_seek_end_llseek_size(st
extern loff_t no_seek_end_llseek(struct file *, loff_t, int);
extern int generic_file_open(struct inode * inode, struct file * filp);
extern int nonseekable_open(struct inode * inode, struct file * filp);
+extern int stream_open(struct inode * inode, struct file * filp);

#ifdef CONFIG_BLOCK
typedef void (dio_submit_t)(struct bio *bio, struct inode *inode,
--- /dev/null
+++ b/scripts/coccinelle/api/stream_open.cocci
@@ -0,0 +1,363 @@
+// SPDX-License-Identifier: GPL-2.0
+// Author: Kirill Smelkov ([email protected])
+//
+// Search for stream-like files that are using nonseekable_open and convert
+// them to stream_open. A stream-like file is a file that does not use ppos in
+// its read and write. Rationale for the conversion is to avoid deadlock in
+// between read and write.
+
+virtual report
+virtual patch
+virtual explain // explain decisions in the patch (SPFLAGS="-D explain")
+
+// stream-like reader & writer - ones that do not depend on f_pos.
+@ stream_reader @
+identifier readstream, ppos;
+identifier f, buf, len;
+type loff_t;
+@@
+ ssize_t readstream(struct file *f, char *buf, size_t len, loff_t *ppos)
+ {
+ ... when != ppos
+ }
+
+@ stream_writer @
+identifier writestream, ppos;
+identifier f, buf, len;
+type loff_t;
+@@
+ ssize_t writestream(struct file *f, const char *buf, size_t len, loff_t *ppos)
+ {
+ ... when != ppos
+ }
+
+
+// a function that blocks
+@ blocks @
+identifier block_f;
+identifier wait_event =~ "^wait_event_.*";
+@@
+ block_f(...) {
+ ... when exists
+ wait_event(...)
+ ... when exists
+ }
+
+// stream_reader that can block inside.
+//
+// XXX wait_* can be called not directly from current function (e.g. func -> f -> g -> wait())
+// XXX currently reader_blocks supports only direct and 1-level indirect cases.
+@ reader_blocks_direct @
+identifier stream_reader.readstream;
+identifier wait_event =~ "^wait_event_.*";
+@@
+ readstream(...)
+ {
+ ... when exists
+ wait_event(...)
+ ... when exists
+ }
+
+@ reader_blocks_1 @
+identifier stream_reader.readstream;
+identifier blocks.block_f;
+@@
+ readstream(...)
+ {
+ ... when exists
+ block_f(...)
+ ... when exists
+ }
+
+@ reader_blocks depends on reader_blocks_direct || reader_blocks_1 @
+identifier stream_reader.readstream;
+@@
+ readstream(...) {
+ ...
+ }
+
+
+// file_operations + whether they have _any_ .read, .write, .llseek ... at all.
+//
+// XXX add support for file_operations xxx[N] = ... (sound/core/pcm_native.c)
+@ fops0 @
+identifier fops;
+@@
+ struct file_operations fops = {
+ ...
+ };
+
+@ has_read @
+identifier fops0.fops;
+identifier read_f;
+@@
+ struct file_operations fops = {
+ .read = read_f,
+ };
+
+@ has_read_iter @
+identifier fops0.fops;
+identifier read_iter_f;
+@@
+ struct file_operations fops = {
+ .read_iter = read_iter_f,
+ };
+
+@ has_write @
+identifier fops0.fops;
+identifier write_f;
+@@
+ struct file_operations fops = {
+ .write = write_f,
+ };
+
+@ has_write_iter @
+identifier fops0.fops;
+identifier write_iter_f;
+@@
+ struct file_operations fops = {
+ .write_iter = write_iter_f,
+ };
+
+@ has_llseek @
+identifier fops0.fops;
+identifier llseek_f;
+@@
+ struct file_operations fops = {
+ .llseek = llseek_f,
+ };
+
+@ has_no_llseek @
+identifier fops0.fops;
+@@
+ struct file_operations fops = {
+ .llseek = no_llseek,
+ };
+
+@ has_mmap @
+identifier fops0.fops;
+identifier mmap_f;
+@@
+ struct file_operations fops = {
+ .mmap = mmap_f,
+ };
+
+@ has_copy_file_range @
+identifier fops0.fops;
+identifier copy_file_range_f;
+@@
+ struct file_operations fops = {
+ .copy_file_range = copy_file_range_f,
+ };
+
+@ has_remap_file_range @
+identifier fops0.fops;
+identifier remap_file_range_f;
+@@
+ struct file_operations fops = {
+ .remap_file_range = remap_file_range_f,
+ };
+
+@ has_splice_read @
+identifier fops0.fops;
+identifier splice_read_f;
+@@
+ struct file_operations fops = {
+ .splice_read = splice_read_f,
+ };
+
+@ has_splice_write @
+identifier fops0.fops;
+identifier splice_write_f;
+@@
+ struct file_operations fops = {
+ .splice_write = splice_write_f,
+ };
+
+
+// file_operations that is candidate for stream_open conversion - it does not
+// use mmap and other methods that assume @offset access to file.
+//
+// XXX for simplicity require no .{read/write}_iter and no .splice_{read/write} for now.
+// XXX maybe_steam.fops cannot be used in other rules - it gives "bad rule maybe_stream or bad variable fops".
+@ maybe_stream depends on (!has_llseek || has_no_llseek) && !has_mmap && !has_copy_file_range && !has_remap_file_range && !has_read_iter && !has_write_iter && !has_splice_read && !has_splice_write @
+identifier fops0.fops;
+@@
+ struct file_operations fops = {
+ };
+
+
+// ---- conversions ----
+
+// XXX .open = nonseekable_open -> .open = stream_open
+// XXX .open = func -> openfunc -> nonseekable_open
+
+// read & write
+//
+// if both are used in the same file_operations together with an opener -
+// under that conditions we can use stream_open instead of nonseekable_open.
+@ fops_rw depends on maybe_stream @
+identifier fops0.fops, openfunc;
+identifier stream_reader.readstream;
+identifier stream_writer.writestream;
+@@
+ struct file_operations fops = {
+ .open = openfunc,
+ .read = readstream,
+ .write = writestream,
+ };
+
+@ report_rw depends on report @
+identifier fops_rw.openfunc;
+position p1;
+@@
+ openfunc(...) {
+ <...
+ nonseekable_open@p1
+ ...>
+ }
+
+@ script:python depends on report && reader_blocks @
+fops << fops0.fops;
+p << report_rw.p1;
+@@
+coccilib.report.print_report(p[0],
+ "ERROR: %s: .read() can deadlock .write(); change nonseekable_open -> stream_open to fix." % (fops,))
+
+@ script:python depends on report && !reader_blocks @
+fops << fops0.fops;
+p << report_rw.p1;
+@@
+coccilib.report.print_report(p[0],
+ "WARNING: %s: .read() and .write() have stream semantic; safe to change nonseekable_open -> stream_open." % (fops,))
+
+
+@ explain_rw_deadlocked depends on explain && reader_blocks @
+identifier fops_rw.openfunc;
+@@
+ openfunc(...) {
+ <...
+- nonseekable_open
++ nonseekable_open /* read & write (was deadlock) */
+ ...>
+ }
+
+
+@ explain_rw_nodeadlock depends on explain && !reader_blocks @
+identifier fops_rw.openfunc;
+@@
+ openfunc(...) {
+ <...
+- nonseekable_open
++ nonseekable_open /* read & write (no direct deadlock) */
+ ...>
+ }
+
+@ patch_rw depends on patch @
+identifier fops_rw.openfunc;
+@@
+ openfunc(...) {
+ <...
+- nonseekable_open
++ stream_open
+ ...>
+ }
+
+
+// read, but not write
+@ fops_r depends on maybe_stream && !has_write @
+identifier fops0.fops, openfunc;
+identifier stream_reader.readstream;
+@@
+ struct file_operations fops = {
+ .open = openfunc,
+ .read = readstream,
+ };
+
+@ report_r depends on report @
+identifier fops_r.openfunc;
+position p1;
+@@
+ openfunc(...) {
+ <...
+ nonseekable_open@p1
+ ...>
+ }
+
+@ script:python depends on report @
+fops << fops0.fops;
+p << report_r.p1;
+@@
+coccilib.report.print_report(p[0],
+ "WARNING: %s: .read() has stream semantic; safe to change nonseekable_open -> stream_open." % (fops,))
+
+@ explain_r depends on explain @
+identifier fops_r.openfunc;
+@@
+ openfunc(...) {
+ <...
+- nonseekable_open
++ nonseekable_open /* read only */
+ ...>
+ }
+
+@ patch_r depends on patch @
+identifier fops_r.openfunc;
+@@
+ openfunc(...) {
+ <...
+- nonseekable_open
++ stream_open
+ ...>
+ }
+
+
+// write, but not read
+@ fops_w depends on maybe_stream && !has_read @
+identifier fops0.fops, openfunc;
+identifier stream_writer.writestream;
+@@
+ struct file_operations fops = {
+ .open = openfunc,
+ .write = writestream,
+ };
+
+@ report_w depends on report @
+identifier fops_w.openfunc;
+position p1;
+@@
+ openfunc(...) {
+ <...
+ nonseekable_open@p1
+ ...>
+ }
+
+@ script:python depends on report @
+fops << fops0.fops;
+p << report_w.p1;
+@@
+coccilib.report.print_report(p[0],
+ "WARNING: %s: .write() has stream semantic; safe to change nonseekable_open -> stream_open." % (fops,))
+
+@ explain_w depends on explain @
+identifier fops_w.openfunc;
+@@
+ openfunc(...) {
+ <...
+- nonseekable_open
++ nonseekable_open /* write only */
+ ...>
+ }
+
+@ patch_w depends on patch @
+identifier fops_w.openfunc;
+@@
+ openfunc(...) {
+ <...
+- nonseekable_open
++ stream_open
+ ...>
+ }
+
+
+// no read, no write - don't change anything


2019-06-09 18:10:18

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.9 83/83] fuse: Add FOPEN_STREAM to use stream_open()

From: Kirill Smelkov <[email protected]>

commit bbd84f33652f852ce5992d65db4d020aba21f882 upstream.

Starting from commit 9c225f2655e3 ("vfs: atomic f_pos accesses as per
POSIX") files opened even via nonseekable_open gate read and write via lock
and do not allow them to be run simultaneously. This can create read vs
write deadlock if a filesystem is trying to implement a socket-like file
which is intended to be simultaneously used for both read and write from
filesystem client. See commit 10dce8af3422 ("fs: stream_open - opener for
stream-like files so that read and write can run simultaneously without
deadlock") for details and e.g. commit 581d21a2d02a ("xenbus: fix deadlock
on writes to /proc/xen/xenbus") for a similar deadlock example on
/proc/xen/xenbus.

To avoid such deadlock it was tempting to adjust fuse_finish_open to use
stream_open instead of nonseekable_open on just FOPEN_NONSEEKABLE flags,
but grepping through Debian codesearch shows users of FOPEN_NONSEEKABLE,
and in particular GVFS which actually uses offset in its read and write
handlers

https://codesearch.debian.net/search?q=-%3Enonseekable+%3D
https://gitlab.gnome.org/GNOME/gvfs/blob/1.40.0-6-gcbc54396/client/gvfsfusedaemon.c#L1080
https://gitlab.gnome.org/GNOME/gvfs/blob/1.40.0-6-gcbc54396/client/gvfsfusedaemon.c#L1247-1346
https://gitlab.gnome.org/GNOME/gvfs/blob/1.40.0-6-gcbc54396/client/gvfsfusedaemon.c#L1399-1481

so if we would do such a change it will break a real user.

Add another flag (FOPEN_STREAM) for filesystem servers to indicate that the
opened handler is having stream-like semantics; does not use file position
and thus the kernel is free to issue simultaneous read and write request on
opened file handle.

This patch together with stream_open() should be added to stable kernels
starting from v3.14+. This will allow to patch OSSPD and other FUSE
filesystems that provide stream-like files to return FOPEN_STREAM |
FOPEN_NONSEEKABLE in open handler and this way avoid the deadlock on all
kernel versions. This should work because fuse_finish_open ignores unknown
open flags returned from a filesystem and so passing FOPEN_STREAM to a
kernel that is not aware of this flag cannot hurt. In turn the kernel that
is not aware of FOPEN_STREAM will be < v3.14 where just FOPEN_NONSEEKABLE
is sufficient to implement streams without read vs write deadlock.

Cc: [email protected] # v3.14+
Signed-off-by: Kirill Smelkov <[email protected]>
Signed-off-by: Miklos Szeredi <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>


---
fs/fuse/file.c | 4 +++-
include/uapi/linux/fuse.h | 2 ++
2 files changed, 5 insertions(+), 1 deletion(-)

--- a/fs/fuse/file.c
+++ b/fs/fuse/file.c
@@ -178,7 +178,9 @@ void fuse_finish_open(struct inode *inod
file->f_op = &fuse_direct_io_file_operations;
if (!(ff->open_flags & FOPEN_KEEP_CACHE))
invalidate_inode_pages2(inode->i_mapping);
- if (ff->open_flags & FOPEN_NONSEEKABLE)
+ if (ff->open_flags & FOPEN_STREAM)
+ stream_open(inode, file);
+ else if (ff->open_flags & FOPEN_NONSEEKABLE)
nonseekable_open(inode, file);
if (fc->atomic_o_trunc && (file->f_flags & O_TRUNC)) {
struct fuse_inode *fi = get_fuse_inode(inode);
--- a/include/uapi/linux/fuse.h
+++ b/include/uapi/linux/fuse.h
@@ -215,10 +215,12 @@ struct fuse_file_lock {
* FOPEN_DIRECT_IO: bypass page cache for this open file
* FOPEN_KEEP_CACHE: don't invalidate the data cache on open
* FOPEN_NONSEEKABLE: the file is not seekable
+ * FOPEN_STREAM: the file is stream-like (no file position at all)
*/
#define FOPEN_DIRECT_IO (1 << 0)
#define FOPEN_KEEP_CACHE (1 << 1)
#define FOPEN_NONSEEKABLE (1 << 2)
+#define FOPEN_STREAM (1 << 4)

/**
* INIT request/reply flags


2019-06-09 18:10:18

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.9 25/83] usbip: usbip_host: fix stub_dev lock context imbalance regression

From: Shuah Khan <[email protected]>

commit 3ea3091f1bd8586125848c62be295910e9802af0 upstream.

Fix the following sparse context imbalance regression introduced in
a patch that fixed sleeping function called from invalid context bug.

kbuild test robot reported on:

tree/branch: https://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb.git usb-linus

Regressions in current branch:

drivers/usb/usbip/stub_dev.c:399:9: sparse: sparse: context imbalance in 'stub_probe' - different lock contexts for basic block
drivers/usb/usbip/stub_dev.c:418:13: sparse: sparse: context imbalance in 'stub_disconnect' - different lock contexts for basic block
drivers/usb/usbip/stub_dev.c:464:1-10: second lock on line 476

Error ids grouped by kconfigs:

recent_errors
├── i386-allmodconfig
│ └── drivers-usb-usbip-stub_dev.c:second-lock-on-line
├── x86_64-allmodconfig
│ ├── drivers-usb-usbip-stub_dev.c:sparse:sparse:context-imbalance-in-stub_disconnect-different-lock-contexts-for-basic-block
│ └── drivers-usb-usbip-stub_dev.c:sparse:sparse:context-imbalance-in-stub_probe-different-lock-contexts-for-basic-block
└── x86_64-allyesconfig
└── drivers-usb-usbip-stub_dev.c:second-lock-on-line

This is a real problem in an error leg where spin_lock() is called on an
already held lock.

Fix the imbalance in stub_probe() and stub_disconnect().

Signed-off-by: Shuah Khan <[email protected]>
Fixes: 0c9e8b3cad65 ("usbip: usbip_host: fix BUG: sleeping function called from invalid context")
Cc: stable <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
drivers/usb/usbip/stub_dev.c | 36 +++++++++++++++++++++++-------------
1 file changed, 23 insertions(+), 13 deletions(-)

--- a/drivers/usb/usbip/stub_dev.c
+++ b/drivers/usb/usbip/stub_dev.c
@@ -340,14 +340,17 @@ static int stub_probe(struct usb_device
* See driver_probe_device() in driver/base/dd.c
*/
rc = -ENODEV;
- goto sdev_free;
+ if (!busid_priv)
+ goto sdev_free;
+
+ goto call_put_busid_priv;
}

if (udev->descriptor.bDeviceClass == USB_CLASS_HUB) {
dev_dbg(&udev->dev, "%s is a usb hub device... skip!\n",
udev_busid);
rc = -ENODEV;
- goto sdev_free;
+ goto call_put_busid_priv;
}

if (!strcmp(udev->bus->bus_name, "vhci_hcd")) {
@@ -356,7 +359,7 @@ static int stub_probe(struct usb_device
udev_busid);

rc = -ENODEV;
- goto sdev_free;
+ goto call_put_busid_priv;
}


@@ -375,6 +378,9 @@ static int stub_probe(struct usb_device
save_status = busid_priv->status;
busid_priv->status = STUB_BUSID_ALLOC;

+ /* release the busid_lock */
+ put_busid_priv(busid_priv);
+
/*
* Claim this hub port.
* It doesn't matter what value we pass as owner
@@ -387,9 +393,6 @@ static int stub_probe(struct usb_device
goto err_port;
}

- /* release the busid_lock */
- put_busid_priv(busid_priv);
-
rc = stub_add_files(&udev->dev);
if (rc) {
dev_err(&udev->dev, "stub_add_files for %s\n", udev_busid);
@@ -409,11 +412,17 @@ err_port:
spin_lock(&busid_priv->busid_lock);
busid_priv->sdev = NULL;
busid_priv->status = save_status;
-sdev_free:
- stub_device_free(sdev);
+ spin_unlock(&busid_priv->busid_lock);
+ /* lock is released - go to free */
+ goto sdev_free;
+
+call_put_busid_priv:
/* release the busid_lock */
put_busid_priv(busid_priv);

+sdev_free:
+ stub_device_free(sdev);
+
return rc;
}

@@ -449,7 +458,9 @@ static void stub_disconnect(struct usb_d
/* get stub_device */
if (!sdev) {
dev_err(&udev->dev, "could not get device");
- goto call_put_busid_priv;
+ /* release busid_lock */
+ put_busid_priv(busid_priv);
+ return;
}

dev_set_drvdata(&udev->dev, NULL);
@@ -479,7 +490,7 @@ static void stub_disconnect(struct usb_d
if (!busid_priv->shutdown_busid)
busid_priv->shutdown_busid = 1;
/* release busid_lock */
- put_busid_priv(busid_priv);
+ spin_unlock(&busid_priv->busid_lock);

/* shutdown the current connection */
shutdown_busid(busid_priv);
@@ -494,10 +505,9 @@ static void stub_disconnect(struct usb_d

if (busid_priv->status == STUB_BUSID_ALLOC)
busid_priv->status = STUB_BUSID_ADDED;
-
-call_put_busid_priv:
/* release busid_lock */
- put_busid_priv(busid_priv);
+ spin_unlock(&busid_priv->busid_lock);
+ return;
}

#ifdef CONFIG_PM


2019-06-09 18:10:30

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.9 70/83] Revert "fib_rules: return 0 directly if an exactly same rule exists when NLM_F_EXCL not supplied"

From: Hangbin Liu <[email protected]>

[ Upstream commit 4970b42d5c362bf873982db7d93245c5281e58f4 ]

This reverts commit e9919a24d3022f72bcadc407e73a6ef17093a849.

Nathan reported the new behaviour breaks Android, as Android just add
new rules and delete old ones.

If we return 0 without adding dup rules, Android will remove the new
added rules and causing system to soft-reboot.

Fixes: e9919a24d302 ("fib_rules: return 0 directly if an exactly same rule exists when NLM_F_EXCL not supplied")
Reported-by: Nathan Chancellor <[email protected]>
Reported-by: Yaro Slav <[email protected]>
Reported-by: Maciej Żenczykowski <[email protected]>
Signed-off-by: Hangbin Liu <[email protected]>
Reviewed-by: Nathan Chancellor <[email protected]>
Tested-by: Nathan Chancellor <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
net/core/fib_rules.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)

--- a/net/core/fib_rules.c
+++ b/net/core/fib_rules.c
@@ -429,9 +429,9 @@ int fib_nl_newrule(struct sk_buff *skb,
if (rule->l3mdev && rule->table)
goto errout_free;

- if (rule_exists(ops, frh, tb, rule)) {
- if (nlh->nlmsg_flags & NLM_F_EXCL)
- err = -EEXIST;
+ if ((nlh->nlmsg_flags & NLM_F_EXCL) &&
+ rule_exists(ops, frh, tb, rule)) {
+ err = -EEXIST;
goto errout_free;
}



2019-06-09 18:10:32

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.9 64/83] net/mlx4_en: ethtool, Remove unsupported SFP EEPROM high pages query

From: Erez Alfasi <[email protected]>

[ Upstream commit 135dd9594f127c8a82d141c3c8430e9e2143216a ]

Querying EEPROM high pages data for SFP module is currently
not supported by our driver but is still tried, resulting in
invalid FW queries.

Set the EEPROM ethtool data length to 256 for SFP module to
limit the reading for page 0 only and prevent invalid FW queries.

Fixes: 7202da8b7f71 ("ethtool, net/mlx4_en: Cable info, get_module_info/eeprom ethtool support")
Signed-off-by: Erez Alfasi <[email protected]>
Signed-off-by: Tariq Toukan <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
drivers/net/ethernet/mellanox/mlx4/en_ethtool.c | 4 +++-
drivers/net/ethernet/mellanox/mlx4/port.c | 5 -----
2 files changed, 3 insertions(+), 6 deletions(-)

--- a/drivers/net/ethernet/mellanox/mlx4/en_ethtool.c
+++ b/drivers/net/ethernet/mellanox/mlx4/en_ethtool.c
@@ -1930,6 +1930,8 @@ static int mlx4_en_set_tunable(struct ne
return ret;
}

+#define MLX4_EEPROM_PAGE_LEN 256
+
static int mlx4_en_get_module_info(struct net_device *dev,
struct ethtool_modinfo *modinfo)
{
@@ -1964,7 +1966,7 @@ static int mlx4_en_get_module_info(struc
break;
case MLX4_MODULE_ID_SFP:
modinfo->type = ETH_MODULE_SFF_8472;
- modinfo->eeprom_len = ETH_MODULE_SFF_8472_LEN;
+ modinfo->eeprom_len = MLX4_EEPROM_PAGE_LEN;
break;
default:
return -ENOSYS;
--- a/drivers/net/ethernet/mellanox/mlx4/port.c
+++ b/drivers/net/ethernet/mellanox/mlx4/port.c
@@ -1960,11 +1960,6 @@ int mlx4_get_module_info(struct mlx4_dev
size -= offset + size - I2C_PAGE_SIZE;

i2c_addr = I2C_ADDR_LOW;
- if (offset >= I2C_PAGE_SIZE) {
- /* Reset offset to high page */
- i2c_addr = I2C_ADDR_HIGH;
- offset -= I2C_PAGE_SIZE;
- }

cable_info = (struct mlx4_cable_info *)inmad->data;
cable_info->dev_mem_address = cpu_to_be16(offset);


2019-06-09 18:10:57

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.9 77/83] genwqe: Prevent an integer overflow in the ioctl

From: Dan Carpenter <[email protected]>

commit 110080cea0d0e4dfdb0b536e7f8a5633ead6a781 upstream.

There are a couple potential integer overflows here.

round_up(m->size + (m->addr & ~PAGE_MASK), PAGE_SIZE);

The first thing is that the "m->size + (...)" addition could overflow,
and the second is that round_up() overflows to zero if the result is
within PAGE_SIZE of the type max.

In this code, the "m->size" variable is an u64 but we're saving the
result in "map_size" which is an unsigned long and genwqe_user_vmap()
takes an unsigned long as well. So I have used ULONG_MAX as the upper
bound. From a practical perspective unsigned long is fine/better than
trying to change all the types to u64.

Fixes: eaf4722d4645 ("GenWQE Character device and DDCB queue")
Signed-off-by: Dan Carpenter <[email protected]>
Cc: stable <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
drivers/misc/genwqe/card_dev.c | 2 ++
drivers/misc/genwqe/card_utils.c | 4 ++++
2 files changed, 6 insertions(+)

--- a/drivers/misc/genwqe/card_dev.c
+++ b/drivers/misc/genwqe/card_dev.c
@@ -782,6 +782,8 @@ static int genwqe_pin_mem(struct genwqe_

if ((m->addr == 0x0) || (m->size == 0))
return -EINVAL;
+ if (m->size > ULONG_MAX - PAGE_SIZE - (m->addr & ~PAGE_MASK))
+ return -EINVAL;

map_addr = (m->addr & PAGE_MASK);
map_size = round_up(m->size + (m->addr & ~PAGE_MASK), PAGE_SIZE);
--- a/drivers/misc/genwqe/card_utils.c
+++ b/drivers/misc/genwqe/card_utils.c
@@ -582,6 +582,10 @@ int genwqe_user_vmap(struct genwqe_dev *
/* determine space needed for page_list. */
data = (unsigned long)uaddr;
offs = offset_in_page(data);
+ if (size > ULONG_MAX - PAGE_SIZE - offs) {
+ m->size = 0; /* mark unused and not added */
+ return -EINVAL;
+ }
m->nr_pages = DIV_ROUND_UP(offs + size, PAGE_SIZE);

m->page_list = kcalloc(m->nr_pages,


2019-06-09 18:11:02

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.9 62/83] ethtool: fix potential userspace buffer overflow

From: Vivien Didelot <[email protected]>

[ Upstream commit 0ee4e76937d69128a6a66861ba393ebdc2ffc8a2 ]

ethtool_get_regs() allocates a buffer of size ops->get_regs_len(),
and pass it to the kernel driver via ops->get_regs() for filling.

There is no restriction about what the kernel drivers can or cannot do
with the open ethtool_regs structure. They usually set regs->version
and ignore regs->len or set it to the same size as ops->get_regs_len().

But if userspace allocates a smaller buffer for the registers dump,
we would cause a userspace buffer overflow in the final copy_to_user()
call, which uses the regs.len value potentially reset by the driver.

To fix this, make this case obvious and store regs.len before calling
ops->get_regs(), to only copy as much data as requested by userspace,
up to the value returned by ops->get_regs_len().

While at it, remove the redundant check for non-null regbuf.

Signed-off-by: Vivien Didelot <[email protected]>
Reviewed-by: Michal Kubecek <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
net/core/ethtool.c | 5 ++++-
1 file changed, 4 insertions(+), 1 deletion(-)

--- a/net/core/ethtool.c
+++ b/net/core/ethtool.c
@@ -1390,13 +1390,16 @@ static int ethtool_get_regs(struct net_d
return -ENOMEM;
}

+ if (regs.len < reglen)
+ reglen = regs.len;
+
ops->get_regs(dev, &regs, regbuf);

ret = -EFAULT;
if (copy_to_user(useraddr, &regs, sizeof(regs)))
goto out;
useraddr += offsetof(struct ethtool_regs, data);
- if (regbuf && copy_to_user(useraddr, regbuf, regs.len))
+ if (copy_to_user(useraddr, regbuf, reglen))
goto out;
ret = 0;



2019-06-09 18:11:16

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.9 67/83] ipv6: fix EFAULT on sendto with icmpv6 and hdrincl

From: Olivier Matz <[email protected]>

[ Upstream commit b9aa52c4cb457e7416cc0c95f475e72ef4a61336 ]

The following code returns EFAULT (Bad address):

s = socket(AF_INET6, SOCK_RAW, IPPROTO_ICMPV6);
setsockopt(s, SOL_IPV6, IPV6_HDRINCL, 1);
sendto(ipv6_icmp6_packet, addr); /* returns -1, errno = EFAULT */

The IPv4 equivalent code works. A workaround is to use IPPROTO_RAW
instead of IPPROTO_ICMPV6.

The failure happens because 2 bytes are eaten from the msghdr by
rawv6_probe_proto_opt() starting from commit 19e3c66b52ca ("ipv6
equivalent of "ipv4: Avoid reading user iov twice after
raw_probe_proto_opt""), but at that time it was not a problem because
IPV6_HDRINCL was not yet introduced.

Only eat these 2 bytes if hdrincl == 0.

Fixes: 715f504b1189 ("ipv6: add IPV6_HDRINCL option for raw sockets")
Signed-off-by: Olivier Matz <[email protected]>
Acked-by: Nicolas Dichtel <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
net/ipv6/raw.c | 13 ++++++++-----
1 file changed, 8 insertions(+), 5 deletions(-)

--- a/net/ipv6/raw.c
+++ b/net/ipv6/raw.c
@@ -880,11 +880,14 @@ static int rawv6_sendmsg(struct sock *sk
opt = ipv6_fixup_options(&opt_space, opt);

fl6.flowi6_proto = proto;
- rfv.msg = msg;
- rfv.hlen = 0;
- err = rawv6_probe_proto_opt(&rfv, &fl6);
- if (err)
- goto out;
+
+ if (!hdrincl) {
+ rfv.msg = msg;
+ rfv.hlen = 0;
+ err = rawv6_probe_proto_opt(&rfv, &fl6);
+ if (err)
+ goto out;
+ }

if (!ipv6_addr_any(daddr))
fl6.daddr = *daddr;


2019-06-09 18:44:14

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.9 42/83] memcg: make it work on sparse non-0-node systems

From: Jiri Slaby <[email protected]>

commit 3e8589963773a5c23e2f1fe4bcad0e9a90b7f471 upstream.

We have a single node system with node 0 disabled:
Scanning NUMA topology in Northbridge 24
Number of physical nodes 2
Skipping disabled node 0
Node 1 MemBase 0000000000000000 Limit 00000000fbff0000
NODE_DATA(1) allocated [mem 0xfbfda000-0xfbfeffff]

This causes crashes in memcg when system boots:
BUG: unable to handle kernel NULL pointer dereference at 0000000000000008
#PF error: [normal kernel read fault]
...
RIP: 0010:list_lru_add+0x94/0x170
...
Call Trace:
d_lru_add+0x44/0x50
dput.part.34+0xfc/0x110
__fput+0x108/0x230
task_work_run+0x9f/0xc0
exit_to_usermode_loop+0xf5/0x100

It is reproducible as far as 4.12. I did not try older kernels. You have
to have a new enough systemd, e.g. 241 (the reason is unknown -- was not
investigated). Cannot be reproduced with systemd 234.

The system crashes because the size of lru array is never updated in
memcg_update_all_list_lrus and the reads are past the zero-sized array,
causing dereferences of random memory.

The root cause are list_lru_memcg_aware checks in the list_lru code. The
test in list_lru_memcg_aware is broken: it assumes node 0 is always
present, but it is not true on some systems as can be seen above.

So fix this by avoiding checks on node 0. Remember the memcg-awareness by
a bool flag in struct list_lru.

Link: http://lkml.kernel.org/r/[email protected]
Fixes: 60d3fd32a7a9 ("list_lru: introduce per-memcg lists")
Signed-off-by: Jiri Slaby <[email protected]>
Acked-by: Michal Hocko <[email protected]>
Suggested-by: Vladimir Davydov <[email protected]>
Acked-by: Vladimir Davydov <[email protected]>
Reviewed-by: Shakeel Butt <[email protected]>
Cc: Johannes Weiner <[email protected]>
Cc: Raghavendra K T <[email protected]>
Cc: <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
include/linux/list_lru.h | 1 +
mm/list_lru.c | 8 +++-----
2 files changed, 4 insertions(+), 5 deletions(-)

--- a/include/linux/list_lru.h
+++ b/include/linux/list_lru.h
@@ -51,6 +51,7 @@ struct list_lru {
struct list_lru_node *node;
#if defined(CONFIG_MEMCG) && !defined(CONFIG_SLOB)
struct list_head list;
+ bool memcg_aware;
#endif
};

--- a/mm/list_lru.c
+++ b/mm/list_lru.c
@@ -42,11 +42,7 @@ static void list_lru_unregister(struct l
#if defined(CONFIG_MEMCG) && !defined(CONFIG_SLOB)
static inline bool list_lru_memcg_aware(struct list_lru *lru)
{
- /*
- * This needs node 0 to be always present, even
- * in the systems supporting sparse numa ids.
- */
- return !!lru->node[0].memcg_lrus;
+ return lru->memcg_aware;
}

static inline struct list_lru_one *
@@ -389,6 +385,8 @@ static int memcg_init_list_lru(struct li
{
int i;

+ lru->memcg_aware = memcg_aware;
+
if (!memcg_aware)
return 0;



2019-06-09 19:09:34

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.9 45/83] staging: vc04_services: prevent integer overflow in create_pagelist()

From: Dan Carpenter <[email protected]>

commit ca641bae6da977d638458e78cd1487b6160a2718 upstream.

The create_pagelist() "count" parameter comes from the user in
vchiq_ioctl() and it could overflow. If you look at how create_page()
is called in vchiq_prepare_bulk_data(), then the "size" variable is an
int so it doesn't make sense to allow negatives or larger than INT_MAX.

I don't know this code terribly well, but I believe that typical values
of "count" are typically quite low and I don't think this check will
affect normal valid uses at all.

The "pagelist_size" calculation can also overflow on 32 bit systems, but
not on 64 bit systems. I have added an integer overflow check for that
as well.

The Raspberry PI doesn't offer the same level of memory protection that
x86 does so these sorts of bugs are probably not super critical to fix.

Fixes: 71bad7f08641 ("staging: add bcm2708 vchiq driver")
Signed-off-by: Dan Carpenter <[email protected]>
Cc: stable <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
drivers/staging/vc04_services/interface/vchiq_arm/vchiq_2835_arm.c | 9 +++++++++
1 file changed, 9 insertions(+)

--- a/drivers/staging/vc04_services/interface/vchiq_arm/vchiq_2835_arm.c
+++ b/drivers/staging/vc04_services/interface/vchiq_arm/vchiq_2835_arm.c
@@ -381,9 +381,18 @@ create_pagelist(char __user *buf, size_t
int run, addridx, actual_pages;
unsigned long *need_release;

+ if (count >= INT_MAX - PAGE_SIZE)
+ return NULL;
+
offset = (unsigned int)buf & (PAGE_SIZE - 1);
num_pages = (count + offset + PAGE_SIZE - 1) / PAGE_SIZE;

+ if (num_pages > (SIZE_MAX - sizeof(PAGELIST_T) -
+ sizeof(struct vchiq_pagelist_info)) /
+ (sizeof(u32) + sizeof(pages[0]) +
+ sizeof(struct scatterlist)))
+ return NULL;
+
*ppagelist = NULL;

/* Allocate enough storage to hold the page pointers and the page


2019-06-09 19:09:37

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.9 76/83] Revert "MIPS: perf: ath79: Fix perfcount IRQ assignment"

From: Greg Kroah-Hartman <[email protected]>

This reverts commit f9b1baac265600a61d36ebaf9ba657119303b5b5 which is
commit a1e8783db8e0d58891681bc1e6d9ada66eae8e20 upstream.

Petr writes:
Karl has reported to me today, that he's experiencing weird
reboot hang on his devices with 4.9.180 kernel and that he has
bisected it down to my backported patch.

I would like to kindly ask you for removal of this patch. This
patch should be reverted from all stable kernels up to 5.1,
because perf counters were not broken on those kernels, and this
patch won't work on the ath79 legacy IRQ code anyway, it needs
new irqchip driver which was enabled on ath79 with commit
51fa4f8912c0 ("MIPS: ath79: drop legacy IRQ code").

Reported-by: Petr Štetiar <[email protected]>
Cc: Kevin 'ldir' Darbyshire-Bryant <[email protected]>
Cc: John Crispin <[email protected]>
Cc: Marc Zyngier <[email protected]>
Cc: Paul Burton <[email protected]>
Cc: [email protected]
Cc: Ralf Baechle <[email protected]>
Cc: James Hogan <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Jason Cooper <[email protected]>
Cc: Sasha Levin <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
arch/mips/ath79/setup.c | 6 ++++++
drivers/irqchip/irq-ath79-misc.c | 11 -----------
2 files changed, 6 insertions(+), 11 deletions(-)

--- a/arch/mips/ath79/setup.c
+++ b/arch/mips/ath79/setup.c
@@ -183,6 +183,12 @@ const char *get_system_type(void)
return ath79_sys_type;
}

+int get_c0_perfcount_int(void)
+{
+ return ATH79_MISC_IRQ(5);
+}
+EXPORT_SYMBOL_GPL(get_c0_perfcount_int);
+
unsigned int get_c0_compare_int(void)
{
return CP0_LEGACY_COMPARE_IRQ;
--- a/drivers/irqchip/irq-ath79-misc.c
+++ b/drivers/irqchip/irq-ath79-misc.c
@@ -22,15 +22,6 @@
#define AR71XX_RESET_REG_MISC_INT_ENABLE 4

#define ATH79_MISC_IRQ_COUNT 32
-#define ATH79_MISC_PERF_IRQ 5
-
-static int ath79_perfcount_irq;
-
-int get_c0_perfcount_int(void)
-{
- return ath79_perfcount_irq;
-}
-EXPORT_SYMBOL_GPL(get_c0_perfcount_int);

static void ath79_misc_irq_handler(struct irq_desc *desc)
{
@@ -122,8 +113,6 @@ static void __init ath79_misc_intc_domai
{
void __iomem *base = domain->host_data;

- ath79_perfcount_irq = irq_create_mapping(domain, ATH79_MISC_PERF_IRQ);
-
/* Disable and clear all interrupts */
__raw_writel(0, base + AR71XX_RESET_REG_MISC_INT_ENABLE);
__raw_writel(0, base + AR71XX_RESET_REG_MISC_INT_STATUS);


2019-06-09 19:09:38

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.9 68/83] ipv6: use READ_ONCE() for inet->hdrincl as in ipv4

From: Olivier Matz <[email protected]>

[ Upstream commit 59e3e4b52663a9d97efbce7307f62e4bc5c9ce91 ]

As it was done in commit 8f659a03a0ba ("net: ipv4: fix for a race
condition in raw_sendmsg") and commit 20b50d79974e ("net: ipv4: emulate
READ_ONCE() on ->hdrincl bit-field in raw_sendmsg()") for ipv4, copy the
value of inet->hdrincl in a local variable, to avoid introducing a race
condition in the next commit.

Signed-off-by: Olivier Matz <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
net/ipv6/raw.c | 12 ++++++++++--
1 file changed, 10 insertions(+), 2 deletions(-)

--- a/net/ipv6/raw.c
+++ b/net/ipv6/raw.c
@@ -774,6 +774,7 @@ static int rawv6_sendmsg(struct sock *sk
struct sockcm_cookie sockc;
struct ipcm6_cookie ipc6;
int addr_len = msg->msg_namelen;
+ int hdrincl;
u16 proto;
int err;

@@ -787,6 +788,13 @@ static int rawv6_sendmsg(struct sock *sk
if (msg->msg_flags & MSG_OOB)
return -EOPNOTSUPP;

+ /* hdrincl should be READ_ONCE(inet->hdrincl)
+ * but READ_ONCE() doesn't work with bit fields.
+ * Doing this indirectly yields the same result.
+ */
+ hdrincl = inet->hdrincl;
+ hdrincl = READ_ONCE(hdrincl);
+
/*
* Get and verify the address.
*/
@@ -904,7 +912,7 @@ static int rawv6_sendmsg(struct sock *sk
fl6.flowi6_oif = np->ucast_oif;
security_sk_classify_flow(sk, flowi6_to_flowi(&fl6));

- if (inet->hdrincl)
+ if (hdrincl)
fl6.flowi6_flags |= FLOWI_FLAG_KNOWN_NH;

if (ipc6.tclass < 0)
@@ -927,7 +935,7 @@ static int rawv6_sendmsg(struct sock *sk
goto do_confirm;

back_from_confirm:
- if (inet->hdrincl)
+ if (hdrincl)
err = rawv6_send_hdrinc(sk, msg, len, &fl6, &dst, msg->msg_flags);
else {
ipc6.opt = opt;


2019-06-09 22:11:21

by kernelci.org bot

[permalink] [raw]
Subject: Re: [PATCH 4.9 00/83] 4.9.181-stable review

stable-rc/linux-4.9.y boot: 107 boots: 1 failed, 106 passed (v4.9.180-84-g4fcf72df7bc7)

Full Boot Summary: https://kernelci.org/boot/all/job/stable-rc/branch/linux-4.9.y/kernel/v4.9.180-84-g4fcf72df7bc7/
Full Build Summary: https://kernelci.org/build/stable-rc/branch/linux-4.9.y/kernel/v4.9.180-84-g4fcf72df7bc7/

Tree: stable-rc
Branch: linux-4.9.y
Git Describe: v4.9.180-84-g4fcf72df7bc7
Git Commit: 4fcf72df7bc71264d86e616874a0a0cd382f1b12
Git URL: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git
Tested: 53 unique boards, 23 SoC families, 15 builds out of 197

Boot Regressions Detected:

arm:

omap2plus_defconfig:
gcc-8:
omap3-beagle-xm:
lab-baylibre: new failure (last pass: v4.9.180-62-gd9b5fd7ab17b)

Boot Failure Detected:

arm:
omap2plus_defconfig:
gcc-8:
omap3-beagle-xm: 1 failed lab

---
For more info write to <[email protected]>

2019-06-10 06:39:26

by Naresh Kamboju

[permalink] [raw]
Subject: Re: [PATCH 4.9 00/83] 4.9.181-stable review

On Sun, 9 Jun 2019 at 22:22, Greg Kroah-Hartman
<[email protected]> wrote:
>
> This is the start of the stable review cycle for the 4.9.181 release.
> There are 83 patches in this series, all will be posted as a response
> to this one. If anyone has any issues with these being applied, please
> let me know.
>
> Responses should be made by Tue 11 Jun 2019 04:39:58 PM UTC.
> Anything received after that time might be too late.
>
> The whole patch series can be found in one patch at:
> https://www.kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.9.181-rc1.gz
> or in the git tree and branch at:
> git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-4.9.y
> and the diffstat can be found below.
>
> thanks,
>
> greg k-h


Results from Linaro’s test farm.
No regressions on arm64, arm, x86_64, and i386.

NOTE:
selftest sources version updated to 5.1
LTP version upgrade to 20190517

Summary
------------------------------------------------------------------------

kernel: 4.9.181-rc1
git repo: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git
git branch: linux-4.9.y
git commit: 4fcf72df7bc71264d86e616874a0a0cd382f1b12
git describe: v4.9.180-84-g4fcf72df7bc7
Test details: https://qa-reports.linaro.org/lkft/linux-stable-rc-4.9-oe/build/v4.9.180-84-g4fcf72df7bc7

No regressions (compared to build v4.9.180)

No fixes (compared to build v4.9.180)

Ran 23615 total tests in the following environments and test suites.

Environments
--------------
- dragonboard-410c - arm64
- hi6220-hikey - arm64
- i386
- juno-r2 - arm64
- qemu_arm
- qemu_arm64
- qemu_i386
- qemu_x86_64
- x15 - arm
- x86_64

Test Suites
-----------
* build
* install-android-platform-tools-r2600
* kselftest
* libhugetlbfs
* ltp-cap_bounds-tests
* ltp-commands-tests
* ltp-containers-tests
* ltp-cpuhotplug-tests
* ltp-cve-tests
* ltp-dio-tests
* ltp-fcntl-locktests-tests
* ltp-filecaps-tests
* ltp-fs-tests
* ltp-fs_bind-tests
* ltp-fs_perms_simple-tests
* ltp-fsx-tests
* ltp-hugetlb-tests
* ltp-io-tests
* ltp-ipc-tests
* ltp-math-tests
* ltp-mm-tests
* ltp-nptl-tests
* ltp-pty-tests
* ltp-sched-tests
* ltp-securebits-tests
* ltp-syscalls-tests
* ltp-timers-tests
* perf
* spectre-meltdown-checker-test
* v4l2-compliance
* network-basic-tests
* ltp-open-posix-tests
* prep-tmp-disk
* kvm-unit-tests
* kselftest-vsyscall-mode-native
* kselftest-vsyscall-mode-none

--
Linaro LKFT
https://lkft.linaro.org

2019-06-10 08:52:23

by Jon Hunter

[permalink] [raw]
Subject: Re: [PATCH 4.9 00/83] 4.9.181-stable review


On 09/06/2019 17:41, Greg Kroah-Hartman wrote:
> This is the start of the stable review cycle for the 4.9.181 release.
> There are 83 patches in this series, all will be posted as a response
> to this one. If anyone has any issues with these being applied, please
> let me know.
>
> Responses should be made by Tue 11 Jun 2019 04:39:58 PM UTC.
> Anything received after that time might be too late.
>
> The whole patch series can be found in one patch at:
> https://www.kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.9.181-rc1.gz
> or in the git tree and branch at:
> git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-4.9.y
> and the diffstat can be found below.
>
> thanks,
>
> greg k-h

All tests are passing for Tegra ...

Test results for stable-v4.9:
8 builds: 8 pass, 0 fail
16 boots: 16 pass, 0 fail
24 tests: 24 pass, 0 fail

Linux version: 4.9.181-rc1-g4fcf72d
Boards tested: tegra124-jetson-tk1, tegra20-ventana,
tegra210-p2371-2180, tegra30-cardhu-a04

Cheers
Jon

--
nvpublic

2019-06-10 16:06:20

by Guenter Roeck

[permalink] [raw]
Subject: Re: [PATCH 4.9 00/83] 4.9.181-stable review

On Sun, Jun 09, 2019 at 06:41:30PM +0200, Greg Kroah-Hartman wrote:
> This is the start of the stable review cycle for the 4.9.181 release.
> There are 83 patches in this series, all will be posted as a response
> to this one. If anyone has any issues with these being applied, please
> let me know.
>
> Responses should be made by Tue 11 Jun 2019 04:39:58 PM UTC.
> Anything received after that time might be too late.
>
Build results:
total: 172 pass: 172 fail: 0
Qemu test results:
total: 322 pass: 322 fail: 0

Guenter

2019-06-10 21:50:52

by Shuah Khan

[permalink] [raw]
Subject: Re: [PATCH 4.9 00/83] 4.9.181-stable review

On 6/9/19 10:41 AM, Greg Kroah-Hartman wrote:
> This is the start of the stable review cycle for the 4.9.181 release.
> There are 83 patches in this series, all will be posted as a response
> to this one. If anyone has any issues with these being applied, please
> let me know.
>
> Responses should be made by Tue 11 Jun 2019 04:39:58 PM UTC.
> Anything received after that time might be too late.
>
> The whole patch series can be found in one patch at:
> https://www.kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.9.181-rc1.gz
> or in the git tree and branch at:
> git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-4.9.y
> and the diffstat can be found below.
>
> thanks,
>
> greg k-h
>

Compiled and booted on my test system. No dmesg regressions.

thanks,
-- Shuah

2019-06-19 16:10:58

by Martin Weinelt

[permalink] [raw]
Subject: Re: [PATCH 4.9 45/83] staging: vc04_services: prevent integer overflow in create_pagelist()

Hi.

On 6/9/19 6:42 PM, Greg Kroah-Hartman wrote:
> From: Dan Carpenter <[email protected]>
>
> commit ca641bae6da977d638458e78cd1487b6160a2718 upstream.

This commit breaks the kernel build because the vchiq_pagelist_info
struct is not defined in v4.9.182.

It was only added in v4.10, in commit
4807f2c0e684e907c501cb96049809d7a957dbc2.


Best regards,

Martin Weinelt


In file included from ./include/uapi/linux/posix_types.h:4:0,
from ./include/uapi/linux/types.h:13,
from ./include/linux/compiler.h:224,
from ./include/linux/linkage.h:4,
from ./include/linux/kernel.h:6,
from
drivers/staging/vc04_services/interface/vchiq_arm/vchiq_2835_arm.c:34:
drivers/staging/vc04_services/interface/vchiq_arm/vchiq_2835_arm.c: In
function 'create_pagelist':
./include/linux/stddef.h:7:14: warning: return makes integer from
pointer without a cast [-Wint-conversion]
#define NULL ((void *)0)
^
drivers/staging/vc04_services/interface/vchiq_arm/vchiq_2835_arm.c:385:10:
note: in expansion of macro 'NULL'
return NULL;
^~~~
drivers/staging/vc04_services/interface/vchiq_arm/vchiq_2835_arm.c:391:12:
error: invalid application of 'sizeof' to incomplete type 'struct
vchiq_pagelist_info'
sizeof(struct vchiq_pagelist_info)) /
^~~~~~
In file included from ./include/uapi/linux/posix_types.h:4:0,
from ./include/uapi/linux/types.h:13,
from ./include/linux/compiler.h:224,
from ./include/linux/linkage.h:4,
from ./include/linux/kernel.h:6,
from
drivers/staging/vc04_services/interface/vchiq_arm/vchiq_2835_arm.c:34:
./include/linux/stddef.h:7:14: warning: return makes integer from
pointer without a cast [-Wint-conversion]
#define NULL ((void *)0)
^
drivers/staging/vc04_services/interface/vchiq_arm/vchiq_2835_arm.c:394:10:
note: in expansion of macro 'NULL'
return NULL;
^~~~


>
> The create_pagelist() "count" parameter comes from the user in
> vchiq_ioctl() and it could overflow. If you look at how create_page()
> is called in vchiq_prepare_bulk_data(), then the "size" variable is an
> int so it doesn't make sense to allow negatives or larger than INT_MAX.
>
> I don't know this code terribly well, but I believe that typical values
> of "count" are typically quite low and I don't think this check will
> affect normal valid uses at all.
>
> The "pagelist_size" calculation can also overflow on 32 bit systems, but
> not on 64 bit systems. I have added an integer overflow check for that
> as well.
>
> The Raspberry PI doesn't offer the same level of memory protection that
> x86 does so these sorts of bugs are probably not super critical to fix.
>
> Fixes: 71bad7f08641 ("staging: add bcm2708 vchiq driver")
> Signed-off-by: Dan Carpenter <[email protected]>
> Cc: stable <[email protected]>
> Signed-off-by: Greg Kroah-Hartman <[email protected]>
> Signed-off-by: Greg Kroah-Hartman <[email protected]>
> ---
> drivers/staging/vc04_services/interface/vchiq_arm/vchiq_2835_arm.c | 9 +++++++++
> 1 file changed, 9 insertions(+)
>
> --- a/drivers/staging/vc04_services/interface/vchiq_arm/vchiq_2835_arm.c
> +++ b/drivers/staging/vc04_services/interface/vchiq_arm/vchiq_2835_arm.c
> @@ -381,9 +381,18 @@ create_pagelist(char __user *buf, size_t
> int run, addridx, actual_pages;
> unsigned long *need_release;
>
> + if (count >= INT_MAX - PAGE_SIZE)
> + return NULL;
> +
> offset = (unsigned int)buf & (PAGE_SIZE - 1);
> num_pages = (count + offset + PAGE_SIZE - 1) / PAGE_SIZE;
>
> + if (num_pages > (SIZE_MAX - sizeof(PAGELIST_T) -
> + sizeof(struct vchiq_pagelist_info)) /
> + (sizeof(u32) + sizeof(pages[0]) +
> + sizeof(struct scatterlist)))
> + return NULL;
> +
> *ppagelist = NULL;
>
> /* Allocate enough storage to hold the page pointers and the page
>

2019-06-19 17:14:15

by Greg Kroah-Hartman

[permalink] [raw]
Subject: Re: [PATCH 4.9 45/83] staging: vc04_services: prevent integer overflow in create_pagelist()

On Wed, Jun 19, 2019 at 06:02:07PM +0200, Martin Weinelt wrote:
> Hi.
>
> On 6/9/19 6:42 PM, Greg Kroah-Hartman wrote:
> > From: Dan Carpenter <[email protected]>
> >
> > commit ca641bae6da977d638458e78cd1487b6160a2718 upstream.
>
> This commit breaks the kernel build because the vchiq_pagelist_info
> struct is not defined in v4.9.182.
>
> It was only added in v4.10, in commit
> 4807f2c0e684e907c501cb96049809d7a957dbc2.
>
>
> Best regards,
>
> Martin Weinelt
>
>
> In file included from ./include/uapi/linux/posix_types.h:4:0,
> from ./include/uapi/linux/types.h:13,
> from ./include/linux/compiler.h:224,
> from ./include/linux/linkage.h:4,
> from ./include/linux/kernel.h:6,
> from
> drivers/staging/vc04_services/interface/vchiq_arm/vchiq_2835_arm.c:34:
> drivers/staging/vc04_services/interface/vchiq_arm/vchiq_2835_arm.c: In
> function 'create_pagelist':
> ./include/linux/stddef.h:7:14: warning: return makes integer from
> pointer without a cast [-Wint-conversion]
> #define NULL ((void *)0)
> ^
> drivers/staging/vc04_services/interface/vchiq_arm/vchiq_2835_arm.c:385:10:
> note: in expansion of macro 'NULL'
> return NULL;
> ^~~~
> drivers/staging/vc04_services/interface/vchiq_arm/vchiq_2835_arm.c:391:12:
> error: invalid application of 'sizeof' to incomplete type 'struct
> vchiq_pagelist_info'
> sizeof(struct vchiq_pagelist_info)) /
> ^~~~~~
> In file included from ./include/uapi/linux/posix_types.h:4:0,
> from ./include/uapi/linux/types.h:13,
> from ./include/linux/compiler.h:224,
> from ./include/linux/linkage.h:4,
> from ./include/linux/kernel.h:6,
> from
> drivers/staging/vc04_services/interface/vchiq_arm/vchiq_2835_arm.c:34:
> ./include/linux/stddef.h:7:14: warning: return makes integer from
> pointer without a cast [-Wint-conversion]
> #define NULL ((void *)0)
> ^
> drivers/staging/vc04_services/interface/vchiq_arm/vchiq_2835_arm.c:394:10:
> note: in expansion of macro 'NULL'
> return NULL;
> ^~~~

Really? How come all of the built tests still succeed?

Ah, arm systems :(

Odd that we didn't catch this already, sorry about that. And that was
my fault in the backport, which the build tests did catch. Odd that it
didn't catch the failure after that...

Anyway, thanks, I'll go revert this.

greg k-h

2019-07-31 15:35:48

by Vlastimil Babka

[permalink] [raw]
Subject: Re: [PATCH 4.9 57/83] mm: prevent get_user_pages() from overflowing page refcount

On 6/9/19 6:42 PM, Greg Kroah-Hartman wrote:
> From: Linus Torvalds <[email protected]>
>
> commit 8fde12ca79aff9b5ba951fce1a2641901b8d8e64 upstream.
>
> If the page refcount wraps around past zero, it will be freed while
> there are still four billion references to it. One of the possible
> avenues for an attacker to try to make this happen is by doing direct IO
> on a page multiple times. This patch makes get_user_pages() refuse to
> take a new page reference if there are already more than two billion
> references to the page.
>
> Reported-by: Jann Horn <[email protected]>
> Acked-by: Matthew Wilcox <[email protected]>
> Signed-off-by: Linus Torvalds <[email protected]>
> [bwh: Backported to 4.9:
> - Add the "err" variable in follow_hugetlb_page()
> - Adjust context]
> Signed-off-by: Ben Hutchings <[email protected]>
> Signed-off-by: Greg Kroah-Hartman <[email protected]>
> ---
> mm/gup.c | 45 ++++++++++++++++++++++++++++++++++-----------
> mm/hugetlb.c | 16 +++++++++++++++-
> 2 files changed, 49 insertions(+), 12 deletions(-)
>

...

> @@ -1231,6 +1240,20 @@ struct page *get_dump_page(unsigned long
> */
> #ifdef CONFIG_HAVE_GENERIC_RCU_GUP
>
> +/*
> + * Return the compund head page with ref appropriately incremented,
> + * or NULL if that failed.
> + */
> +static inline struct page *try_get_compound_head(struct page *page, int refs)
> +{
> + struct page *head = compound_head(page);
> + if (WARN_ON_ONCE(page_ref_count(head) < 0))
> + return NULL;
> + if (unlikely(!page_cache_add_speculative(head, refs)))
> + return NULL;
> + return head;
> +}
> +
> #ifdef __HAVE_ARCH_PTE_SPECIAL
> static int gup_pte_range(pmd_t pmd, unsigned long addr, unsigned long end,
> int write, struct page **pages, int *nr)
> @@ -1263,9 +1286,9 @@ static int gup_pte_range(pmd_t pmd, unsi
>
> VM_BUG_ON(!pfn_valid(pte_pfn(pte)));
> page = pte_page(pte);
> - head = compound_head(page);
>
> - if (!page_cache_get_speculative(head))
> + head = try_get_compound_head(page, 1);

BTW, several arches in 4.9, including x86, have arch-specific fast gup
implementation, which is not touched by this backport. Didn't check if
Jann's exploit ends up using the fast on non-fast one, though.