2018-09-17 23:01:51

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.14 000/126] 4.14.71-stable review

This is the start of the stable review cycle for the 4.14.71 release.
There are 126 patches in this series, all will be posted as a response
to this one. If anyone has any issues with these being applied, please
let me know.

Responses should be made by Wed Sep 19 21:16:12 UTC 2018.
Anything received after that time might be too late.

The whole patch series can be found in one patch at:
https://www.kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.14.71-rc1.gz
or in the git tree and branch at:
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-4.14.y
and the diffstat can be found below.

thanks,

greg k-h

-------------
Pseudo-Shortlog of commits:

Greg Kroah-Hartman <[email protected]>
Linux 4.14.71-rc1

Linus Torvalds <[email protected]>
mm: get rid of vmacache_flush_all() entirely

Ian Kent <[email protected]>
autofs: fix autofs_sbi() does not check super block type

Jason Wang <[email protected]>
tuntap: fix use after free during release

Jason Wang <[email protected]>
tun: fix use after free for ptr_ring

Wei Yongjun <[email protected]>
mtd: ubi: wl: Fix error return code in ubi_wl_init()

Taehee Yoo <[email protected]>
ip: frags: fix crash in ip_do_fragment()

Peter Oskolkov <[email protected]>
ip: process in-order fragments efficiently

Peter Oskolkov <[email protected]>
ip: add helpers to process in-order fragments faster.

Dan Carpenter <[email protected]>
ipv4: frags: precedence bug in ip_expire()

Eric Dumazet <[email protected]>
net: sk_buff rbnode reorg

Eric Dumazet <[email protected]>
net: add rb_to_skb() and other rb tree helpers

Eric Dumazet <[email protected]>
net: pskb_trim_rcsum() and CHECKSUM_COMPLETE are friends

Florian Westphal <[email protected]>
ipv6: defrag: drop non-last frags smaller than min mtu

Peter Oskolkov <[email protected]>
net: modify skb_rbtree_purge to return the truesize of all purged skbs.

Eric Dumazet <[email protected]>
net: speed up skb_rbtree_purge()

Peter Oskolkov <[email protected]>
ip: discard IPv4 datagrams with overlapping segments.

Eric Dumazet <[email protected]>
inet: frags: fix ip6frag_low_thresh boundary

Eric Dumazet <[email protected]>
inet: frags: get rid of ipfrag_skb_cb/FRAG_CB

Eric Dumazet <[email protected]>
inet: frags: reorganize struct netns_frags

Eric Dumazet <[email protected]>
rhashtable: reorganize struct rhashtable layout

Eric Dumazet <[email protected]>
ipv6: frags: rewrite ip6_expire_frag_queue()

Eric Dumazet <[email protected]>
inet: frags: do not clone skb in ip_expire()

Eric Dumazet <[email protected]>
inet: frags: break the 2GB limit for frags storage

Eric Dumazet <[email protected]>
inet: frags: remove inet_frag_maybe_warn_overflow()

Eric Dumazet <[email protected]>
inet: frags: get rif of inet_frag_evicting()

Eric Dumazet <[email protected]>
inet: frags: remove some helpers

Eric Dumazet <[email protected]>
inet: frags: use rhashtables for reassembly units

Eric Dumazet <[email protected]>
rhashtable: add schedule points

Eric Dumazet <[email protected]>
ipv6: export ip6 fragments sysctl to unprivileged users

Eric Dumazet <[email protected]>
inet: frags: refactor lowpan_net_frag_init()

Eric Dumazet <[email protected]>
inet: frags: refactor ipv6_frag_init()

Kees Cook <[email protected]>
inet: frags: Convert timers to use timer_setup()

Eric Dumazet <[email protected]>
inet: frags: refactor ipfrag_init()

Eric Dumazet <[email protected]>
inet: frags: add a pointer to struct netns_frags

Eric Dumazet <[email protected]>
inet: frags: change inet_frags_init_net() return value

Jani Nikula <[email protected]>
drm/i915: set DP Main Stream Attribute for color range on DDI platforms

Parav Pandit <[email protected]>
RDMA/cma: Do not ignore net namespace for unbound cm_id

Paul Burton <[email protected]>
MIPS: WARN_ON invalid DMA cache maintenance, not BUG_ON

Trond Myklebust <[email protected]>
NFSv4.1: Fix a potential layoutget/layoutrecall deadlock

Chao Yu <[email protected]>
f2fs: fix to do sanity check with {sit,nat}_ver_bitmap_bytesize

Zumeng Chen <[email protected]>
mfd: ti_am335x_tscadc: Fix struct clk memory leak

Geert Uytterhoeven <[email protected]>
iommu/ipmmu-vmsa: Fix allocation in atomic context

Dan Carpenter <[email protected]>
f2fs: Fix uninitialized return in f2fs_ioc_shutdown()

Chao Yu <[email protected]>
f2fs: fix to wait on page writeback before updating page

Katsuhiro Suzuki <[email protected]>
media: helene: fix xtal frequency setting at power on

Mauricio Faria de Oliveira <[email protected]>
partitions/aix: fix usage of uninitialized lv_info and lvname structures

Mauricio Faria de Oliveira <[email protected]>
partitions/aix: append null character to print data from disk

Sylwester Nawrocki <[email protected]>
media: s5p-mfc: Fix buffer look up in s5p_mfc_handle_frame_{new, copy_time} functions

Nick Dyer <[email protected]>
Input: atmel_mxt_ts - only use first T9 instance

John Pittman <[email protected]>
dm cache: only allow a single io_mode cache feature to be requested

Petr Machata <[email protected]>
net: dcb: For wild-card lookups, use priority -1, not 0

Nicholas Mc Guire <[email protected]>
MIPS: generic: fix missing of_node_put()

Nicholas Mc Guire <[email protected]>
MIPS: Octeon: add missing of_node_put()

Chao Yu <[email protected]>
f2fs: fix to do sanity check with reserved blkaddr of inline inode

Peter Rosin <[email protected]>
tpm/tpm_i2c_infineon: switch to i2c_lock_bus(..., I2C_LOCK_SEGMENT)

Linus Walleij <[email protected]>
tpm_tis_spi: Pass the SPI IRQ down to the driver

Chao Yu <[email protected]>
f2fs: fix to skip GC if type in SSA and SIT is inconsistent

Jinbum Park <[email protected]>
pktcdvd: Fix possible Spectre-v1 for pkt_devs

Chao Yu <[email protected]>
f2fs: try grabbing node page lock aggressively in sync scenario

Yelena Krivosheev <[email protected]>
net: mvneta: fix mtu change on port without link

Daniel Kurtz <[email protected]>
pinctrl/amd: only handle irq if it is pending and unmasked

Anton Vasilyev <[email protected]>
gpio: ml-ioh: Fix buffer underwrite on probe error path

Dan Carpenter <[email protected]>
pinctrl: imx: off by one in imx_pinconf_group_dbg_show()

Joerg Roedel <[email protected]>
x86/mm: Remove in_nmi() warning from vmalloc_fault()

Marcel Holtmann <[email protected]>
Bluetooth: hidp: Fix handling of strncpy for hid->name information

Surabhi Vishnoi <[email protected]>
ath10k: disable bundle mgmt tx completion event support

Huaisheng Ye <[email protected]>
tools/testing/nvdimm: kaddr and pfn can be NULL to ->direct_access()

Anton Vasilyev <[email protected]>
scsi: 3ware: fix return 0 on the error path of probe

Srinivas Pandruvada <[email protected]>
ata: libahci: Correct setting of DEVSLP register

Srinivas Pandruvada <[email protected]>
ata: libahci: Allow reconfigure of DEVSLP register

Paul Burton <[email protected]>
MIPS: Fix ISA virt/bus conversion for non-zero PHYS_OFFSET

Srinivas Kandagatla <[email protected]>
rpmsg: core: add support to power domains for devices

Loic Poulain <[email protected]>
wlcore: Set rx_status boottime_ns field on rx

Sven Eckelmann <[email protected]>
ath10k: prevent active scans on potential unusable channels

Felix Fietkau <[email protected]>
ath9k_hw: fix channel maximum power level test

Felix Fietkau <[email protected]>
ath9k: report tx status on EOSP

Finn Thain <[email protected]>
macintosh/via-pmu: Add missing mmio accessors

Kan Liang <[email protected]>
perf evlist: Fix error out while applying initial delay and LBR

Jiri Olsa <[email protected]>
perf c2c report: Fix crash for empty browser

Olga Kornievskaia <[email protected]>
NFSv4.0 fix client reference leak in callback

Christophe Leroy <[email protected]>
perf tools: Allow overriding MAX_NR_CPUS at compile time

Randy Dunlap <[email protected]>
f2fs: fix defined but not used build warnings

Yunlong Song <[email protected]>
f2fs: do not set free of current section

Chao Yu <[email protected]>
f2fs: fix to active page in lru list for read path

Anton Vasilyev <[email protected]>
tty: rocket: Fix possible buffer overwrite on register_PCI

Michael Kelley <[email protected]>
Drivers: hv: vmbus: Cleanup synic memory free path

Anton Vasilyev <[email protected]>
firmware: vpd: Fix section enabled flag on vpd_section_destroy

Dan Carpenter <[email protected]>
uio: potential double frees if __uio_register_device() fails

Anton Vasilyev <[email protected]>
misc: ti-st: Fix memory leak in the error path of probe()

Philipp Zabel <[email protected]>
gpu: ipu-v3: default to id 0 on missing OF alias

Todor Tomov <[email protected]>
media: camss: csid: Configure data type and decode format properly

Gaurav Kohli <[email protected]>
timers: Clear timer_base::must_forward_clk with timer_base::lock held

BingJing Chang <[email protected]>
md/raid5: fix data corruption of replacements after originals dropped

Mike Christie <[email protected]>
scsi: target: fix __transport_register_session locking

Ming Lei <[email protected]>
blk-mq: fix updating tags depth

Arun Parameswaran <[email protected]>
net: phy: Fix the register offsets in Broadcom iProc mdio mux driver

Anton Vasilyev <[email protected]>
media: dw2102: Fix memleak on sequence of probes

Anton Vasilyev <[email protected]>
media: davinci: vpif_display: Mix memory leak on probe error path

Roman Gushchin <[email protected]>
selftests/bpf: fix a typo in map in map test

Reza Arbab <[email protected]>
powerpc/powernv: Fix concurrency issue with npu->mmio_atsd_usage

Dmitry Osipenko <[email protected]>
gpio: tegra: Move driver registration to subsys_init level

Johan Hedberg <[email protected]>
Bluetooth: h5: Fix missing dependency on BT_HCIUART_SERDEV

Jae Hyun Yoo <[email protected]>
i2c: aspeed: Add an explicit type casting for *get_clk_reg_val

Florian Fainelli <[email protected]>
ethtool: Remove trailing semicolon for static inline

Dan Carpenter <[email protected]>
misc: mic: SCIF Fix scif_get_new_port() error handling

Alexey Brodkin <[email protected]>
ARC: [plat-axs*]: Enable SWAP

Tomas Winkler <[email protected]>
tpm: separate cmd_ready/go_idle from runtime_pm

Arnd Bergmann <[email protected]>
crypto: aes-generic - fix aes-generic regression on powerpc

Gustavo A. R. Silva <[email protected]>
switchtec: Fix Spectre v1 vulnerability

Filippo Sironi <[email protected]>
x86/microcode: Update the new microcode revision unconditionally

Prarit Bhargava <[email protected]>
x86/microcode: Make sure boot_cpu_data.microcode is up-to-date

Thomas Gleixner <[email protected]>
cpu/hotplug: Prevent state corruption on error rollback

Neeraj Upadhyay <[email protected]>
cpu/hotplug: Adjust misplaced smb() in cpuhp_thread_fun()

Takashi Iwai <[email protected]>
ALSA: hda - Fix cancel_work_sync() stall from jackpoll work

Sean Christopherson <[email protected]>
KVM: VMX: Do not allow reexecute_instruction() when skipping MMIO instr

Pierre Morel <[email protected]>
KVM: s390: vsie: copy wrapping keys to right place

Filipe Manana <[email protected]>
Btrfs: fix data corruption when deduplicating between different files

Steve French <[email protected]>
smb3: check for and properly advertise directory lease support

Steve French <[email protected]>
SMB3: Backup intent flag missing for directory opens with backupuid mounts

Paul Burton <[email protected]>
MIPS: VDSO: Match data page cache colouring when D$ aliases

Minchan Kim <[email protected]>
android: binder: fix the race mmap and alloc_new_buf_locked

Konstantin Khlebnikov <[email protected]>
block: bfq: swap puts in bfqg_and_blkg_put

Jens Axboe <[email protected]>
nbd: don't allow invalid blocksize settings

James Smart <[email protected]>
scsi: lpfc: Correct MDS diag and nvmet configuration

Felipe Balbi <[email protected]>
i2c: i801: fix DNV's SMBCTRL register offset

Shubhrajyoti Datta <[email protected]>
i2c: xiic: Make the start and the byte count write atomic


-------------

Diffstat:

Documentation/networking/ip-sysctl.txt | 13 +-
Makefile | 4 +-
arch/arc/configs/axs101_defconfig | 1 -
arch/arc/configs/axs103_defconfig | 1 -
arch/arc/configs/axs103_smp_defconfig | 1 -
arch/mips/cavium-octeon/octeon-platform.c | 2 +
arch/mips/generic/init.c | 1 +
arch/mips/include/asm/io.h | 8 +-
arch/mips/kernel/vdso.c | 20 +
arch/mips/mm/c-r4k.c | 6 +-
arch/powerpc/platforms/powernv/npu-dma.c | 5 +-
arch/s390/kvm/vsie.c | 3 +-
arch/x86/kernel/cpu/microcode/amd.c | 24 +-
arch/x86/kernel/cpu/microcode/intel.c | 17 +-
arch/x86/kvm/vmx.c | 4 +-
arch/x86/mm/fault.c | 2 -
block/bfq-cgroup.c | 4 +-
block/blk-mq-tag.c | 8 +-
block/partitions/aix.c | 13 +-
crypto/Makefile | 2 +-
drivers/android/binder_alloc.c | 42 +-
drivers/ata/libahci.c | 20 +-
drivers/block/nbd.c | 3 +
drivers/block/pktcdvd.c | 4 +-
drivers/bluetooth/Kconfig | 1 +
drivers/char/tpm/tpm-interface.c | 50 +-
drivers/char/tpm/tpm.h | 12 +-
drivers/char/tpm/tpm2-space.c | 16 +-
drivers/char/tpm/tpm_crb.c | 101 +---
drivers/char/tpm/tpm_i2c_infineon.c | 8 +-
drivers/char/tpm/tpm_tis_spi.c | 9 +-
drivers/firmware/google/vpd.c | 5 +-
drivers/gpio/gpio-ml-ioh.c | 3 +-
drivers/gpio/gpio-tegra.c | 2 +-
drivers/gpu/drm/i915/i915_reg.h | 1 +
drivers/gpu/drm/i915/intel_ddi.c | 4 +
drivers/gpu/ipu-v3/ipu-common.c | 2 +
drivers/hv/hv.c | 14 +-
drivers/i2c/busses/i2c-aspeed.c | 2 +-
drivers/i2c/busses/i2c-i801.c | 7 +-
drivers/i2c/busses/i2c-xiic.c | 4 +
drivers/infiniband/core/cma.c | 13 +-
drivers/input/touchscreen/atmel_mxt_ts.c | 7 +-
drivers/iommu/ipmmu-vmsa.c | 9 +-
drivers/macintosh/via-pmu.c | 9 +-
drivers/md/dm-cache-target.c | 19 +-
drivers/md/raid5.c | 6 +
drivers/media/dvb-frontends/helene.c | 5 +-
drivers/media/platform/davinci/vpif_display.c | 24 +-
.../media/platform/qcom/camss-8x16/camss-csid.c | 16 +-
drivers/media/platform/s5p-mfc/s5p_mfc.c | 23 +-
drivers/media/usb/dvb-usb/dw2102.c | 19 +-
drivers/mfd/ti_am335x_tscadc.c | 3 +-
drivers/misc/mic/scif/scif_api.c | 20 +-
drivers/misc/ti-st/st_kim.c | 4 +-
drivers/mtd/ubi/wl.c | 8 +-
drivers/net/ethernet/marvell/mvneta.c | 1 -
drivers/net/phy/mdio-mux-bcm-iproc.c | 20 +-
drivers/net/tun.c | 21 +-
drivers/net/wireless/ath/ath10k/mac.c | 7 +
drivers/net/wireless/ath/ath10k/wmi-tlv.c | 5 +
drivers/net/wireless/ath/ath10k/wmi-tlv.h | 5 +
drivers/net/wireless/ath/ath9k/hw.c | 7 +-
drivers/net/wireless/ath/ath9k/xmit.c | 3 +-
drivers/net/wireless/ti/wlcore/rx.c | 8 +-
drivers/pci/switch/switchtec.c | 4 +
drivers/pinctrl/freescale/pinctrl-imx.c | 2 +-
drivers/pinctrl/pinctrl-amd.c | 3 +-
drivers/rpmsg/rpmsg_core.c | 7 +
drivers/scsi/3w-9xxx.c | 6 +-
drivers/scsi/3w-sas.c | 3 +
drivers/scsi/3w-xxxx.c | 2 +
drivers/scsi/lpfc/lpfc.h | 2 +-
drivers/target/target_core_transport.c | 5 +-
drivers/tty/rocket.c | 2 +-
drivers/uio/uio.c | 3 +-
fs/autofs4/autofs_i.h | 4 +-
fs/autofs4/inode.c | 1 -
fs/btrfs/ioctl.c | 19 +
fs/cifs/inode.c | 2 +
fs/cifs/smb2ops.c | 35 +-
fs/cifs/smb2pdu.c | 3 +
fs/f2fs/f2fs.h | 7 +-
fs/f2fs/file.c | 2 +-
fs/f2fs/gc.c | 8 +-
fs/f2fs/inline.c | 22 +
fs/f2fs/node.c | 4 +-
fs/f2fs/segment.h | 3 +
fs/f2fs/super.c | 21 +-
fs/f2fs/sysfs.c | 10 +-
fs/nfs/callback_proc.c | 4 +-
fs/nfs/callback_xdr.c | 11 +-
include/linux/mm_types.h | 2 +-
include/linux/mm_types_task.h | 2 +-
include/linux/rhashtable.h | 8 +-
include/linux/skbuff.h | 50 +-
include/linux/tpm.h | 2 +
include/linux/vm_event_item.h | 1 -
include/linux/vmacache.h | 5 -
include/net/inet_frag.h | 135 +++--
include/net/ip.h | 1 -
include/net/ipv6.h | 26 +-
include/uapi/linux/ethtool.h | 4 +-
include/uapi/linux/snmp.h | 1 +
kernel/cpu.c | 11 +-
kernel/time/timer.c | 29 +-
lib/rhashtable.c | 2 +
mm/debug.c | 4 +-
mm/vmacache.c | 38 --
net/bluetooth/hidp/core.c | 2 +-
net/core/skbuff.c | 31 +-
net/dcb/dcbnl.c | 11 +-
net/ieee802154/6lowpan/6lowpan_i.h | 26 +-
net/ieee802154/6lowpan/reassembly.c | 153 +++---
net/ipv4/inet_fragment.c | 378 +++-----------
net/ipv4/ip_fragment.c | 578 ++++++++++++---------
net/ipv4/proc.c | 7 +-
net/ipv4/tcp_fastopen.c | 8 +-
net/ipv4/tcp_input.c | 33 +-
net/ipv6/netfilter/nf_conntrack_reasm.c | 105 ++--
net/ipv6/proc.c | 5 +-
net/ipv6/reassembly.c | 217 ++++----
net/sched/sch_netem.c | 14 +-
sound/pci/hda/hda_codec.c | 3 +-
tools/perf/builtin-c2c.c | 3 +
tools/perf/perf.h | 2 +
tools/perf/util/evsel.c | 14 +
tools/testing/nvdimm/pmem-dax.c | 12 +-
tools/testing/selftests/bpf/test_verifier.c | 6 +-
129 files changed, 1473 insertions(+), 1362 deletions(-)




2018-09-17 23:00:14

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.14 001/126] i2c: xiic: Make the start and the byte count write atomic

4.14-stable review patch. If anyone has any objections, please let me know.

------------------

From: Shubhrajyoti Datta <[email protected]>

commit ae7304c3ea28a3ba47a7a8312c76c654ef24967e upstream.

Disable interrupts while configuring the transfer and enable them back.

We have below as the programming sequence
1. start and slave address
2. byte count and stop

In some customer platform there was a lot of interrupts between 1 and 2
and after slave address (around 7 clock cyles) if 2 is not executed
then the transaction is nacked.

To fix this case make the 2 writes atomic.

Signed-off-by: Shubhrajyoti Datta <[email protected]>
Signed-off-by: Michal Simek <[email protected]>
[wsa: added a newline for better readability]
Signed-off-by: Wolfram Sang <[email protected]>
Cc: [email protected]
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
drivers/i2c/busses/i2c-xiic.c | 4 ++++
1 file changed, 4 insertions(+)

--- a/drivers/i2c/busses/i2c-xiic.c
+++ b/drivers/i2c/busses/i2c-xiic.c
@@ -538,6 +538,7 @@ static void xiic_start_recv(struct xiic_
{
u8 rx_watermark;
struct i2c_msg *msg = i2c->rx_msg = i2c->tx_msg;
+ unsigned long flags;

/* Clear and enable Rx full interrupt. */
xiic_irq_clr_en(i2c, XIIC_INTR_RX_FULL_MASK | XIIC_INTR_TX_ERROR_MASK);
@@ -553,6 +554,7 @@ static void xiic_start_recv(struct xiic_
rx_watermark = IIC_RX_FIFO_DEPTH;
xiic_setreg8(i2c, XIIC_RFD_REG_OFFSET, rx_watermark - 1);

+ local_irq_save(flags);
if (!(msg->flags & I2C_M_NOSTART))
/* write the address */
xiic_setreg16(i2c, XIIC_DTR_REG_OFFSET,
@@ -563,6 +565,8 @@ static void xiic_start_recv(struct xiic_

xiic_setreg16(i2c, XIIC_DTR_REG_OFFSET,
msg->len | ((i2c->nmsgs == 1) ? XIIC_TX_DYN_STOP_MASK : 0));
+ local_irq_restore(flags);
+
if (i2c->nmsgs == 1)
/* very last, enable bus not busy as well */
xiic_irq_clr_en(i2c, XIIC_INTR_BNB_MASK);



2018-09-17 23:00:20

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.14 003/126] scsi: lpfc: Correct MDS diag and nvmet configuration

4.14-stable review patch. If anyone has any objections, please let me know.

------------------

From: James Smart <[email protected]>

commit 53e13ee087a80e8d4fc95436318436e5c2c1f8c2 upstream.

A recent change added some MDS processing in the lpfc_drain_txq routine
that relies on the fcp_wq being allocated. For nvmet operation the fcp_wq
is not allocated because it can only be an nvme-target. When the original
MDS support was added LS_MDS_LOOPBACK was defined wrong, (0x16) it should
have been 0x10 (decimal value used for hex setting). This incorrect value
allowed MDS_LOOPBACK to be set simultaneously with LS_NPIV_FAB_SUPPORTED,
causing the driver to crash when it accesses the non-existent fcp_wq.

Correct the bad value setting for LS_MDS_LOOPBACK.

Fixes: ae9e28f36a6c ("lpfc: Add MDS Diagnostic support.")
Cc: <[email protected]> # v4.12+
Signed-off-by: Dick Kennedy <[email protected]>
Signed-off-by: James Smart <[email protected]>
Tested-by: Ewan D. Milne <[email protected]>
Signed-off-by: Martin K. Petersen <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
drivers/scsi/lpfc/lpfc.h | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

--- a/drivers/scsi/lpfc/lpfc.h
+++ b/drivers/scsi/lpfc/lpfc.h
@@ -676,7 +676,7 @@ struct lpfc_hba {
#define LS_NPIV_FAB_SUPPORTED 0x2 /* Fabric supports NPIV */
#define LS_IGNORE_ERATT 0x4 /* intr handler should ignore ERATT */
#define LS_MDS_LINK_DOWN 0x8 /* MDS Diagnostics Link Down */
-#define LS_MDS_LOOPBACK 0x16 /* MDS Diagnostics Link Up (Loopback) */
+#define LS_MDS_LOOPBACK 0x10 /* MDS Diagnostics Link Up (Loopback) */

uint32_t hba_flag; /* hba generic flags */
#define HBA_ERATT_HANDLED 0x1 /* This flag is set when eratt handled */



2018-09-17 23:00:30

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.14 005/126] block: bfq: swap puts in bfqg_and_blkg_put

4.14-stable review patch. If anyone has any objections, please let me know.

------------------

From: Konstantin Khlebnikov <[email protected]>

commit d5274b3cd6a814ccb2f56d81ee87cbbf51bd4cf7 upstream.

Fix trivial use-after-free. This could be last reference to bfqg.

Fixes: 8f9bebc33dd7 ("block, bfq: access and cache blkg data only when safe")
Acked-by: Paolo Valente <[email protected]>
Signed-off-by: Konstantin Khlebnikov <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
block/bfq-cgroup.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)

--- a/block/bfq-cgroup.c
+++ b/block/bfq-cgroup.c
@@ -224,9 +224,9 @@ static void bfqg_and_blkg_get(struct bfq

void bfqg_and_blkg_put(struct bfq_group *bfqg)
{
- bfqg_put(bfqg);
-
blkg_put(bfqg_to_blkg(bfqg));
+
+ bfqg_put(bfqg);
}

void bfqg_stats_update_io_add(struct bfq_group *bfqg, struct bfq_queue *bfqq,



2018-09-17 23:00:36

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.14 006/126] android: binder: fix the race mmap and alloc_new_buf_locked

4.14-stable review patch. If anyone has any objections, please let me know.

------------------

From: Minchan Kim <[email protected]>

commit da1b9564e85b1d7baf66cbfabcab27e183a1db63 upstream.

There is RaceFuzzer report like below because we have no lock to close
below the race between binder_mmap and binder_alloc_new_buf_locked.
To close the race, let's use memory barrier so that if someone see
alloc->vma is not NULL, alloc->vma_vm_mm should be never NULL.

(I didn't add stable mark intentionallybecause standard android
userspace libraries that interact with binder (libbinder & libhwbinder)
prevent the mmap/ioctl race. - from Todd)

"
Thread interleaving:
CPU0 (binder_alloc_mmap_handler) CPU1 (binder_alloc_new_buf_locked)
===== =====
// drivers/android/binder_alloc.c
// #L718 (v4.18-rc3)
alloc->vma = vma;
// drivers/android/binder_alloc.c
// #L346 (v4.18-rc3)
if (alloc->vma == NULL) {
...
// alloc->vma is not NULL at this point
return ERR_PTR(-ESRCH);
}
...
// #L438
binder_update_page_range(alloc, 0,
(void *)PAGE_ALIGN((uintptr_t)buffer->data),
end_page_addr);

// In binder_update_page_range() #L218
// But still alloc->vma_vm_mm is NULL here
if (need_mm && mmget_not_zero(alloc->vma_vm_mm))
alloc->vma_vm_mm = vma->vm_mm;

Crash Log:
==================================================================
BUG: KASAN: null-ptr-deref in __atomic_add_unless include/asm-generic/atomic-instrumented.h:89 [inline]
BUG: KASAN: null-ptr-deref in atomic_add_unless include/linux/atomic.h:533 [inline]
BUG: KASAN: null-ptr-deref in mmget_not_zero include/linux/sched/mm.h:75 [inline]
BUG: KASAN: null-ptr-deref in binder_update_page_range+0xece/0x18e0 drivers/android/binder_alloc.c:218
Write of size 4 at addr 0000000000000058 by task syz-executor0/11184

CPU: 1 PID: 11184 Comm: syz-executor0 Not tainted 4.18.0-rc3 #1
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.8.2-0-g33fbe13 by qemu-project.org 04/01/2014
Call Trace:
__dump_stack lib/dump_stack.c:77 [inline]
dump_stack+0x16e/0x22c lib/dump_stack.c:113
kasan_report_error mm/kasan/report.c:352 [inline]
kasan_report+0x163/0x380 mm/kasan/report.c:412
check_memory_region_inline mm/kasan/kasan.c:260 [inline]
check_memory_region+0x140/0x1a0 mm/kasan/kasan.c:267
kasan_check_write+0x14/0x20 mm/kasan/kasan.c:278
__atomic_add_unless include/asm-generic/atomic-instrumented.h:89 [inline]
atomic_add_unless include/linux/atomic.h:533 [inline]
mmget_not_zero include/linux/sched/mm.h:75 [inline]
binder_update_page_range+0xece/0x18e0 drivers/android/binder_alloc.c:218
binder_alloc_new_buf_locked drivers/android/binder_alloc.c:443 [inline]
binder_alloc_new_buf+0x467/0xc30 drivers/android/binder_alloc.c:513
binder_transaction+0x125b/0x4fb0 drivers/android/binder.c:2957
binder_thread_write+0xc08/0x2770 drivers/android/binder.c:3528
binder_ioctl_write_read.isra.39+0x24f/0x8e0 drivers/android/binder.c:4456
binder_ioctl+0xa86/0xf34 drivers/android/binder.c:4596
vfs_ioctl fs/ioctl.c:46 [inline]
do_vfs_ioctl+0x154/0xd40 fs/ioctl.c:686
ksys_ioctl+0x94/0xb0 fs/ioctl.c:701
__do_sys_ioctl fs/ioctl.c:708 [inline]
__se_sys_ioctl fs/ioctl.c:706 [inline]
__x64_sys_ioctl+0x43/0x50 fs/ioctl.c:706
do_syscall_64+0x167/0x4b0 arch/x86/entry/common.c:290
entry_SYSCALL_64_after_hwframe+0x49/0xbe
"

Signed-off-by: Todd Kjos <[email protected]>
Signed-off-by: Minchan Kim <[email protected]>
Reviewed-by: Martijn Coenen <[email protected]>
Cc: stable <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
drivers/android/binder_alloc.c | 42 +++++++++++++++++++++++++++++++++--------
1 file changed, 34 insertions(+), 8 deletions(-)

--- a/drivers/android/binder_alloc.c
+++ b/drivers/android/binder_alloc.c
@@ -324,6 +324,34 @@ err_no_vma:
return vma ? -ENOMEM : -ESRCH;
}

+static inline void binder_alloc_set_vma(struct binder_alloc *alloc,
+ struct vm_area_struct *vma)
+{
+ if (vma)
+ alloc->vma_vm_mm = vma->vm_mm;
+ /*
+ * If we see alloc->vma is not NULL, buffer data structures set up
+ * completely. Look at smp_rmb side binder_alloc_get_vma.
+ * We also want to guarantee new alloc->vma_vm_mm is always visible
+ * if alloc->vma is set.
+ */
+ smp_wmb();
+ alloc->vma = vma;
+}
+
+static inline struct vm_area_struct *binder_alloc_get_vma(
+ struct binder_alloc *alloc)
+{
+ struct vm_area_struct *vma = NULL;
+
+ if (alloc->vma) {
+ /* Look at description in binder_alloc_set_vma */
+ smp_rmb();
+ vma = alloc->vma;
+ }
+ return vma;
+}
+
struct binder_buffer *binder_alloc_new_buf_locked(struct binder_alloc *alloc,
size_t data_size,
size_t offsets_size,
@@ -339,7 +367,7 @@ struct binder_buffer *binder_alloc_new_b
size_t size, data_offsets_size;
int ret;

- if (alloc->vma == NULL) {
+ if (!binder_alloc_get_vma(alloc)) {
pr_err("%d: binder_alloc_buf, no vma\n",
alloc->pid);
return ERR_PTR(-ESRCH);
@@ -712,9 +740,7 @@ int binder_alloc_mmap_handler(struct bin
buffer->free = 1;
binder_insert_free_buffer(alloc, buffer);
alloc->free_async_space = alloc->buffer_size / 2;
- barrier();
- alloc->vma = vma;
- alloc->vma_vm_mm = vma->vm_mm;
+ binder_alloc_set_vma(alloc, vma);
mmgrab(alloc->vma_vm_mm);

return 0;
@@ -741,10 +767,10 @@ void binder_alloc_deferred_release(struc
int buffers, page_count;
struct binder_buffer *buffer;

- BUG_ON(alloc->vma);
-
buffers = 0;
mutex_lock(&alloc->mutex);
+ BUG_ON(alloc->vma);
+
while ((n = rb_first(&alloc->allocated_buffers))) {
buffer = rb_entry(n, struct binder_buffer, rb_node);

@@ -886,7 +912,7 @@ int binder_alloc_get_allocated_count(str
*/
void binder_alloc_vma_close(struct binder_alloc *alloc)
{
- WRITE_ONCE(alloc->vma, NULL);
+ binder_alloc_set_vma(alloc, NULL);
}

/**
@@ -921,7 +947,7 @@ enum lru_status binder_alloc_free_page(s

index = page - alloc->pages;
page_addr = (uintptr_t)alloc->buffer + index * PAGE_SIZE;
- vma = alloc->vma;
+ vma = binder_alloc_get_vma(alloc);
if (vma) {
if (!mmget_not_zero(alloc->vma_vm_mm))
goto err_mmget;



2018-09-17 23:00:46

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.14 018/126] switchtec: Fix Spectre v1 vulnerability

4.14-stable review patch. If anyone has any objections, please let me know.

------------------

From: Gustavo A. R. Silva <[email protected]>

commit 46feb6b495f7628a6dbf36c4e6d80faf378372d4 upstream.

p.port can is indirectly controlled by user-space, hence leading to
a potential exploitation of the Spectre variant 1 vulnerability.

This issue was detected with the help of Smatch:

drivers/pci/switch/switchtec.c:912 ioctl_port_to_pff() warn: potential spectre issue 'pcfg->dsp_pff_inst_id' [r]

Fix this by sanitizing p.port before using it to index
pcfg->dsp_pff_inst_id

Notice that given that speculation windows are large, the policy is to kill
the speculation on the first load and not worry if it can be completed with
a dependent load/store [1].

[1] https://marc.info/?l=linux-kernel&m=152449131114778&w=2

Signed-off-by: Gustavo A. R. Silva <[email protected]>
Signed-off-by: Bjorn Helgaas <[email protected]>
Acked-by: Logan Gunthorpe <[email protected]>
Cc: [email protected]
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
drivers/pci/switch/switchtec.c | 4 ++++
1 file changed, 4 insertions(+)

--- a/drivers/pci/switch/switchtec.c
+++ b/drivers/pci/switch/switchtec.c
@@ -24,6 +24,8 @@
#include <linux/cdev.h>
#include <linux/wait.h>

+#include <linux/nospec.h>
+
MODULE_DESCRIPTION("Microsemi Switchtec(tm) PCIe Management Driver");
MODULE_VERSION("0.1");
MODULE_LICENSE("GPL");
@@ -1173,6 +1175,8 @@ static int ioctl_port_to_pff(struct swit
default:
if (p.port > ARRAY_SIZE(pcfg->dsp_pff_inst_id))
return -EINVAL;
+ p.port = array_index_nospec(p.port,
+ ARRAY_SIZE(pcfg->dsp_pff_inst_id) + 1);
p.pff = ioread32(&pcfg->dsp_pff_inst_id[p.port - 1]);
break;
}



2018-09-17 23:00:55

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.14 021/126] ARC: [plat-axs*]: Enable SWAP

4.14-stable review patch. If anyone has any objections, please let me know.

------------------

From: Alexey Brodkin <[email protected]>

commit c83532fb0fe053d2e43e9387354cb1b52ba26427 upstream.

SWAP support on ARC was fixed earlier by
commit 6e3761145a9b ("ARC: Fix CONFIG_SWAP")
so now we may safely enable it on platforms that
have external media like USB and SD-card.

Note: it was already allowed for HSDK

Signed-off-by: Alexey Brodkin <[email protected]>
Cc: [email protected] # 6e3761145a9b: ARC: Fix CONFIG_SWAP
Signed-off-by: Vineet Gupta <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
arch/arc/configs/axs101_defconfig | 1 -
arch/arc/configs/axs103_defconfig | 1 -
arch/arc/configs/axs103_smp_defconfig | 1 -
3 files changed, 3 deletions(-)

--- a/arch/arc/configs/axs101_defconfig
+++ b/arch/arc/configs/axs101_defconfig
@@ -1,5 +1,4 @@
CONFIG_DEFAULT_HOSTNAME="ARCLinux"
-# CONFIG_SWAP is not set
CONFIG_SYSVIPC=y
CONFIG_POSIX_MQUEUE=y
# CONFIG_CROSS_MEMORY_ATTACH is not set
--- a/arch/arc/configs/axs103_defconfig
+++ b/arch/arc/configs/axs103_defconfig
@@ -1,5 +1,4 @@
CONFIG_DEFAULT_HOSTNAME="ARCLinux"
-# CONFIG_SWAP is not set
CONFIG_SYSVIPC=y
CONFIG_POSIX_MQUEUE=y
# CONFIG_CROSS_MEMORY_ATTACH is not set
--- a/arch/arc/configs/axs103_smp_defconfig
+++ b/arch/arc/configs/axs103_smp_defconfig
@@ -1,5 +1,4 @@
CONFIG_DEFAULT_HOSTNAME="ARCLinux"
-# CONFIG_SWAP is not set
CONFIG_SYSVIPC=y
CONFIG_POSIX_MQUEUE=y
# CONFIG_CROSS_MEMORY_ATTACH is not set



2018-09-17 23:00:58

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.14 022/126] misc: mic: SCIF Fix scif_get_new_port() error handling

4.14-stable review patch. If anyone has any objections, please let me know.

------------------

From: Dan Carpenter <[email protected]>

[ Upstream commit a39284ae9d2ad09975c8ae33f1bd0f05fbfbf6ee ]

There are only 2 callers of scif_get_new_port() and both appear to get
the error handling wrong. Both treat zero returns as error, but it
actually returns negative error codes and >= 0 on success.

Fixes: e9089f43c9a7 ("misc: mic: SCIF open close bind and listen APIs")
Signed-off-by: Dan Carpenter <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
drivers/misc/mic/scif/scif_api.c | 20 +++++++++-----------
1 file changed, 9 insertions(+), 11 deletions(-)

--- a/drivers/misc/mic/scif/scif_api.c
+++ b/drivers/misc/mic/scif/scif_api.c
@@ -370,11 +370,10 @@ int scif_bind(scif_epd_t epd, u16 pn)
goto scif_bind_exit;
}
} else {
- pn = scif_get_new_port();
- if (!pn) {
- ret = -ENOSPC;
+ ret = scif_get_new_port();
+ if (ret < 0)
goto scif_bind_exit;
- }
+ pn = ret;
}

ep->state = SCIFEP_BOUND;
@@ -648,13 +647,12 @@ int __scif_connect(scif_epd_t epd, struc
err = -EISCONN;
break;
case SCIFEP_UNBOUND:
- ep->port.port = scif_get_new_port();
- if (!ep->port.port) {
- err = -ENOSPC;
- } else {
- ep->port.node = scif_info.nodeid;
- ep->conn_async_state = ASYNC_CONN_IDLE;
- }
+ err = scif_get_new_port();
+ if (err < 0)
+ break;
+ ep->port.port = err;
+ ep->port.node = scif_info.nodeid;
+ ep->conn_async_state = ASYNC_CONN_IDLE;
/* Fall through */
case SCIFEP_BOUND:
/*



2018-09-17 23:01:02

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.14 023/126] ethtool: Remove trailing semicolon for static inline

4.14-stable review patch. If anyone has any objections, please let me know.

------------------

From: Florian Fainelli <[email protected]>

[ Upstream commit d89d41556141a527030a15233135ba622ba3350d ]

Android's header sanitization tool chokes on static inline functions having a
trailing semicolon, leading to an incorrectly parsed header file. While the
tool should obviously be fixed, also fix the header files for the two affected
functions: ethtool_get_flow_spec_ring() and ethtool_get_flow_spec_ring_vf().

Fixes: 8cf6f497de40 ("ethtool: Add helper routines to pass vf to rx_flow_spec")
Reporetd-by: Blair Prescott <[email protected]>
Signed-off-by: Florian Fainelli <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
include/uapi/linux/ethtool.h | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)

--- a/include/uapi/linux/ethtool.h
+++ b/include/uapi/linux/ethtool.h
@@ -898,13 +898,13 @@ struct ethtool_rx_flow_spec {
static inline __u64 ethtool_get_flow_spec_ring(__u64 ring_cookie)
{
return ETHTOOL_RX_FLOW_SPEC_RING & ring_cookie;
-};
+}

static inline __u64 ethtool_get_flow_spec_ring_vf(__u64 ring_cookie)
{
return (ETHTOOL_RX_FLOW_SPEC_RING_VF & ring_cookie) >>
ETHTOOL_RX_FLOW_SPEC_RING_VF_OFF;
-};
+}

/**
* struct ethtool_rxnfc - command to get or set RX flow classification rules



2018-09-17 23:01:10

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.14 025/126] Bluetooth: h5: Fix missing dependency on BT_HCIUART_SERDEV

4.14-stable review patch. If anyone has any objections, please let me know.

------------------

From: Johan Hedberg <[email protected]>

[ Upstream commit 6c3711ec64fd23a9abc8aaf59a9429569a6282df ]

This driver was recently updated to use serdev, so add the appropriate
dependency. Without this one can get compiler warnings like this if
CONFIG_SERIAL_DEV_BUS is not enabled:

CC [M] drivers/bluetooth/hci_h5.o
drivers/bluetooth/hci_h5.c:934:36: warning: ‘h5_serdev_driver’ defined but not used [-Wunused-variable]
static struct serdev_device_driver h5_serdev_driver = {
^~~~~~~~~~~~~~~~

Signed-off-by: Johan Hedberg <[email protected]>
Signed-off-by: Marcel Holtmann <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
drivers/bluetooth/Kconfig | 1 +
1 file changed, 1 insertion(+)

--- a/drivers/bluetooth/Kconfig
+++ b/drivers/bluetooth/Kconfig
@@ -146,6 +146,7 @@ config BT_HCIUART_LL
config BT_HCIUART_3WIRE
bool "Three-wire UART (H5) protocol support"
depends on BT_HCIUART
+ depends on BT_HCIUART_SERDEV
help
The HCI Three-wire UART Transport Layer makes it possible to
user the Bluetooth HCI over a serial port interface. The HCI



2018-09-17 23:01:22

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.14 009/126] smb3: check for and properly advertise directory lease support

4.14-stable review patch. If anyone has any objections, please let me know.

------------------

From: Steve French <[email protected]>

commit f801568332321e2b1e7a8bd26c3e4913a312a2ec upstream.

Although servers will typically ignore unsupported features,
we should advertise the support for directory leases (as
Windows e.g. does) in the negotiate protocol capabilities we
pass to the server, and should check for the server capability
(CAP_DIRECTORY_LEASING) before sending a lease request for an
open of a directory. This will prevent us from accidentally
sending directory leases to SMB2.1 or SMB2 server for example.

Signed-off-by: Steve French <[email protected]>
CC: Stable <[email protected]>
Reviewed-by: Ronnie Sahlberg <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
fs/cifs/smb2ops.c | 10 +++++-----
fs/cifs/smb2pdu.c | 3 +++
2 files changed, 8 insertions(+), 5 deletions(-)

--- a/fs/cifs/smb2ops.c
+++ b/fs/cifs/smb2ops.c
@@ -3215,7 +3215,7 @@ struct smb_version_values smb21_values =
struct smb_version_values smb3any_values = {
.version_string = SMB3ANY_VERSION_STRING,
.protocol_id = SMB302_PROT_ID, /* doesn't matter, send protocol array */
- .req_capabilities = SMB2_GLOBAL_CAP_DFS | SMB2_GLOBAL_CAP_LEASING | SMB2_GLOBAL_CAP_LARGE_MTU | SMB2_GLOBAL_CAP_PERSISTENT_HANDLES | SMB2_GLOBAL_CAP_ENCRYPTION,
+ .req_capabilities = SMB2_GLOBAL_CAP_DFS | SMB2_GLOBAL_CAP_LEASING | SMB2_GLOBAL_CAP_LARGE_MTU | SMB2_GLOBAL_CAP_PERSISTENT_HANDLES | SMB2_GLOBAL_CAP_ENCRYPTION | SMB2_GLOBAL_CAP_DIRECTORY_LEASING,
.large_lock_type = 0,
.exclusive_lock_type = SMB2_LOCKFLAG_EXCLUSIVE_LOCK,
.shared_lock_type = SMB2_LOCKFLAG_SHARED_LOCK,
@@ -3235,7 +3235,7 @@ struct smb_version_values smb3any_values
struct smb_version_values smbdefault_values = {
.version_string = SMBDEFAULT_VERSION_STRING,
.protocol_id = SMB302_PROT_ID, /* doesn't matter, send protocol array */
- .req_capabilities = SMB2_GLOBAL_CAP_DFS | SMB2_GLOBAL_CAP_LEASING | SMB2_GLOBAL_CAP_LARGE_MTU | SMB2_GLOBAL_CAP_PERSISTENT_HANDLES | SMB2_GLOBAL_CAP_ENCRYPTION,
+ .req_capabilities = SMB2_GLOBAL_CAP_DFS | SMB2_GLOBAL_CAP_LEASING | SMB2_GLOBAL_CAP_LARGE_MTU | SMB2_GLOBAL_CAP_PERSISTENT_HANDLES | SMB2_GLOBAL_CAP_ENCRYPTION | SMB2_GLOBAL_CAP_DIRECTORY_LEASING,
.large_lock_type = 0,
.exclusive_lock_type = SMB2_LOCKFLAG_EXCLUSIVE_LOCK,
.shared_lock_type = SMB2_LOCKFLAG_SHARED_LOCK,
@@ -3255,7 +3255,7 @@ struct smb_version_values smbdefault_val
struct smb_version_values smb30_values = {
.version_string = SMB30_VERSION_STRING,
.protocol_id = SMB30_PROT_ID,
- .req_capabilities = SMB2_GLOBAL_CAP_DFS | SMB2_GLOBAL_CAP_LEASING | SMB2_GLOBAL_CAP_LARGE_MTU | SMB2_GLOBAL_CAP_PERSISTENT_HANDLES | SMB2_GLOBAL_CAP_ENCRYPTION,
+ .req_capabilities = SMB2_GLOBAL_CAP_DFS | SMB2_GLOBAL_CAP_LEASING | SMB2_GLOBAL_CAP_LARGE_MTU | SMB2_GLOBAL_CAP_PERSISTENT_HANDLES | SMB2_GLOBAL_CAP_ENCRYPTION | SMB2_GLOBAL_CAP_DIRECTORY_LEASING,
.large_lock_type = 0,
.exclusive_lock_type = SMB2_LOCKFLAG_EXCLUSIVE_LOCK,
.shared_lock_type = SMB2_LOCKFLAG_SHARED_LOCK,
@@ -3275,7 +3275,7 @@ struct smb_version_values smb30_values =
struct smb_version_values smb302_values = {
.version_string = SMB302_VERSION_STRING,
.protocol_id = SMB302_PROT_ID,
- .req_capabilities = SMB2_GLOBAL_CAP_DFS | SMB2_GLOBAL_CAP_LEASING | SMB2_GLOBAL_CAP_LARGE_MTU | SMB2_GLOBAL_CAP_PERSISTENT_HANDLES | SMB2_GLOBAL_CAP_ENCRYPTION,
+ .req_capabilities = SMB2_GLOBAL_CAP_DFS | SMB2_GLOBAL_CAP_LEASING | SMB2_GLOBAL_CAP_LARGE_MTU | SMB2_GLOBAL_CAP_PERSISTENT_HANDLES | SMB2_GLOBAL_CAP_ENCRYPTION | SMB2_GLOBAL_CAP_DIRECTORY_LEASING,
.large_lock_type = 0,
.exclusive_lock_type = SMB2_LOCKFLAG_EXCLUSIVE_LOCK,
.shared_lock_type = SMB2_LOCKFLAG_SHARED_LOCK,
@@ -3296,7 +3296,7 @@ struct smb_version_values smb302_values
struct smb_version_values smb311_values = {
.version_string = SMB311_VERSION_STRING,
.protocol_id = SMB311_PROT_ID,
- .req_capabilities = SMB2_GLOBAL_CAP_DFS | SMB2_GLOBAL_CAP_LEASING | SMB2_GLOBAL_CAP_LARGE_MTU | SMB2_GLOBAL_CAP_PERSISTENT_HANDLES | SMB2_GLOBAL_CAP_ENCRYPTION,
+ .req_capabilities = SMB2_GLOBAL_CAP_DFS | SMB2_GLOBAL_CAP_LEASING | SMB2_GLOBAL_CAP_LARGE_MTU | SMB2_GLOBAL_CAP_PERSISTENT_HANDLES | SMB2_GLOBAL_CAP_ENCRYPTION | SMB2_GLOBAL_CAP_DIRECTORY_LEASING,
.large_lock_type = 0,
.exclusive_lock_type = SMB2_LOCKFLAG_EXCLUSIVE_LOCK,
.shared_lock_type = SMB2_LOCKFLAG_SHARED_LOCK,
--- a/fs/cifs/smb2pdu.c
+++ b/fs/cifs/smb2pdu.c
@@ -1816,6 +1816,9 @@ SMB2_open(const unsigned int xid, struct
if (!(server->capabilities & SMB2_GLOBAL_CAP_LEASING) ||
*oplock == SMB2_OPLOCK_LEVEL_NONE)
req->RequestedOplockLevel = *oplock;
+ else if (!(server->capabilities & SMB2_GLOBAL_CAP_DIRECTORY_LEASING) &&
+ (oparms->create_options & CREATE_NOT_FILE))
+ req->RequestedOplockLevel = *oplock; /* no srv lease support */
else {
rc = add_lease_context(server, iov, &n_iov, oplock);
if (rc) {



2018-09-17 23:01:24

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.14 024/126] i2c: aspeed: Add an explicit type casting for *get_clk_reg_val

4.14-stable review patch. If anyone has any objections, please let me know.

------------------

From: Jae Hyun Yoo <[email protected]>

[ Upstream commit 5799c4b2f1dbc0166d9b1d94443deaafc6e7a070 ]

This commit fixes this sparse warning:
drivers/i2c/busses/i2c-aspeed.c:875:38: warning: incorrect type in assignment (different modifiers)
drivers/i2c/busses/i2c-aspeed.c:875:38: expected unsigned int ( *get_clk_reg_val )( ... )
drivers/i2c/busses/i2c-aspeed.c:875:38: got void const *const data

Reported-by: Wolfram Sang <[email protected]>
Signed-off-by: Jae Hyun Yoo <[email protected]>
Reviewed-by: Brendan Higgins <[email protected]>
Signed-off-by: Wolfram Sang <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
drivers/i2c/busses/i2c-aspeed.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

--- a/drivers/i2c/busses/i2c-aspeed.c
+++ b/drivers/i2c/busses/i2c-aspeed.c
@@ -859,7 +859,7 @@ static int aspeed_i2c_probe_bus(struct p
if (!match)
bus->get_clk_reg_val = aspeed_i2c_24xx_get_clk_reg_val;
else
- bus->get_clk_reg_val = match->data;
+ bus->get_clk_reg_val = (u32 (*)(u32))match->data;

/* Initialize the I2C adapter */
spin_lock_init(&bus->lock);



2018-09-17 23:01:29

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.14 028/126] selftests/bpf: fix a typo in map in map test

4.14-stable review patch. If anyone has any objections, please let me know.

------------------

From: Roman Gushchin <[email protected]>

[ Upstream commit 0069fb854364da79fd99236ea620affc8e1152d5 ]

Commit fbeb1603bf4e ("bpf: verifier: MOV64 don't mark dst reg unbounded")
revealed a typo in commit fb30d4b71214 ("bpf: Add tests for map-in-map"):
BPF_MOV64_REG(BPF_REG_0, 0) was used instead of
BPF_MOV64_IMM(BPF_REG_0, 0).

I've noticed the problem by running bpf kselftests.

Fixes: fb30d4b71214 ("bpf: Add tests for map-in-map")
Signed-off-by: Roman Gushchin <[email protected]>
Cc: Martin KaFai Lau <[email protected]>
Cc: Arthur Fabre <[email protected]>
Cc: Daniel Borkmann <[email protected]>
Cc: Alexei Starovoitov <[email protected]>
Acked-by: Martin KaFai Lau <[email protected]>
Signed-off-by: Daniel Borkmann <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
tools/testing/selftests/bpf/test_verifier.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)

--- a/tools/testing/selftests/bpf/test_verifier.c
+++ b/tools/testing/selftests/bpf/test_verifier.c
@@ -5895,7 +5895,7 @@ static struct bpf_test tests[] = {
BPF_MOV64_REG(BPF_REG_1, BPF_REG_0),
BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0,
BPF_FUNC_map_lookup_elem),
- BPF_MOV64_REG(BPF_REG_0, 0),
+ BPF_MOV64_IMM(BPF_REG_0, 0),
BPF_EXIT_INSN(),
},
.fixup_map_in_map = { 3 },
@@ -5918,7 +5918,7 @@ static struct bpf_test tests[] = {
BPF_ALU64_IMM(BPF_ADD, BPF_REG_1, 8),
BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0,
BPF_FUNC_map_lookup_elem),
- BPF_MOV64_REG(BPF_REG_0, 0),
+ BPF_MOV64_IMM(BPF_REG_0, 0),
BPF_EXIT_INSN(),
},
.fixup_map_in_map = { 3 },
@@ -5941,7 +5941,7 @@ static struct bpf_test tests[] = {
BPF_MOV64_REG(BPF_REG_1, BPF_REG_0),
BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0,
BPF_FUNC_map_lookup_elem),
- BPF_MOV64_REG(BPF_REG_0, 0),
+ BPF_MOV64_IMM(BPF_REG_0, 0),
BPF_EXIT_INSN(),
},
.fixup_map_in_map = { 3 },



2018-09-17 23:01:33

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.14 026/126] gpio: tegra: Move driver registration to subsys_init level

4.14-stable review patch. If anyone has any objections, please let me know.

------------------

From: Dmitry Osipenko <[email protected]>

[ Upstream commit 40b25bce0adbe641a744d1291bc0e51fb7f3c3d8 ]

There is a bug in regards to deferred probing within the drivers core
that causes GPIO-driver to suspend after its users. The bug appears if
GPIO-driver probe is getting deferred, which happens after introducing
dependency on PINCTRL-driver for the GPIO-driver by defining "gpio-ranges"
property in device-tree. The bug in the drivers core is old (more than 4
years now) and is well known, unfortunately there is no easy fix for it.
The good news is that we can workaround the deferred probe issue by
changing GPIO / PINCTRL drivers registration order and hence by moving
PINCTRL driver registration to the arch_init level and GPIO to the
subsys_init.

Signed-off-by: Dmitry Osipenko <[email protected]>
Acked-by: Stefan Agner <[email protected]>
Signed-off-by: Linus Walleij <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
drivers/gpio/gpio-tegra.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

--- a/drivers/gpio/gpio-tegra.c
+++ b/drivers/gpio/gpio-tegra.c
@@ -728,4 +728,4 @@ static int __init tegra_gpio_init(void)
{
return platform_driver_register(&tegra_gpio_driver);
}
-postcore_initcall(tegra_gpio_init);
+subsys_initcall(tegra_gpio_init);



2018-09-17 23:01:34

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.14 029/126] media: davinci: vpif_display: Mix memory leak on probe error path

4.14-stable review patch. If anyone has any objections, please let me know.

------------------

From: Anton Vasilyev <[email protected]>

[ Upstream commit 61e641f36ed81ae473177c085f0bfd83ad3b55ed ]

If vpif_probe() fails on v4l2_device_register() then memory allocated
at initialize_vpif() for global vpif_obj.dev[i] become unreleased.

The patch adds deallocation of vpif_obj.dev[i] on the error path and
removes duplicated check on platform_data presence.

Found by Linux Driver Verification project (linuxtesting.org).

Signed-off-by: Anton Vasilyev <[email protected]>
Signed-off-by: Mauro Carvalho Chehab <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
drivers/media/platform/davinci/vpif_display.c | 24 ++++++++++++++++--------
1 file changed, 16 insertions(+), 8 deletions(-)

--- a/drivers/media/platform/davinci/vpif_display.c
+++ b/drivers/media/platform/davinci/vpif_display.c
@@ -1114,6 +1114,14 @@ vpif_init_free_channel_objects:
return err;
}

+static void free_vpif_objs(void)
+{
+ int i;
+
+ for (i = 0; i < VPIF_DISPLAY_MAX_DEVICES; i++)
+ kfree(vpif_obj.dev[i]);
+}
+
static int vpif_async_bound(struct v4l2_async_notifier *notifier,
struct v4l2_subdev *subdev,
struct v4l2_async_subdev *asd)
@@ -1250,11 +1258,6 @@ static __init int vpif_probe(struct plat
return -EINVAL;
}

- if (!pdev->dev.platform_data) {
- dev_warn(&pdev->dev, "Missing platform data. Giving up.\n");
- return -EINVAL;
- }
-
vpif_dev = &pdev->dev;
err = initialize_vpif();

@@ -1266,7 +1269,7 @@ static __init int vpif_probe(struct plat
err = v4l2_device_register(vpif_dev, &vpif_obj.v4l2_dev);
if (err) {
v4l2_err(vpif_dev->driver, "Error registering v4l2 device\n");
- return err;
+ goto vpif_free;
}

while ((res = platform_get_resource(pdev, IORESOURCE_IRQ, res_idx))) {
@@ -1309,7 +1312,10 @@ static __init int vpif_probe(struct plat
if (vpif_obj.sd[i])
vpif_obj.sd[i]->grp_id = 1 << i;
}
- vpif_probe_complete();
+ err = vpif_probe_complete();
+ if (err) {
+ goto probe_subdev_out;
+ }
} else {
vpif_obj.notifier.subdevs = vpif_obj.config->asd;
vpif_obj.notifier.num_subdevs = vpif_obj.config->asd_sizes[0];
@@ -1330,6 +1336,8 @@ probe_subdev_out:
kfree(vpif_obj.sd);
vpif_unregister:
v4l2_device_unregister(&vpif_obj.v4l2_dev);
+vpif_free:
+ free_vpif_objs();

return err;
}
@@ -1351,8 +1359,8 @@ static int vpif_remove(struct platform_d
ch = vpif_obj.dev[i];
/* Unregister video device */
video_unregister_device(&ch->video_dev);
- kfree(vpif_obj.dev[i]);
}
+ free_vpif_objs();

return 0;
}



2018-09-17 23:01:40

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.14 004/126] nbd: dont allow invalid blocksize settings

4.14-stable review patch. If anyone has any objections, please let me know.

------------------

From: Jens Axboe <[email protected]>

commit bc811f05d77f47059c197a98b6ad242eb03999cb upstream.

syzbot reports a divide-by-zero off the NBD_SET_BLKSIZE ioctl.
We need proper validation of the input here. Not just if it's
zero, but also if the value is a power-of-2 and in a valid
range. Add that.

Cc: [email protected]
Reported-by: syzbot <[email protected]>
Reviewed-by: Josef Bacik <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
drivers/block/nbd.c | 3 +++
1 file changed, 3 insertions(+)

--- a/drivers/block/nbd.c
+++ b/drivers/block/nbd.c
@@ -1228,6 +1228,9 @@ static int __nbd_ioctl(struct block_devi
case NBD_SET_SOCK:
return nbd_add_socket(nbd, arg, false);
case NBD_SET_BLKSIZE:
+ if (!arg || !is_power_of_2(arg) || arg < 512 ||
+ arg > PAGE_SIZE)
+ return -EINVAL;
nbd_size_set(nbd, arg,
div_s64(config->bytesize, arg));
return 0;



2018-09-17 23:01:43

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.14 031/126] net: phy: Fix the register offsets in Broadcom iProc mdio mux driver

4.14-stable review patch. If anyone has any objections, please let me know.

------------------

From: Arun Parameswaran <[email protected]>

[ Upstream commit 77fefa93bfebe4df44f154f2aa5938e32630d0bf ]

Modify the register offsets in the Broadcom iProc mdio mux to start
from the top of the register address space.

Earlier, the base address pointed to the end of the block's register
space. The base address will now point to the start of the mdio's
address space. The offsets have been fixed to match this.

Signed-off-by: Arun Parameswaran <[email protected]>
Reviewed-by: Andrew Lunn <[email protected]>
Reviewed-by: Florian Fainelli <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
drivers/net/phy/mdio-mux-bcm-iproc.c | 20 +++++++++++++++-----
1 file changed, 15 insertions(+), 5 deletions(-)

--- a/drivers/net/phy/mdio-mux-bcm-iproc.c
+++ b/drivers/net/phy/mdio-mux-bcm-iproc.c
@@ -22,7 +22,7 @@
#include <linux/mdio-mux.h>
#include <linux/delay.h>

-#define MDIO_PARAM_OFFSET 0x00
+#define MDIO_PARAM_OFFSET 0x23c
#define MDIO_PARAM_MIIM_CYCLE 29
#define MDIO_PARAM_INTERNAL_SEL 25
#define MDIO_PARAM_BUS_ID 22
@@ -30,20 +30,22 @@
#define MDIO_PARAM_PHY_ID 16
#define MDIO_PARAM_PHY_DATA 0

-#define MDIO_READ_OFFSET 0x04
+#define MDIO_READ_OFFSET 0x240
#define MDIO_READ_DATA_MASK 0xffff
-#define MDIO_ADDR_OFFSET 0x08
+#define MDIO_ADDR_OFFSET 0x244

-#define MDIO_CTRL_OFFSET 0x0C
+#define MDIO_CTRL_OFFSET 0x248
#define MDIO_CTRL_WRITE_OP 0x1
#define MDIO_CTRL_READ_OP 0x2

-#define MDIO_STAT_OFFSET 0x10
+#define MDIO_STAT_OFFSET 0x24c
#define MDIO_STAT_DONE 1

#define BUS_MAX_ADDR 32
#define EXT_BUS_START_ADDR 16

+#define MDIO_REG_ADDR_SPACE_SIZE 0x250
+
struct iproc_mdiomux_desc {
void *mux_handle;
void __iomem *base;
@@ -169,6 +171,14 @@ static int mdio_mux_iproc_probe(struct p
md->dev = &pdev->dev;

res = platform_get_resource(pdev, IORESOURCE_MEM, 0);
+ if (res->start & 0xfff) {
+ /* For backward compatibility in case the
+ * base address is specified with an offset.
+ */
+ dev_info(&pdev->dev, "fix base address in dt-blob\n");
+ res->start &= ~0xfff;
+ res->end = res->start + MDIO_REG_ADDR_SPACE_SIZE - 1;
+ }
md->base = devm_ioremap_resource(&pdev->dev, res);
if (IS_ERR(md->base)) {
dev_err(&pdev->dev, "failed to ioremap register\n");



2018-09-17 23:01:52

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.14 033/126] scsi: target: fix __transport_register_session locking

4.14-stable review patch. If anyone has any objections, please let me know.

------------------

From: Mike Christie <[email protected]>

[ Upstream commit 6a64f6e1591322beb8ce16e952a53582caf2a15c ]

When __transport_register_session is called from transport_register_session
irqs will already have been disabled, so we do not want the unlock irq call
to enable them until the higher level has done the final
spin_unlock_irqrestore/ spin_unlock_irq.

This has __transport_register_session use the save/restore call.

Signed-off-by: Mike Christie <[email protected]>
Reviewed-by: Bart Van Assche <[email protected]>
Reviewed-by: Christoph Hellwig <[email protected]>
Signed-off-by: Martin K. Petersen <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
drivers/target/target_core_transport.c | 5 +++--
1 file changed, 3 insertions(+), 2 deletions(-)

--- a/drivers/target/target_core_transport.c
+++ b/drivers/target/target_core_transport.c
@@ -317,6 +317,7 @@ void __transport_register_session(
{
const struct target_core_fabric_ops *tfo = se_tpg->se_tpg_tfo;
unsigned char buf[PR_REG_ISID_LEN];
+ unsigned long flags;

se_sess->se_tpg = se_tpg;
se_sess->fabric_sess_ptr = fabric_sess_ptr;
@@ -353,7 +354,7 @@ void __transport_register_session(
se_sess->sess_bin_isid = get_unaligned_be64(&buf[0]);
}

- spin_lock_irq(&se_nacl->nacl_sess_lock);
+ spin_lock_irqsave(&se_nacl->nacl_sess_lock, flags);
/*
* The se_nacl->nacl_sess pointer will be set to the
* last active I_T Nexus for each struct se_node_acl.
@@ -362,7 +363,7 @@ void __transport_register_session(

list_add_tail(&se_sess->sess_acl_list,
&se_nacl->acl_sess_list);
- spin_unlock_irq(&se_nacl->nacl_sess_lock);
+ spin_unlock_irqrestore(&se_nacl->nacl_sess_lock, flags);
}
list_add_tail(&se_sess->sess_list, &se_tpg->tpg_sess_list);




2018-09-17 23:01:54

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.14 008/126] SMB3: Backup intent flag missing for directory opens with backupuid mounts

4.14-stable review patch. If anyone has any objections, please let me know.

------------------

From: Steve French <[email protected]>

commit 5e19697b56a64004e2d0ff1bb952ea05493c088f upstream.

When "backup intent" is requested on the mount (e.g. backupuid or
backupgid mount options), the corresponding flag needs to be set
on opens of directories (and files) but was missing in some
places causing access denied trying to enumerate and backup
servers.

Fixes kernel bugzilla #200953
https://bugzilla.kernel.org/show_bug.cgi?id=200953

Reported-and-tested-by: <[email protected]>
Signed-off-by: Steve French <[email protected]>
CC: Stable <[email protected]>
Reviewed-by: Pavel Shilovsky <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
fs/cifs/inode.c | 2 ++
fs/cifs/smb2ops.c | 25 ++++++++++++++++++++-----
2 files changed, 22 insertions(+), 5 deletions(-)

--- a/fs/cifs/inode.c
+++ b/fs/cifs/inode.c
@@ -467,6 +467,8 @@ cifs_sfu_type(struct cifs_fattr *fattr,
oparms.cifs_sb = cifs_sb;
oparms.desired_access = GENERIC_READ;
oparms.create_options = CREATE_NOT_DIR;
+ if (backup_cred(cifs_sb))
+ oparms.create_options |= CREATE_OPEN_BACKUP_INTENT;
oparms.disposition = FILE_OPEN;
oparms.path = path;
oparms.fid = &fid;
--- a/fs/cifs/smb2ops.c
+++ b/fs/cifs/smb2ops.c
@@ -385,7 +385,10 @@ smb2_is_path_accessible(const unsigned i
oparms.tcon = tcon;
oparms.desired_access = FILE_READ_ATTRIBUTES;
oparms.disposition = FILE_OPEN;
- oparms.create_options = 0;
+ if (backup_cred(cifs_sb))
+ oparms.create_options = CREATE_OPEN_BACKUP_INTENT;
+ else
+ oparms.create_options = 0;
oparms.fid = &fid;
oparms.reconnect = false;

@@ -534,7 +537,10 @@ smb2_query_eas(const unsigned int xid, s
oparms.tcon = tcon;
oparms.desired_access = FILE_READ_EA;
oparms.disposition = FILE_OPEN;
- oparms.create_options = 0;
+ if (backup_cred(cifs_sb))
+ oparms.create_options = CREATE_OPEN_BACKUP_INTENT;
+ else
+ oparms.create_options = 0;
oparms.fid = &fid;
oparms.reconnect = false;

@@ -613,7 +619,10 @@ smb2_set_ea(const unsigned int xid, stru
oparms.tcon = tcon;
oparms.desired_access = FILE_WRITE_EA;
oparms.disposition = FILE_OPEN;
- oparms.create_options = 0;
+ if (backup_cred(cifs_sb))
+ oparms.create_options = CREATE_OPEN_BACKUP_INTENT;
+ else
+ oparms.create_options = 0;
oparms.fid = &fid;
oparms.reconnect = false;

@@ -1215,7 +1224,10 @@ smb2_query_dir_first(const unsigned int
oparms.tcon = tcon;
oparms.desired_access = FILE_READ_ATTRIBUTES | FILE_READ_DATA;
oparms.disposition = FILE_OPEN;
- oparms.create_options = 0;
+ if (backup_cred(cifs_sb))
+ oparms.create_options = CREATE_OPEN_BACKUP_INTENT;
+ else
+ oparms.create_options = 0;
oparms.fid = fid;
oparms.reconnect = false;

@@ -1491,7 +1503,10 @@ smb2_query_symlink(const unsigned int xi
oparms.tcon = tcon;
oparms.desired_access = FILE_READ_ATTRIBUTES;
oparms.disposition = FILE_OPEN;
- oparms.create_options = 0;
+ if (backup_cred(cifs_sb))
+ oparms.create_options = CREATE_OPEN_BACKUP_INTENT;
+ else
+ oparms.create_options = 0;
oparms.fid = &fid;
oparms.reconnect = false;




2018-09-17 23:01:59

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.14 034/126] md/raid5: fix data corruption of replacements after originals dropped

4.14-stable review patch. If anyone has any objections, please let me know.

------------------

From: BingJing Chang <[email protected]>

[ Upstream commit d63e2fc804c46e50eee825c5d3a7228e07048b47 ]

During raid5 replacement, the stripes can be marked with R5_NeedReplace
flag. Data can be read from being-replaced devices and written to
replacing spares without reading all other devices. (It's 'replace'
mode. s.replacing = 1) If a being-replaced device is dropped, the
replacement progress will be interrupted and resumed with pure recovery
mode. However, existing stripes before being interrupted cannot read
from the dropped device anymore. It prints lots of WARN_ON messages.
And it results in data corruption because existing stripes write
problematic data into its replacement device and update the progress.

\# Erase disks (1MB + 2GB)
dd if=/dev/zero of=/dev/sda bs=1MB count=2049
dd if=/dev/zero of=/dev/sdb bs=1MB count=2049
dd if=/dev/zero of=/dev/sdc bs=1MB count=2049
dd if=/dev/zero of=/dev/sdd bs=1MB count=2049
mdadm -C /dev/md0 -amd -R -l5 -n3 -x0 /dev/sd[abc] -z 2097152
\# Ensure array stores non-zero data
dd if=/root/data_4GB.iso of=/dev/md0 bs=1MB
\# Start replacement
mdadm /dev/md0 -a /dev/sdd
mdadm /dev/md0 --replace /dev/sda

Then, Hot-plug out /dev/sda during recovery, and wait for recovery done.
echo check > /sys/block/md0/md/sync_action
cat /sys/block/md0/md/mismatch_cnt # it will be greater than 0.

Soon after you hot-plug out /dev/sda, you will see many WARN_ON
messages. The replacement recovery will be interrupted shortly. After
the recovery finishes, it will result in data corruption.

Actually, it's just an unhandled case of replacement. In commit
<f94c0b6658c7> (md/raid5: fix interaction of 'replace' and 'recovery'.),
if a NeedReplace device is not UPTODATE then that is an error, the
commit just simply print WARN_ON but also mark these corrupted stripes
with R5_WantReplace. (it means it's ready for writes.)

To fix this case, we can leverage 'sync and replace' mode mentioned in
commit <9a3e1101b827> (md/raid5: detect and handle replacements during
recovery.). We can add logics to detect and use 'sync and replace' mode
for these stripes.

Reported-by: Alex Chen <[email protected]>
Reviewed-by: Alex Wu <[email protected]>
Reviewed-by: Chung-Chiang Cheng <[email protected]>
Signed-off-by: BingJing Chang <[email protected]>
Signed-off-by: Shaohua Li <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
drivers/md/raid5.c | 6 ++++++
1 file changed, 6 insertions(+)

--- a/drivers/md/raid5.c
+++ b/drivers/md/raid5.c
@@ -4516,6 +4516,12 @@ static void analyse_stripe(struct stripe
s->failed++;
if (rdev && !test_bit(Faulty, &rdev->flags))
do_recovery = 1;
+ else if (!rdev) {
+ rdev = rcu_dereference(
+ conf->disks[i].replacement);
+ if (rdev && !test_bit(Faulty, &rdev->flags))
+ do_recovery = 1;
+ }
}

if (test_bit(R5_InJournal, &dev->flags))



2018-09-17 23:02:01

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.14 036/126] media: camss: csid: Configure data type and decode format properly

4.14-stable review patch. If anyone has any objections, please let me know.

------------------

From: Todor Tomov <[email protected]>

[ Upstream commit c628e78899ff8006b5f9d8206da54ed3bb994342 ]

The CSID decodes the input data stream. When the input comes from
the Test Generator the format of the stream is set on the source
media pad. When the input comes from the CSIPHY the format is the
one on the sink media pad. Use the proper format for each case.

Signed-off-by: Todor Tomov <[email protected]>
Signed-off-by: Hans Verkuil <[email protected]>
Signed-off-by: Mauro Carvalho Chehab <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
drivers/media/platform/qcom/camss-8x16/camss-csid.c | 16 +++++++++++-----
1 file changed, 11 insertions(+), 5 deletions(-)

--- a/drivers/media/platform/qcom/camss-8x16/camss-csid.c
+++ b/drivers/media/platform/qcom/camss-8x16/camss-csid.c
@@ -392,9 +392,6 @@ static int csid_set_stream(struct v4l2_s
!media_entity_remote_pad(&csid->pads[MSM_CSID_PAD_SINK]))
return -ENOLINK;

- dt = csid_get_fmt_entry(csid->fmt[MSM_CSID_PAD_SRC].code)->
- data_type;
-
if (tg->enabled) {
/* Config Test Generator */
struct v4l2_mbus_framefmt *f =
@@ -416,6 +413,9 @@ static int csid_set_stream(struct v4l2_s
writel_relaxed(val, csid->base +
CAMSS_CSID_TG_DT_n_CGG_0(0));

+ dt = csid_get_fmt_entry(
+ csid->fmt[MSM_CSID_PAD_SRC].code)->data_type;
+
/* 5:0 data type */
val = dt;
writel_relaxed(val, csid->base +
@@ -425,6 +425,9 @@ static int csid_set_stream(struct v4l2_s
val = tg->payload_mode;
writel_relaxed(val, csid->base +
CAMSS_CSID_TG_DT_n_CGG_2(0));
+
+ df = csid_get_fmt_entry(
+ csid->fmt[MSM_CSID_PAD_SRC].code)->decode_format;
} else {
struct csid_phy_config *phy = &csid->phy;

@@ -439,13 +442,16 @@ static int csid_set_stream(struct v4l2_s

writel_relaxed(val,
csid->base + CAMSS_CSID_CORE_CTRL_1);
+
+ dt = csid_get_fmt_entry(
+ csid->fmt[MSM_CSID_PAD_SINK].code)->data_type;
+ df = csid_get_fmt_entry(
+ csid->fmt[MSM_CSID_PAD_SINK].code)->decode_format;
}

/* Config LUT */

dt_shift = (cid % 4) * 8;
- df = csid_get_fmt_entry(csid->fmt[MSM_CSID_PAD_SINK].code)->
- decode_format;

val = readl_relaxed(csid->base + CAMSS_CSID_CID_LUT_VC_n(vc));
val &= ~(0xff << dt_shift);



2018-09-17 23:02:01

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.14 035/126] timers: Clear timer_base::must_forward_clk with timer_base::lock held

4.14-stable review patch. If anyone has any objections, please let me know.

------------------

From: Gaurav Kohli <[email protected]>

[ Upstream commit 363e934d8811d799c88faffc5bfca782fd728334 ]

timer_base::must_forward_clock is indicating that the base clock might be
stale due to a long idle sleep.

The forwarding of the base clock takes place in the timer softirq or when a
timer is enqueued to a base which is idle. If the enqueue of timer to an
idle base happens from a remote CPU, then the following race can happen:

CPU0 CPU1
run_timer_softirq mod_timer

base = lock_timer_base(timer);
base->must_forward_clk = false
if (base->must_forward_clk)
forward(base); -> skipped

enqueue_timer(base, timer, idx);
-> idx is calculated high due to
stale base
unlock_timer_base(timer);
base = lock_timer_base(timer);
forward(base);

The root cause is that timer_base::must_forward_clk is cleared outside the
timer_base::lock held region, so the remote queuing CPU observes it as
cleared, but the base clock is still stale. This can cause large
granularity values for timers, i.e. the accuracy of the expiry time
suffers.

Prevent this by clearing the flag with timer_base::lock held, so that the
forwarding takes place before the cleared flag is observable by a remote
CPU.

Signed-off-by: Gaurav Kohli <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Sasha Levin <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
kernel/time/timer.c | 29 ++++++++++++++++-------------
1 file changed, 16 insertions(+), 13 deletions(-)

--- a/kernel/time/timer.c
+++ b/kernel/time/timer.c
@@ -1609,6 +1609,22 @@ static inline void __run_timers(struct t

raw_spin_lock_irq(&base->lock);

+ /*
+ * timer_base::must_forward_clk must be cleared before running
+ * timers so that any timer functions that call mod_timer() will
+ * not try to forward the base. Idle tracking / clock forwarding
+ * logic is only used with BASE_STD timers.
+ *
+ * The must_forward_clk flag is cleared unconditionally also for
+ * the deferrable base. The deferrable base is not affected by idle
+ * tracking and never forwarded, so clearing the flag is a NOOP.
+ *
+ * The fact that the deferrable base is never forwarded can cause
+ * large variations in granularity for deferrable timers, but they
+ * can be deferred for long periods due to idle anyway.
+ */
+ base->must_forward_clk = false;
+
while (time_after_eq(jiffies, base->clk)) {

levels = collect_expired_timers(base, heads);
@@ -1628,19 +1644,6 @@ static __latent_entropy void run_timer_s
{
struct timer_base *base = this_cpu_ptr(&timer_bases[BASE_STD]);

- /*
- * must_forward_clk must be cleared before running timers so that any
- * timer functions that call mod_timer will not try to forward the
- * base. idle trcking / clock forwarding logic is only used with
- * BASE_STD timers.
- *
- * The deferrable base does not do idle tracking at all, so we do
- * not forward it. This can result in very large variations in
- * granularity for deferrable timers, but they can be deferred for
- * long periods due to idle.
- */
- base->must_forward_clk = false;
-
__run_timers(base);
if (IS_ENABLED(CONFIG_NO_HZ_COMMON))
__run_timers(this_cpu_ptr(&timer_bases[BASE_DEF]));



2018-09-17 23:02:03

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.14 010/126] Btrfs: fix data corruption when deduplicating between different files

4.14-stable review patch. If anyone has any objections, please let me know.

------------------

From: Filipe Manana <[email protected]>

commit de02b9f6bb65a6a1848f346f7a3617b7a9b930c0 upstream.

If we deduplicate extents between two different files we can end up
corrupting data if the source range ends at the size of the source file,
the source file's size is not aligned to the filesystem's block size
and the destination range does not go past the size of the destination
file size.

Example:

$ mkfs.btrfs -f /dev/sdb
$ mount /dev/sdb /mnt

$ xfs_io -f -c "pwrite -S 0x6b 0 2518890" /mnt/foo
# The first byte with a value of 0xae starts at an offset (2518890)
# which is not a multiple of the sector size.
$ xfs_io -c "pwrite -S 0xae 2518890 102398" /mnt/foo

# Confirm the file content is full of bytes with values 0x6b and 0xae.
$ od -t x1 /mnt/foo
0000000 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b
*
11467540 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b ae ae ae ae ae ae
11467560 ae ae ae ae ae ae ae ae ae ae ae ae ae ae ae ae
*
11777540 ae ae ae ae ae ae ae ae
11777550

# Create a second file with a length not aligned to the sector size,
# whose bytes all have the value 0x6b, so that its extent(s) can be
# deduplicated with the first file.
$ xfs_io -f -c "pwrite -S 0x6b 0 557771" /mnt/bar

# Now deduplicate the entire second file into a range of the first file
# that also has all bytes with the value 0x6b. The destination range's
# end offset must not be aligned to the sector size and must be less
# then the offset of the first byte with the value 0xae (byte at offset
# 2518890).
$ xfs_io -c "dedupe /mnt/bar 0 1957888 557771" /mnt/foo

# The bytes in the range starting at offset 2515659 (end of the
# deduplication range) and ending at offset 2519040 (start offset
# rounded up to the block size) must all have the value 0xae (and not
# replaced with 0x00 values). In other words, we should have exactly
# the same data we had before we asked for deduplication.
$ od -t x1 /mnt/foo
0000000 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b
*
11467540 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b ae ae ae ae ae ae
11467560 ae ae ae ae ae ae ae ae ae ae ae ae ae ae ae ae
*
11777540 ae ae ae ae ae ae ae ae
11777550

# Unmount the filesystem and mount it again. This guarantees any file
# data in the page cache is dropped.
$ umount /dev/sdb
$ mount /dev/sdb /mnt

$ od -t x1 /mnt/foo
0000000 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b
*
11461300 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 00 00 00 00 00
11461320 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
*
11470000 ae ae ae ae ae ae ae ae ae ae ae ae ae ae ae ae
*
11777540 ae ae ae ae ae ae ae ae
11777550

# The bytes in range 2515659 to 2519040 have a value of 0x00 and not a
# value of 0xae, data corruption happened due to the deduplication
# operation.

So fix this by rounding down, to the sector size, the length used for the
deduplication when the following conditions are met:

1) Source file's range ends at its i_size;
2) Source file's i_size is not aligned to the sector size;
3) Destination range does not cross the i_size of the destination file.

Fixes: e1d227a42ea2 ("btrfs: Handle unaligned length in extent_same")
CC: [email protected] # 4.2+
Signed-off-by: Filipe Manana <[email protected]>
Signed-off-by: David Sterba <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
fs/btrfs/ioctl.c | 19 +++++++++++++++++++
1 file changed, 19 insertions(+)

--- a/fs/btrfs/ioctl.c
+++ b/fs/btrfs/ioctl.c
@@ -3158,6 +3158,25 @@ static int btrfs_extent_same(struct inod

same_lock_start = min_t(u64, loff, dst_loff);
same_lock_len = max_t(u64, loff, dst_loff) + len - same_lock_start;
+ } else {
+ /*
+ * If the source and destination inodes are different, the
+ * source's range end offset matches the source's i_size, that
+ * i_size is not a multiple of the sector size, and the
+ * destination range does not go past the destination's i_size,
+ * we must round down the length to the nearest sector size
+ * multiple. If we don't do this adjustment we end replacing
+ * with zeroes the bytes in the range that starts at the
+ * deduplication range's end offset and ends at the next sector
+ * size multiple.
+ */
+ if (loff + olen == i_size_read(src) &&
+ dst_loff + len < i_size_read(dst)) {
+ const u64 sz = BTRFS_I(src)->root->fs_info->sectorsize;
+
+ len = round_down(i_size_read(src), sz) - loff;
+ olen = len;
+ }
}

/* don't make the dst file partly checksummed */



2018-09-17 23:02:03

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.14 019/126] crypto: aes-generic - fix aes-generic regression on powerpc

4.14-stable review patch. If anyone has any objections, please let me know.

------------------

From: Arnd Bergmann <[email protected]>

commit 6e36719fbe90213fbba9f50093fa2d4d69b0e93c upstream.

My last bugfix added -Os on the command line, which unfortunately caused
a build regression on powerpc in some configurations.

I've done some more analysis of the original problem and found slightly
different workaround that avoids this regression and also results in
better performance on gcc-7.0: -fcode-hoisting is an optimization step
that got added in gcc-7 and that for all gcc-7 versions causes worse
performance.

This disables -fcode-hoisting on all compilers that understand the option.
For gcc-7.1 and 7.2 I found the same performance as my previous patch
(using -Os), in gcc-7.0 it was even better. On gcc-8 I could see no
change in performance from this patch. In theory, code hoisting should
not be able make things better for the AES cipher, so leaving it
disabled for gcc-8 only serves to simplify the Makefile change.

Reported-by: kbuild test robot <[email protected]>
Link: https://www.mail-archive.com/[email protected]/msg30418.html
Link: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83356
Link: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83651
Fixes: 148b974deea9 ("crypto: aes-generic - build with -Os on gcc-7+")
Signed-off-by: Arnd Bergmann <[email protected]>
Signed-off-by: Herbert Xu <[email protected]>
Cc: Horia Geanta <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
crypto/Makefile | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

--- a/crypto/Makefile
+++ b/crypto/Makefile
@@ -98,7 +98,7 @@ obj-$(CONFIG_CRYPTO_TWOFISH_COMMON) += t
obj-$(CONFIG_CRYPTO_SERPENT) += serpent_generic.o
CFLAGS_serpent_generic.o := $(call cc-option,-fsched-pressure) # https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79149
obj-$(CONFIG_CRYPTO_AES) += aes_generic.o
-CFLAGS_aes_generic.o := $(call cc-ifversion, -ge, 0701, -Os) # https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83356
+CFLAGS_aes_generic.o := $(call cc-option,-fno-code-hoisting) # https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83356
obj-$(CONFIG_CRYPTO_AES_TI) += aes_ti.o
obj-$(CONFIG_CRYPTO_CAMELLIA) += camellia_generic.o
obj-$(CONFIG_CRYPTO_CAST_COMMON) += cast_common.o



2018-09-17 23:02:08

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.14 039/126] uio: potential double frees if __uio_register_device() fails

4.14-stable review patch. If anyone has any objections, please let me know.

------------------

From: Dan Carpenter <[email protected]>

[ Upstream commit f019f07ecf6a6b8bd6d7853bce70925d90af02d1 ]

The uio_unregister_device() function assumes that if "info->uio_dev" is
non-NULL that means "info" is fully allocated. Setting info->uio_de
has to be the last thing in the function.

In the current code, if request_threaded_irq() fails then we return with
info->uio_dev set to non-NULL but info is not fully allocated and it can
lead to double frees.

Fixes: beafc54c4e2f ("UIO: Add the User IO core code")
Signed-off-by: Dan Carpenter <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
drivers/uio/uio.c | 3 +--
1 file changed, 1 insertion(+), 2 deletions(-)

--- a/drivers/uio/uio.c
+++ b/drivers/uio/uio.c
@@ -841,8 +841,6 @@ int __uio_register_device(struct module
if (ret)
goto err_uio_dev_add_attributes;

- info->uio_dev = idev;
-
if (info->irq && (info->irq != UIO_IRQ_CUSTOM)) {
/*
* Note that we deliberately don't use devm_request_irq
@@ -858,6 +856,7 @@ int __uio_register_device(struct module
goto err_request_irq;
}

+ info->uio_dev = idev;
return 0;

err_request_irq:



2018-09-17 23:02:18

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.14 040/126] firmware: vpd: Fix section enabled flag on vpd_section_destroy

4.14-stable review patch. If anyone has any objections, please let me know.

------------------

From: Anton Vasilyev <[email protected]>

[ Upstream commit 45ca3f76de0507ecf143f770570af2942f263812 ]

static struct ro_vpd and rw_vpd are initialized by vpd_sections_init()
in vpd_probe() based on header's ro and rw sizes.
In vpd_remove() vpd_section_destroy() performs deinitialization based
on enabled flag, which is set to true by vpd_sections_init().
This leads to call of vpd_section_destroy() on already destroyed section
for probe-release-probe-release sequence if first probe performs
ro_vpd initialization and second probe does not initialize it.

The patch adds changing enabled flag on vpd_section_destroy and adds
cleanup on the error path of vpd_sections_init.

Found by Linux Driver Verification project (linuxtesting.org).

Signed-off-by: Anton Vasilyev <[email protected]>
Reviewed-by: Guenter Roeck <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
drivers/firmware/google/vpd.c | 5 ++++-
1 file changed, 4 insertions(+), 1 deletion(-)

--- a/drivers/firmware/google/vpd.c
+++ b/drivers/firmware/google/vpd.c
@@ -246,6 +246,7 @@ static int vpd_section_destroy(struct vp
sysfs_remove_bin_file(vpd_kobj, &sec->bin_attr);
kfree(sec->raw_name);
memunmap(sec->baseaddr);
+ sec->enabled = false;
}

return 0;
@@ -279,8 +280,10 @@ static int vpd_sections_init(phys_addr_t
ret = vpd_section_init("rw", &rw_vpd,
physaddr + sizeof(struct vpd_cbmem) +
header.ro_size, header.rw_size);
- if (ret)
+ if (ret) {
+ vpd_section_destroy(&ro_vpd);
return ret;
+ }
}

return 0;



2018-09-17 23:02:29

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.14 011/126] KVM: s390: vsie: copy wrapping keys to right place

4.14-stable review patch. If anyone has any objections, please let me know.

------------------

From: Pierre Morel <[email protected]>

commit 204c97245612b6c255edf4e21e24d417c4a0c008 upstream.

Copy the key mask to the right offset inside the shadow CRYCB

Fixes: bbeaa58b3 ("KVM: s390: vsie: support aes dea wrapping keys")
Signed-off-by: Pierre Morel <[email protected]>
Reviewed-by: David Hildenbrand <[email protected]>
Reviewed-by: Cornelia Huck <[email protected]>
Reviewed-by: Janosch Frank <[email protected]>
Cc: [email protected] # v4.8+
Message-Id: <[email protected]>
Signed-off-by: Janosch Frank <[email protected]>
Signed-off-by: Christian Borntraeger <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
arch/s390/kvm/vsie.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)

--- a/arch/s390/kvm/vsie.c
+++ b/arch/s390/kvm/vsie.c
@@ -170,7 +170,8 @@ static int shadow_crycb(struct kvm_vcpu
return set_validity_icpt(scb_s, 0x0039U);

/* copy only the wrapping keys */
- if (read_guest_real(vcpu, crycb_addr + 72, &vsie_page->crycb, 56))
+ if (read_guest_real(vcpu, crycb_addr + 72,
+ vsie_page->crycb.dea_wrapping_key_mask, 56))
return set_validity_icpt(scb_s, 0x0035U);

scb_s->ecb3 |= ecb3_flags;



2018-09-17 23:02:37

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.14 027/126] powerpc/powernv: Fix concurrency issue with npu->mmio_atsd_usage

4.14-stable review patch. If anyone has any objections, please let me know.

------------------

From: Reza Arbab <[email protected]>

[ Upstream commit 9eab9901b015f489199105c470de1ffc337cfabb ]

We've encountered a performance issue when multiple processors stress
{get,put}_mmio_atsd_reg(). These functions contend for
mmio_atsd_usage, an unsigned long used as a bitmask.

The accesses to mmio_atsd_usage are done using test_and_set_bit_lock()
and clear_bit_unlock(). As implemented, both of these will require
a (successful) stwcx to that same cache line.

What we end up with is thread A, attempting to unlock, being slowed by
other threads repeatedly attempting to lock. A's stwcx instructions
fail and retry because the memory reservation is lost every time a
different thread beats it to the punch.

There may be a long-term way to fix this at a larger scale, but for
now resolve the immediate problem by gating our call to
test_and_set_bit_lock() with one to test_bit(), which is obviously
implemented without using a store.

Fixes: 1ab66d1fbada ("powerpc/powernv: Introduce address translation services for Nvlink2")
Signed-off-by: Reza Arbab <[email protected]>
Acked-by: Alistair Popple <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
arch/powerpc/platforms/powernv/npu-dma.c | 5 +++--
1 file changed, 3 insertions(+), 2 deletions(-)

--- a/arch/powerpc/platforms/powernv/npu-dma.c
+++ b/arch/powerpc/platforms/powernv/npu-dma.c
@@ -427,8 +427,9 @@ static int get_mmio_atsd_reg(struct npu
int i;

for (i = 0; i < npu->mmio_atsd_count; i++) {
- if (!test_and_set_bit_lock(i, &npu->mmio_atsd_usage))
- return i;
+ if (!test_bit(i, &npu->mmio_atsd_usage))
+ if (!test_and_set_bit_lock(i, &npu->mmio_atsd_usage))
+ return i;
}

return -ENOSPC;



2018-09-17 23:02:37

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.14 013/126] ALSA: hda - Fix cancel_work_sync() stall from jackpoll work

4.14-stable review patch. If anyone has any objections, please let me know.

------------------

From: Takashi Iwai <[email protected]>

commit 16037643969e095509cd8446a3f8e406a6dc3a2c upstream.

On AMD/ATI controllers, the HD-audio controller driver allows a bus
reset upon the error recovery, and its procedure includes the
cancellation of pending jack polling work as found in
snd_hda_bus_codec_reset(). This works usually fine, but it becomes a
problem when the reset happens from the jack poll work itself; then
calling cancel_work_sync() from the work being processed tries to wait
the finish endlessly.

As a workaround, this patch adds the check of current_work() and
applies the cancel_work_sync() only when it's not from the
jackpoll_work.

This doesn't fix the root cause of the reported error below, but at
least, it eases the unexpected stall of the whole system.

Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=200937
Cc: <[email protected]>
Cc: Lukas Wunner <[email protected]>
Signed-off-by: Takashi Iwai <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
sound/pci/hda/hda_codec.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)

--- a/sound/pci/hda/hda_codec.c
+++ b/sound/pci/hda/hda_codec.c
@@ -3923,7 +3923,8 @@ void snd_hda_bus_reset_codecs(struct hda

list_for_each_codec(codec, bus) {
/* FIXME: maybe a better way needed for forced reset */
- cancel_delayed_work_sync(&codec->jackpoll_work);
+ if (current_work() != &codec->jackpoll_work.work)
+ cancel_delayed_work_sync(&codec->jackpoll_work);
#ifdef CONFIG_PM
if (hda_codec_is_power_on(codec)) {
hda_call_codec_suspend(codec);



2018-09-17 23:02:37

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.14 014/126] cpu/hotplug: Adjust misplaced smb() in cpuhp_thread_fun()

4.14-stable review patch. If anyone has any objections, please let me know.

------------------

From: Neeraj Upadhyay <[email protected]>

commit f8b7530aa0a1def79c93101216b5b17cf408a70a upstream.

The smp_mb() in cpuhp_thread_fun() is misplaced. It needs to be after the
load of st->should_run to prevent reordering of the later load/stores
w.r.t. the load of st->should_run.

Fixes: 4dddfb5faa61 ("smp/hotplug: Rewrite AP state machine core")
Signed-off-by: Neeraj Upadhyay <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Acked-by: Peter Zijlstra (Intel) <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
kernel/cpu.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)

--- a/kernel/cpu.c
+++ b/kernel/cpu.c
@@ -612,15 +612,15 @@ static void cpuhp_thread_fun(unsigned in
bool bringup = st->bringup;
enum cpuhp_state state;

+ if (WARN_ON_ONCE(!st->should_run))
+ return;
+
/*
* ACQUIRE for the cpuhp_should_run() load of ->should_run. Ensures
* that if we see ->should_run we also see the rest of the state.
*/
smp_mb();

- if (WARN_ON_ONCE(!st->should_run))
- return;
-
cpuhp_lock_acquire(bringup);

if (st->single) {



2018-09-17 23:02:40

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.14 015/126] cpu/hotplug: Prevent state corruption on error rollback

4.14-stable review patch. If anyone has any objections, please let me know.

------------------

From: Thomas Gleixner <[email protected]>

commit 69fa6eb7d6a64801ea261025cce9723d9442d773 upstream.

When a teardown callback fails, the CPU hotplug code brings the CPU back to
the previous state. The previous state becomes the new target state. The
rollback happens in undo_cpu_down() which increments the state
unconditionally even if the state is already the same as the target.

As a consequence the next CPU hotplug operation will start at the wrong
state. This is easily to observe when __cpu_disable() fails.

Prevent the unconditional undo by checking the state vs. target before
incrementing state and fix up the consequently wrong conditional in the
unplug code which handles the failure of the final CPU take down on the
control CPU side.

Fixes: 4dddfb5faa61 ("smp/hotplug: Rewrite AP state machine core")
Reported-by: Neeraj Upadhyay <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Tested-by: Geert Uytterhoeven <[email protected]>
Tested-by: Sudeep Holla <[email protected]>
Tested-by: Neeraj Upadhyay <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Greg Kroah-Hartman <[email protected]>

----

---
kernel/cpu.c | 5 +++--
1 file changed, 3 insertions(+), 2 deletions(-)

--- a/kernel/cpu.c
+++ b/kernel/cpu.c
@@ -932,7 +932,8 @@ static int cpuhp_down_callbacks(unsigned
ret = cpuhp_invoke_callback(cpu, st->state, false, NULL, NULL);
if (ret) {
st->target = prev_state;
- undo_cpu_down(cpu, st);
+ if (st->state < prev_state)
+ undo_cpu_down(cpu, st);
break;
}
}
@@ -985,7 +986,7 @@ static int __ref _cpu_down(unsigned int
* to do the further cleanups.
*/
ret = cpuhp_down_callbacks(cpu, st, target);
- if (ret && st->state > CPUHP_TEARDOWN_CPU && st->state < prev_state) {
+ if (ret && st->state == CPUHP_TEARDOWN_CPU && st->state < prev_state) {
cpuhp_reset_state(st, prev_state);
__cpuhp_kick_ap(st);
}



2018-09-17 23:02:41

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.14 042/126] tty: rocket: Fix possible buffer overwrite on register_PCI

4.14-stable review patch. If anyone has any objections, please let me know.

------------------

From: Anton Vasilyev <[email protected]>

[ Upstream commit 0419056ec8fd01ddf5460d2dba0491aad22657dd ]

If number of isa and pci boards exceed NUM_BOARDS on the path
rp_init()->init_PCI()->register_PCI() then buffer overwrite occurs
in register_PCI() on assign rcktpt_io_addr[i].

The patch adds check on upper bound for index of registered
board in register_PCI.

Found by Linux Driver Verification project (linuxtesting.org).

Signed-off-by: Anton Vasilyev <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
drivers/tty/rocket.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

--- a/drivers/tty/rocket.c
+++ b/drivers/tty/rocket.c
@@ -1894,7 +1894,7 @@ static __init int register_PCI(int i, st
ByteIO_t UPCIRingInd = 0;

if (!dev || !pci_match_id(rocket_pci_ids, dev) ||
- pci_enable_device(dev))
+ pci_enable_device(dev) || i >= NUM_BOARDS)
return 0;

rcktpt_io_addr[i] = pci_resource_start(dev, 0);



2018-09-17 23:02:46

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.14 007/126] MIPS: VDSO: Match data page cache colouring when D$ aliases

4.14-stable review patch. If anyone has any objections, please let me know.

------------------

From: Paul Burton <[email protected]>

commit 0f02cfbc3d9e413d450d8d0fd660077c23f67eff upstream.

When a system suffers from dcache aliasing a user program may observe
stale VDSO data from an aliased cache line. Notably this can break the
expectation that clock_gettime(CLOCK_MONOTONIC, ...) is, as its name
suggests, monotonic.

In order to ensure that users observe updates to the VDSO data page as
intended, align the user mappings of the VDSO data page such that their
cache colouring matches that of the virtual address range which the
kernel will use to update the data page - typically its unmapped address
within kseg0.

This ensures that we don't introduce aliasing cache lines for the VDSO
data page, and therefore that userland will observe updates without
requiring cache invalidation.

Signed-off-by: Paul Burton <[email protected]>
Reported-by: Hauke Mehrtens <[email protected]>
Reported-by: Rene Nielsen <[email protected]>
Reported-by: Alexandre Belloni <[email protected]>
Fixes: ebb5e78cc634 ("MIPS: Initial implementation of a VDSO")
Patchwork: https://patchwork.linux-mips.org/patch/20344/
Tested-by: Alexandre Belloni <[email protected]>
Tested-by: Hauke Mehrtens <[email protected]>
Cc: James Hogan <[email protected]>
Cc: [email protected]
Cc: [email protected] # v4.4+
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
arch/mips/kernel/vdso.c | 20 ++++++++++++++++++++
1 file changed, 20 insertions(+)

--- a/arch/mips/kernel/vdso.c
+++ b/arch/mips/kernel/vdso.c
@@ -13,6 +13,7 @@
#include <linux/err.h>
#include <linux/init.h>
#include <linux/ioport.h>
+#include <linux/kernel.h>
#include <linux/mm.h>
#include <linux/sched.h>
#include <linux/slab.h>
@@ -20,6 +21,7 @@

#include <asm/abi.h>
#include <asm/mips-cps.h>
+#include <asm/page.h>
#include <asm/vdso.h>

/* Kernel-provided data used by the VDSO. */
@@ -128,12 +130,30 @@ int arch_setup_additional_pages(struct l
vvar_size = gic_size + PAGE_SIZE;
size = vvar_size + image->size;

+ /*
+ * Find a region that's large enough for us to perform the
+ * colour-matching alignment below.
+ */
+ if (cpu_has_dc_aliases)
+ size += shm_align_mask + 1;
+
base = get_unmapped_area(NULL, 0, size, 0, 0);
if (IS_ERR_VALUE(base)) {
ret = base;
goto out;
}

+ /*
+ * If we suffer from dcache aliasing, ensure that the VDSO data page
+ * mapping is coloured the same as the kernel's mapping of that memory.
+ * This ensures that when the kernel updates the VDSO data userland
+ * will observe it without requiring cache invalidations.
+ */
+ if (cpu_has_dc_aliases) {
+ base = __ALIGN_MASK(base, shm_align_mask);
+ base += ((unsigned long)&vdso_data - gic_size) & shm_align_mask;
+ }
+
data_addr = base + gic_size;
vdso_addr = data_addr + PAGE_SIZE;




2018-09-17 23:02:50

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.14 045/126] f2fs: fix defined but not used build warnings

4.14-stable review patch. If anyone has any objections, please let me know.

------------------

From: Randy Dunlap <[email protected]>

[ Upstream commit cb15d1e43db0a6341c1e26ac6a2c74e61b74f1aa ]

Fix build warnings in f2fs when CONFIG_PROC_FS is not enabled
by marking the unused functions as __maybe_unused.

../fs/f2fs/sysfs.c:519:12: warning: 'segment_info_seq_show' defined but not used [-Wunused-function]
../fs/f2fs/sysfs.c:546:12: warning: 'segment_bits_seq_show' defined but not used [-Wunused-function]
../fs/f2fs/sysfs.c:570:12: warning: 'iostat_info_seq_show' defined but not used [-Wunused-function]

Signed-off-by: Randy Dunlap <[email protected]>
Cc: Jaegeuk Kim <[email protected]>
Cc: Chao Yu <[email protected]>
Cc: [email protected]
Signed-off-by: Jaegeuk Kim <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
fs/f2fs/sysfs.c | 10 +++++++---
1 file changed, 7 insertions(+), 3 deletions(-)

--- a/fs/f2fs/sysfs.c
+++ b/fs/f2fs/sysfs.c
@@ -9,6 +9,7 @@
* it under the terms of the GNU General Public License version 2 as
* published by the Free Software Foundation.
*/
+#include <linux/compiler.h>
#include <linux/proc_fs.h>
#include <linux/f2fs_fs.h>
#include <linux/seq_file.h>
@@ -381,7 +382,8 @@ static struct kobject f2fs_feat = {
.kset = &f2fs_kset,
};

-static int segment_info_seq_show(struct seq_file *seq, void *offset)
+static int __maybe_unused segment_info_seq_show(struct seq_file *seq,
+ void *offset)
{
struct super_block *sb = seq->private;
struct f2fs_sb_info *sbi = F2FS_SB(sb);
@@ -408,7 +410,8 @@ static int segment_info_seq_show(struct
return 0;
}

-static int segment_bits_seq_show(struct seq_file *seq, void *offset)
+static int __maybe_unused segment_bits_seq_show(struct seq_file *seq,
+ void *offset)
{
struct super_block *sb = seq->private;
struct f2fs_sb_info *sbi = F2FS_SB(sb);
@@ -432,7 +435,8 @@ static int segment_bits_seq_show(struct
return 0;
}

-static int iostat_info_seq_show(struct seq_file *seq, void *offset)
+static int __maybe_unused iostat_info_seq_show(struct seq_file *seq,
+ void *offset)
{
struct super_block *sb = seq->private;
struct f2fs_sb_info *sbi = F2FS_SB(sb);



2018-09-17 23:02:52

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.14 054/126] wlcore: Set rx_status boottime_ns field on rx

4.14-stable review patch. If anyone has any objections, please let me know.

------------------

From: Loic Poulain <[email protected]>

[ Upstream commit 37a634f60fd6dfbda2c312657eec7ef0750546e7 ]

When receiving a beacon or probe response, we should update the
boottime_ns field which is the timestamp the frame was received at.
(cf mac80211.h)

This fixes a scanning issue with Android since it relies on this
timestamp to determine when the AP has been seen for the last time
(via the nl80211 BSS_LAST_SEEN_BOOTTIME parameter).

Signed-off-by: Loic Poulain <[email protected]>
Signed-off-by: Kalle Valo <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
drivers/net/wireless/ti/wlcore/rx.c | 8 ++++++--
1 file changed, 6 insertions(+), 2 deletions(-)

--- a/drivers/net/wireless/ti/wlcore/rx.c
+++ b/drivers/net/wireless/ti/wlcore/rx.c
@@ -59,7 +59,7 @@ static u32 wlcore_rx_get_align_buf_size(
static void wl1271_rx_status(struct wl1271 *wl,
struct wl1271_rx_descriptor *desc,
struct ieee80211_rx_status *status,
- u8 beacon)
+ u8 beacon, u8 probe_rsp)
{
memset(status, 0, sizeof(struct ieee80211_rx_status));

@@ -106,6 +106,9 @@ static void wl1271_rx_status(struct wl12
}
}

+ if (beacon || probe_rsp)
+ status->boottime_ns = ktime_get_boot_ns();
+
if (beacon)
wlcore_set_pending_regdomain_ch(wl, (u16)desc->channel,
status->band);
@@ -191,7 +194,8 @@ static int wl1271_rx_handle_data(struct
if (ieee80211_is_data_present(hdr->frame_control))
is_data = 1;

- wl1271_rx_status(wl, desc, IEEE80211_SKB_RXCB(skb), beacon);
+ wl1271_rx_status(wl, desc, IEEE80211_SKB_RXCB(skb), beacon,
+ ieee80211_is_probe_resp(hdr->frame_control));
wlcore_hw_set_rx_csum(wl, desc, skb);

seq_num = (le16_to_cpu(hdr->seq_ctrl) & IEEE80211_SCTL_SEQ) >> 4;



2018-09-17 23:02:52

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.14 030/126] media: dw2102: Fix memleak on sequence of probes

4.14-stable review patch. If anyone has any objections, please let me know.

------------------

From: Anton Vasilyev <[email protected]>

[ Upstream commit 299c7007e93645067e1d2743f4e50156de78c4ff ]

Each call to dw2102_probe() allocates memory by kmemdup for structures
p1100, s660, p7500 and s421, but there is no their deallocation.
dvb_usb_device_init() copies the corresponding structure into
dvb_usb_device->props, so there is no use of original structure after
dvb_usb_device_init().

The patch moves structures from global scope to local and adds their
deallocation.

Found by Linux Driver Verification project (linuxtesting.org).

Signed-off-by: Anton Vasilyev <[email protected]>
Signed-off-by: Mauro Carvalho Chehab <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
drivers/media/usb/dvb-usb/dw2102.c | 19 ++++++++++++++-----
1 file changed, 14 insertions(+), 5 deletions(-)

--- a/drivers/media/usb/dvb-usb/dw2102.c
+++ b/drivers/media/usb/dvb-usb/dw2102.c
@@ -2103,14 +2103,12 @@ static struct dvb_usb_device_properties
}
};

-static struct dvb_usb_device_properties *p1100;
static const struct dvb_usb_device_description d1100 = {
"Prof 1100 USB ",
{&dw2102_table[PROF_1100], NULL},
{NULL},
};

-static struct dvb_usb_device_properties *s660;
static const struct dvb_usb_device_description d660 = {
"TeVii S660 USB",
{&dw2102_table[TEVII_S660], NULL},
@@ -2129,14 +2127,12 @@ static const struct dvb_usb_device_descr
{NULL},
};

-static struct dvb_usb_device_properties *p7500;
static const struct dvb_usb_device_description d7500 = {
"Prof 7500 USB DVB-S2",
{&dw2102_table[PROF_7500], NULL},
{NULL},
};

-static struct dvb_usb_device_properties *s421;
static const struct dvb_usb_device_description d421 = {
"TeVii S421 PCI",
{&dw2102_table[TEVII_S421], NULL},
@@ -2336,6 +2332,11 @@ static int dw2102_probe(struct usb_inter
const struct usb_device_id *id)
{
int retval = -ENOMEM;
+ struct dvb_usb_device_properties *p1100;
+ struct dvb_usb_device_properties *s660;
+ struct dvb_usb_device_properties *p7500;
+ struct dvb_usb_device_properties *s421;
+
p1100 = kmemdup(&s6x0_properties,
sizeof(struct dvb_usb_device_properties), GFP_KERNEL);
if (!p1100)
@@ -2404,8 +2405,16 @@ static int dw2102_probe(struct usb_inter
0 == dvb_usb_device_init(intf, &t220_properties,
THIS_MODULE, NULL, adapter_nr) ||
0 == dvb_usb_device_init(intf, &tt_s2_4600_properties,
- THIS_MODULE, NULL, adapter_nr))
+ THIS_MODULE, NULL, adapter_nr)) {
+
+ /* clean up copied properties */
+ kfree(s421);
+ kfree(p7500);
+ kfree(s660);
+ kfree(p1100);
+
return 0;
+ }

retval = -ENODEV;
kfree(s421);



2018-09-17 23:02:53

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.14 055/126] rpmsg: core: add support to power domains for devices

4.14-stable review patch. If anyone has any objections, please let me know.

------------------

From: Srinivas Kandagatla <[email protected]>

[ Upstream commit fe782affd0f440a4e60e2cc81b8f2eccb2923113 ]

Some of the rpmsg devices need to switch on power domains to communicate
with remote processor. For example on Qualcomm DB820c platform LPASS
power domain needs to switched on for any kind of audio services.
This patch adds the missing power domain support in rpmsg core.

Without this patch attempting to play audio via QDSP on DB820c would
reboot the system.

Signed-off-by: Srinivas Kandagatla <[email protected]>
Signed-off-by: Bjorn Andersson <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
drivers/rpmsg/rpmsg_core.c | 7 +++++++
1 file changed, 7 insertions(+)

--- a/drivers/rpmsg/rpmsg_core.c
+++ b/drivers/rpmsg/rpmsg_core.c
@@ -23,6 +23,7 @@
#include <linux/module.h>
#include <linux/rpmsg.h>
#include <linux/of_device.h>
+#include <linux/pm_domain.h>
#include <linux/slab.h>

#include "rpmsg_internal.h"
@@ -418,6 +419,10 @@ static int rpmsg_dev_probe(struct device
struct rpmsg_endpoint *ept = NULL;
int err;

+ err = dev_pm_domain_attach(dev, true);
+ if (err)
+ goto out;
+
if (rpdrv->callback) {
strncpy(chinfo.name, rpdev->id.name, RPMSG_NAME_SIZE);
chinfo.src = rpdev->src;
@@ -459,6 +464,8 @@ static int rpmsg_dev_remove(struct devic

rpdrv->remove(rpdev);

+ dev_pm_domain_detach(dev, true);
+
if (rpdev->ept)
rpmsg_destroy_ept(rpdev->ept);




2018-09-17 23:02:58

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.14 032/126] blk-mq: fix updating tags depth

4.14-stable review patch. If anyone has any objections, please let me know.

------------------

From: Ming Lei <[email protected]>

[ Upstream commit 75d6e175fc511e95ae3eb8f708680133bc211ed3 ]

The passed 'nr' from userspace represents the total depth, meantime
inside 'struct blk_mq_tags', 'nr_tags' stores the total tag depth,
and 'nr_reserved_tags' stores the reserved part.

There are two issues in blk_mq_tag_update_depth() now:

1) for growing tags, we should have used the passed 'nr', and keep the
number of reserved tags not changed.

2) the passed 'nr' should have been used for checking against
'tags->nr_tags', instead of number of the normal part.

This patch fixes the above two cases, and avoids kernel crash caused
by wrong resizing sbitmap queue.

Cc: "Ewan D. Milne" <[email protected]>
Cc: Christoph Hellwig <[email protected]>
Cc: Bart Van Assche <[email protected]>
Cc: Omar Sandoval <[email protected]>
Tested by: Marco Patalano <[email protected]>
Signed-off-by: Ming Lei <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
block/blk-mq-tag.c | 8 ++++----
1 file changed, 4 insertions(+), 4 deletions(-)

--- a/block/blk-mq-tag.c
+++ b/block/blk-mq-tag.c
@@ -416,8 +416,6 @@ int blk_mq_tag_update_depth(struct blk_m
if (tdepth <= tags->nr_reserved_tags)
return -EINVAL;

- tdepth -= tags->nr_reserved_tags;
-
/*
* If we are allowed to grow beyond the original size, allocate
* a new set of tags before freeing the old one.
@@ -437,7 +435,8 @@ int blk_mq_tag_update_depth(struct blk_m
if (tdepth > 16 * BLKDEV_MAX_RQ)
return -EINVAL;

- new = blk_mq_alloc_rq_map(set, hctx->queue_num, tdepth, 0);
+ new = blk_mq_alloc_rq_map(set, hctx->queue_num, tdepth,
+ tags->nr_reserved_tags);
if (!new)
return -ENOMEM;
ret = blk_mq_alloc_rqs(set, new, hctx->queue_num, tdepth);
@@ -454,7 +453,8 @@ int blk_mq_tag_update_depth(struct blk_m
* Don't need (or can't) update reserved tags here, they
* remain static and should never need resizing.
*/
- sbitmap_queue_resize(&tags->bitmap_tags, tdepth);
+ sbitmap_queue_resize(&tags->bitmap_tags,
+ tdepth - tags->nr_reserved_tags);
}

return 0;



2018-09-17 23:03:05

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.14 057/126] ata: libahci: Allow reconfigure of DEVSLP register

4.14-stable review patch. If anyone has any objections, please let me know.

------------------

From: Srinivas Pandruvada <[email protected]>

[ Upstream commit 11c291461b6ea8d1195a96d6bba6673a94aacebc ]

There are two modes in which DEVSLP can be entered. The OS initiated or
hardware autonomous.

In hardware autonomous mode, BIOS configures the AHCI controller and the
device to enable DEVSLP. But they may not be ideal for all cases. So in
this case, OS should be able to reconfigure DEVSLP register.

Currently if the DEVSLP is already enabled, we can't set again as it will
simply return. There are some systems where the firmware is setting high
DITO by default, in this case we can't modify here to correct settings.
With the default in several seconds, we are not able to transition to
DEVSLP.

This change will allow reconfiguration of devslp register if DITO is
different.

Signed-off-by: Srinivas Pandruvada <[email protected]>
Reviewed-by: Hans de Goede <[email protected]>
Signed-off-by: Tejun Heo <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
drivers/ata/libahci.c | 18 ++++++++++--------
1 file changed, 10 insertions(+), 8 deletions(-)

--- a/drivers/ata/libahci.c
+++ b/drivers/ata/libahci.c
@@ -2096,7 +2096,7 @@ static void ahci_set_aggressive_devslp(s
struct ahci_host_priv *hpriv = ap->host->private_data;
void __iomem *port_mmio = ahci_port_base(ap);
struct ata_device *dev = ap->link.device;
- u32 devslp, dm, dito, mdat, deto;
+ u32 devslp, dm, dito, mdat, deto, dito_conf;
int rc;
unsigned int err_mask;

@@ -2120,8 +2120,15 @@ static void ahci_set_aggressive_devslp(s
return;
}

- /* device sleep was already enabled */
- if (devslp & PORT_DEVSLP_ADSE)
+ dm = (devslp & PORT_DEVSLP_DM_MASK) >> PORT_DEVSLP_DM_OFFSET;
+ dito = devslp_idle_timeout / (dm + 1);
+ if (dito > 0x3ff)
+ dito = 0x3ff;
+
+ dito_conf = (devslp >> PORT_DEVSLP_DITO_OFFSET) & 0x3FF;
+
+ /* device sleep was already enabled and same dito */
+ if ((devslp & PORT_DEVSLP_ADSE) && (dito_conf == dito))
return;

/* set DITO, MDAT, DETO and enable DevSlp, need to stop engine first */
@@ -2129,11 +2136,6 @@ static void ahci_set_aggressive_devslp(s
if (rc)
return;

- dm = (devslp & PORT_DEVSLP_DM_MASK) >> PORT_DEVSLP_DM_OFFSET;
- dito = devslp_idle_timeout / (dm + 1);
- if (dito > 0x3ff)
- dito = 0x3ff;
-
/* Use the nominal value 10 ms if the read MDAT is zero,
* the nominal value of DETO is 20 ms.
*/



2018-09-17 23:03:15

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.14 059/126] scsi: 3ware: fix return 0 on the error path of probe

4.14-stable review patch. If anyone has any objections, please let me know.

------------------

From: Anton Vasilyev <[email protected]>

[ Upstream commit 4dc98c1995482262e70e83ef029135247fafe0f2 ]

tw_probe() returns 0 in case of fail of tw_initialize_device_extension(),
pci_resource_start() or tw_reset_sequence() and releases resources.
twl_probe() returns 0 in case of fail of twl_initialize_device_extension(),
pci_iomap() and twl_reset_sequence(). twa_probe() returns 0 in case of
fail of tw_initialize_device_extension(), ioremap() and
twa_reset_sequence().

The patch adds retval initialization for these cases.

Found by Linux Driver Verification project (linuxtesting.org).

Signed-off-by: Anton Vasilyev <[email protected]>
Acked-by: Adam Radford <[email protected]>
Signed-off-by: Martin K. Petersen <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
drivers/scsi/3w-9xxx.c | 6 +++++-
drivers/scsi/3w-sas.c | 3 +++
drivers/scsi/3w-xxxx.c | 2 ++
3 files changed, 10 insertions(+), 1 deletion(-)

--- a/drivers/scsi/3w-9xxx.c
+++ b/drivers/scsi/3w-9xxx.c
@@ -2042,6 +2042,7 @@ static int twa_probe(struct pci_dev *pde

if (twa_initialize_device_extension(tw_dev)) {
TW_PRINTK(tw_dev->host, TW_DRIVER, 0x25, "Failed to initialize device extension");
+ retval = -ENOMEM;
goto out_free_device_extension;
}

@@ -2064,6 +2065,7 @@ static int twa_probe(struct pci_dev *pde
tw_dev->base_addr = ioremap(mem_addr, mem_len);
if (!tw_dev->base_addr) {
TW_PRINTK(tw_dev->host, TW_DRIVER, 0x35, "Failed to ioremap");
+ retval = -ENOMEM;
goto out_release_mem_region;
}

@@ -2071,8 +2073,10 @@ static int twa_probe(struct pci_dev *pde
TW_DISABLE_INTERRUPTS(tw_dev);

/* Initialize the card */
- if (twa_reset_sequence(tw_dev, 0))
+ if (twa_reset_sequence(tw_dev, 0)) {
+ retval = -ENOMEM;
goto out_iounmap;
+ }

/* Set host specific parameters */
if ((pdev->device == PCI_DEVICE_ID_3WARE_9650SE) ||
--- a/drivers/scsi/3w-sas.c
+++ b/drivers/scsi/3w-sas.c
@@ -1597,6 +1597,7 @@ static int twl_probe(struct pci_dev *pde

if (twl_initialize_device_extension(tw_dev)) {
TW_PRINTK(tw_dev->host, TW_DRIVER, 0x1a, "Failed to initialize device extension");
+ retval = -ENOMEM;
goto out_free_device_extension;
}

@@ -1611,6 +1612,7 @@ static int twl_probe(struct pci_dev *pde
tw_dev->base_addr = pci_iomap(pdev, 1, 0);
if (!tw_dev->base_addr) {
TW_PRINTK(tw_dev->host, TW_DRIVER, 0x1c, "Failed to ioremap");
+ retval = -ENOMEM;
goto out_release_mem_region;
}

@@ -1620,6 +1622,7 @@ static int twl_probe(struct pci_dev *pde
/* Initialize the card */
if (twl_reset_sequence(tw_dev, 0)) {
TW_PRINTK(tw_dev->host, TW_DRIVER, 0x1d, "Controller reset failed during probe");
+ retval = -ENOMEM;
goto out_iounmap;
}

--- a/drivers/scsi/3w-xxxx.c
+++ b/drivers/scsi/3w-xxxx.c
@@ -2280,6 +2280,7 @@ static int tw_probe(struct pci_dev *pdev

if (tw_initialize_device_extension(tw_dev)) {
printk(KERN_WARNING "3w-xxxx: Failed to initialize device extension.");
+ retval = -ENOMEM;
goto out_free_device_extension;
}

@@ -2294,6 +2295,7 @@ static int tw_probe(struct pci_dev *pdev
tw_dev->base_addr = pci_resource_start(pdev, 0);
if (!tw_dev->base_addr) {
printk(KERN_WARNING "3w-xxxx: Failed to get io address.");
+ retval = -ENOMEM;
goto out_release_mem_region;
}




2018-09-17 23:03:17

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.14 060/126] tools/testing/nvdimm: kaddr and pfn can be NULL to ->direct_access()

4.14-stable review patch. If anyone has any objections, please let me know.

------------------

From: Huaisheng Ye <[email protected]>

[ Upstream commit 45df5d3dc0c7289c1e67afe6d2ba806ad5174314 ]

The mock / test version of pmem_direct_access() needs to check the
validity of pointers kaddr and pfn for NULL assignment. If anyone
equals to NULL, it doesn't need to calculate the value.

If pointer equals to NULL, that is to say callers may have no need for
kaddr or pfn, so this patch is prepared for allowing them to pass in
NULL instead of having to pass in a local pointer or variable that
they then just throw away.

Suggested-by: Dan Williams <[email protected]>
Signed-off-by: Huaisheng Ye <[email protected]>
Reviewed-by: Ross Zwisler <[email protected]>
Signed-off-by: Dave Jiang <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
tools/testing/nvdimm/pmem-dax.c | 12 ++++++++----
1 file changed, 8 insertions(+), 4 deletions(-)

--- a/tools/testing/nvdimm/pmem-dax.c
+++ b/tools/testing/nvdimm/pmem-dax.c
@@ -31,17 +31,21 @@ long __pmem_direct_access(struct pmem_de
if (get_nfit_res(pmem->phys_addr + offset)) {
struct page *page;

- *kaddr = pmem->virt_addr + offset;
+ if (kaddr)
+ *kaddr = pmem->virt_addr + offset;
page = vmalloc_to_page(pmem->virt_addr + offset);
- *pfn = page_to_pfn_t(page);
+ if (pfn)
+ *pfn = page_to_pfn_t(page);
pr_debug_ratelimited("%s: pmem: %p pgoff: %#lx pfn: %#lx\n",
__func__, pmem, pgoff, page_to_pfn(page));

return 1;
}

- *kaddr = pmem->virt_addr + offset;
- *pfn = phys_to_pfn_t(pmem->phys_addr + offset, pmem->pfn_flags);
+ if (kaddr)
+ *kaddr = pmem->virt_addr + offset;
+ if (pfn)
+ *pfn = phys_to_pfn_t(pmem->phys_addr + offset, pmem->pfn_flags);

/*
* If badblocks are present, limit known good range to the



2018-09-17 23:03:18

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.14 056/126] MIPS: Fix ISA virt/bus conversion for non-zero PHYS_OFFSET

4.14-stable review patch. If anyone has any objections, please let me know.

------------------

From: Paul Burton <[email protected]>

[ Upstream commit 0494d7ffdcebc6935410ea0719b24ab626675351 ]

isa_virt_to_bus() & isa_bus_to_virt() claim to treat ISA bus addresses
as being identical to physical addresses, but they fail to do so in the
presence of a non-zero PHYS_OFFSET.

Correct this by having them use virt_to_phys() & phys_to_virt(), which
consolidates the calculations to one place & ensures that ISA bus
addresses do indeed match physical addresses.

Signed-off-by: Paul Burton <[email protected]>
Patchwork: https://patchwork.linux-mips.org/patch/20047/
Cc: James Hogan <[email protected]>
Cc: Ralf Baechle <[email protected]>
Cc: [email protected]
Cc: Vladimir Kondratiev <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
arch/mips/include/asm/io.h | 8 ++++----
1 file changed, 4 insertions(+), 4 deletions(-)

--- a/arch/mips/include/asm/io.h
+++ b/arch/mips/include/asm/io.h
@@ -141,14 +141,14 @@ static inline void * phys_to_virt(unsign
/*
* ISA I/O bus memory addresses are 1:1 with the physical address.
*/
-static inline unsigned long isa_virt_to_bus(volatile void * address)
+static inline unsigned long isa_virt_to_bus(volatile void *address)
{
- return (unsigned long)address - PAGE_OFFSET;
+ return virt_to_phys(address);
}

-static inline void * isa_bus_to_virt(unsigned long address)
+static inline void *isa_bus_to_virt(unsigned long address)
{
- return (void *)(address + PAGE_OFFSET);
+ return phys_to_virt(address);
}

#define isa_page_to_bus page_to_phys



2018-09-17 23:03:18

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.14 037/126] gpu: ipu-v3: default to id 0 on missing OF alias

4.14-stable review patch. If anyone has any objections, please let me know.

------------------

From: Philipp Zabel <[email protected]>

[ Upstream commit 2d87e6c1b99c402360fdfe19ce4f579ab2f96adf ]

This is better than storing -ENODEV in the id number. This fixes SoCs
with only one IPU that don't specify an IPU alias in the device tree.

Signed-off-by: Philipp Zabel <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
drivers/gpu/ipu-v3/ipu-common.c | 2 ++
1 file changed, 2 insertions(+)

--- a/drivers/gpu/ipu-v3/ipu-common.c
+++ b/drivers/gpu/ipu-v3/ipu-common.c
@@ -1401,6 +1401,8 @@ static int ipu_probe(struct platform_dev
return -ENODEV;

ipu->id = of_alias_get_id(np, "ipu");
+ if (ipu->id < 0)
+ ipu->id = 0;

if (of_device_is_compatible(np, "fsl,imx6qp-ipu") &&
IS_ENABLED(CONFIG_DRM)) {



2018-09-17 23:03:19

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.14 038/126] misc: ti-st: Fix memory leak in the error path of probe()

4.14-stable review patch. If anyone has any objections, please let me know.

------------------

From: Anton Vasilyev <[email protected]>

[ Upstream commit 81ae962d7f180c0092859440c82996cccb254976 ]

Free resources instead of direct return of the error code if kim_probe
fails.

Found by Linux Driver Verification project (linuxtesting.org).

Signed-off-by: Anton Vasilyev <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
drivers/misc/ti-st/st_kim.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)

--- a/drivers/misc/ti-st/st_kim.c
+++ b/drivers/misc/ti-st/st_kim.c
@@ -756,14 +756,14 @@ static int kim_probe(struct platform_dev
err = gpio_request(kim_gdata->nshutdown, "kim");
if (unlikely(err)) {
pr_err(" gpio %d request failed ", kim_gdata->nshutdown);
- return err;
+ goto err_sysfs_group;
}

/* Configure nShutdown GPIO as output=0 */
err = gpio_direction_output(kim_gdata->nshutdown, 0);
if (unlikely(err)) {
pr_err(" unable to configure gpio %d", kim_gdata->nshutdown);
- return err;
+ goto err_sysfs_group;
}
/* get reference of pdev for request_firmware
*/



2018-09-17 23:03:23

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.14 061/126] ath10k: disable bundle mgmt tx completion event support

4.14-stable review patch. If anyone has any objections, please let me know.

------------------

From: Surabhi Vishnoi <[email protected]>

[ Upstream commit 673bc519c55843c68c3aecff71a4101e79d28d2b ]

The tx completion of multiple mgmt frames can be bundled
in a single event and sent by the firmware to host, if this
capability is not disabled explicitly by the host. If the host
cannot handle the bundled mgmt tx completion, this capability
support needs to be disabled in the wmi init cmd, sent to the firmware.

Add the host capability indication flag in the wmi ready command,
to let firmware know the features supported by the host driver.
This field is ignored if it is not supported by firmware.

Set the host capability indication flag(i.e. host_capab) to zero,
for disabling the support of bundle mgmt tx completion. This will
indicate the firmware to send completion event for every mgmt tx
completion, instead of bundling them together and sending in a single
event.

Tested HW: WCN3990
Tested FW: WLAN.HL.2.0-01188-QCAHLSWMTPLZ-1

Signed-off-by: Surabhi Vishnoi <[email protected]>
Signed-off-by: Rakesh Pillai <[email protected]>
Signed-off-by: Kalle Valo <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
drivers/net/wireless/ath/ath10k/wmi-tlv.c | 5 +++++
drivers/net/wireless/ath/ath10k/wmi-tlv.h | 5 +++++
2 files changed, 10 insertions(+)

--- a/drivers/net/wireless/ath/ath10k/wmi-tlv.c
+++ b/drivers/net/wireless/ath/ath10k/wmi-tlv.c
@@ -1451,6 +1451,11 @@ static struct sk_buff *ath10k_wmi_tlv_op
cfg->keep_alive_pattern_size = __cpu_to_le32(0);
cfg->max_tdls_concurrent_sleep_sta = __cpu_to_le32(1);
cfg->max_tdls_concurrent_buffer_sta = __cpu_to_le32(1);
+ cfg->wmi_send_separate = __cpu_to_le32(0);
+ cfg->num_ocb_vdevs = __cpu_to_le32(0);
+ cfg->num_ocb_channels = __cpu_to_le32(0);
+ cfg->num_ocb_schedules = __cpu_to_le32(0);
+ cfg->host_capab = __cpu_to_le32(0);

ath10k_wmi_put_host_mem_chunks(ar, chunks);

--- a/drivers/net/wireless/ath/ath10k/wmi-tlv.h
+++ b/drivers/net/wireless/ath/ath10k/wmi-tlv.h
@@ -1228,6 +1228,11 @@ struct wmi_tlv_resource_config {
__le32 keep_alive_pattern_size;
__le32 max_tdls_concurrent_sleep_sta;
__le32 max_tdls_concurrent_buffer_sta;
+ __le32 wmi_send_separate;
+ __le32 num_ocb_vdevs;
+ __le32 num_ocb_channels;
+ __le32 num_ocb_schedules;
+ __le32 host_capab;
} __packed;

struct wmi_tlv_init_cmd {



2018-09-17 23:03:25

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.14 062/126] Bluetooth: hidp: Fix handling of strncpy for hid->name information

4.14-stable review patch. If anyone has any objections, please let me know.

------------------

From: Marcel Holtmann <[email protected]>

[ Upstream commit b3cadaa485f0c20add1644a5c877b0765b285c0c ]

This fixes two issues with setting hid->name information.

CC net/bluetooth/hidp/core.o
In function ‘hidp_setup_hid’,
inlined from ‘hidp_session_dev_init’ at net/bluetooth/hidp/core.c:815:9,
inlined from ‘hidp_session_new’ at net/bluetooth/hidp/core.c:953:8,
inlined from ‘hidp_connection_add’ at net/bluetooth/hidp/core.c:1366:8:
net/bluetooth/hidp/core.c:778:2: warning: ‘strncpy’ output may be truncated copying 127 bytes from a string of length 127 [-Wstringop-truncation]
strncpy(hid->name, req->name, sizeof(req->name) - 1);
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

CC net/bluetooth/hidp/core.o
net/bluetooth/hidp/core.c: In function ‘hidp_setup_hid’:
net/bluetooth/hidp/core.c:778:38: warning: argument to ‘sizeof’ in ‘strncpy’ call is the same expression as the source; did you mean to use the size of the destination? [-Wsizeof-pointer-memaccess]
strncpy(hid->name, req->name, sizeof(req->name));
^

Signed-off-by: Marcel Holtmann <[email protected]>
Signed-off-by: Johan Hedberg <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
net/bluetooth/hidp/core.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

--- a/net/bluetooth/hidp/core.c
+++ b/net/bluetooth/hidp/core.c
@@ -775,7 +775,7 @@ static int hidp_setup_hid(struct hidp_se
hid->version = req->version;
hid->country = req->country;

- strncpy(hid->name, req->name, sizeof(req->name) - 1);
+ strncpy(hid->name, req->name, sizeof(hid->name));

snprintf(hid->phys, sizeof(hid->phys), "%pMR",
&l2cap_pi(session->ctrl_sock->sk)->chan->src);



2018-09-17 23:03:31

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.14 063/126] x86/mm: Remove in_nmi() warning from vmalloc_fault()

4.14-stable review patch. If anyone has any objections, please let me know.

------------------

From: Joerg Roedel <[email protected]>

[ Upstream commit 6863ea0cda8725072522cd78bda332d9a0b73150 ]

It is perfectly okay to take page-faults, especially on the
vmalloc area while executing an NMI handler. Remove the
warning.

Signed-off-by: Joerg Roedel <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Tested-by: David H. Gutteridge <[email protected]>
Cc: "H . Peter Anvin" <[email protected]>
Cc: [email protected]
Cc: Linus Torvalds <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: Josh Poimboeuf <[email protected]>
Cc: Juergen Gross <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Jiri Kosina <[email protected]>
Cc: Boris Ostrovsky <[email protected]>
Cc: Brian Gerst <[email protected]>
Cc: David Laight <[email protected]>
Cc: Denys Vlasenko <[email protected]>
Cc: Eduardo Valentin <[email protected]>
Cc: Greg KH <[email protected]>
Cc: Will Deacon <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: Andrea Arcangeli <[email protected]>
Cc: Waiman Long <[email protected]>
Cc: Pavel Machek <[email protected]>
Cc: Arnaldo Carvalho de Melo <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: [email protected]
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Sasha Levin <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
arch/x86/mm/fault.c | 2 --
1 file changed, 2 deletions(-)

--- a/arch/x86/mm/fault.c
+++ b/arch/x86/mm/fault.c
@@ -317,8 +317,6 @@ static noinline int vmalloc_fault(unsign
if (!(address >= VMALLOC_START && address < VMALLOC_END))
return -1;

- WARN_ON_ONCE(in_nmi());
-
/*
* Synchronize this task's top level page-table
* with the 'reference' page table.



2018-09-17 23:03:33

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.14 041/126] Drivers: hv: vmbus: Cleanup synic memory free path

4.14-stable review patch. If anyone has any objections, please let me know.

------------------

From: Michael Kelley <[email protected]>

[ Upstream commit 572086325ce9a9e348b8748e830653f3959e88b6 ]

clk_evt memory is not being freed when the synic is shutdown
or when there is an allocation error. Add the appropriate
kfree() call, along with a comment to clarify how the memory
gets freed after an allocation error. Make the free path
consistent by removing checks for NULL since kfree() and
free_page() already do the check.

Signed-off-by: Michael Kelley <[email protected]>
Reported-by: Dan Carpenter <[email protected]>
Signed-off-by: K. Y. Srinivasan <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
drivers/hv/hv.c | 14 ++++++++------
1 file changed, 8 insertions(+), 6 deletions(-)

--- a/drivers/hv/hv.c
+++ b/drivers/hv/hv.c
@@ -196,6 +196,10 @@ int hv_synic_alloc(void)

return 0;
err:
+ /*
+ * Any memory allocations that succeeded will be freed when
+ * the caller cleans up by calling hv_synic_free()
+ */
return -ENOMEM;
}

@@ -208,12 +212,10 @@ void hv_synic_free(void)
struct hv_per_cpu_context *hv_cpu
= per_cpu_ptr(hv_context.cpu_context, cpu);

- if (hv_cpu->synic_event_page)
- free_page((unsigned long)hv_cpu->synic_event_page);
- if (hv_cpu->synic_message_page)
- free_page((unsigned long)hv_cpu->synic_message_page);
- if (hv_cpu->post_msg_page)
- free_page((unsigned long)hv_cpu->post_msg_page);
+ kfree(hv_cpu->clk_evt);
+ free_page((unsigned long)hv_cpu->synic_event_page);
+ free_page((unsigned long)hv_cpu->synic_message_page);
+ free_page((unsigned long)hv_cpu->post_msg_page);
}

kfree(hv_context.hv_numa_map);



2018-09-17 23:03:33

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.14 046/126] perf tools: Allow overriding MAX_NR_CPUS at compile time

4.14-stable review patch. If anyone has any objections, please let me know.

------------------

From: Christophe Leroy <[email protected]>

[ Upstream commit 21b8732eb4479b579bda9ee38e62b2c312c2a0e5 ]

After update of kernel, the perf tool doesn't run anymore on my 32MB RAM
powerpc board, but still runs on a 128MB RAM board:

~# strace perf
execve("/usr/sbin/perf", ["perf"], [/* 12 vars */]) = -1 ENOMEM (Cannot allocate memory)
--- SIGSEGV {si_signo=SIGSEGV, si_code=SI_KERNEL, si_addr=0} ---
+++ killed by SIGSEGV +++
Segmentation fault

objdump -x shows that .bss section has a huge size of 24Mbytes:

27 .bss 016baca8 101cebb8 101cebb8 001cd988 2**3

With especially the following objects having quite big size:

10205f80 l O .bss 00140000 runtime_cycles_stats
10345f80 l O .bss 00140000 runtime_stalled_cycles_front_stats
10485f80 l O .bss 00140000 runtime_stalled_cycles_back_stats
105c5f80 l O .bss 00140000 runtime_branches_stats
10705f80 l O .bss 00140000 runtime_cacherefs_stats
10845f80 l O .bss 00140000 runtime_l1_dcache_stats
10985f80 l O .bss 00140000 runtime_l1_icache_stats
10ac5f80 l O .bss 00140000 runtime_ll_cache_stats
10c05f80 l O .bss 00140000 runtime_itlb_cache_stats
10d45f80 l O .bss 00140000 runtime_dtlb_cache_stats
10e85f80 l O .bss 00140000 runtime_cycles_in_tx_stats
10fc5f80 l O .bss 00140000 runtime_transaction_stats
11105f80 l O .bss 00140000 runtime_elision_stats
11245f80 l O .bss 00140000 runtime_topdown_total_slots
11385f80 l O .bss 00140000 runtime_topdown_slots_retired
114c5f80 l O .bss 00140000 runtime_topdown_slots_issued
11605f80 l O .bss 00140000 runtime_topdown_fetch_bubbles
11745f80 l O .bss 00140000 runtime_topdown_recovery_bubbles

This is due to commit 4d255766d28b1 ("perf: Bump max number of cpus
to 1024"), because many tables are sized with MAX_NR_CPUS

This patch gives the opportunity to redefine MAX_NR_CPUS via

$ make EXTRA_CFLAGS=-DMAX_NR_CPUS=1

Signed-off-by: Christophe Leroy <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: [email protected]
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
tools/perf/perf.h | 2 ++
1 file changed, 2 insertions(+)

--- a/tools/perf/perf.h
+++ b/tools/perf/perf.h
@@ -24,7 +24,9 @@ static inline unsigned long long rdclock
return ts.tv_sec * 1000000000ULL + ts.tv_nsec;
}

+#ifndef MAX_NR_CPUS
#define MAX_NR_CPUS 1024
+#endif

extern const char *input_name;
extern bool perf_host, perf_guest;



2018-09-17 23:03:36

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.14 064/126] pinctrl: imx: off by one in imx_pinconf_group_dbg_show()

4.14-stable review patch. If anyone has any objections, please let me know.

------------------

From: Dan Carpenter <[email protected]>

[ Upstream commit b4859f3edb47825f62d1b2efdd75fe7945996f49 ]

The > should really be >= here. It's harmless because
pinctrl_generic_get_group() will return a NULL if group is invalid.

Fixes: ae75ff814538 ("pinctrl: pinctrl-imx: add imx pinctrl core driver")
Reported-by: Dong Aisheng <[email protected]>
Signed-off-by: Dan Carpenter <[email protected]>
Signed-off-by: Linus Walleij <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
drivers/pinctrl/freescale/pinctrl-imx.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

--- a/drivers/pinctrl/freescale/pinctrl-imx.c
+++ b/drivers/pinctrl/freescale/pinctrl-imx.c
@@ -389,7 +389,7 @@ static void imx_pinconf_group_dbg_show(s
const char *name;
int i, ret;

- if (group > pctldev->num_groups)
+ if (group >= pctldev->num_groups)
return;

seq_printf(s, "\n");



2018-09-17 23:03:37

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.14 043/126] f2fs: fix to active page in lru list for read path

4.14-stable review patch. If anyone has any objections, please let me know.

------------------

From: Chao Yu <[email protected]>

[ Upstream commit 82cf4f132e6d16dca6fc3bd955019246141bc645 ]

If config CONFIG_F2FS_FAULT_INJECTION is on, for both read or write path
we will call find_lock_page() to get the page, but for read path, it
missed to passing FGP_ACCESSED to allocator to active the page in LRU
list, result in being reclaimed in advance incorrectly, fix it.

Reported-by: Xianrong Zhou <[email protected]>
Signed-off-by: Chao Yu <[email protected]>
Signed-off-by: Jaegeuk Kim <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
fs/f2fs/f2fs.h | 7 ++++++-
1 file changed, 6 insertions(+), 1 deletion(-)

--- a/fs/f2fs/f2fs.h
+++ b/fs/f2fs/f2fs.h
@@ -1766,8 +1766,13 @@ static inline struct page *f2fs_grab_cac
pgoff_t index, bool for_write)
{
#ifdef CONFIG_F2FS_FAULT_INJECTION
- struct page *page = find_lock_page(mapping, index);
+ struct page *page;

+ if (!for_write)
+ page = find_get_page_flags(mapping, index,
+ FGP_LOCK | FGP_ACCESSED);
+ else
+ page = find_lock_page(mapping, index);
if (page)
return page;




2018-09-17 23:03:37

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.14 065/126] gpio: ml-ioh: Fix buffer underwrite on probe error path

4.14-stable review patch. If anyone has any objections, please let me know.

------------------

From: Anton Vasilyev <[email protected]>

[ Upstream commit 4bf4eed44bfe288f459496eaf38089502ef91a79 ]

If ioh_gpio_probe() fails on devm_irq_alloc_descs() then chip may point
to any element of chip_save array, so reverse iteration from pointer chip
may become chip_save[-1] and gpiochip_remove() will operate with wrong
memory.

The patch fix the error path of ioh_gpio_probe() to correctly bypass
chip_save array.

Found by Linux Driver Verification project (linuxtesting.org).

Signed-off-by: Anton Vasilyev <[email protected]>
Signed-off-by: Linus Walleij <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
drivers/gpio/gpio-ml-ioh.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)

--- a/drivers/gpio/gpio-ml-ioh.c
+++ b/drivers/gpio/gpio-ml-ioh.c
@@ -497,9 +497,10 @@ static int ioh_gpio_probe(struct pci_dev
return 0;

err_gpiochip_add:
+ chip = chip_save;
while (--i >= 0) {
- chip--;
gpiochip_remove(&chip->gpio);
+ chip++;
}
kfree(chip_save);




2018-09-17 23:03:42

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.14 066/126] pinctrl/amd: only handle irq if it is pending and unmasked

4.14-stable review patch. If anyone has any objections, please let me know.

------------------

From: Daniel Kurtz <[email protected]>

[ Upstream commit 8bbed1eef001fdfc0ee9595f64cc4f769d265af4 ]

The AMD pinctrl driver demultiplexes GPIO interrupts and fires off their
individual handlers.

If one of these GPIO irqs is configured as a level interrupt, and its
downstream handler is a threaded ONESHOT interrupt, the GPIO interrupt
source is masked by handle_level_irq() until the eventual return of the
threaded irq handler. During this time the level GPIO interrupt status
will still report as high until the actual gpio source is cleared - both
in the individual GPIO interrupt status bit (INTERRUPT_STS_OFF) and in
its corresponding "WAKE_INT_STATUS_REG" bit.

Thus, if another GPIO interrupt occurs during this time,
amd_gpio_irq_handler() will see that the (masked-and-not-yet-cleared)
level irq is still pending and incorrectly call its handler again.

To fix this, have amd_gpio_irq_handler() check for both interrupts status
and mask before calling generic_handle_irq().

Note: Is it possible that this bug was the source of the interrupt storm
on Ryzen when using chained interrupts before commit ba714a9c1dea85
("pinctrl/amd: Use regular interrupt instead of chained")?

Signed-off-by: Daniel Kurtz <[email protected]>
Acked-by: Thomas Gleixner <[email protected]>
Signed-off-by: Linus Walleij <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
drivers/pinctrl/pinctrl-amd.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)

--- a/drivers/pinctrl/pinctrl-amd.c
+++ b/drivers/pinctrl/pinctrl-amd.c
@@ -530,7 +530,8 @@ static irqreturn_t amd_gpio_irq_handler(
/* Each status bit covers four pins */
for (i = 0; i < 4; i++) {
regval = readl(regs + i);
- if (!(regval & PIN_IRQ_PENDING))
+ if (!(regval & PIN_IRQ_PENDING) ||
+ !(regval & BIT(INTERRUPT_MASK_OFF)))
continue;
irq = irq_find_mapping(gc->irqdomain, irqnr + i);
generic_handle_irq(irq);



2018-09-17 23:03:42

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.14 012/126] KVM: VMX: Do not allow reexecute_instruction() when skipping MMIO instr

4.14-stable review patch. If anyone has any objections, please let me know.

------------------

From: Sean Christopherson <[email protected]>

commit c4409905cd6eb42cfd06126e9226b0150e05a715 upstream.

Re-execution after an emulation decode failure is only intended to
handle a case where two or vCPUs race to write a shadowed page, i.e.
we should never re-execute an instruction as part of MMIO emulation.
As handle_ept_misconfig() is only used for MMIO emulation, it should
pass EMULTYPE_NO_REEXECUTE when using the emulator to skip an instr
in the fast-MMIO case where VM_EXIT_INSTRUCTION_LEN is invalid.

And because the cr2 value passed to x86_emulate_instruction() is only
destined for use when retrying or reexecuting, we can simply call
emulate_instruction().

Fixes: d391f1207067 ("x86/kvm/vmx: do not use vm-exit instruction length
for fast MMIO when running nested")
Cc: Vitaly Kuznetsov <[email protected]>
Signed-off-by: Sean Christopherson <[email protected]>
Cc: [email protected]
Signed-off-by: Radim Krčmář <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
arch/x86/kvm/vmx.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)

--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -6965,8 +6965,8 @@ static int handle_ept_misconfig(struct k
if (!static_cpu_has(X86_FEATURE_HYPERVISOR))
return kvm_skip_emulated_instruction(vcpu);
else
- return x86_emulate_instruction(vcpu, gpa, EMULTYPE_SKIP,
- NULL, 0) == EMULATE_DONE;
+ return emulate_instruction(vcpu, EMULTYPE_SKIP) ==
+ EMULATE_DONE;
}

ret = kvm_mmu_page_fault(vcpu, gpa, PFERR_RSVD_MASK, NULL, 0);



2018-09-17 23:04:01

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.14 070/126] f2fs: fix to skip GC if type in SSA and SIT is inconsistent

4.14-stable review patch. If anyone has any objections, please let me know.

------------------

From: Chao Yu <[email protected]>

[ Upstream commit 10d255c3540239c7920f52d2eb223756e186af56 ]

If segment type in SSA and SIT is inconsistent, we will encounter below
BUG_ON during GC, to avoid this panic, let's just skip doing GC on such
segment.

The bug is triggered with image reported in below link:

https://bugzilla.kernel.org/show_bug.cgi?id=200223

[ 388.060262] ------------[ cut here ]------------
[ 388.060268] kernel BUG at /home/y00370721/git/devf2fs/gc.c:989!
[ 388.061172] invalid opcode: 0000 [#1] SMP
[ 388.061773] Modules linked in: f2fs(O) bluetooth ecdh_generic xt_tcpudp iptable_filter ip_tables x_tables lp ttm drm_kms_helper drm intel_rapl sb_edac crct10dif_pclmul crc32_pclmul ghash_clmulni_intel pcbc aesni_intel fb_sys_fops ppdev aes_x86_64 syscopyarea crypto_simd sysfillrect parport_pc joydev sysimgblt glue_helper parport cryptd i2c_piix4 serio_raw mac_hid btrfs hid_generic usbhid hid raid6_pq psmouse pata_acpi floppy
[ 388.064247] CPU: 7 PID: 4151 Comm: f2fs_gc-7:0 Tainted: G O 4.13.0-rc1+ #26
[ 388.065306] Hardware name: Xen HVM domU, BIOS 4.1.2_115-900.260_ 11/06/2015
[ 388.066058] task: ffff880201583b80 task.stack: ffffc90004d7c000
[ 388.069948] RIP: 0010:do_garbage_collect+0xcc8/0xcd0 [f2fs]
[ 388.070766] RSP: 0018:ffffc90004d7fc68 EFLAGS: 00010202
[ 388.071783] RAX: ffff8801ed227000 RBX: 0000000000000001 RCX: ffffea0007b489c0
[ 388.072700] RDX: ffff880000000000 RSI: 0000000000000001 RDI: ffffea0007b489c0
[ 388.073607] RBP: ffffc90004d7fd58 R08: 0000000000000003 R09: ffffea0007b489dc
[ 388.074619] R10: 0000000000000000 R11: 0052782ab317138d R12: 0000000000000018
[ 388.075625] R13: 0000000000000018 R14: ffff880211ceb000 R15: ffff880211ceb000
[ 388.076687] FS: 0000000000000000(0000) GS:ffff880214fc0000(0000) knlGS:0000000000000000
[ 388.083277] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 388.084536] CR2: 0000000000e18c60 CR3: 00000001ecf2e000 CR4: 00000000001406e0
[ 388.085748] Call Trace:
[ 388.086690] ? find_next_bit+0xb/0x10
[ 388.088091] f2fs_gc+0x1a8/0x9d0 [f2fs]
[ 388.088888] ? lock_timer_base+0x7d/0xa0
[ 388.090213] ? try_to_del_timer_sync+0x44/0x60
[ 388.091698] gc_thread_func+0x342/0x4b0 [f2fs]
[ 388.092892] ? wait_woken+0x80/0x80
[ 388.094098] kthread+0x109/0x140
[ 388.095010] ? f2fs_gc+0x9d0/0x9d0 [f2fs]
[ 388.096043] ? kthread_park+0x60/0x60
[ 388.097281] ret_from_fork+0x25/0x30
[ 388.098401] Code: ff ff 48 83 e8 01 48 89 44 24 58 e9 27 f8 ff ff 48 83 e8 01 e9 78 fc ff ff 48 8d 78 ff e9 17 fb ff ff 48 83 ef 01 e9 4d f4 ff ff <0f> 0b 66 0f 1f 44 00 00 0f 1f 44 00 00 55 48 89 e5 41 56 41 55
[ 388.100864] RIP: do_garbage_collect+0xcc8/0xcd0 [f2fs] RSP: ffffc90004d7fc68
[ 388.101810] ---[ end trace 81c73d6e6b7da61d ]---

Signed-off-by: Chao Yu <[email protected]>
Signed-off-by: Jaegeuk Kim <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
fs/f2fs/gc.c | 8 +++++++-
1 file changed, 7 insertions(+), 1 deletion(-)

--- a/fs/f2fs/gc.c
+++ b/fs/f2fs/gc.c
@@ -958,7 +958,13 @@ static int do_garbage_collect(struct f2f
goto next;

sum = page_address(sum_page);
- f2fs_bug_on(sbi, type != GET_SUM_TYPE((&sum->footer)));
+ if (type != GET_SUM_TYPE((&sum->footer))) {
+ f2fs_msg(sbi->sb, KERN_ERR, "Inconsistent segment (%u) "
+ "type [%d, %d] in SSA and SIT",
+ segno, type, GET_SUM_TYPE((&sum->footer)));
+ set_sbi_flag(sbi, SBI_NEED_FSCK);
+ goto next;
+ }

/*
* this is to avoid deadlock:



2018-09-17 23:04:09

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.14 073/126] f2fs: fix to do sanity check with reserved blkaddr of inline inode

4.14-stable review patch. If anyone has any objections, please let me know.

------------------

From: Chao Yu <[email protected]>

[ Upstream commit 4dbe38dc386910c668c75ae616b99b823b59f3eb ]

As Wen Xu reported in bugzilla, after image was injected with random data
by fuzzing, inline inode would contain invalid reserved blkaddr, then
during inline conversion, we will encounter illegal memory accessing
reported by KASAN, the root cause of this is when writing out converted
inline page, we will use invalid reserved blkaddr to update sit bitmap,
result in accessing memory beyond sit bitmap boundary.

In order to fix this issue, let's do sanity check with reserved block
address of inline inode to avoid above condition.

https://bugzilla.kernel.org/show_bug.cgi?id=200179

[ 1428.846352] BUG: KASAN: use-after-free in update_sit_entry+0x80/0x7f0
[ 1428.846618] Read of size 4 at addr ffff880194483540 by task a.out/2741

[ 1428.846855] CPU: 0 PID: 2741 Comm: a.out Tainted: G W 4.17.0+ #1
[ 1428.846858] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Ubuntu-1.8.2-1ubuntu1 04/01/2014
[ 1428.846860] Call Trace:
[ 1428.846868] dump_stack+0x71/0xab
[ 1428.846875] print_address_description+0x6b/0x290
[ 1428.846881] kasan_report+0x28e/0x390
[ 1428.846888] ? update_sit_entry+0x80/0x7f0
[ 1428.846898] update_sit_entry+0x80/0x7f0
[ 1428.846906] f2fs_allocate_data_block+0x6db/0xc70
[ 1428.846914] ? f2fs_get_node_info+0x14f/0x590
[ 1428.846920] do_write_page+0xc8/0x150
[ 1428.846928] f2fs_outplace_write_data+0xfe/0x210
[ 1428.846935] ? f2fs_do_write_node_page+0x170/0x170
[ 1428.846941] ? radix_tree_tag_clear+0xff/0x130
[ 1428.846946] ? __mod_node_page_state+0x22/0xa0
[ 1428.846951] ? inc_zone_page_state+0x54/0x100
[ 1428.846956] ? __test_set_page_writeback+0x336/0x5d0
[ 1428.846964] f2fs_convert_inline_page+0x407/0x6d0
[ 1428.846971] ? f2fs_read_inline_data+0x3b0/0x3b0
[ 1428.846978] ? __get_node_page+0x335/0x6b0
[ 1428.846987] f2fs_convert_inline_inode+0x41b/0x500
[ 1428.846994] ? f2fs_convert_inline_page+0x6d0/0x6d0
[ 1428.847000] ? kasan_unpoison_shadow+0x31/0x40
[ 1428.847005] ? kasan_kmalloc+0xa6/0xd0
[ 1428.847024] f2fs_file_mmap+0x79/0xc0
[ 1428.847029] mmap_region+0x58b/0x880
[ 1428.847037] ? arch_get_unmapped_area+0x370/0x370
[ 1428.847042] do_mmap+0x55b/0x7a0
[ 1428.847048] vm_mmap_pgoff+0x16f/0x1c0
[ 1428.847055] ? vma_is_stack_for_current+0x50/0x50
[ 1428.847062] ? __fsnotify_update_child_dentry_flags.part.1+0x160/0x160
[ 1428.847068] ? do_sys_open+0x206/0x2a0
[ 1428.847073] ? __fget+0xb4/0x100
[ 1428.847079] ksys_mmap_pgoff+0x278/0x360
[ 1428.847085] ? find_mergeable_anon_vma+0x50/0x50
[ 1428.847091] do_syscall_64+0x73/0x160
[ 1428.847098] entry_SYSCALL_64_after_hwframe+0x44/0xa9
[ 1428.847102] RIP: 0033:0x7fb1430766ba
[ 1428.847103] Code: 89 f5 41 54 49 89 fc 55 53 74 35 49 63 e8 48 63 da 4d 89 f9 49 89 e8 4d 63 d6 48 89 da 4c 89 ee 4c 89 e7 b8 09 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 56 5b 5d 41 5c 41 5d 41 5e 41 5f c3 0f 1f 00
[ 1428.847162] RSP: 002b:00007ffc651d9388 EFLAGS: 00000246 ORIG_RAX: 0000000000000009
[ 1428.847167] RAX: ffffffffffffffda RBX: 0000000000000001 RCX: 00007fb1430766ba
[ 1428.847170] RDX: 0000000000000001 RSI: 0000000000001000 RDI: 0000000000000000
[ 1428.847173] RBP: 0000000000000003 R08: 0000000000000003 R09: 0000000000000000
[ 1428.847176] R10: 0000000000008002 R11: 0000000000000246 R12: 0000000000000000
[ 1428.847179] R13: 0000000000001000 R14: 0000000000008002 R15: 0000000000000000

[ 1428.847252] Allocated by task 2683:
[ 1428.847372] kasan_kmalloc+0xa6/0xd0
[ 1428.847380] kmem_cache_alloc+0xc8/0x1e0
[ 1428.847385] getname_flags+0x73/0x2b0
[ 1428.847390] user_path_at_empty+0x1d/0x40
[ 1428.847395] vfs_statx+0xc1/0x150
[ 1428.847401] __do_sys_newlstat+0x7e/0xd0
[ 1428.847405] do_syscall_64+0x73/0x160
[ 1428.847411] entry_SYSCALL_64_after_hwframe+0x44/0xa9

[ 1428.847466] Freed by task 2683:
[ 1428.847566] __kasan_slab_free+0x137/0x190
[ 1428.847571] kmem_cache_free+0x85/0x1e0
[ 1428.847575] filename_lookup+0x191/0x280
[ 1428.847580] vfs_statx+0xc1/0x150
[ 1428.847585] __do_sys_newlstat+0x7e/0xd0
[ 1428.847590] do_syscall_64+0x73/0x160
[ 1428.847596] entry_SYSCALL_64_after_hwframe+0x44/0xa9

[ 1428.847648] The buggy address belongs to the object at ffff880194483300
which belongs to the cache names_cache of size 4096
[ 1428.847946] The buggy address is located 576 bytes inside of
4096-byte region [ffff880194483300, ffff880194484300)
[ 1428.848234] The buggy address belongs to the page:
[ 1428.848366] page:ffffea0006512000 count:1 mapcount:0 mapping:ffff8801f3586380 index:0x0 compound_mapcount: 0
[ 1428.848606] flags: 0x17fff8000008100(slab|head)
[ 1428.848737] raw: 017fff8000008100 dead000000000100 dead000000000200 ffff8801f3586380
[ 1428.848931] raw: 0000000000000000 0000000000070007 00000001ffffffff 0000000000000000
[ 1428.849122] page dumped because: kasan: bad access detected

[ 1428.849305] Memory state around the buggy address:
[ 1428.849436] ffff880194483400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1428.849620] ffff880194483480: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1428.849804] >ffff880194483500: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1428.849985] ^
[ 1428.850120] ffff880194483580: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1428.850303] ffff880194483600: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1428.850498] ==================================================================

Reported-by: Wen Xu <[email protected]>
Signed-off-by: Chao Yu <[email protected]>
Signed-off-by: Jaegeuk Kim <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
fs/f2fs/inline.c | 21 +++++++++++++++++++++
1 file changed, 21 insertions(+)

--- a/fs/f2fs/inline.c
+++ b/fs/f2fs/inline.c
@@ -128,6 +128,16 @@ int f2fs_convert_inline_page(struct dnod
if (err)
return err;

+ if (unlikely(dn->data_blkaddr != NEW_ADDR)) {
+ f2fs_put_dnode(dn);
+ set_sbi_flag(fio.sbi, SBI_NEED_FSCK);
+ f2fs_msg(fio.sbi->sb, KERN_WARNING,
+ "%s: corrupted inline inode ino=%lx, i_addr[0]:0x%x, "
+ "run fsck to fix.",
+ __func__, dn->inode->i_ino, dn->data_blkaddr);
+ return -EINVAL;
+ }
+
f2fs_bug_on(F2FS_P_SB(page), PageWriteback(page));

read_inline_data(page, dn->inode_page);
@@ -365,6 +375,17 @@ static int f2fs_move_inline_dirents(stru
if (err)
goto out;

+ if (unlikely(dn.data_blkaddr != NEW_ADDR)) {
+ f2fs_put_dnode(&dn);
+ set_sbi_flag(F2FS_P_SB(page), SBI_NEED_FSCK);
+ f2fs_msg(F2FS_P_SB(page)->sb, KERN_WARNING,
+ "%s: corrupted inline inode ino=%lx, i_addr[0]:0x%x, "
+ "run fsck to fix.",
+ __func__, dir->i_ino, dn.data_blkaddr);
+ err = -EINVAL;
+ goto out;
+ }
+
f2fs_wait_on_page_writeback(page, DATA, true);
zero_user_segment(page, MAX_INLINE_DATA(dir), PAGE_SIZE);




2018-09-17 23:04:10

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.14 047/126] NFSv4.0 fix client reference leak in callback

4.14-stable review patch. If anyone has any objections, please let me know.

------------------

From: Olga Kornievskaia <[email protected]>

[ Upstream commit 32cd3ee511f4e07ca25d71163b50e704808d22f4 ]

If there is an error during processing of a callback message, it leads
to refrence leak on the client structure and eventually an unclean
superblock.

Signed-off-by: Olga Kornievskaia <[email protected]>
Signed-off-by: Anna Schumaker <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
fs/nfs/callback_xdr.c | 11 ++++++++---
1 file changed, 8 insertions(+), 3 deletions(-)

--- a/fs/nfs/callback_xdr.c
+++ b/fs/nfs/callback_xdr.c
@@ -904,16 +904,21 @@ static __be32 nfs4_callback_compound(str

if (hdr_arg.minorversion == 0) {
cps.clp = nfs4_find_client_ident(SVC_NET(rqstp), hdr_arg.cb_ident);
- if (!cps.clp || !check_gss_callback_principal(cps.clp, rqstp))
+ if (!cps.clp || !check_gss_callback_principal(cps.clp, rqstp)) {
+ if (cps.clp)
+ nfs_put_client(cps.clp);
goto out_invalidcred;
+ }
}

cps.minorversion = hdr_arg.minorversion;
hdr_res.taglen = hdr_arg.taglen;
hdr_res.tag = hdr_arg.tag;
- if (encode_compound_hdr_res(&xdr_out, &hdr_res) != 0)
+ if (encode_compound_hdr_res(&xdr_out, &hdr_res) != 0) {
+ if (cps.clp)
+ nfs_put_client(cps.clp);
return rpc_system_err;
-
+ }
while (status == 0 && nops != hdr_arg.nops) {
status = process_op(nops, rqstp, &xdr_in,
rqstp->rq_argp, &xdr_out, rqstp->rq_resp,



2018-09-17 23:04:18

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.14 075/126] MIPS: generic: fix missing of_node_put()

4.14-stable review patch. If anyone has any objections, please let me know.

------------------

From: Nicholas Mc Guire <[email protected]>

[ Upstream commit 28ec2238f37e72a3a40a7eb46893e7651bcc40a6 ]

of_find_compatible_node() returns a device_node pointer with refcount
incremented and must be decremented explicitly.
As this code is using the result only to check presence of the interrupt
controller (!NULL) but not actually using the result otherwise the
refcount can be decremented here immediately again.

Signed-off-by: Nicholas Mc Guire <[email protected]>
Signed-off-by: Paul Burton <[email protected]>
Patchwork: https://patchwork.linux-mips.org/patch/19820/
Cc: Ralf Baechle <[email protected]>
Cc: James Hogan <[email protected]>
Cc: [email protected]
Cc: [email protected]
Signed-off-by: Sasha Levin <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
arch/mips/generic/init.c | 1 +
1 file changed, 1 insertion(+)

--- a/arch/mips/generic/init.c
+++ b/arch/mips/generic/init.c
@@ -204,6 +204,7 @@ void __init arch_init_irq(void)
"mti,cpu-interrupt-controller");
if (!cpu_has_veic && !intc_node)
mips_cpu_irq_init();
+ of_node_put(intc_node);

irqchip_init();
}



2018-09-17 23:04:19

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.14 058/126] ata: libahci: Correct setting of DEVSLP register

4.14-stable review patch. If anyone has any objections, please let me know.

------------------

From: Srinivas Pandruvada <[email protected]>

[ Upstream commit 2dbb3ec29a6c069035857a2fc4c24e80e5dfe3cc ]

We have seen that on some platforms, SATA device never show any DEVSLP
residency. This prevent power gating of SATA IP, which prevent system
to transition to low power mode in systems with SLP_S0 aka modern
standby systems. The PHY logic is off only in DEVSLP not in slumber.
Reference:
https://www.intel.com/content/dam/www/public/us/en/documents/datasheets
/332995-skylake-i-o-platform-datasheet-volume-1.pdf
Section 28.7.6.1

Here driver is trying to do read-modify-write the devslp register. But
not resetting the bits for which this driver will modify values (DITO,
MDAT and DETO). So simply reset those bits before updating to new values.

Signed-off-by: Srinivas Pandruvada <[email protected]>
Reviewed-by: Rafael J. Wysocki <[email protected]>
Reviewed-by: Hans de Goede <[email protected]>
Signed-off-by: Tejun Heo <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
drivers/ata/libahci.c | 2 ++
1 file changed, 2 insertions(+)

--- a/drivers/ata/libahci.c
+++ b/drivers/ata/libahci.c
@@ -2153,6 +2153,8 @@ static void ahci_set_aggressive_devslp(s
deto = 20;
}

+ /* Make dito, mdat, deto bits to 0s */
+ devslp &= ~GENMASK_ULL(24, 2);
devslp |= ((dito << PORT_DEVSLP_DITO_OFFSET) |
(mdat << PORT_DEVSLP_MDAT_OFFSET) |
(deto << PORT_DEVSLP_DETO_OFFSET) |



2018-09-17 23:04:21

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.14 077/126] dm cache: only allow a single io_mode cache feature to be requested

4.14-stable review patch. If anyone has any objections, please let me know.

------------------

From: John Pittman <[email protected]>

[ Upstream commit af9313c32c0fa2a0ac3b113669273833d60cc9de ]

More than one io_mode feature can be requested when creating a dm cache
device (as is: last one wins). The io_mode selections are incompatible
with one another, we should force them to be selected exclusively. Add
a counter to check for more than one io_mode selection.

Fixes: 629d0a8a1a10 ("dm cache metadata: add "metadata2" feature")
Signed-off-by: John Pittman <[email protected]>
Signed-off-by: Mike Snitzer <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
drivers/md/dm-cache-target.c | 19 +++++++++++++++----
1 file changed, 15 insertions(+), 4 deletions(-)

--- a/drivers/md/dm-cache-target.c
+++ b/drivers/md/dm-cache-target.c
@@ -2330,7 +2330,7 @@ static int parse_features(struct cache_a
{0, 2, "Invalid number of cache feature arguments"},
};

- int r;
+ int r, mode_ctr = 0;
unsigned argc;
const char *arg;
struct cache_features *cf = &ca->features;
@@ -2344,14 +2344,20 @@ static int parse_features(struct cache_a
while (argc--) {
arg = dm_shift_arg(as);

- if (!strcasecmp(arg, "writeback"))
+ if (!strcasecmp(arg, "writeback")) {
cf->io_mode = CM_IO_WRITEBACK;
+ mode_ctr++;
+ }

- else if (!strcasecmp(arg, "writethrough"))
+ else if (!strcasecmp(arg, "writethrough")) {
cf->io_mode = CM_IO_WRITETHROUGH;
+ mode_ctr++;
+ }

- else if (!strcasecmp(arg, "passthrough"))
+ else if (!strcasecmp(arg, "passthrough")) {
cf->io_mode = CM_IO_PASSTHROUGH;
+ mode_ctr++;
+ }

else if (!strcasecmp(arg, "metadata2"))
cf->metadata_version = 2;
@@ -2362,6 +2368,11 @@ static int parse_features(struct cache_a
}
}

+ if (mode_ctr > 1) {
+ *error = "Duplicate cache io_mode features requested";
+ return -EINVAL;
+ }
+
return 0;
}




2018-09-17 23:04:24

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.14 078/126] Input: atmel_mxt_ts - only use first T9 instance

4.14-stable review patch. If anyone has any objections, please let me know.

------------------

From: Nick Dyer <[email protected]>

[ Upstream commit 36f5d9ef26e52edff046b4b097855db89bf0cd4a ]

The driver only registers one input device, which uses the screen
parameters from the first T9 instance. The first T63 instance also uses
those parameters.

It is incorrect to send input reports from the second instances of these
objects if they are enabled: the input scaling will be wrong and the
positions will be mashed together.

This also causes problems on Android if the number of slots exceeds 32.

In the future, this could be handled by looking for enabled touch object
instances and creating an input device for each one.

Signed-off-by: Nick Dyer <[email protected]>
Acked-by: Benson Leung <[email protected]>
Acked-by: Yufeng Shen <[email protected]>
Signed-off-by: Dmitry Torokhov <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
drivers/input/touchscreen/atmel_mxt_ts.c | 7 ++++---
1 file changed, 4 insertions(+), 3 deletions(-)

--- a/drivers/input/touchscreen/atmel_mxt_ts.c
+++ b/drivers/input/touchscreen/atmel_mxt_ts.c
@@ -1647,10 +1647,11 @@ static int mxt_parse_object_table(struct
break;
case MXT_TOUCH_MULTI_T9:
data->multitouch = MXT_TOUCH_MULTI_T9;
+ /* Only handle messages from first T9 instance */
data->T9_reportid_min = min_id;
- data->T9_reportid_max = max_id;
- data->num_touchids = object->num_report_ids
- * mxt_obj_instances(object);
+ data->T9_reportid_max = min_id +
+ object->num_report_ids - 1;
+ data->num_touchids = object->num_report_ids;
break;
case MXT_SPT_MESSAGECOUNT_T44:
data->T44_address = object->start_address;



2018-09-17 23:04:28

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.14 049/126] perf evlist: Fix error out while applying initial delay and LBR

4.14-stable review patch. If anyone has any objections, please let me know.

------------------

From: Kan Liang <[email protected]>

[ Upstream commit 95035c5e167ae6e740b1ddd30210ae0eaf39a5db ]

'perf record' will error out if both --delay and LBR are applied.

For example:

# perf record -D 1000 -a -e cycles -j any -- sleep 2
Error:
dummy:HG: PMU Hardware doesn't support sampling/overflow-interrupts.
Try 'perf stat'
#

A dummy event is added implicitly for initial delay, which has the same
configurations as real sampling events. The dummy event is a software
event. If LBR is configured, perf must error out.

The dummy event will only be used to track PERF_RECORD_MMAP while perf
waits for the initial delay to enable the real events. The BRANCH_STACK
bit can be safely cleared for the dummy event.

After applying the patch:

# perf record -D 1000 -a -e cycles -j any -- sleep 2
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 1.054 MB perf.data (828 samples) ]
#

Reported-by: Sunil K Pandey <[email protected]>
Signed-off-by: Kan Liang <[email protected]>
Acked-by: Jiri Olsa <[email protected]>
Tested-by: Arnaldo Carvalho de Melo <[email protected]>
Cc: Andi Kleen <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
tools/perf/util/evsel.c | 14 ++++++++++++++
1 file changed, 14 insertions(+)

--- a/tools/perf/util/evsel.c
+++ b/tools/perf/util/evsel.c
@@ -824,6 +824,12 @@ static void apply_config_terms(struct pe
}
}

+static bool is_dummy_event(struct perf_evsel *evsel)
+{
+ return (evsel->attr.type == PERF_TYPE_SOFTWARE) &&
+ (evsel->attr.config == PERF_COUNT_SW_DUMMY);
+}
+
/*
* The enable_on_exec/disabled value strategy:
*
@@ -1054,6 +1060,14 @@ void perf_evsel__config(struct perf_evse
else
perf_evsel__reset_sample_bit(evsel, PERIOD);
}
+
+ /*
+ * For initial_delay, a dummy event is added implicitly.
+ * The software event will trigger -EOPNOTSUPP error out,
+ * if BRANCH_STACK bit is set.
+ */
+ if (opts->initial_delay && is_dummy_event(evsel))
+ perf_evsel__reset_sample_bit(evsel, BRANCH_STACK);
}

static int perf_evsel__alloc_fd(struct perf_evsel *evsel, int ncpus, int nthreads)



2018-09-17 23:04:33

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.14 051/126] ath9k: report tx status on EOSP

4.14-stable review patch. If anyone has any objections, please let me know.

------------------

From: Felix Fietkau <[email protected]>

[ Upstream commit 36e14a787dd0b459760de3622e9709edb745a6af ]

Fixes missed indications of end of U-APSD service period to mac80211

Signed-off-by: Felix Fietkau <[email protected]>
Signed-off-by: Kalle Valo <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
drivers/net/wireless/ath/ath9k/xmit.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)

--- a/drivers/net/wireless/ath/ath9k/xmit.c
+++ b/drivers/net/wireless/ath/ath9k/xmit.c
@@ -86,7 +86,8 @@ static void ath_tx_status(struct ieee802
struct ieee80211_tx_info *info = IEEE80211_SKB_CB(skb);
struct ieee80211_sta *sta = info->status.status_driver_data[0];

- if (info->flags & IEEE80211_TX_CTL_REQ_TX_STATUS) {
+ if (info->flags & (IEEE80211_TX_CTL_REQ_TX_STATUS |
+ IEEE80211_TX_STATUS_EOSP)) {
ieee80211_tx_status(hw, skb);
return;
}



2018-09-17 23:04:44

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.14 044/126] f2fs: do not set free of current section

4.14-stable review patch. If anyone has any objections, please let me know.

------------------

From: Yunlong Song <[email protected]>

[ Upstream commit 3611ce9911267cb93d364bd71ddea6821278d11f ]

For the case when sbi->segs_per_sec > 1, take section:segment = 5 for
example, if segment 1 is just used and allocate new segment 2, and the
blocks of segment 1 is invalidated, at this time, the previous code will
use __set_test_and_free to free the free_secmap and free_sections++,
this is not correct since it is still a current section, so fix it.

Signed-off-by: Yunlong Song <[email protected]>
Reviewed-by: Chao Yu <[email protected]>
Signed-off-by: Jaegeuk Kim <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
fs/f2fs/segment.h | 3 +++
1 file changed, 3 insertions(+)

--- a/fs/f2fs/segment.h
+++ b/fs/f2fs/segment.h
@@ -414,6 +414,8 @@ static inline void __set_test_and_free(s
if (test_and_clear_bit(segno, free_i->free_segmap)) {
free_i->free_segments++;

+ if (IS_CURSEC(sbi, secno))
+ goto skip_free;
next = find_next_bit(free_i->free_segmap,
start_segno + sbi->segs_per_sec, start_segno);
if (next >= start_segno + sbi->segs_per_sec) {
@@ -421,6 +423,7 @@ static inline void __set_test_and_free(s
free_i->free_sections++;
}
}
+skip_free:
spin_unlock(&free_i->segmap_lock);
}




2018-09-17 23:04:50

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.14 090/126] RDMA/cma: Do not ignore net namespace for unbound cm_id

4.14-stable review patch. If anyone has any objections, please let me know.

------------------

From: Parav Pandit <[email protected]>

[ Upstream commit 643d213a9a034fa04f5575a40dfc8548e33ce04f ]

Currently if the cm_id is not bound to any netdevice, than for such cm_id,
net namespace is ignored; which is incorrect.

Regardless of cm_id bound to a netdevice or not, net namespace must
match. When a cm_id is bound to a netdevice, in such case net namespace
and netdevice both must match.

Fixes: 4c21b5bcef73 ("IB/cma: Add net_dev and private data checks to RDMA CM")
Signed-off-by: Parav Pandit <[email protected]>
Reviewed-by: Daniel Jurgens <[email protected]>
Signed-off-by: Leon Romanovsky <[email protected]>
Signed-off-by: Jason Gunthorpe <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
drivers/infiniband/core/cma.c | 13 ++++++++++---
1 file changed, 10 insertions(+), 3 deletions(-)

--- a/drivers/infiniband/core/cma.c
+++ b/drivers/infiniband/core/cma.c
@@ -1459,9 +1459,16 @@ static bool cma_match_net_dev(const stru
(addr->src_addr.ss_family == AF_IB ||
cma_protocol_roce_dev_port(id->device, port_num));

- return !addr->dev_addr.bound_dev_if ||
- (net_eq(dev_net(net_dev), addr->dev_addr.net) &&
- addr->dev_addr.bound_dev_if == net_dev->ifindex);
+ /*
+ * Net namespaces must match, and if the listner is listening
+ * on a specific netdevice than netdevice must match as well.
+ */
+ if (net_eq(dev_net(net_dev), addr->dev_addr.net) &&
+ (!!addr->dev_addr.bound_dev_if ==
+ (addr->dev_addr.bound_dev_if == net_dev->ifindex)))
+ return true;
+ else
+ return false;
}

static struct rdma_id_private *cma_find_listener(



2018-09-17 23:04:53

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.14 052/126] ath9k_hw: fix channel maximum power level test

4.14-stable review patch. If anyone has any objections, please let me know.

------------------

From: Felix Fietkau <[email protected]>

[ Upstream commit 461d8a6bb9879b0e619752d040292e67aa06f1d2 ]

The tx power applied by set_txpower is limited by the CTL (conformance
test limit) entries in the EEPROM. These can change based on the user
configured regulatory domain.
Depending on the EEPROM data this can cause the tx power to become too
limited, if the original regdomain CTLs impose lower limits than the CTLs
of the user configured regdomain.

To fix this issue, set the initial channel limits without any CTL
restrictions and only apply the CTL at run time when setting the channel
and the real tx power.

Signed-off-by: Felix Fietkau <[email protected]>
Signed-off-by: Kalle Valo <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
drivers/net/wireless/ath/ath9k/hw.c | 7 +++++--
1 file changed, 5 insertions(+), 2 deletions(-)

--- a/drivers/net/wireless/ath/ath9k/hw.c
+++ b/drivers/net/wireless/ath/ath9k/hw.c
@@ -2915,16 +2915,19 @@ void ath9k_hw_apply_txpower(struct ath_h
struct ath_regulatory *reg = ath9k_hw_regulatory(ah);
struct ieee80211_channel *channel;
int chan_pwr, new_pwr;
+ u16 ctl = NO_CTL;

if (!chan)
return;

+ if (!test)
+ ctl = ath9k_regd_get_ctl(reg, chan);
+
channel = chan->chan;
chan_pwr = min_t(int, channel->max_power * 2, MAX_RATE_POWER);
new_pwr = min_t(int, chan_pwr, reg->power_limit);

- ah->eep_ops->set_txpower(ah, chan,
- ath9k_regd_get_ctl(reg, chan),
+ ah->eep_ops->set_txpower(ah, chan, ctl,
get_antenna_gain(ah, chan), new_pwr, test);
}




2018-09-17 23:04:56

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.14 092/126] inet: frags: change inet_frags_init_net() return value

4.14-stable review patch. If anyone has any objections, please let me know.

------------------

From: Eric Dumazet <[email protected]>

We will soon initialize one rhashtable per struct netns_frags
in inet_frags_init_net().

This patch changes the return value to eventually propagate an
error.

Signed-off-by: Eric Dumazet <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
(cherry picked from commit 787bea7748a76130566f881c2342a0be4127d182)
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
include/net/inet_frag.h | 3 ++-
net/ieee802154/6lowpan/reassembly.c | 11 ++++++++---
net/ipv4/ip_fragment.c | 12 +++++++++---
net/ipv6/netfilter/nf_conntrack_reasm.c | 12 +++++++++---
net/ipv6/reassembly.c | 11 +++++++++--
5 files changed, 37 insertions(+), 12 deletions(-)

--- a/include/net/inet_frag.h
+++ b/include/net/inet_frag.h
@@ -104,9 +104,10 @@ struct inet_frags {
int inet_frags_init(struct inet_frags *);
void inet_frags_fini(struct inet_frags *);

-static inline void inet_frags_init_net(struct netns_frags *nf)
+static inline int inet_frags_init_net(struct netns_frags *nf)
{
atomic_set(&nf->mem, 0);
+ return 0;
}
void inet_frags_exit_net(struct netns_frags *nf, struct inet_frags *f);

--- a/net/ieee802154/6lowpan/reassembly.c
+++ b/net/ieee802154/6lowpan/reassembly.c
@@ -580,14 +580,19 @@ static int __net_init lowpan_frags_init_
{
struct netns_ieee802154_lowpan *ieee802154_lowpan =
net_ieee802154_lowpan(net);
+ int res;

ieee802154_lowpan->frags.high_thresh = IPV6_FRAG_HIGH_THRESH;
ieee802154_lowpan->frags.low_thresh = IPV6_FRAG_LOW_THRESH;
ieee802154_lowpan->frags.timeout = IPV6_FRAG_TIMEOUT;

- inet_frags_init_net(&ieee802154_lowpan->frags);
-
- return lowpan_frags_ns_sysctl_register(net);
+ res = inet_frags_init_net(&ieee802154_lowpan->frags);
+ if (res < 0)
+ return res;
+ res = lowpan_frags_ns_sysctl_register(net);
+ if (res < 0)
+ inet_frags_exit_net(&ieee802154_lowpan->frags, &lowpan_frags);
+ return res;
}

static void __net_exit lowpan_frags_exit_net(struct net *net)
--- a/net/ipv4/ip_fragment.c
+++ b/net/ipv4/ip_fragment.c
@@ -850,6 +850,8 @@ static void __init ip4_frags_ctl_registe

static int __net_init ipv4_frags_init_net(struct net *net)
{
+ int res;
+
/* Fragment cache limits.
*
* The fragment memory accounting code, (tries to) account for
@@ -875,9 +877,13 @@ static int __net_init ipv4_frags_init_ne

net->ipv4.frags.max_dist = 64;

- inet_frags_init_net(&net->ipv4.frags);
-
- return ip4_frags_ns_ctl_register(net);
+ res = inet_frags_init_net(&net->ipv4.frags);
+ if (res < 0)
+ return res;
+ res = ip4_frags_ns_ctl_register(net);
+ if (res < 0)
+ inet_frags_exit_net(&net->ipv4.frags, &ip4_frags);
+ return res;
}

static void __net_exit ipv4_frags_exit_net(struct net *net)
--- a/net/ipv6/netfilter/nf_conntrack_reasm.c
+++ b/net/ipv6/netfilter/nf_conntrack_reasm.c
@@ -630,12 +630,18 @@ EXPORT_SYMBOL_GPL(nf_ct_frag6_gather);

static int nf_ct_net_init(struct net *net)
{
+ int res;
+
net->nf_frag.frags.high_thresh = IPV6_FRAG_HIGH_THRESH;
net->nf_frag.frags.low_thresh = IPV6_FRAG_LOW_THRESH;
net->nf_frag.frags.timeout = IPV6_FRAG_TIMEOUT;
- inet_frags_init_net(&net->nf_frag.frags);
-
- return nf_ct_frag6_sysctl_register(net);
+ res = inet_frags_init_net(&net->nf_frag.frags);
+ if (res < 0)
+ return res;
+ res = nf_ct_frag6_sysctl_register(net);
+ if (res < 0)
+ inet_frags_exit_net(&net->nf_frag.frags, &nf_frags);
+ return res;
}

static void nf_ct_net_exit(struct net *net)
--- a/net/ipv6/reassembly.c
+++ b/net/ipv6/reassembly.c
@@ -714,13 +714,20 @@ static void ip6_frags_sysctl_unregister(

static int __net_init ipv6_frags_init_net(struct net *net)
{
+ int res;
+
net->ipv6.frags.high_thresh = IPV6_FRAG_HIGH_THRESH;
net->ipv6.frags.low_thresh = IPV6_FRAG_LOW_THRESH;
net->ipv6.frags.timeout = IPV6_FRAG_TIMEOUT;

- inet_frags_init_net(&net->ipv6.frags);
+ res = inet_frags_init_net(&net->ipv6.frags);
+ if (res < 0)
+ return res;

- return ip6_frags_ns_sysctl_register(net);
+ res = ip6_frags_ns_sysctl_register(net);
+ if (res < 0)
+ inet_frags_exit_net(&net->ipv6.frags, &ip6_frags);
+ return res;
}

static void __net_exit ipv6_frags_exit_net(struct net *net)



2018-09-17 23:05:01

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.14 093/126] inet: frags: add a pointer to struct netns_frags

4.14-stable review patch. If anyone has any objections, please let me know.

------------------

From: Eric Dumazet <[email protected]>

In order to simplify the API, add a pointer to struct inet_frags.
This will allow us to make things less complex.

These functions no longer have a struct inet_frags parameter :

inet_frag_destroy(struct inet_frag_queue *q /*, struct inet_frags *f */)
inet_frag_put(struct inet_frag_queue *q /*, struct inet_frags *f */)
inet_frag_kill(struct inet_frag_queue *q /*, struct inet_frags *f */)
inet_frags_exit_net(struct netns_frags *nf /*, struct inet_frags *f */)
ip6_expire_frag_queue(struct net *net, struct frag_queue *fq)

Signed-off-by: Eric Dumazet <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
(cherry picked from commit 093ba72914b696521e4885756a68a3332782c8de)
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
include/net/inet_frag.h | 11 ++++++-----
include/net/ipv6.h | 3 +--
net/ieee802154/6lowpan/reassembly.c | 13 +++++++------
net/ipv4/inet_fragment.c | 17 ++++++++++-------
net/ipv4/ip_fragment.c | 9 +++++----
net/ipv6/netfilter/nf_conntrack_reasm.c | 16 +++++++++-------
net/ipv6/reassembly.c | 20 ++++++++++----------
7 files changed, 48 insertions(+), 41 deletions(-)

--- a/include/net/inet_frag.h
+++ b/include/net/inet_frag.h
@@ -10,6 +10,7 @@ struct netns_frags {
int high_thresh;
int low_thresh;
int max_dist;
+ struct inet_frags *f;
};

/**
@@ -109,20 +110,20 @@ static inline int inet_frags_init_net(st
atomic_set(&nf->mem, 0);
return 0;
}
-void inet_frags_exit_net(struct netns_frags *nf, struct inet_frags *f);
+void inet_frags_exit_net(struct netns_frags *nf);

-void inet_frag_kill(struct inet_frag_queue *q, struct inet_frags *f);
-void inet_frag_destroy(struct inet_frag_queue *q, struct inet_frags *f);
+void inet_frag_kill(struct inet_frag_queue *q);
+void inet_frag_destroy(struct inet_frag_queue *q);
struct inet_frag_queue *inet_frag_find(struct netns_frags *nf,
struct inet_frags *f, void *key, unsigned int hash);

void inet_frag_maybe_warn_overflow(struct inet_frag_queue *q,
const char *prefix);

-static inline void inet_frag_put(struct inet_frag_queue *q, struct inet_frags *f)
+static inline void inet_frag_put(struct inet_frag_queue *q)
{
if (refcount_dec_and_test(&q->refcnt))
- inet_frag_destroy(q, f);
+ inet_frag_destroy(q);
}

static inline bool inet_frag_evicting(struct inet_frag_queue *q)
--- a/include/net/ipv6.h
+++ b/include/net/ipv6.h
@@ -560,8 +560,7 @@ struct frag_queue {
u8 ecn;
};

-void ip6_expire_frag_queue(struct net *net, struct frag_queue *fq,
- struct inet_frags *frags);
+void ip6_expire_frag_queue(struct net *net, struct frag_queue *fq);

static inline bool ipv6_addr_any(const struct in6_addr *a)
{
--- a/net/ieee802154/6lowpan/reassembly.c
+++ b/net/ieee802154/6lowpan/reassembly.c
@@ -93,10 +93,10 @@ static void lowpan_frag_expire(unsigned
if (fq->q.flags & INET_FRAG_COMPLETE)
goto out;

- inet_frag_kill(&fq->q, &lowpan_frags);
+ inet_frag_kill(&fq->q);
out:
spin_unlock(&fq->q.lock);
- inet_frag_put(&fq->q, &lowpan_frags);
+ inet_frag_put(&fq->q);
}

static inline struct lowpan_frag_queue *
@@ -229,7 +229,7 @@ static int lowpan_frag_reasm(struct lowp
struct sk_buff *fp, *head = fq->q.fragments;
int sum_truesize;

- inet_frag_kill(&fq->q, &lowpan_frags);
+ inet_frag_kill(&fq->q);

/* Make the one we just received the head. */
if (prev) {
@@ -437,7 +437,7 @@ int lowpan_frag_rcv(struct sk_buff *skb,
ret = lowpan_frag_queue(fq, skb, frag_type);
spin_unlock(&fq->q.lock);

- inet_frag_put(&fq->q, &lowpan_frags);
+ inet_frag_put(&fq->q);
return ret;
}

@@ -585,13 +585,14 @@ static int __net_init lowpan_frags_init_
ieee802154_lowpan->frags.high_thresh = IPV6_FRAG_HIGH_THRESH;
ieee802154_lowpan->frags.low_thresh = IPV6_FRAG_LOW_THRESH;
ieee802154_lowpan->frags.timeout = IPV6_FRAG_TIMEOUT;
+ ieee802154_lowpan->frags.f = &lowpan_frags;

res = inet_frags_init_net(&ieee802154_lowpan->frags);
if (res < 0)
return res;
res = lowpan_frags_ns_sysctl_register(net);
if (res < 0)
- inet_frags_exit_net(&ieee802154_lowpan->frags, &lowpan_frags);
+ inet_frags_exit_net(&ieee802154_lowpan->frags);
return res;
}

@@ -601,7 +602,7 @@ static void __net_exit lowpan_frags_exit
net_ieee802154_lowpan(net);

lowpan_frags_ns_sysctl_unregister(net);
- inet_frags_exit_net(&ieee802154_lowpan->frags, &lowpan_frags);
+ inet_frags_exit_net(&ieee802154_lowpan->frags);
}

static struct pernet_operations lowpan_frags_ops = {
--- a/net/ipv4/inet_fragment.c
+++ b/net/ipv4/inet_fragment.c
@@ -219,8 +219,9 @@ void inet_frags_fini(struct inet_frags *
}
EXPORT_SYMBOL(inet_frags_fini);

-void inet_frags_exit_net(struct netns_frags *nf, struct inet_frags *f)
+void inet_frags_exit_net(struct netns_frags *nf)
{
+ struct inet_frags *f =nf->f;
unsigned int seq;
int i;

@@ -264,33 +265,34 @@ __acquires(hb->chain_lock)
return hb;
}

-static inline void fq_unlink(struct inet_frag_queue *fq, struct inet_frags *f)
+static inline void fq_unlink(struct inet_frag_queue *fq)
{
struct inet_frag_bucket *hb;

- hb = get_frag_bucket_locked(fq, f);
+ hb = get_frag_bucket_locked(fq, fq->net->f);
hlist_del(&fq->list);
fq->flags |= INET_FRAG_COMPLETE;
spin_unlock(&hb->chain_lock);
}

-void inet_frag_kill(struct inet_frag_queue *fq, struct inet_frags *f)
+void inet_frag_kill(struct inet_frag_queue *fq)
{
if (del_timer(&fq->timer))
refcount_dec(&fq->refcnt);

if (!(fq->flags & INET_FRAG_COMPLETE)) {
- fq_unlink(fq, f);
+ fq_unlink(fq);
refcount_dec(&fq->refcnt);
}
}
EXPORT_SYMBOL(inet_frag_kill);

-void inet_frag_destroy(struct inet_frag_queue *q, struct inet_frags *f)
+void inet_frag_destroy(struct inet_frag_queue *q)
{
struct sk_buff *fp;
struct netns_frags *nf;
unsigned int sum, sum_truesize = 0;
+ struct inet_frags *f;

WARN_ON(!(q->flags & INET_FRAG_COMPLETE));
WARN_ON(del_timer(&q->timer) != 0);
@@ -298,6 +300,7 @@ void inet_frag_destroy(struct inet_frag_
/* Release all fragment data. */
fp = q->fragments;
nf = q->net;
+ f = nf->f;
while (fp) {
struct sk_buff *xp = fp->next;

@@ -333,7 +336,7 @@ static struct inet_frag_queue *inet_frag
refcount_inc(&qp->refcnt);
spin_unlock(&hb->chain_lock);
qp_in->flags |= INET_FRAG_COMPLETE;
- inet_frag_put(qp_in, f);
+ inet_frag_put(qp_in);
return qp;
}
}
--- a/net/ipv4/ip_fragment.c
+++ b/net/ipv4/ip_fragment.c
@@ -168,7 +168,7 @@ static void ip4_frag_free(struct inet_fr

static void ipq_put(struct ipq *ipq)
{
- inet_frag_put(&ipq->q, &ip4_frags);
+ inet_frag_put(&ipq->q);
}

/* Kill ipq entry. It is not destroyed immediately,
@@ -176,7 +176,7 @@ static void ipq_put(struct ipq *ipq)
*/
static void ipq_kill(struct ipq *ipq)
{
- inet_frag_kill(&ipq->q, &ip4_frags);
+ inet_frag_kill(&ipq->q);
}

static bool frag_expire_skip_icmp(u32 user)
@@ -876,20 +876,21 @@ static int __net_init ipv4_frags_init_ne
net->ipv4.frags.timeout = IP_FRAG_TIME;

net->ipv4.frags.max_dist = 64;
+ net->ipv4.frags.f = &ip4_frags;

res = inet_frags_init_net(&net->ipv4.frags);
if (res < 0)
return res;
res = ip4_frags_ns_ctl_register(net);
if (res < 0)
- inet_frags_exit_net(&net->ipv4.frags, &ip4_frags);
+ inet_frags_exit_net(&net->ipv4.frags);
return res;
}

static void __net_exit ipv4_frags_exit_net(struct net *net)
{
ip4_frags_ns_ctl_unregister(net);
- inet_frags_exit_net(&net->ipv4.frags, &ip4_frags);
+ inet_frags_exit_net(&net->ipv4.frags);
}

static struct pernet_operations ip4_frags_ops = {
--- a/net/ipv6/netfilter/nf_conntrack_reasm.c
+++ b/net/ipv6/netfilter/nf_conntrack_reasm.c
@@ -177,7 +177,7 @@ static void nf_ct_frag6_expire(unsigned
fq = container_of((struct inet_frag_queue *)data, struct frag_queue, q);
net = container_of(fq->q.net, struct net, nf_frag.frags);

- ip6_expire_frag_queue(net, fq, &nf_frags);
+ ip6_expire_frag_queue(net, fq);
}

/* Creation primitives. */
@@ -263,7 +263,7 @@ static int nf_ct_frag6_queue(struct frag
* this case. -DaveM
*/
pr_debug("end of fragment not rounded to 8 bytes.\n");
- inet_frag_kill(&fq->q, &nf_frags);
+ inet_frag_kill(&fq->q);
return -EPROTO;
}
if (end > fq->q.len) {
@@ -356,7 +356,7 @@ found:
return 0;

discard_fq:
- inet_frag_kill(&fq->q, &nf_frags);
+ inet_frag_kill(&fq->q);
err:
return -EINVAL;
}
@@ -378,7 +378,7 @@ nf_ct_frag6_reasm(struct frag_queue *fq,
int payload_len;
u8 ecn;

- inet_frag_kill(&fq->q, &nf_frags);
+ inet_frag_kill(&fq->q);

WARN_ON(head == NULL);
WARN_ON(NFCT_FRAG6_CB(head)->offset != 0);
@@ -623,7 +623,7 @@ int nf_ct_frag6_gather(struct net *net,

out_unlock:
spin_unlock_bh(&fq->q.lock);
- inet_frag_put(&fq->q, &nf_frags);
+ inet_frag_put(&fq->q);
return ret;
}
EXPORT_SYMBOL_GPL(nf_ct_frag6_gather);
@@ -635,19 +635,21 @@ static int nf_ct_net_init(struct net *ne
net->nf_frag.frags.high_thresh = IPV6_FRAG_HIGH_THRESH;
net->nf_frag.frags.low_thresh = IPV6_FRAG_LOW_THRESH;
net->nf_frag.frags.timeout = IPV6_FRAG_TIMEOUT;
+ net->nf_frag.frags.f = &nf_frags;
+
res = inet_frags_init_net(&net->nf_frag.frags);
if (res < 0)
return res;
res = nf_ct_frag6_sysctl_register(net);
if (res < 0)
- inet_frags_exit_net(&net->nf_frag.frags, &nf_frags);
+ inet_frags_exit_net(&net->nf_frag.frags);
return res;
}

static void nf_ct_net_exit(struct net *net)
{
nf_ct_frags6_sysctl_unregister(net);
- inet_frags_exit_net(&net->nf_frag.frags, &nf_frags);
+ inet_frags_exit_net(&net->nf_frag.frags);
}

static struct pernet_operations nf_ct_net_ops = {
--- a/net/ipv6/reassembly.c
+++ b/net/ipv6/reassembly.c
@@ -128,8 +128,7 @@ void ip6_frag_init(struct inet_frag_queu
}
EXPORT_SYMBOL(ip6_frag_init);

-void ip6_expire_frag_queue(struct net *net, struct frag_queue *fq,
- struct inet_frags *frags)
+void ip6_expire_frag_queue(struct net *net, struct frag_queue *fq)
{
struct net_device *dev = NULL;

@@ -138,7 +137,7 @@ void ip6_expire_frag_queue(struct net *n
if (fq->q.flags & INET_FRAG_COMPLETE)
goto out;

- inet_frag_kill(&fq->q, frags);
+ inet_frag_kill(&fq->q);

rcu_read_lock();
dev = dev_get_by_index_rcu(net, fq->iif);
@@ -166,7 +165,7 @@ out_rcu_unlock:
rcu_read_unlock();
out:
spin_unlock(&fq->q.lock);
- inet_frag_put(&fq->q, frags);
+ inet_frag_put(&fq->q);
}
EXPORT_SYMBOL(ip6_expire_frag_queue);

@@ -178,7 +177,7 @@ static void ip6_frag_expire(unsigned lon
fq = container_of((struct inet_frag_queue *)data, struct frag_queue, q);
net = container_of(fq->q.net, struct net, ipv6.frags);

- ip6_expire_frag_queue(net, fq, &ip6_frags);
+ ip6_expire_frag_queue(net, fq);
}

static struct frag_queue *
@@ -363,7 +362,7 @@ found:
return -1;

discard_fq:
- inet_frag_kill(&fq->q, &ip6_frags);
+ inet_frag_kill(&fq->q);
err:
__IP6_INC_STATS(net, ip6_dst_idev(skb_dst(skb)),
IPSTATS_MIB_REASMFAILS);
@@ -390,7 +389,7 @@ static int ip6_frag_reasm(struct frag_qu
int sum_truesize;
u8 ecn;

- inet_frag_kill(&fq->q, &ip6_frags);
+ inet_frag_kill(&fq->q);

ecn = ip_frag_ecn_table[fq->ecn];
if (unlikely(ecn == 0xff))
@@ -568,7 +567,7 @@ static int ipv6_frag_rcv(struct sk_buff
ret = ip6_frag_queue(fq, skb, fhdr, IP6CB(skb)->nhoff);

spin_unlock(&fq->q.lock);
- inet_frag_put(&fq->q, &ip6_frags);
+ inet_frag_put(&fq->q);
return ret;
}

@@ -719,6 +718,7 @@ static int __net_init ipv6_frags_init_ne
net->ipv6.frags.high_thresh = IPV6_FRAG_HIGH_THRESH;
net->ipv6.frags.low_thresh = IPV6_FRAG_LOW_THRESH;
net->ipv6.frags.timeout = IPV6_FRAG_TIMEOUT;
+ net->ipv6.frags.f = &ip6_frags;

res = inet_frags_init_net(&net->ipv6.frags);
if (res < 0)
@@ -726,14 +726,14 @@ static int __net_init ipv6_frags_init_ne

res = ip6_frags_ns_sysctl_register(net);
if (res < 0)
- inet_frags_exit_net(&net->ipv6.frags, &ip6_frags);
+ inet_frags_exit_net(&net->ipv6.frags);
return res;
}

static void __net_exit ipv6_frags_exit_net(struct net *net)
{
ip6_frags_ns_sysctl_unregister(net);
- inet_frags_exit_net(&net->ipv6.frags, &ip6_frags);
+ inet_frags_exit_net(&net->ipv6.frags);
}

static struct pernet_operations ip6_frags_ops = {



2018-09-17 23:05:03

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.14 094/126] inet: frags: refactor ipfrag_init()

4.14-stable review patch. If anyone has any objections, please let me know.

------------------

From: Eric Dumazet <[email protected]>

We need to call inet_frags_init() before register_pernet_subsys(),
as a prereq for following patch ("inet: frags: use rhashtables for reassembly units")

Signed-off-by: Eric Dumazet <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
(cherry picked from commit 483a6e4fa055123142d8956866fe2aa9c98d546d)
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
net/ipv4/ip_fragment.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)

--- a/net/ipv4/ip_fragment.c
+++ b/net/ipv4/ip_fragment.c
@@ -900,8 +900,6 @@ static struct pernet_operations ip4_frag

void __init ipfrag_init(void)
{
- ip4_frags_ctl_register();
- register_pernet_subsys(&ip4_frags_ops);
ip4_frags.hashfn = ip4_hashfn;
ip4_frags.constructor = ip4_frag_init;
ip4_frags.destructor = ip4_frag_free;
@@ -911,4 +909,6 @@ void __init ipfrag_init(void)
ip4_frags.frags_cache_name = ip_frag_cache_name;
if (inet_frags_init(&ip4_frags))
panic("IP: failed to allocate ip4_frags cache\n");
+ ip4_frags_ctl_register();
+ register_pernet_subsys(&ip4_frags_ops);
}



2018-09-17 23:05:07

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.14 068/126] f2fs: try grabbing node page lock aggressively in sync scenario

4.14-stable review patch. If anyone has any objections, please let me know.

------------------

From: Chao Yu <[email protected]>

[ Upstream commit 4b270a8cc5047682f0a3f3f9af3b498408dbd2bc ]

In synchronous scenario, like in checkpoint(), we are going to flush
dirty node pages to device synchronously, we can easily failed
writebacking node page due to trylock_page() failure, especially in
condition of intensive lock competition, which can cause long latency
of checkpoint(). So let's use lock_page() in synchronous scenario to
avoid this issue.

Signed-off-by: Yunlei He <[email protected]>
Signed-off-by: Chao Yu <[email protected]>
Signed-off-by: Jaegeuk Kim <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
fs/f2fs/node.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)

--- a/fs/f2fs/node.c
+++ b/fs/f2fs/node.c
@@ -1610,7 +1610,9 @@ next_step:
!is_cold_node(page)))
continue;
lock_node:
- if (!trylock_page(page))
+ if (wbc->sync_mode == WB_SYNC_ALL)
+ lock_page(page);
+ else if (!trylock_page(page))
continue;

if (unlikely(page->mapping != NODE_MAPPING(sbi))) {



2018-09-17 23:05:10

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.14 069/126] pktcdvd: Fix possible Spectre-v1 for pkt_devs

4.14-stable review patch. If anyone has any objections, please let me know.

------------------

From: Jinbum Park <[email protected]>

[ Upstream commit 55690c07b44a82cc3359ce0c233f4ba7d80ba145 ]

User controls @dev_minor which to be used as index of pkt_devs.
So, It can be exploited via Spectre-like attack. (speculative execution)

This kind of attack leaks address of pkt_devs, [1]
It leads an attacker to bypass security mechanism such as KASLR.

So sanitize @dev_minor before using it to prevent attack.

[1] https://github.com/jinb-park/linux-exploit/
tree/master/exploit-remaining-spectre-gadget/leak_pkt_devs.c

Signed-off-by: Jinbum Park <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
drivers/block/pktcdvd.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)

--- a/drivers/block/pktcdvd.c
+++ b/drivers/block/pktcdvd.c
@@ -67,7 +67,7 @@
#include <scsi/scsi.h>
#include <linux/debugfs.h>
#include <linux/device.h>
-
+#include <linux/nospec.h>
#include <linux/uaccess.h>

#define DRIVER_NAME "pktcdvd"
@@ -2231,6 +2231,8 @@ static struct pktcdvd_device *pkt_find_d
{
if (dev_minor >= MAX_WRITERS)
return NULL;
+
+ dev_minor = array_index_nospec(dev_minor, MAX_WRITERS);
return pkt_devs[dev_minor];
}




2018-09-17 23:05:14

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.14 071/126] tpm_tis_spi: Pass the SPI IRQ down to the driver

4.14-stable review patch. If anyone has any objections, please let me know.

------------------

From: Linus Walleij <[email protected]>

[ Upstream commit 1a339b658d9dbe1471f67b78237cf8fa08bbbeb5 ]

An SPI TPM device managed directly on an embedded board using
the SPI bus and some GPIO or similar line as IRQ handler will
pass the IRQn from the TPM device associated with the SPI
device. This is already handled by the SPI core, so make sure
to pass this down to the core as well.

(The TPM core habit of using -1 to signal no IRQ is dubious
(as IRQ 0 is NO_IRQ) but I do not want to mess with that
semantic in this patch.)

Cc: Mark Brown <[email protected]>
Signed-off-by: Linus Walleij <[email protected]>
Reviewed-by: Jarkko Sakkinen <[email protected]>
Tested-by: Jarkko Sakkinen <[email protected]>
Signed-off-by: Jarkko Sakkinen <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
drivers/char/tpm/tpm_tis_spi.c | 9 ++++++++-
1 file changed, 8 insertions(+), 1 deletion(-)

--- a/drivers/char/tpm/tpm_tis_spi.c
+++ b/drivers/char/tpm/tpm_tis_spi.c
@@ -188,6 +188,7 @@ static const struct tpm_tis_phy_ops tpm_
static int tpm_tis_spi_probe(struct spi_device *dev)
{
struct tpm_tis_spi_phy *phy;
+ int irq;

phy = devm_kzalloc(&dev->dev, sizeof(struct tpm_tis_spi_phy),
GFP_KERNEL);
@@ -200,7 +201,13 @@ static int tpm_tis_spi_probe(struct spi_
if (!phy->iobuf)
return -ENOMEM;

- return tpm_tis_core_init(&dev->dev, &phy->priv, -1, &tpm_spi_phy_ops,
+ /* If the SPI device has an IRQ then use that */
+ if (dev->irq > 0)
+ irq = dev->irq;
+ else
+ irq = -1;
+
+ return tpm_tis_core_init(&dev->dev, &phy->priv, irq, &tpm_spi_phy_ops,
NULL);
}




2018-09-17 23:05:18

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.14 081/126] partitions/aix: fix usage of uninitialized lv_info and lvname structures

4.14-stable review patch. If anyone has any objections, please let me know.

------------------

From: Mauricio Faria de Oliveira <[email protected]>

[ Upstream commit 14cb2c8a6c5dae57ee3e2da10fa3db2b9087e39e ]

The if-block that sets a successful return value in aix_partition()
uses 'lvip[].pps_per_lv' and 'n[].name' potentially uninitialized.

For example, if 'numlvs' is zero or alloc_lvn() fails, neither is
initialized, but are used anyway if alloc_pvd() succeeds after it.

So, make the alloc_pvd() call conditional on their initialization.

This has been hit when attaching an apparently corrupted/stressed
AIX LUN, misleading the kernel to pr_warn() invalid data and hang.

[...] partition (null) (11 pp's found) is not contiguous
[...] partition (null) (2 pp's found) is not contiguous
[...] partition (null) (3 pp's found) is not contiguous
[...] partition (null) (64 pp's found) is not contiguous

Fixes: 6ceea22bbbc8 ("partitions: add aix lvm partition support files")
Signed-off-by: Mauricio Faria de Oliveira <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
block/partitions/aix.c | 5 +++--
1 file changed, 3 insertions(+), 2 deletions(-)

--- a/block/partitions/aix.c
+++ b/block/partitions/aix.c
@@ -178,7 +178,7 @@ int aix_partition(struct parsed_partitio
u32 vgda_sector = 0;
u32 vgda_len = 0;
int numlvs = 0;
- struct pvd *pvd;
+ struct pvd *pvd = NULL;
struct lv_info {
unsigned short pps_per_lv;
unsigned short pps_found;
@@ -232,10 +232,11 @@ int aix_partition(struct parsed_partitio
if (lvip[i].pps_per_lv)
foundlvs += 1;
}
+ /* pvd loops depend on n[].name and lvip[].pps_per_lv */
+ pvd = alloc_pvd(state, vgda_sector + 17);
}
put_dev_sector(sect);
}
- pvd = alloc_pvd(state, vgda_sector + 17);
if (pvd) {
int numpps = be16_to_cpu(pvd->pp_count);
int psn_part1 = be32_to_cpu(pvd->psn_part1);



2018-09-17 23:05:20

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.14 072/126] tpm/tpm_i2c_infineon: switch to i2c_lock_bus(..., I2C_LOCK_SEGMENT)

4.14-stable review patch. If anyone has any objections, please let me know.

------------------

From: Peter Rosin <[email protected]>

[ Upstream commit bb853aac2c478ce78116128263801189408ad2a8 ]

Locking the root adapter for __i2c_transfer will deadlock if the
device sits behind a mux-locked I2C mux. Switch to the finer-grained
i2c_lock_bus with the I2C_LOCK_SEGMENT flag. If the device does not
sit behind a mux-locked mux, the two locking variants are equivalent.

Signed-off-by: Peter Rosin <[email protected]>
Reviewed-by: Jarkko Sakkinen <[email protected]>
Tested-by: Alexander Steffen <[email protected]>
Signed-off-by: Wolfram Sang <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
drivers/char/tpm/tpm_i2c_infineon.c | 8 ++++----
1 file changed, 4 insertions(+), 4 deletions(-)

--- a/drivers/char/tpm/tpm_i2c_infineon.c
+++ b/drivers/char/tpm/tpm_i2c_infineon.c
@@ -117,7 +117,7 @@ static int iic_tpm_read(u8 addr, u8 *buf
/* Lock the adapter for the duration of the whole sequence. */
if (!tpm_dev.client->adapter->algo->master_xfer)
return -EOPNOTSUPP;
- i2c_lock_adapter(tpm_dev.client->adapter);
+ i2c_lock_bus(tpm_dev.client->adapter, I2C_LOCK_SEGMENT);

if (tpm_dev.chip_type == SLB9645) {
/* use a combined read for newer chips
@@ -192,7 +192,7 @@ static int iic_tpm_read(u8 addr, u8 *buf
}

out:
- i2c_unlock_adapter(tpm_dev.client->adapter);
+ i2c_unlock_bus(tpm_dev.client->adapter, I2C_LOCK_SEGMENT);
/* take care of 'guard time' */
usleep_range(SLEEP_DURATION_LOW, SLEEP_DURATION_HI);

@@ -224,7 +224,7 @@ static int iic_tpm_write_generic(u8 addr

if (!tpm_dev.client->adapter->algo->master_xfer)
return -EOPNOTSUPP;
- i2c_lock_adapter(tpm_dev.client->adapter);
+ i2c_lock_bus(tpm_dev.client->adapter, I2C_LOCK_SEGMENT);

/* prepend the 'register address' to the buffer */
tpm_dev.buf[0] = addr;
@@ -243,7 +243,7 @@ static int iic_tpm_write_generic(u8 addr
usleep_range(sleep_low, sleep_hi);
}

- i2c_unlock_adapter(tpm_dev.client->adapter);
+ i2c_unlock_bus(tpm_dev.client->adapter, I2C_LOCK_SEGMENT);
/* take care of 'guard time' */
usleep_range(SLEEP_DURATION_LOW, SLEEP_DURATION_HI);




2018-09-17 23:05:31

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.14 076/126] net: dcb: For wild-card lookups, use priority -1, not 0

4.14-stable review patch. If anyone has any objections, please let me know.

------------------

From: Petr Machata <[email protected]>

[ Upstream commit 08193d1a893c802c4b807e4d522865061f4e9f4f ]

The function dcb_app_lookup walks the list of specified DCB APP entries,
looking for one that matches a given criteria: ifindex, selector,
protocol ID and optionally also priority. The "don't care" value for
priority is set to 0, because that priority has not been allowed under
CEE regime, which predates the IEEE standardization.

Under IEEE, 0 is a valid priority number. But because dcb_app_lookup
considers zero a wild card, attempts to add an APP entry with priority 0
fail when other entries exist for a given ifindex / selector / PID
triplet.

Fix by changing the wild-card value to -1.

Signed-off-by: Petr Machata <[email protected]>
Signed-off-by: Ido Schimmel <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
net/dcb/dcbnl.c | 11 +++++++----
1 file changed, 7 insertions(+), 4 deletions(-)

--- a/net/dcb/dcbnl.c
+++ b/net/dcb/dcbnl.c
@@ -1765,7 +1765,7 @@ static struct dcb_app_type *dcb_app_look
if (itr->app.selector == app->selector &&
itr->app.protocol == app->protocol &&
itr->ifindex == ifindex &&
- (!prio || itr->app.priority == prio))
+ ((prio == -1) || itr->app.priority == prio))
return itr;
}

@@ -1800,7 +1800,8 @@ u8 dcb_getapp(struct net_device *dev, st
u8 prio = 0;

spin_lock_bh(&dcb_lock);
- if ((itr = dcb_app_lookup(app, dev->ifindex, 0)))
+ itr = dcb_app_lookup(app, dev->ifindex, -1);
+ if (itr)
prio = itr->app.priority;
spin_unlock_bh(&dcb_lock);

@@ -1828,7 +1829,8 @@ int dcb_setapp(struct net_device *dev, s

spin_lock_bh(&dcb_lock);
/* Search for existing match and replace */
- if ((itr = dcb_app_lookup(new, dev->ifindex, 0))) {
+ itr = dcb_app_lookup(new, dev->ifindex, -1);
+ if (itr) {
if (new->priority)
itr->app.priority = new->priority;
else {
@@ -1861,7 +1863,8 @@ u8 dcb_ieee_getapp_mask(struct net_devic
u8 prio = 0;

spin_lock_bh(&dcb_lock);
- if ((itr = dcb_app_lookup(app, dev->ifindex, 0)))
+ itr = dcb_app_lookup(app, dev->ifindex, -1);
+ if (itr)
prio |= 1 << itr->app.priority;
spin_unlock_bh(&dcb_lock);




2018-09-17 23:05:33

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.14 102/126] inet: frags: get rif of inet_frag_evicting()

4.14-stable review patch. If anyone has any objections, please let me know.

------------------

From: Eric Dumazet <[email protected]>

This refactors ip_expire() since one indentation level is removed.

Note: in the future, we should try hard to avoid the skb_clone()
since this is a serious performance cost.
Under DDOS, the ICMP message wont be sent because of rate limits.

Fact that ip6_expire_frag_queue() does not use skb_clone() is
disturbing too. Presumably IPv6 should have the same
issue than the one we fixed in commit ec4fbd64751d
("inet: frag: release spinlock before calling icmp_send()")

Signed-off-by: Eric Dumazet <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
(cherry picked from commit 399d1404be660d355192ff4df5ccc3f4159ec1e4)
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
include/net/inet_frag.h | 5 ---
net/ipv4/ip_fragment.c | 65 +++++++++++++++++++++++-------------------------
net/ipv6/reassembly.c | 4 --
3 files changed, 32 insertions(+), 42 deletions(-)

--- a/include/net/inet_frag.h
+++ b/include/net/inet_frag.h
@@ -119,11 +119,6 @@ static inline void inet_frag_put(struct
inet_frag_destroy(q);
}

-static inline bool inet_frag_evicting(struct inet_frag_queue *q)
-{
- return false;
-}
-
/* Memory Tracking Functions. */

static inline int frag_mem_limit(struct netns_frags *nf)
--- a/net/ipv4/ip_fragment.c
+++ b/net/ipv4/ip_fragment.c
@@ -143,8 +143,11 @@ static bool frag_expire_skip_icmp(u32 us
static void ip_expire(struct timer_list *t)
{
struct inet_frag_queue *frag = from_timer(frag, t, timer);
- struct ipq *qp;
+ struct sk_buff *clone, *head;
+ const struct iphdr *iph;
struct net *net;
+ struct ipq *qp;
+ int err;

qp = container_of(frag, struct ipq, q);
net = container_of(qp->q.net, struct net, ipv4.frags);
@@ -158,45 +161,41 @@ static void ip_expire(struct timer_list
ipq_kill(qp);
__IP_INC_STATS(net, IPSTATS_MIB_REASMFAILS);

- if (!inet_frag_evicting(&qp->q)) {
- struct sk_buff *clone, *head = qp->q.fragments;
- const struct iphdr *iph;
- int err;
+ head = qp->q.fragments;

- __IP_INC_STATS(net, IPSTATS_MIB_REASMTIMEOUT);
+ __IP_INC_STATS(net, IPSTATS_MIB_REASMTIMEOUT);

- if (!(qp->q.flags & INET_FRAG_FIRST_IN) || !qp->q.fragments)
- goto out;
+ if (!(qp->q.flags & INET_FRAG_FIRST_IN) || !head)
+ goto out;

- head->dev = dev_get_by_index_rcu(net, qp->iif);
- if (!head->dev)
- goto out;
+ head->dev = dev_get_by_index_rcu(net, qp->iif);
+ if (!head->dev)
+ goto out;


- /* skb has no dst, perform route lookup again */
- iph = ip_hdr(head);
- err = ip_route_input_noref(head, iph->daddr, iph->saddr,
+ /* skb has no dst, perform route lookup again */
+ iph = ip_hdr(head);
+ err = ip_route_input_noref(head, iph->daddr, iph->saddr,
iph->tos, head->dev);
- if (err)
- goto out;
+ if (err)
+ goto out;
+
+ /* Only an end host needs to send an ICMP
+ * "Fragment Reassembly Timeout" message, per RFC792.
+ */
+ if (frag_expire_skip_icmp(qp->q.key.v4.user) &&
+ (skb_rtable(head)->rt_type != RTN_LOCAL))
+ goto out;
+
+ clone = skb_clone(head, GFP_ATOMIC);

- /* Only an end host needs to send an ICMP
- * "Fragment Reassembly Timeout" message, per RFC792.
- */
- if (frag_expire_skip_icmp(qp->q.key.v4.user) &&
- (skb_rtable(head)->rt_type != RTN_LOCAL))
- goto out;
-
- clone = skb_clone(head, GFP_ATOMIC);
-
- /* Send an ICMP "Fragment Reassembly Timeout" message. */
- if (clone) {
- spin_unlock(&qp->q.lock);
- icmp_send(clone, ICMP_TIME_EXCEEDED,
- ICMP_EXC_FRAGTIME, 0);
- consume_skb(clone);
- goto out_rcu_unlock;
- }
+ /* Send an ICMP "Fragment Reassembly Timeout" message. */
+ if (clone) {
+ spin_unlock(&qp->q.lock);
+ icmp_send(clone, ICMP_TIME_EXCEEDED,
+ ICMP_EXC_FRAGTIME, 0);
+ consume_skb(clone);
+ goto out_rcu_unlock;
}
out:
spin_unlock(&qp->q.lock);
--- a/net/ipv6/reassembly.c
+++ b/net/ipv6/reassembly.c
@@ -106,10 +106,6 @@ void ip6_expire_frag_queue(struct net *n
goto out_rcu_unlock;

__IP6_INC_STATS(net, __in6_dev_get(dev), IPSTATS_MIB_REASMFAILS);
-
- if (inet_frag_evicting(&fq->q))
- goto out_rcu_unlock;
-
__IP6_INC_STATS(net, __in6_dev_get(dev), IPSTATS_MIB_REASMTIMEOUT);

/* Don't send error if the first segment did not arrive. */



2018-09-17 23:05:39

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.14 048/126] perf c2c report: Fix crash for empty browser

4.14-stable review patch. If anyone has any objections, please let me know.

------------------

From: Jiri Olsa <[email protected]>

[ Upstream commit 73978332572ccf5e364c31e9a70ba953f8202b46 ]

'perf c2c' scans read/write accesses and tries to find false sharing
cases, so when the events it wants were not asked for or ended up not
taking place, we get no histograms.

So do not try to display entry details if there's not any. Currently
this ends up in crash:

$ perf c2c report # then press 'd'
perf: Segmentation fault
$

Committer testing:

Before:

Record a perf.data file without events of interest to 'perf c2c report',
then call it and press 'd':

# perf record sleep 1
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.001 MB perf.data (6 samples) ]
# perf c2c report
perf: Segmentation fault
-------- backtrace --------
perf[0x5b1d2a]
/lib64/libc.so.6(+0x346df)[0x7fcb566e36df]
perf[0x46fcae]
perf[0x4a9f1e]
perf[0x4aa220]
perf(main+0x301)[0x42c561]
/lib64/libc.so.6(__libc_start_main+0xe9)[0x7fcb566cff29]
perf(_start+0x29)[0x42c999]
#

After the patch the segfault doesn't take place, a follow up patch to
tell the user why nothing changes when 'd' is pressed would be good.

Reported-by: [email protected]
Signed-off-by: Jiri Olsa <[email protected]>
Tested-by: Arnaldo Carvalho de Melo <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: David Ahern <[email protected]>
Cc: Don Zickus <[email protected]>
Cc: Joe Mario <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Fixes: f1c5fd4d0bb9 ("perf c2c report: Add TUI cacheline browser")
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
tools/perf/builtin-c2c.c | 3 +++
1 file changed, 3 insertions(+)

--- a/tools/perf/builtin-c2c.c
+++ b/tools/perf/builtin-c2c.c
@@ -2229,6 +2229,9 @@ static int perf_c2c__browse_cacheline(st
" s Togle full lenght of symbol and source line columns \n"
" q Return back to cacheline list \n";

+ if (!he)
+ return 0;
+
/* Display compact version first. */
c2c.symbol_full = false;




2018-09-17 23:05:39

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.14 104/126] inet: frags: break the 2GB limit for frags storage

4.14-stable review patch. If anyone has any objections, please let me know.

------------------

From: Eric Dumazet <[email protected]>

Some users are willing to provision huge amounts of memory to be able
to perform reassembly reasonnably well under pressure.

Current memory tracking is using one atomic_t and integers.

Switch to atomic_long_t so that 64bit arches can use more than 2GB,
without any cost for 32bit arches.

Note that this patch avoids an overflow error, if high_thresh was set
to ~2GB, since this test in inet_frag_alloc() was never true :

if (... || frag_mem_limit(nf) > nf->high_thresh)

Tested:

$ echo 16000000000 >/proc/sys/net/ipv4/ipfrag_high_thresh

<frag DDOS>

$ grep FRAG /proc/net/sockstat
FRAG: inuse 14705885 memory 16000002880

$ nstat -n ; sleep 1 ; nstat | grep Reas
IpReasmReqds 3317150 0.0
IpReasmFails 3317112 0.0

Signed-off-by: Eric Dumazet <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
(cherry picked from commit 3e67f106f619dcfaf6f4e2039599bdb69848c714)
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
Documentation/networking/ip-sysctl.txt | 4 ++--
include/net/inet_frag.h | 20 ++++++++++----------
net/ieee802154/6lowpan/reassembly.c | 10 +++++-----
net/ipv4/ip_fragment.c | 10 +++++-----
net/ipv4/proc.c | 2 +-
net/ipv6/netfilter/nf_conntrack_reasm.c | 10 +++++-----
net/ipv6/proc.c | 2 +-
net/ipv6/reassembly.c | 6 +++---
8 files changed, 32 insertions(+), 32 deletions(-)

--- a/Documentation/networking/ip-sysctl.txt
+++ b/Documentation/networking/ip-sysctl.txt
@@ -133,10 +133,10 @@ min_adv_mss - INTEGER

IP Fragmentation:

-ipfrag_high_thresh - INTEGER
+ipfrag_high_thresh - LONG INTEGER
Maximum memory used to reassemble IP fragments.

-ipfrag_low_thresh - INTEGER
+ipfrag_low_thresh - LONG INTEGER
(Obsolete since linux-4.17)
Maximum memory used to reassemble IP fragments before the kernel
begins to remove incomplete fragment queues to free up resources.
--- a/include/net/inet_frag.h
+++ b/include/net/inet_frag.h
@@ -8,11 +8,11 @@ struct netns_frags {
struct rhashtable rhashtable ____cacheline_aligned_in_smp;

/* Keep atomic mem on separate cachelines in structs that include it */
- atomic_t mem ____cacheline_aligned_in_smp;
+ atomic_long_t mem ____cacheline_aligned_in_smp;
/* sysctls */
+ long high_thresh;
+ long low_thresh;
int timeout;
- int high_thresh;
- int low_thresh;
int max_dist;
struct inet_frags *f;
};
@@ -102,7 +102,7 @@ void inet_frags_fini(struct inet_frags *

static inline int inet_frags_init_net(struct netns_frags *nf)
{
- atomic_set(&nf->mem, 0);
+ atomic_long_set(&nf->mem, 0);
return rhashtable_init(&nf->rhashtable, &nf->f->rhash_params);
}
void inet_frags_exit_net(struct netns_frags *nf);
@@ -119,19 +119,19 @@ static inline void inet_frag_put(struct

/* Memory Tracking Functions. */

-static inline int frag_mem_limit(struct netns_frags *nf)
+static inline long frag_mem_limit(const struct netns_frags *nf)
{
- return atomic_read(&nf->mem);
+ return atomic_long_read(&nf->mem);
}

-static inline void sub_frag_mem_limit(struct netns_frags *nf, int i)
+static inline void sub_frag_mem_limit(struct netns_frags *nf, long val)
{
- atomic_sub(i, &nf->mem);
+ atomic_long_sub(val, &nf->mem);
}

-static inline void add_frag_mem_limit(struct netns_frags *nf, int i)
+static inline void add_frag_mem_limit(struct netns_frags *nf, long val)
{
- atomic_add(i, &nf->mem);
+ atomic_long_add(val, &nf->mem);
}

/* RFC 3168 support :
--- a/net/ieee802154/6lowpan/reassembly.c
+++ b/net/ieee802154/6lowpan/reassembly.c
@@ -411,23 +411,23 @@ err:
}

#ifdef CONFIG_SYSCTL
-static int zero;
+static long zero;

static struct ctl_table lowpan_frags_ns_ctl_table[] = {
{
.procname = "6lowpanfrag_high_thresh",
.data = &init_net.ieee802154_lowpan.frags.high_thresh,
- .maxlen = sizeof(int),
+ .maxlen = sizeof(unsigned long),
.mode = 0644,
- .proc_handler = proc_dointvec_minmax,
+ .proc_handler = proc_doulongvec_minmax,
.extra1 = &init_net.ieee802154_lowpan.frags.low_thresh
},
{
.procname = "6lowpanfrag_low_thresh",
.data = &init_net.ieee802154_lowpan.frags.low_thresh,
- .maxlen = sizeof(int),
+ .maxlen = sizeof(unsigned long),
.mode = 0644,
- .proc_handler = proc_dointvec_minmax,
+ .proc_handler = proc_doulongvec_minmax,
.extra1 = &zero,
.extra2 = &init_net.ieee802154_lowpan.frags.high_thresh
},
--- a/net/ipv4/ip_fragment.c
+++ b/net/ipv4/ip_fragment.c
@@ -683,23 +683,23 @@ struct sk_buff *ip_check_defrag(struct n
EXPORT_SYMBOL(ip_check_defrag);

#ifdef CONFIG_SYSCTL
-static int zero;
+static long zero;

static struct ctl_table ip4_frags_ns_ctl_table[] = {
{
.procname = "ipfrag_high_thresh",
.data = &init_net.ipv4.frags.high_thresh,
- .maxlen = sizeof(int),
+ .maxlen = sizeof(unsigned long),
.mode = 0644,
- .proc_handler = proc_dointvec_minmax,
+ .proc_handler = proc_doulongvec_minmax,
.extra1 = &init_net.ipv4.frags.low_thresh
},
{
.procname = "ipfrag_low_thresh",
.data = &init_net.ipv4.frags.low_thresh,
- .maxlen = sizeof(int),
+ .maxlen = sizeof(unsigned long),
.mode = 0644,
- .proc_handler = proc_dointvec_minmax,
+ .proc_handler = proc_doulongvec_minmax,
.extra1 = &zero,
.extra2 = &init_net.ipv4.frags.high_thresh
},
--- a/net/ipv4/proc.c
+++ b/net/ipv4/proc.c
@@ -71,7 +71,7 @@ static int sockstat_seq_show(struct seq_
sock_prot_inuse_get(net, &udplite_prot));
seq_printf(seq, "RAW: inuse %d\n",
sock_prot_inuse_get(net, &raw_prot));
- seq_printf(seq, "FRAG: inuse %u memory %u\n",
+ seq_printf(seq, "FRAG: inuse %u memory %lu\n",
atomic_read(&net->ipv4.frags.rhashtable.nelems),
frag_mem_limit(&net->ipv4.frags));
return 0;
--- a/net/ipv6/netfilter/nf_conntrack_reasm.c
+++ b/net/ipv6/netfilter/nf_conntrack_reasm.c
@@ -63,7 +63,7 @@ struct nf_ct_frag6_skb_cb
static struct inet_frags nf_frags;

#ifdef CONFIG_SYSCTL
-static int zero;
+static long zero;

static struct ctl_table nf_ct_frag6_sysctl_table[] = {
{
@@ -76,18 +76,18 @@ static struct ctl_table nf_ct_frag6_sysc
{
.procname = "nf_conntrack_frag6_low_thresh",
.data = &init_net.nf_frag.frags.low_thresh,
- .maxlen = sizeof(unsigned int),
+ .maxlen = sizeof(unsigned long),
.mode = 0644,
- .proc_handler = proc_dointvec_minmax,
+ .proc_handler = proc_doulongvec_minmax,
.extra1 = &zero,
.extra2 = &init_net.nf_frag.frags.high_thresh
},
{
.procname = "nf_conntrack_frag6_high_thresh",
.data = &init_net.nf_frag.frags.high_thresh,
- .maxlen = sizeof(unsigned int),
+ .maxlen = sizeof(unsigned long),
.mode = 0644,
- .proc_handler = proc_dointvec_minmax,
+ .proc_handler = proc_doulongvec_minmax,
.extra1 = &init_net.nf_frag.frags.low_thresh
},
{ }
--- a/net/ipv6/proc.c
+++ b/net/ipv6/proc.c
@@ -47,7 +47,7 @@ static int sockstat6_seq_show(struct seq
sock_prot_inuse_get(net, &udplitev6_prot));
seq_printf(seq, "RAW6: inuse %d\n",
sock_prot_inuse_get(net, &rawv6_prot));
- seq_printf(seq, "FRAG6: inuse %u memory %u\n",
+ seq_printf(seq, "FRAG6: inuse %u memory %lu\n",
atomic_read(&net->ipv6.frags.rhashtable.nelems),
frag_mem_limit(&net->ipv6.frags));
return 0;
--- a/net/ipv6/reassembly.c
+++ b/net/ipv6/reassembly.c
@@ -552,15 +552,15 @@ static struct ctl_table ip6_frags_ns_ctl
{
.procname = "ip6frag_high_thresh",
.data = &init_net.ipv6.frags.high_thresh,
- .maxlen = sizeof(int),
+ .maxlen = sizeof(unsigned long),
.mode = 0644,
- .proc_handler = proc_dointvec_minmax,
+ .proc_handler = proc_doulongvec_minmax,
.extra1 = &init_net.ipv6.frags.low_thresh
},
{
.procname = "ip6frag_low_thresh",
.data = &init_net.ipv6.frags.low_thresh,
- .maxlen = sizeof(int),
+ .maxlen = sizeof(unsigned long),
.mode = 0644,
.proc_handler = proc_dointvec_minmax,
.extra1 = &zero,



2018-09-17 23:05:46

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.14 050/126] macintosh/via-pmu: Add missing mmio accessors

4.14-stable review patch. If anyone has any objections, please let me know.

------------------

From: Finn Thain <[email protected]>

[ Upstream commit 576d5290d678a651b9f36050fc1717e0573aca13 ]

Add missing in_8() accessors to init_pmu() and pmu_sr_intr().

This fixes several sparse warnings:
drivers/macintosh/via-pmu.c:536:29: warning: dereference of noderef expression
drivers/macintosh/via-pmu.c:537:33: warning: dereference of noderef expression
drivers/macintosh/via-pmu.c:1455:17: warning: dereference of noderef expression
drivers/macintosh/via-pmu.c:1456:69: warning: dereference of noderef expression

Tested-by: Stan Johnson <[email protected]>
Signed-off-by: Finn Thain <[email protected]>
Reviewed-by: Geert Uytterhoeven <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
drivers/macintosh/via-pmu.c | 9 +++++----
1 file changed, 5 insertions(+), 4 deletions(-)

--- a/drivers/macintosh/via-pmu.c
+++ b/drivers/macintosh/via-pmu.c
@@ -532,8 +532,9 @@ init_pmu(void)
int timeout;
struct adb_request req;

- out_8(&via[B], via[B] | TREQ); /* negate TREQ */
- out_8(&via[DIRB], (via[DIRB] | TREQ) & ~TACK); /* TACK in, TREQ out */
+ /* Negate TREQ. Set TACK to input and TREQ to output. */
+ out_8(&via[B], in_8(&via[B]) | TREQ);
+ out_8(&via[DIRB], (in_8(&via[DIRB]) | TREQ) & ~TACK);

pmu_request(&req, NULL, 2, PMU_SET_INTR_MASK, pmu_intr_mask);
timeout = 100000;
@@ -1455,8 +1456,8 @@ pmu_sr_intr(void)
struct adb_request *req;
int bite = 0;

- if (via[B] & TREQ) {
- printk(KERN_ERR "PMU: spurious SR intr (%x)\n", via[B]);
+ if (in_8(&via[B]) & TREQ) {
+ printk(KERN_ERR "PMU: spurious SR intr (%x)\n", in_8(&via[B]));
out_8(&via[IFR], SR_INT);
return NULL;
}



2018-09-17 23:05:49

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.14 108/126] inet: frags: reorganize struct netns_frags

4.14-stable review patch. If anyone has any objections, please let me know.

------------------

From: Eric Dumazet <[email protected]>

Put the read-mostly fields in a separate cache line
at the beginning of struct netns_frags, to reduce
false sharing noticed in inet_frag_kill()

Signed-off-by: Eric Dumazet <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
(cherry picked from commit c2615cf5a761b32bf74e85bddc223dfff3d9b9f0)
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
include/net/inet_frag.h | 9 +++++----
1 file changed, 5 insertions(+), 4 deletions(-)

--- a/include/net/inet_frag.h
+++ b/include/net/inet_frag.h
@@ -5,16 +5,17 @@
#include <linux/rhashtable.h>

struct netns_frags {
- struct rhashtable rhashtable ____cacheline_aligned_in_smp;
-
- /* Keep atomic mem on separate cachelines in structs that include it */
- atomic_long_t mem ____cacheline_aligned_in_smp;
/* sysctls */
long high_thresh;
long low_thresh;
int timeout;
int max_dist;
struct inet_frags *f;
+
+ struct rhashtable rhashtable ____cacheline_aligned_in_smp;
+
+ /* Keep atomic mem on separate cachelines in structs that include it */
+ atomic_long_t mem ____cacheline_aligned_in_smp;
};

/**



2018-09-17 23:05:54

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.14 053/126] ath10k: prevent active scans on potential unusable channels

4.14-stable review patch. If anyone has any objections, please let me know.

------------------

From: Sven Eckelmann <[email protected]>

[ Upstream commit 3f259111583801013cb605bb4414aa529adccf1c ]

The QCA4019 hw1.0 firmware 10.4-3.2.1-00050 and 10.4-3.5.3-00053 (and most
likely all other) seem to ignore the WMI_CHAN_FLAG_DFS flag during the
scan. This results in transmission (probe requests) on channels which are
not "available" for transmissions.

Since the firmware is closed source and nothing can be done from our side
to fix the problem in it, the driver has to work around this problem. The
WMI_CHAN_FLAG_PASSIVE seems to be interpreted by the firmware to not
scan actively on a channel unless an AP was detected on it. Simple probe
requests will then be transmitted by the STA on the channel.

ath10k must therefore also use this flag when it queues a radar channel for
scanning. This should reduce the chance of an active scan when the channel
might be "unusable" for transmissions.

Fixes: e8a50f8ba44b ("ath10k: introduce DFS implementation")
Signed-off-by: Sven Eckelmann <[email protected]>
Signed-off-by: Kalle Valo <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
drivers/net/wireless/ath/ath10k/mac.c | 7 +++++++
1 file changed, 7 insertions(+)

--- a/drivers/net/wireless/ath/ath10k/mac.c
+++ b/drivers/net/wireless/ath/ath10k/mac.c
@@ -3074,6 +3074,13 @@ static int ath10k_update_channel_list(st
passive = channel->flags & IEEE80211_CHAN_NO_IR;
ch->passive = passive;

+ /* the firmware is ignoring the "radar" flag of the
+ * channel and is scanning actively using Probe Requests
+ * on "Radar detection"/DFS channels which are not
+ * marked as "available"
+ */
+ ch->passive |= ch->chan_radar;
+
ch->freq = channel->center_freq;
ch->band_center_freq1 = channel->center_freq;
ch->min_power = 0;



2018-09-17 23:05:56

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.14 109/126] inet: frags: get rid of ipfrag_skb_cb/FRAG_CB

4.14-stable review patch. If anyone has any objections, please let me know.

------------------

From: Eric Dumazet <[email protected]>

ip_defrag uses skb->cb[] to store the fragment offset, and unfortunately
this integer is currently in a different cache line than skb->next,
meaning that we use two cache lines per skb when finding the insertion point.

By aliasing skb->ip_defrag_offset and skb->dev, we pack all the fields
in a single cache line and save precious memory bandwidth.

Note that after the fast path added by Changli Gao in commit
d6bebca92c66 ("fragment: add fast path for in-order fragments")
this change wont help the fast path, since we still need
to access prev->len (2nd cache line), but will show great
benefits when slow path is entered, since we perform
a linear scan of a potentially long list.

Also, note that this potential long list is an attack vector,
we might consider also using an rb-tree there eventually.

Signed-off-by: Eric Dumazet <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
(cherry picked from commit bf66337140c64c27fa37222b7abca7e49d63fb57)
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
include/linux/skbuff.h | 1 +
net/ipv4/ip_fragment.c | 35 ++++++++++++++---------------------
2 files changed, 15 insertions(+), 21 deletions(-)

--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -678,6 +678,7 @@ struct sk_buff {
* UDP receive path is one user.
*/
unsigned long dev_scratch;
+ int ip_defrag_offset;
};
/*
* This is the control buffer. It is free to use for every
--- a/net/ipv4/ip_fragment.c
+++ b/net/ipv4/ip_fragment.c
@@ -57,14 +57,6 @@
*/
static const char ip_frag_cache_name[] = "ip4-frags";

-struct ipfrag_skb_cb
-{
- struct inet_skb_parm h;
- int offset;
-};
-
-#define FRAG_CB(skb) ((struct ipfrag_skb_cb *)((skb)->cb))
-
/* Describe an entry in the "incomplete datagrams" queue. */
struct ipq {
struct inet_frag_queue q;
@@ -353,13 +345,13 @@ static int ip_frag_queue(struct ipq *qp,
* this fragment, right?
*/
prev = qp->q.fragments_tail;
- if (!prev || FRAG_CB(prev)->offset < offset) {
+ if (!prev || prev->ip_defrag_offset < offset) {
next = NULL;
goto found;
}
prev = NULL;
for (next = qp->q.fragments; next != NULL; next = next->next) {
- if (FRAG_CB(next)->offset >= offset)
+ if (next->ip_defrag_offset >= offset)
break; /* bingo! */
prev = next;
}
@@ -370,7 +362,7 @@ found:
* any overlaps are eliminated.
*/
if (prev) {
- int i = (FRAG_CB(prev)->offset + prev->len) - offset;
+ int i = (prev->ip_defrag_offset + prev->len) - offset;

if (i > 0) {
offset += i;
@@ -387,8 +379,8 @@ found:

err = -ENOMEM;

- while (next && FRAG_CB(next)->offset < end) {
- int i = end - FRAG_CB(next)->offset; /* overlap is 'i' bytes */
+ while (next && next->ip_defrag_offset < end) {
+ int i = end - next->ip_defrag_offset; /* overlap is 'i' bytes */

if (i < next->len) {
int delta = -next->truesize;
@@ -401,7 +393,7 @@ found:
delta += next->truesize;
if (delta)
add_frag_mem_limit(qp->q.net, delta);
- FRAG_CB(next)->offset += i;
+ next->ip_defrag_offset += i;
qp->q.meat -= i;
if (next->ip_summed != CHECKSUM_UNNECESSARY)
next->ip_summed = CHECKSUM_NONE;
@@ -425,7 +417,13 @@ found:
}
}

- FRAG_CB(skb)->offset = offset;
+ /* Note : skb->ip_defrag_offset and skb->dev share the same location */
+ dev = skb->dev;
+ if (dev)
+ qp->iif = dev->ifindex;
+ /* Makes sure compiler wont do silly aliasing games */
+ barrier();
+ skb->ip_defrag_offset = offset;

/* Insert this fragment in the chain of fragments. */
skb->next = next;
@@ -436,11 +434,6 @@ found:
else
qp->q.fragments = skb;

- dev = skb->dev;
- if (dev) {
- qp->iif = dev->ifindex;
- skb->dev = NULL;
- }
qp->q.stamp = skb->tstamp;
qp->q.meat += skb->len;
qp->ecn |= ecn;
@@ -516,7 +509,7 @@ static int ip_frag_reasm(struct ipq *qp,
}

WARN_ON(!head);
- WARN_ON(FRAG_CB(head)->offset != 0);
+ WARN_ON(head->ip_defrag_offset != 0);

/* Allocate a new buffer for the datagram. */
ihlen = ip_hdrlen(head);



2018-09-17 23:05:58

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.14 105/126] inet: frags: do not clone skb in ip_expire()

4.14-stable review patch. If anyone has any objections, please let me know.

------------------

From: Eric Dumazet <[email protected]>

An skb_clone() was added in commit ec4fbd64751d ("inet: frag: release
spinlock before calling icmp_send()")

While fixing the bug at that time, it also added a very high cost
for DDOS frags, as the ICMP rate limit is applied after this
expensive operation (skb_clone() + consume_skb(), implying memory
allocations, copy, and freeing)

We can use skb_get(head) here, all we want is to make sure skb wont
be freed by another cpu.

Signed-off-by: Eric Dumazet <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
(cherry picked from commit 1eec5d5670084ee644597bd26c25e22c69b9f748)
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
net/ipv4/ip_fragment.c | 16 ++++++----------
1 file changed, 6 insertions(+), 10 deletions(-)

--- a/net/ipv4/ip_fragment.c
+++ b/net/ipv4/ip_fragment.c
@@ -143,8 +143,8 @@ static bool frag_expire_skip_icmp(u32 us
static void ip_expire(struct timer_list *t)
{
struct inet_frag_queue *frag = from_timer(frag, t, timer);
- struct sk_buff *clone, *head;
const struct iphdr *iph;
+ struct sk_buff *head;
struct net *net;
struct ipq *qp;
int err;
@@ -187,16 +187,12 @@ static void ip_expire(struct timer_list
(skb_rtable(head)->rt_type != RTN_LOCAL))
goto out;

- clone = skb_clone(head, GFP_ATOMIC);
+ skb_get(head);
+ spin_unlock(&qp->q.lock);
+ icmp_send(head, ICMP_TIME_EXCEEDED, ICMP_EXC_FRAGTIME, 0);
+ kfree_skb(head);
+ goto out_rcu_unlock;

- /* Send an ICMP "Fragment Reassembly Timeout" message. */
- if (clone) {
- spin_unlock(&qp->q.lock);
- icmp_send(clone, ICMP_TIME_EXCEEDED,
- ICMP_EXC_FRAGTIME, 0);
- consume_skb(clone);
- goto out_rcu_unlock;
- }
out:
spin_unlock(&qp->q.lock);
out_rcu_unlock:



2018-09-17 23:05:59

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.14 080/126] partitions/aix: append null character to print data from disk

4.14-stable review patch. If anyone has any objections, please let me know.

------------------

From: Mauricio Faria de Oliveira <[email protected]>

[ Upstream commit d43fdae7bac2def8c4314b5a49822cb7f08a45f1 ]

Even if properly initialized, the lvname array (i.e., strings)
is read from disk, and might contain corrupt data (e.g., lack
the null terminating character for strings).

So, make sure the partition name string used in pr_warn() has
the null terminating character.

Fixes: 6ceea22bbbc8 ("partitions: add aix lvm partition support files")
Suggested-by: Daniel J. Axtens <[email protected]>
Signed-off-by: Mauricio Faria de Oliveira <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
block/partitions/aix.c | 8 ++++++--
1 file changed, 6 insertions(+), 2 deletions(-)

--- a/block/partitions/aix.c
+++ b/block/partitions/aix.c
@@ -282,10 +282,14 @@ int aix_partition(struct parsed_partitio
next_lp_ix += 1;
}
for (i = 0; i < state->limit; i += 1)
- if (lvip[i].pps_found && !lvip[i].lv_is_contiguous)
+ if (lvip[i].pps_found && !lvip[i].lv_is_contiguous) {
+ char tmp[sizeof(n[i].name) + 1]; // null char
+
+ snprintf(tmp, sizeof(tmp), "%s", n[i].name);
pr_warn("partition %s (%u pp's found) is "
"not contiguous\n",
- n[i].name, lvip[i].pps_found);
+ tmp, lvip[i].pps_found);
+ }
kfree(pvd);
}
kfree(n);



2018-09-17 23:06:02

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.14 107/126] rhashtable: reorganize struct rhashtable layout

4.14-stable review patch. If anyone has any objections, please let me know.

------------------

From: Eric Dumazet <[email protected]>

While under frags DDOS I noticed unfortunate false sharing between
@nelems and @params.automatic_shrinking

Move @nelems at the end of struct rhashtable so that first cache line
is shared between all cpus, because almost never dirtied.

Signed-off-by: Eric Dumazet <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
(cherry picked from commit e5d672a0780d9e7118caad4c171ec88b8299398d)
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
include/linux/rhashtable.h | 8 ++++----
1 file changed, 4 insertions(+), 4 deletions(-)

--- a/include/linux/rhashtable.h
+++ b/include/linux/rhashtable.h
@@ -152,25 +152,25 @@ struct rhashtable_params {
/**
* struct rhashtable - Hash table handle
* @tbl: Bucket table
- * @nelems: Number of elements in table
* @key_len: Key length for hashfn
- * @p: Configuration parameters
* @max_elems: Maximum number of elements in table
+ * @p: Configuration parameters
* @rhlist: True if this is an rhltable
* @run_work: Deferred worker to expand/shrink asynchronously
* @mutex: Mutex to protect current/future table swapping
* @lock: Spin lock to protect walker list
+ * @nelems: Number of elements in table
*/
struct rhashtable {
struct bucket_table __rcu *tbl;
- atomic_t nelems;
unsigned int key_len;
- struct rhashtable_params p;
unsigned int max_elems;
+ struct rhashtable_params p;
bool rhlist;
struct work_struct run_work;
struct mutex mutex;
spinlock_t lock;
+ atomic_t nelems;
};

/**



2018-09-17 23:06:03

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.14 089/126] MIPS: WARN_ON invalid DMA cache maintenance, not BUG_ON

4.14-stable review patch. If anyone has any objections, please let me know.

------------------

From: Paul Burton <[email protected]>

[ Upstream commit d4da0e97baea8768b3d66ccef3967bebd50dfc3b ]

If a driver causes DMA cache maintenance with a zero length then we
currently BUG and kill the kernel. As this is a scenario that we may
well be able to recover from, WARN & return in the condition instead.

Signed-off-by: Paul Burton <[email protected]>
Acked-by: Florian Fainelli <[email protected]>
Patchwork: https://patchwork.linux-mips.org/patch/14623/
Cc: Ralf Baechle <[email protected]>
Cc: [email protected]
Signed-off-by: Sasha Levin <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
arch/mips/mm/c-r4k.c | 6 ++++--
1 file changed, 4 insertions(+), 2 deletions(-)

--- a/arch/mips/mm/c-r4k.c
+++ b/arch/mips/mm/c-r4k.c
@@ -835,7 +835,8 @@ static void r4k_flush_icache_user_range(
static void r4k_dma_cache_wback_inv(unsigned long addr, unsigned long size)
{
/* Catch bad driver code */
- BUG_ON(size == 0);
+ if (WARN_ON(size == 0))
+ return;

preempt_disable();
if (cpu_has_inclusive_pcaches) {
@@ -871,7 +872,8 @@ static void r4k_dma_cache_wback_inv(unsi
static void r4k_dma_cache_inv(unsigned long addr, unsigned long size)
{
/* Catch bad driver code */
- BUG_ON(size == 0);
+ if (WARN_ON(size == 0))
+ return;

preempt_disable();
if (cpu_has_inclusive_pcaches) {



2018-09-17 23:06:02

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.14 111/126] ip: discard IPv4 datagrams with overlapping segments.

4.14-stable review patch. If anyone has any objections, please let me know.

------------------

From: Peter Oskolkov <[email protected]>

This behavior is required in IPv6, and there is little need
to tolerate overlapping fragments in IPv4. This change
simplifies the code and eliminates potential DDoS attack vectors.

Tested: ran ip_defrag selftest (not yet available uptream).

Suggested-by: David S. Miller <[email protected]>
Signed-off-by: Peter Oskolkov <[email protected]>
Signed-off-by: Eric Dumazet <[email protected]>
Cc: Florian Westphal <[email protected]>
Acked-by: Stephen Hemminger <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
(cherry picked from commit 7969e5c40dfd04799d4341f1b7cd266b6e47f227)
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
include/uapi/linux/snmp.h | 1
net/ipv4/ip_fragment.c | 77 +++++++++++-----------------------------------
net/ipv4/proc.c | 1
3 files changed, 22 insertions(+), 57 deletions(-)

--- a/include/uapi/linux/snmp.h
+++ b/include/uapi/linux/snmp.h
@@ -56,6 +56,7 @@ enum
IPSTATS_MIB_ECT1PKTS, /* InECT1Pkts */
IPSTATS_MIB_ECT0PKTS, /* InECT0Pkts */
IPSTATS_MIB_CEPKTS, /* InCEPkts */
+ IPSTATS_MIB_REASM_OVERLAPS, /* ReasmOverlaps */
__IPSTATS_MIB_MAX
};

--- a/net/ipv4/ip_fragment.c
+++ b/net/ipv4/ip_fragment.c
@@ -277,6 +277,7 @@ static int ip_frag_reinit(struct ipq *qp
/* Add new segment to existing queue. */
static int ip_frag_queue(struct ipq *qp, struct sk_buff *skb)
{
+ struct net *net = container_of(qp->q.net, struct net, ipv4.frags);
struct sk_buff *prev, *next;
struct net_device *dev;
unsigned int fragsize;
@@ -357,65 +358,23 @@ static int ip_frag_queue(struct ipq *qp,
}

found:
- /* We found where to put this one. Check for overlap with
- * preceding fragment, and, if needed, align things so that
- * any overlaps are eliminated.
+ /* RFC5722, Section 4, amended by Errata ID : 3089
+ * When reassembling an IPv6 datagram, if
+ * one or more its constituent fragments is determined to be an
+ * overlapping fragment, the entire datagram (and any constituent
+ * fragments) MUST be silently discarded.
+ *
+ * We do the same here for IPv4.
*/
- if (prev) {
- int i = (prev->ip_defrag_offset + prev->len) - offset;

- if (i > 0) {
- offset += i;
- err = -EINVAL;
- if (end <= offset)
- goto err;
- err = -ENOMEM;
- if (!pskb_pull(skb, i))
- goto err;
- if (skb->ip_summed != CHECKSUM_UNNECESSARY)
- skb->ip_summed = CHECKSUM_NONE;
- }
- }
-
- err = -ENOMEM;
-
- while (next && next->ip_defrag_offset < end) {
- int i = end - next->ip_defrag_offset; /* overlap is 'i' bytes */
-
- if (i < next->len) {
- int delta = -next->truesize;
-
- /* Eat head of the next overlapped fragment
- * and leave the loop. The next ones cannot overlap.
- */
- if (!pskb_pull(next, i))
- goto err;
- delta += next->truesize;
- if (delta)
- add_frag_mem_limit(qp->q.net, delta);
- next->ip_defrag_offset += i;
- qp->q.meat -= i;
- if (next->ip_summed != CHECKSUM_UNNECESSARY)
- next->ip_summed = CHECKSUM_NONE;
- break;
- } else {
- struct sk_buff *free_it = next;
-
- /* Old fragment is completely overridden with
- * new one drop it.
- */
- next = next->next;
-
- if (prev)
- prev->next = next;
- else
- qp->q.fragments = next;
-
- qp->q.meat -= free_it->len;
- sub_frag_mem_limit(qp->q.net, free_it->truesize);
- kfree_skb(free_it);
- }
- }
+ /* Is there an overlap with the previous fragment? */
+ if (prev &&
+ (prev->ip_defrag_offset + prev->len) > offset)
+ goto discard_qp;
+
+ /* Is there an overlap with the next fragment? */
+ if (next && next->ip_defrag_offset < end)
+ goto discard_qp;

/* Note : skb->ip_defrag_offset and skb->dev share the same location */
dev = skb->dev;
@@ -463,6 +422,10 @@ found:
skb_dst_drop(skb);
return -EINPROGRESS;

+discard_qp:
+ inet_frag_kill(&qp->q);
+ err = -EINVAL;
+ __IP_INC_STATS(net, IPSTATS_MIB_REASM_OVERLAPS);
err:
kfree_skb(skb);
return err;
--- a/net/ipv4/proc.c
+++ b/net/ipv4/proc.c
@@ -132,6 +132,7 @@ static const struct snmp_mib snmp4_ipext
SNMP_MIB_ITEM("InECT1Pkts", IPSTATS_MIB_ECT1PKTS),
SNMP_MIB_ITEM("InECT0Pkts", IPSTATS_MIB_ECT0PKTS),
SNMP_MIB_ITEM("InCEPkts", IPSTATS_MIB_CEPKTS),
+ SNMP_MIB_ITEM("ReasmOverlaps", IPSTATS_MIB_REASM_OVERLAPS),
SNMP_MIB_SENTINEL
};




2018-09-17 23:06:03

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.14 112/126] net: speed up skb_rbtree_purge()

4.14-stable review patch. If anyone has any objections, please let me know.

------------------

From: Eric Dumazet <[email protected]>

As measured in my prior patch ("sch_netem: faster rb tree removal"),
rbtree_postorder_for_each_entry_safe() is nice looking but much slower
than using rb_next() directly, except when tree is small enough
to fit in CPU caches (then the cost is the same)

Also note that there is not even an increase of text size :
$ size net/core/skbuff.o.before net/core/skbuff.o
text data bss dec hex filename
40711 1298 0 42009 a419 net/core/skbuff.o.before
40711 1298 0 42009 a419 net/core/skbuff.o

From: Eric Dumazet <[email protected]>

Signed-off-by: David S. Miller <[email protected]>
(cherry picked from commit 7c90584c66cc4b033a3b684b0e0950f79e7b7166)
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
net/core/skbuff.c | 11 +++++++----
1 file changed, 7 insertions(+), 4 deletions(-)

--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -2850,12 +2850,15 @@ EXPORT_SYMBOL(skb_queue_purge);
*/
void skb_rbtree_purge(struct rb_root *root)
{
- struct sk_buff *skb, *next;
+ struct rb_node *p = rb_first(root);

- rbtree_postorder_for_each_entry_safe(skb, next, root, rbnode)
- kfree_skb(skb);
+ while (p) {
+ struct sk_buff *skb = rb_entry(p, struct sk_buff, rbnode);

- *root = RB_ROOT;
+ p = rb_next(p);
+ rb_erase(&skb->rbnode, root);
+ kfree_skb(skb);
+ }
}

/**



2018-09-17 23:06:07

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.14 113/126] net: modify skb_rbtree_purge to return the truesize of all purged skbs.

4.14-stable review patch. If anyone has any objections, please let me know.

------------------

From: Peter Oskolkov <[email protected]>

Tested: see the next patch is the series.

Suggested-by: Eric Dumazet <[email protected]>
Signed-off-by: Peter Oskolkov <[email protected]>
Signed-off-by: Eric Dumazet <[email protected]>
Cc: Florian Westphal <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
(cherry picked from commit 385114dec8a49b5e5945e77ba7de6356106713f4)
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
include/linux/skbuff.h | 2 +-
net/core/skbuff.c | 6 +++++-
2 files changed, 6 insertions(+), 2 deletions(-)

--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -2581,7 +2581,7 @@ static inline void __skb_queue_purge(str
kfree_skb(skb);
}

-void skb_rbtree_purge(struct rb_root *root);
+unsigned int skb_rbtree_purge(struct rb_root *root);

void *netdev_alloc_frag(unsigned int fragsz);

--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -2842,23 +2842,27 @@ EXPORT_SYMBOL(skb_queue_purge);
/**
* skb_rbtree_purge - empty a skb rbtree
* @root: root of the rbtree to empty
+ * Return value: the sum of truesizes of all purged skbs.
*
* Delete all buffers on an &sk_buff rbtree. Each buffer is removed from
* the list and one reference dropped. This function does not take
* any lock. Synchronization should be handled by the caller (e.g., TCP
* out-of-order queue is protected by the socket lock).
*/
-void skb_rbtree_purge(struct rb_root *root)
+unsigned int skb_rbtree_purge(struct rb_root *root)
{
struct rb_node *p = rb_first(root);
+ unsigned int sum = 0;

while (p) {
struct sk_buff *skb = rb_entry(p, struct sk_buff, rbnode);

p = rb_next(p);
rb_erase(&skb->rbnode, root);
+ sum += skb->truesize;
kfree_skb(skb);
}
+ return sum;
}

/**



2018-09-17 23:06:08

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.14 091/126] drm/i915: set DP Main Stream Attribute for color range on DDI platforms

4.14-stable review patch. If anyone has any objections, please let me know.

------------------

From: Jani Nikula <[email protected]>

commit 6209c285e7a5e68dbcdf8fd2456c6dd68433806b upstream.

Since Haswell we have no color range indication either in the pipe or
port registers for DP. Instead, there's a separate register for setting
the DP Main Stream Attributes (MSA) directly. The MSA register
definition makes no references to colorimetry, just a vague reference to
the DP spec. The connection to the color range was lost.

Apparently we've failed to set the proper MSA bit for limited, or CEA,
range ever since the first DDI platforms. We've started setting other
MSA parameters since commit dae847991a43 ("drm/i915: add
intel_ddi_set_pipe_settings").

Without the crucial bit of information, the DP sink has no way of
knowing the source is actually transmitting limited range RGB, leading
to "washed out" colors. With the colorimetry information, compliant
sinks should be able to handle the limited range properly. Native
(i.e. non-LSPCON) HDMI was not affected because we do pass the color
range via AVI infoframes.

Though not the root cause, the problem was made worse for DDI platforms
with commit 55bc60db5988 ("drm/i915: Add "Automatic" mode for the
"Broadcast RGB" property"), which selects limited range RGB
automatically based on the mode, as per the DP, HDMI and CEA specs.

After all these years, the fix boils down to flipping one bit.

[Per testing reports, this fixes DP sinks, but not the LSPCON. My
educated guess is that the LSPCON fails to turn the CEA range MSA into
AVI infoframes for HDMI.]

Reported-by: Michał Kopeć <[email protected]>
Reported-by: N. W. <[email protected]>
Reported-by: Nicholas Stommel <[email protected]>
Reported-by: Tom Yan <[email protected]>
Tested-by: Nicholas Stommel <[email protected]>
References: https://bugs.freedesktop.org/show_bug.cgi?id=100023
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=107476
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=94921
Cc: Paulo Zanoni <[email protected]>
Cc: Rodrigo Vivi <[email protected]>
Cc: Ville Syrjälä <[email protected]>
Cc: <[email protected]> # v3.9+
Reviewed-by: Rodrigo Vivi <[email protected]>
Signed-off-by: Jani Nikula <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
(cherry picked from commit dc5977da99ea28094b8fa4e9bacbd29bedc41de5)
Signed-off-by: Rodrigo Vivi <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
drivers/gpu/drm/i915/i915_reg.h | 1 +
drivers/gpu/drm/i915/intel_ddi.c | 4 ++++
2 files changed, 5 insertions(+)

--- a/drivers/gpu/drm/i915/i915_reg.h
+++ b/drivers/gpu/drm/i915/i915_reg.h
@@ -8462,6 +8462,7 @@ enum skl_power_gate {
#define TRANS_MSA_10_BPC (2<<5)
#define TRANS_MSA_12_BPC (3<<5)
#define TRANS_MSA_16_BPC (4<<5)
+#define TRANS_MSA_CEA_RANGE (1<<3)

/* LCPLL Control */
#define LCPLL_CTL _MMIO(0x130040)
--- a/drivers/gpu/drm/i915/intel_ddi.c
+++ b/drivers/gpu/drm/i915/intel_ddi.c
@@ -1396,6 +1396,10 @@ void intel_ddi_set_pipe_settings(const s
WARN_ON(transcoder_is_dsi(cpu_transcoder));

temp = TRANS_MSA_SYNC_CLK;
+
+ if (crtc_state->limited_color_range)
+ temp |= TRANS_MSA_CEA_RANGE;
+
switch (crtc_state->pipe_bpp) {
case 18:
temp |= TRANS_MSA_6_BPC;



2018-09-17 23:06:10

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.14 114/126] ipv6: defrag: drop non-last frags smaller than min mtu

4.14-stable review patch. If anyone has any objections, please let me know.

------------------

From: Florian Westphal <[email protected]>

don't bother with pathological cases, they only waste cycles.
IPv6 requires a minimum MTU of 1280 so we should never see fragments
smaller than this (except last frag).

v3: don't use awkward "-offset + len"
v2: drop IPv4 part, which added same check w. IPV4_MIN_MTU (68).
There were concerns that there could be even smaller frags
generated by intermediate nodes, e.g. on radio networks.

Cc: Peter Oskolkov <[email protected]>
Cc: Eric Dumazet <[email protected]>
Signed-off-by: Florian Westphal <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
(cherry picked from commit 0ed4229b08c13c84a3c301a08defdc9e7f4467e6)
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
net/ipv6/netfilter/nf_conntrack_reasm.c | 4 ++++
net/ipv6/reassembly.c | 4 ++++
2 files changed, 8 insertions(+)

--- a/net/ipv6/netfilter/nf_conntrack_reasm.c
+++ b/net/ipv6/netfilter/nf_conntrack_reasm.c
@@ -565,6 +565,10 @@ int nf_ct_frag6_gather(struct net *net,
hdr = ipv6_hdr(skb);
fhdr = (struct frag_hdr *)skb_transport_header(skb);

+ if (skb->len - skb_network_offset(skb) < IPV6_MIN_MTU &&
+ fhdr->frag_off & htons(IP6_MF))
+ return -EINVAL;
+
skb_orphan(skb);
fq = fq_find(net, fhdr->identification, user, hdr,
skb->dev ? skb->dev->ifindex : 0);
--- a/net/ipv6/reassembly.c
+++ b/net/ipv6/reassembly.c
@@ -522,6 +522,10 @@ static int ipv6_frag_rcv(struct sk_buff
return 1;
}

+ if (skb->len - skb_network_offset(skb) < IPV6_MIN_MTU &&
+ fhdr->frag_off & htons(IP6_MF))
+ goto fail_hdr;
+
iif = skb->dev ? skb->dev->ifindex : 0;
fq = fq_find(net, fhdr->identification, hdr, iif);
if (fq) {



2018-09-17 23:06:16

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.14 115/126] net: pskb_trim_rcsum() and CHECKSUM_COMPLETE are friends

4.14-stable review patch. If anyone has any objections, please let me know.

------------------

From: Eric Dumazet <[email protected]>

After working on IP defragmentation lately, I found that some large
packets defeat CHECKSUM_COMPLETE optimization because of NIC adding
zero paddings on the last (small) fragment.

While removing the padding with pskb_trim_rcsum(), we set skb->ip_summed
to CHECKSUM_NONE, forcing a full csum validation, even if all prior
fragments had CHECKSUM_COMPLETE set.

We can instead compute the checksum of the part we are trimming,
usually smaller than the part we keep.

Signed-off-by: Eric Dumazet <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
(cherry picked from commit 88078d98d1bb085d72af8437707279e203524fa5)
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
include/linux/skbuff.h | 5 ++---
net/core/skbuff.c | 14 ++++++++++++++
2 files changed, 16 insertions(+), 3 deletions(-)

--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -3135,6 +3135,7 @@ static inline void *skb_push_rcsum(struc
return skb->data;
}

+int pskb_trim_rcsum_slow(struct sk_buff *skb, unsigned int len);
/**
* pskb_trim_rcsum - trim received skb and update checksum
* @skb: buffer to trim
@@ -3148,9 +3149,7 @@ static inline int pskb_trim_rcsum(struct
{
if (likely(len >= skb->len))
return 0;
- if (skb->ip_summed == CHECKSUM_COMPLETE)
- skb->ip_summed = CHECKSUM_NONE;
- return __pskb_trim(skb, len);
+ return pskb_trim_rcsum_slow(skb, len);
}

static inline int __skb_trim_rcsum(struct sk_buff *skb, unsigned int len)
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -1839,6 +1839,20 @@ done:
}
EXPORT_SYMBOL(___pskb_trim);

+/* Note : use pskb_trim_rcsum() instead of calling this directly
+ */
+int pskb_trim_rcsum_slow(struct sk_buff *skb, unsigned int len)
+{
+ if (skb->ip_summed == CHECKSUM_COMPLETE) {
+ int delta = skb->len - len;
+
+ skb->csum = csum_sub(skb->csum,
+ skb_checksum(skb, len, delta, 0));
+ }
+ return __pskb_trim(skb, len);
+}
+EXPORT_SYMBOL(pskb_trim_rcsum_slow);
+
/**
* __pskb_pull_tail - advance tail of skb header
* @skb: buffer to reallocate



2018-09-17 23:06:21

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.14 084/126] f2fs: Fix uninitialized return in f2fs_ioc_shutdown()

4.14-stable review patch. If anyone has any objections, please let me know.

------------------

From: Dan Carpenter <[email protected]>

[ Upstream commit 2a96d8ad94ce57cb0072f7a660b1039720c47716 ]

"ret" can be uninitialized on the success path when "in ==
F2FS_GOING_DOWN_FULLSYNC".

Fixes: 60b2b4ee2bc0 ("f2fs: Fix deadlock in shutdown ioctl")
Signed-off-by: Dan Carpenter <[email protected]>
Reviewed-by: Chao Yu <[email protected]>
Signed-off-by: Jaegeuk Kim <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
fs/f2fs/file.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

--- a/fs/f2fs/file.c
+++ b/fs/f2fs/file.c
@@ -1803,7 +1803,7 @@ static int f2fs_ioc_shutdown(struct file
struct f2fs_sb_info *sbi = F2FS_I_SB(inode);
struct super_block *sb = sbi->sb;
__u32 in;
- int ret;
+ int ret = 0;

if (!capable(CAP_SYS_ADMIN))
return -EPERM;



2018-09-17 23:06:22

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.14 095/126] inet: frags: Convert timers to use timer_setup()

4.14-stable review patch. If anyone has any objections, please let me know.

------------------

From: Kees Cook <[email protected]>

In preparation for unconditionally passing the struct timer_list pointer to
all timer callbacks, switch to using the new timer_setup() and from_timer()
to pass the timer pointer explicitly.

Cc: Alexander Aring <[email protected]>
Cc: Stefan Schmidt <[email protected]>
Cc: "David S. Miller" <[email protected]>
Cc: Alexey Kuznetsov <[email protected]>
Cc: Hideaki YOSHIFUJI <[email protected]>
Cc: Pablo Neira Ayuso <[email protected]>
Cc: Jozsef Kadlecsik <[email protected]>
Cc: Florian Westphal <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Signed-off-by: Kees Cook <[email protected]>
Acked-by: Stefan Schmidt <[email protected]> # for ieee802154
Signed-off-by: David S. Miller <[email protected]>
(cherry picked from commit 78802011fbe34331bdef6f2dfb1634011f0e4c32)
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
include/net/inet_frag.h | 2 +-
net/ieee802154/6lowpan/reassembly.c | 5 +++--
net/ipv4/inet_fragment.c | 4 ++--
net/ipv4/ip_fragment.c | 5 +++--
net/ipv6/netfilter/nf_conntrack_reasm.c | 5 +++--
net/ipv6/reassembly.c | 5 +++--
6 files changed, 15 insertions(+), 11 deletions(-)

--- a/include/net/inet_frag.h
+++ b/include/net/inet_frag.h
@@ -97,7 +97,7 @@ struct inet_frags {
void (*constructor)(struct inet_frag_queue *q,
const void *arg);
void (*destructor)(struct inet_frag_queue *);
- void (*frag_expire)(unsigned long data);
+ void (*frag_expire)(struct timer_list *t);
struct kmem_cache *frags_cachep;
const char *frags_cache_name;
};
--- a/net/ieee802154/6lowpan/reassembly.c
+++ b/net/ieee802154/6lowpan/reassembly.c
@@ -80,12 +80,13 @@ static void lowpan_frag_init(struct inet
fq->daddr = *arg->dst;
}

-static void lowpan_frag_expire(unsigned long data)
+static void lowpan_frag_expire(struct timer_list *t)
{
+ struct inet_frag_queue *frag = from_timer(frag, t, timer);
struct frag_queue *fq;
struct net *net;

- fq = container_of((struct inet_frag_queue *)data, struct frag_queue, q);
+ fq = container_of(frag, struct frag_queue, q);
net = container_of(fq->q.net, struct net, ieee802154_lowpan.frags);

spin_lock(&fq->q.lock);
--- a/net/ipv4/inet_fragment.c
+++ b/net/ipv4/inet_fragment.c
@@ -150,7 +150,7 @@ inet_evict_bucket(struct inet_frags *f,
spin_unlock(&hb->chain_lock);

hlist_for_each_entry_safe(fq, n, &expired, list_evictor)
- f->frag_expire((unsigned long) fq);
+ f->frag_expire(&fq->timer);

return evicted;
}
@@ -367,7 +367,7 @@ static struct inet_frag_queue *inet_frag
f->constructor(q, arg);
add_frag_mem_limit(nf, f->qsize);

- setup_timer(&q->timer, f->frag_expire, (unsigned long)q);
+ timer_setup(&q->timer, f->frag_expire, 0);
spin_lock_init(&q->lock);
refcount_set(&q->refcnt, 1);

--- a/net/ipv4/ip_fragment.c
+++ b/net/ipv4/ip_fragment.c
@@ -191,12 +191,13 @@ static bool frag_expire_skip_icmp(u32 us
/*
* Oops, a fragment queue timed out. Kill it and send an ICMP reply.
*/
-static void ip_expire(unsigned long arg)
+static void ip_expire(struct timer_list *t)
{
+ struct inet_frag_queue *frag = from_timer(frag, t, timer);
struct ipq *qp;
struct net *net;

- qp = container_of((struct inet_frag_queue *) arg, struct ipq, q);
+ qp = container_of(frag, struct ipq, q);
net = container_of(qp->q.net, struct net, ipv4.frags);

rcu_read_lock();
--- a/net/ipv6/netfilter/nf_conntrack_reasm.c
+++ b/net/ipv6/netfilter/nf_conntrack_reasm.c
@@ -169,12 +169,13 @@ static unsigned int nf_hashfn(const stru
return nf_hash_frag(nq->id, &nq->saddr, &nq->daddr);
}

-static void nf_ct_frag6_expire(unsigned long data)
+static void nf_ct_frag6_expire(struct timer_list *t)
{
+ struct inet_frag_queue *frag = from_timer(frag, t, timer);
struct frag_queue *fq;
struct net *net;

- fq = container_of((struct inet_frag_queue *)data, struct frag_queue, q);
+ fq = container_of(frag, struct frag_queue, q);
net = container_of(fq->q.net, struct net, nf_frag.frags);

ip6_expire_frag_queue(net, fq);
--- a/net/ipv6/reassembly.c
+++ b/net/ipv6/reassembly.c
@@ -169,12 +169,13 @@ out:
}
EXPORT_SYMBOL(ip6_expire_frag_queue);

-static void ip6_frag_expire(unsigned long data)
+static void ip6_frag_expire(struct timer_list *t)
{
+ struct inet_frag_queue *frag = from_timer(frag, t, timer);
struct frag_queue *fq;
struct net *net;

- fq = container_of((struct inet_frag_queue *)data, struct frag_queue, q);
+ fq = container_of(frag, struct frag_queue, q);
net = container_of(fq->q.net, struct net, ipv6.frags);

ip6_expire_frag_queue(net, fq);



2018-09-17 23:06:25

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.14 085/126] iommu/ipmmu-vmsa: Fix allocation in atomic context

4.14-stable review patch. If anyone has any objections, please let me know.

------------------

From: Geert Uytterhoeven <[email protected]>

[ Upstream commit 46583e8c48c5a094ba28060615b3a7c8c576690f ]

When attaching a device to an IOMMU group with
CONFIG_DEBUG_ATOMIC_SLEEP=y:

BUG: sleeping function called from invalid context at mm/slab.h:421
in_atomic(): 1, irqs_disabled(): 128, pid: 61, name: kworker/1:1
...
Call trace:
...
arm_lpae_alloc_pgtable+0x114/0x184
arm_64_lpae_alloc_pgtable_s1+0x2c/0x128
arm_32_lpae_alloc_pgtable_s1+0x40/0x6c
alloc_io_pgtable_ops+0x60/0x88
ipmmu_attach_device+0x140/0x334

ipmmu_attach_device() takes a spinlock, while arm_lpae_alloc_pgtable()
allocates memory using GFP_KERNEL. Originally, the ipmmu-vmsa driver
had its own custom page table allocation implementation using
GFP_ATOMIC, hence the spinlock was fine.

Fix this by replacing the spinlock by a mutex, like the arm-smmu driver
does.

Fixes: f20ed39f53145e45 ("iommu/ipmmu-vmsa: Use the ARM LPAE page table allocator")
Signed-off-by: Geert Uytterhoeven <[email protected]>
Reviewed-by: Laurent Pinchart <[email protected]>
Signed-off-by: Joerg Roedel <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
drivers/iommu/ipmmu-vmsa.c | 9 ++++-----
1 file changed, 4 insertions(+), 5 deletions(-)

--- a/drivers/iommu/ipmmu-vmsa.c
+++ b/drivers/iommu/ipmmu-vmsa.c
@@ -54,7 +54,7 @@ struct ipmmu_vmsa_domain {
struct io_pgtable_ops *iop;

unsigned int context_id;
- spinlock_t lock; /* Protects mappings */
+ struct mutex mutex; /* Protects mappings */
};

struct ipmmu_vmsa_iommu_priv {
@@ -523,7 +523,7 @@ static struct iommu_domain *__ipmmu_doma
if (!domain)
return NULL;

- spin_lock_init(&domain->lock);
+ mutex_init(&domain->mutex);

return &domain->io_domain;
}
@@ -548,7 +548,6 @@ static int ipmmu_attach_device(struct io
struct iommu_fwspec *fwspec = dev->iommu_fwspec;
struct ipmmu_vmsa_device *mmu = priv->mmu;
struct ipmmu_vmsa_domain *domain = to_vmsa_domain(io_domain);
- unsigned long flags;
unsigned int i;
int ret = 0;

@@ -557,7 +556,7 @@ static int ipmmu_attach_device(struct io
return -ENXIO;
}

- spin_lock_irqsave(&domain->lock, flags);
+ mutex_lock(&domain->mutex);

if (!domain->mmu) {
/* The domain hasn't been used yet, initialize it. */
@@ -574,7 +573,7 @@ static int ipmmu_attach_device(struct io
} else
dev_info(dev, "Reusing IPMMU context %u\n", domain->context_id);

- spin_unlock_irqrestore(&domain->lock, flags);
+ mutex_unlock(&domain->mutex);

if (ret < 0)
return ret;



2018-09-17 23:06:26

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.14 097/126] inet: frags: refactor lowpan_net_frag_init()

4.14-stable review patch. If anyone has any objections, please let me know.

------------------

From: Eric Dumazet <[email protected]>

We want to call lowpan_net_frag_init() earlier.
Similar to commit "inet: frags: refactor ipv6_frag_init()"

This is a prereq to "inet: frags: use rhashtables for reassembly units"

Signed-off-by: Eric Dumazet <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
(cherry picked from commit 807f1844df4ac23594268fa9f41902d0549e92aa)
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
net/ieee802154/6lowpan/reassembly.c | 20 +++++++++++---------
1 file changed, 11 insertions(+), 9 deletions(-)

--- a/net/ieee802154/6lowpan/reassembly.c
+++ b/net/ieee802154/6lowpan/reassembly.c
@@ -615,14 +615,6 @@ int __init lowpan_net_frag_init(void)
{
int ret;

- ret = lowpan_frags_sysctl_register();
- if (ret)
- return ret;
-
- ret = register_pernet_subsys(&lowpan_frags_ops);
- if (ret)
- goto err_pernet;
-
lowpan_frags.hashfn = lowpan_hashfn;
lowpan_frags.constructor = lowpan_frag_init;
lowpan_frags.destructor = NULL;
@@ -632,11 +624,21 @@ int __init lowpan_net_frag_init(void)
lowpan_frags.frags_cache_name = lowpan_frags_cache_name;
ret = inet_frags_init(&lowpan_frags);
if (ret)
- goto err_pernet;
+ goto out;
+
+ ret = lowpan_frags_sysctl_register();
+ if (ret)
+ goto err_sysctl;

+ ret = register_pernet_subsys(&lowpan_frags_ops);
+ if (ret)
+ goto err_pernet;
+out:
return ret;
err_pernet:
lowpan_frags_sysctl_unregister();
+err_sysctl:
+ inet_frags_fini(&lowpan_frags);
return ret;
}




2018-09-17 23:06:31

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.14 098/126] ipv6: export ip6 fragments sysctl to unprivileged users

4.14-stable review patch. If anyone has any objections, please let me know.

------------------

From: Eric Dumazet <[email protected]>

IPv4 was changed in commit 52a773d645e9 ("net: Export ip fragment
sysctl to unprivileged users")

The only sysctl that is not per-netns is not used :
ip6frag_secret_interval

Signed-off-by: Eric Dumazet <[email protected]>
Cc: Nikolay Borisov <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
(cherry picked from commit 18dcbe12fe9fca0ab825f7eff993060525ac2503)
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
net/ipv6/reassembly.c | 4 ----
1 file changed, 4 deletions(-)

--- a/net/ipv6/reassembly.c
+++ b/net/ipv6/reassembly.c
@@ -649,10 +649,6 @@ static int __net_init ip6_frags_ns_sysct
table[1].data = &net->ipv6.frags.low_thresh;
table[1].extra2 = &net->ipv6.frags.high_thresh;
table[2].data = &net->ipv6.frags.timeout;
-
- /* Don't export sysctls to unprivileged users */
- if (net->user_ns != &init_user_ns)
- table[0].procname = NULL;
}

hdr = register_net_sysctl(net, "net/ipv6", table);



2018-09-17 23:06:33

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.14 099/126] rhashtable: add schedule points

4.14-stable review patch. If anyone has any objections, please let me know.

------------------

From: Eric Dumazet <[email protected]>

Rehashing and destroying large hash table takes a lot of time,
and happens in process context. It is safe to add cond_resched()
in rhashtable_rehash_table() and rhashtable_free_and_destroy()

Signed-off-by: Eric Dumazet <[email protected]>
Acked-by: Herbert Xu <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
(cherry picked from commit ae6da1f503abb5a5081f9f6c4a6881de97830f3e)
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
lib/rhashtable.c | 2 ++
1 file changed, 2 insertions(+)

--- a/lib/rhashtable.c
+++ b/lib/rhashtable.c
@@ -364,6 +364,7 @@ static int rhashtable_rehash_table(struc
err = rhashtable_rehash_chain(ht, old_hash);
if (err)
return err;
+ cond_resched();
}

/* Publish the new table pointer. */
@@ -1073,6 +1074,7 @@ void rhashtable_free_and_destroy(struct
for (i = 0; i < tbl->size; i++) {
struct rhash_head *pos, *next;

+ cond_resched();
for (pos = rht_dereference(*rht_bucket(tbl, i), ht),
next = !rht_is_a_nulls(pos) ?
rht_dereference(pos->next, ht) : NULL;



2018-09-17 23:06:39

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.14 079/126] media: s5p-mfc: Fix buffer look up in s5p_mfc_handle_frame_{new, copy_time} functions

4.14-stable review patch. If anyone has any objections, please let me know.

------------------

From: Sylwester Nawrocki <[email protected]>

[ Upstream commit 4faeaf9c0f4581667ce5826f9c90c4fd463ef086 ]

Look up of buffers in s5p_mfc_handle_frame_new, s5p_mfc_handle_frame_copy_time
functions is not working properly for DMA addresses above 2 GiB. As a result
flags and timestamp of returned buffers are not set correctly and it breaks
operation of GStreamer/OMX plugins which rely on the CAPTURE buffer queue
flags.

Due to improper return type of the get_dec_y_adr, get_dspl_y_adr callbacks
and sign bit extension these callbacks return incorrect address values,
e.g. 0xfffffffffefc0000 instead of 0x00000000fefc0000. Then the statement:

"if (vb2_dma_contig_plane_dma_addr(&dst_buf->b->vb2_buf, 0) == dec_y_addr)"

is always false, which breaks looking up capture queue buffers.

To ensure proper matching by address u32 type is used for the DMA
addresses. This should work on all related SoCs, since the MFC DMA
address width is not larger than 32-bit.

Changes done in this patch are minimal as there is a larger patch series
pending refactoring the whole driver.

Signed-off-by: Sylwester Nawrocki <[email protected]>
Signed-off-by: Mauro Carvalho Chehab <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
drivers/media/platform/s5p-mfc/s5p_mfc.c | 23 ++++++++++++-----------
1 file changed, 12 insertions(+), 11 deletions(-)

--- a/drivers/media/platform/s5p-mfc/s5p_mfc.c
+++ b/drivers/media/platform/s5p-mfc/s5p_mfc.c
@@ -254,24 +254,24 @@ static void s5p_mfc_handle_frame_all_ext
static void s5p_mfc_handle_frame_copy_time(struct s5p_mfc_ctx *ctx)
{
struct s5p_mfc_dev *dev = ctx->dev;
- struct s5p_mfc_buf *dst_buf, *src_buf;
- size_t dec_y_addr;
+ struct s5p_mfc_buf *dst_buf, *src_buf;
+ u32 dec_y_addr;
unsigned int frame_type;

/* Make sure we actually have a new frame before continuing. */
frame_type = s5p_mfc_hw_call(dev->mfc_ops, get_dec_frame_type, dev);
if (frame_type == S5P_FIMV_DECODE_FRAME_SKIPPED)
return;
- dec_y_addr = s5p_mfc_hw_call(dev->mfc_ops, get_dec_y_adr, dev);
+ dec_y_addr = (u32)s5p_mfc_hw_call(dev->mfc_ops, get_dec_y_adr, dev);

/* Copy timestamp / timecode from decoded src to dst and set
appropriate flags. */
src_buf = list_entry(ctx->src_queue.next, struct s5p_mfc_buf, list);
list_for_each_entry(dst_buf, &ctx->dst_queue, list) {
- if (vb2_dma_contig_plane_dma_addr(&dst_buf->b->vb2_buf, 0)
- == dec_y_addr) {
- dst_buf->b->timecode =
- src_buf->b->timecode;
+ u32 addr = (u32)vb2_dma_contig_plane_dma_addr(&dst_buf->b->vb2_buf, 0);
+
+ if (addr == dec_y_addr) {
+ dst_buf->b->timecode = src_buf->b->timecode;
dst_buf->b->vb2_buf.timestamp =
src_buf->b->vb2_buf.timestamp;
dst_buf->b->flags &=
@@ -307,10 +307,10 @@ static void s5p_mfc_handle_frame_new(str
{
struct s5p_mfc_dev *dev = ctx->dev;
struct s5p_mfc_buf *dst_buf;
- size_t dspl_y_addr;
+ u32 dspl_y_addr;
unsigned int frame_type;

- dspl_y_addr = s5p_mfc_hw_call(dev->mfc_ops, get_dspl_y_adr, dev);
+ dspl_y_addr = (u32)s5p_mfc_hw_call(dev->mfc_ops, get_dspl_y_adr, dev);
if (IS_MFCV6_PLUS(dev))
frame_type = s5p_mfc_hw_call(dev->mfc_ops,
get_disp_frame_type, ctx);
@@ -329,9 +329,10 @@ static void s5p_mfc_handle_frame_new(str
/* The MFC returns address of the buffer, now we have to
* check which videobuf does it correspond to */
list_for_each_entry(dst_buf, &ctx->dst_queue, list) {
+ u32 addr = (u32)vb2_dma_contig_plane_dma_addr(&dst_buf->b->vb2_buf, 0);
+
/* Check if this is the buffer we're looking for */
- if (vb2_dma_contig_plane_dma_addr(&dst_buf->b->vb2_buf, 0)
- == dspl_y_addr) {
+ if (addr == dspl_y_addr) {
list_del(&dst_buf->list);
ctx->dst_queue_cnt--;
dst_buf->b->sequence = ctx->sequence;



2018-09-17 23:06:47

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.14 101/126] inet: frags: remove some helpers

4.14-stable review patch. If anyone has any objections, please let me know.

------------------

From: Eric Dumazet <[email protected]>

Remove sum_frag_mem_limit(), ip_frag_mem() & ip6_frag_mem()

Also since we use rhashtable we can bring back the number of fragments
in "grep FRAG /proc/net/sockstat /proc/net/sockstat6" that was
removed in commit 434d305405ab ("inet: frag: don't account number
of fragment queues")

Signed-off-by: Eric Dumazet <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
(cherry picked from commit 6befe4a78b1553edb6eed3a78b4bcd9748526672)
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
include/net/inet_frag.h | 5 -----
include/net/ip.h | 1 -
include/net/ipv6.h | 7 -------
net/ipv4/ip_fragment.c | 5 -----
net/ipv4/proc.c | 6 +++---
net/ipv6/proc.c | 5 +++--
6 files changed, 6 insertions(+), 23 deletions(-)

--- a/include/net/inet_frag.h
+++ b/include/net/inet_frag.h
@@ -141,11 +141,6 @@ static inline void add_frag_mem_limit(st
atomic_add(i, &nf->mem);
}

-static inline int sum_frag_mem_limit(struct netns_frags *nf)
-{
- return atomic_read(&nf->mem);
-}
-
/* RFC 3168 support :
* We want to check ECN values of all fragments, do detect invalid combinations.
* In ipq->ecn, we store the OR value of each ip4_frag_ecn() fragment value.
--- a/include/net/ip.h
+++ b/include/net/ip.h
@@ -570,7 +570,6 @@ static inline struct sk_buff *ip_check_d
return skb;
}
#endif
-int ip_frag_mem(struct net *net);

/*
* Functions provided by ip_forward.c
--- a/include/net/ipv6.h
+++ b/include/net/ipv6.h
@@ -331,13 +331,6 @@ static inline bool ipv6_accept_ra(struct
idev->cnf.accept_ra;
}

-#if IS_ENABLED(CONFIG_IPV6)
-static inline int ip6_frag_mem(struct net *net)
-{
- return sum_frag_mem_limit(&net->ipv6.frags);
-}
-#endif
-
#define IPV6_FRAG_HIGH_THRESH (4 * 1024*1024) /* 4194304 */
#define IPV6_FRAG_LOW_THRESH (3 * 1024*1024) /* 3145728 */
#define IPV6_FRAG_TIMEOUT (60 * HZ) /* 60 seconds */
--- a/net/ipv4/ip_fragment.c
+++ b/net/ipv4/ip_fragment.c
@@ -83,11 +83,6 @@ static u8 ip4_frag_ecn(u8 tos)

static struct inet_frags ip4_frags;

-int ip_frag_mem(struct net *net)
-{
- return sum_frag_mem_limit(&net->ipv4.frags);
-}
-
static int ip_frag_reasm(struct ipq *qp, struct sk_buff *prev,
struct net_device *dev);

--- a/net/ipv4/proc.c
+++ b/net/ipv4/proc.c
@@ -54,7 +54,6 @@
static int sockstat_seq_show(struct seq_file *seq, void *v)
{
struct net *net = seq->private;
- unsigned int frag_mem;
int orphans, sockets;

orphans = percpu_counter_sum_positive(&tcp_orphan_count);
@@ -72,8 +71,9 @@ static int sockstat_seq_show(struct seq_
sock_prot_inuse_get(net, &udplite_prot));
seq_printf(seq, "RAW: inuse %d\n",
sock_prot_inuse_get(net, &raw_prot));
- frag_mem = ip_frag_mem(net);
- seq_printf(seq, "FRAG: inuse %u memory %u\n", !!frag_mem, frag_mem);
+ seq_printf(seq, "FRAG: inuse %u memory %u\n",
+ atomic_read(&net->ipv4.frags.rhashtable.nelems),
+ frag_mem_limit(&net->ipv4.frags));
return 0;
}

--- a/net/ipv6/proc.c
+++ b/net/ipv6/proc.c
@@ -38,7 +38,6 @@
static int sockstat6_seq_show(struct seq_file *seq, void *v)
{
struct net *net = seq->private;
- unsigned int frag_mem = ip6_frag_mem(net);

seq_printf(seq, "TCP6: inuse %d\n",
sock_prot_inuse_get(net, &tcpv6_prot));
@@ -48,7 +47,9 @@ static int sockstat6_seq_show(struct seq
sock_prot_inuse_get(net, &udplitev6_prot));
seq_printf(seq, "RAW6: inuse %d\n",
sock_prot_inuse_get(net, &rawv6_prot));
- seq_printf(seq, "FRAG6: inuse %u memory %u\n", !!frag_mem, frag_mem);
+ seq_printf(seq, "FRAG6: inuse %u memory %u\n",
+ atomic_read(&net->ipv6.frags.rhashtable.nelems),
+ frag_mem_limit(&net->ipv6.frags));
return 0;
}




2018-09-17 23:06:51

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.14 087/126] f2fs: fix to do sanity check with {sit,nat}_ver_bitmap_bytesize

4.14-stable review patch. If anyone has any objections, please let me know.

------------------

From: Chao Yu <[email protected]>

[ Upstream commit c77ec61ca0a49544ca81881cc5d5529858f7e196 ]

This patch adds to do sanity check with {sit,nat}_ver_bitmap_bytesize
during mount, in order to avoid accessing across cache boundary with
this abnormal bitmap size.

- Overview
buffer overrun in build_sit_info() when mounting a crafted f2fs image

- Reproduce

- Kernel message
[ 548.580867] F2FS-fs (loop0): Invalid log blocks per segment (8201)

[ 548.580877] F2FS-fs (loop0): Can't find valid F2FS filesystem in 1th superblock
[ 548.584979] ==================================================================
[ 548.586568] BUG: KASAN: use-after-free in kmemdup+0x36/0x50
[ 548.587715] Read of size 64 at addr ffff8801e9c265ff by task mount/1295

[ 548.589428] CPU: 1 PID: 1295 Comm: mount Not tainted 4.18.0-rc1+ #4
[ 548.589432] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Ubuntu-1.8.2-1ubuntu1 04/01/2014
[ 548.589438] Call Trace:
[ 548.589474] dump_stack+0x7b/0xb5
[ 548.589487] print_address_description+0x70/0x290
[ 548.589492] kasan_report+0x291/0x390
[ 548.589496] ? kmemdup+0x36/0x50
[ 548.589509] check_memory_region+0x139/0x190
[ 548.589514] memcpy+0x23/0x50
[ 548.589518] kmemdup+0x36/0x50
[ 548.589545] f2fs_build_segment_manager+0x8fa/0x3410
[ 548.589551] ? __asan_loadN+0xf/0x20
[ 548.589560] ? f2fs_sanity_check_ckpt+0x1be/0x240
[ 548.589566] ? f2fs_flush_sit_entries+0x10c0/0x10c0
[ 548.589587] ? __put_user_ns+0x40/0x40
[ 548.589604] ? find_next_bit+0x57/0x90
[ 548.589610] f2fs_fill_super+0x194b/0x2b40
[ 548.589617] ? f2fs_commit_super+0x1b0/0x1b0
[ 548.589637] ? set_blocksize+0x90/0x140
[ 548.589651] mount_bdev+0x1c5/0x210
[ 548.589655] ? f2fs_commit_super+0x1b0/0x1b0
[ 548.589667] f2fs_mount+0x15/0x20
[ 548.589672] mount_fs+0x60/0x1a0
[ 548.589683] ? alloc_vfsmnt+0x309/0x360
[ 548.589688] vfs_kern_mount+0x6b/0x1a0
[ 548.589699] do_mount+0x34a/0x18c0
[ 548.589710] ? lockref_put_or_lock+0xcf/0x160
[ 548.589716] ? copy_mount_string+0x20/0x20
[ 548.589728] ? memcg_kmem_put_cache+0x1b/0xa0
[ 548.589734] ? kasan_check_write+0x14/0x20
[ 548.589740] ? _copy_from_user+0x6a/0x90
[ 548.589744] ? memdup_user+0x42/0x60
[ 548.589750] ksys_mount+0x83/0xd0
[ 548.589755] __x64_sys_mount+0x67/0x80
[ 548.589781] do_syscall_64+0x78/0x170
[ 548.589797] entry_SYSCALL_64_after_hwframe+0x44/0xa9
[ 548.589820] RIP: 0033:0x7f76fc331b9a
[ 548.589821] Code: 48 8b 0d 01 c3 2b 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 49 89 ca b8 a5 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d ce c2 2b 00 f7 d8 64 89 01 48
[ 548.589880] RSP: 002b:00007ffd4f0a0e48 EFLAGS: 00000206 ORIG_RAX: 00000000000000a5
[ 548.589890] RAX: ffffffffffffffda RBX: 000000000146c030 RCX: 00007f76fc331b9a
[ 548.589892] RDX: 000000000146c210 RSI: 000000000146df30 RDI: 0000000001474ec0
[ 548.589895] RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000013
[ 548.589897] R10: 00000000c0ed0000 R11: 0000000000000206 R12: 0000000001474ec0
[ 548.589900] R13: 000000000146c210 R14: 0000000000000000 R15: 0000000000000003

[ 548.590242] The buggy address belongs to the page:
[ 548.591243] page:ffffea0007a70980 count:0 mapcount:0 mapping:0000000000000000 index:0x0
[ 548.592886] flags: 0x2ffff0000000000()
[ 548.593665] raw: 02ffff0000000000 dead000000000100 dead000000000200 0000000000000000
[ 548.595258] raw: 0000000000000000 0000000000000000 00000000ffffffff 0000000000000000
[ 548.603713] page dumped because: kasan: bad access detected

[ 548.605203] Memory state around the buggy address:
[ 548.606198] ffff8801e9c26480: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
[ 548.607676] ffff8801e9c26500: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
[ 548.609157] >ffff8801e9c26580: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
[ 548.610629] ^
[ 548.612088] ffff8801e9c26600: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
[ 548.613674] ffff8801e9c26680: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
[ 548.615141] ==================================================================
[ 548.616613] Disabling lock debugging due to kernel taint
[ 548.622871] WARNING: CPU: 1 PID: 1295 at mm/page_alloc.c:4065 __alloc_pages_slowpath+0xe4a/0x1420
[ 548.622878] Modules linked in: snd_hda_codec_generic snd_hda_intel snd_hda_codec snd_hwdep snd_hda_core snd_pcm snd_timer snd mac_hid i2c_piix4 soundcore ib_iser rdma_cm iw_cm ib_cm ib_core iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx raid1 raid0 multipath linear 8139too crct10dif_pclmul crc32_pclmul qxl drm_kms_helper syscopyarea aesni_intel sysfillrect sysimgblt fb_sys_fops ttm drm aes_x86_64 crypto_simd cryptd 8139cp glue_helper mii pata_acpi floppy
[ 548.623217] CPU: 1 PID: 1295 Comm: mount Tainted: G B 4.18.0-rc1+ #4
[ 548.623219] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Ubuntu-1.8.2-1ubuntu1 04/01/2014
[ 548.623226] RIP: 0010:__alloc_pages_slowpath+0xe4a/0x1420
[ 548.623227] Code: ff ff 01 89 85 c8 fe ff ff e9 91 fc ff ff 41 89 c5 e9 5c fc ff ff 0f 0b 89 f8 25 ff ff f7 ff 89 85 8c fe ff ff e9 d5 f2 ff ff <0f> 0b e9 65 f2 ff ff 65 8b 05 38 81 d2 47 f6 c4 01 74 1c 65 48 8b
[ 548.623281] RSP: 0018:ffff8801f28c7678 EFLAGS: 00010246
[ 548.623284] RAX: 0000000000000000 RBX: 00000000006040c0 RCX: ffffffffb82f73b7
[ 548.623287] RDX: 1ffff1003e518eeb RSI: 000000000000000c RDI: 0000000000000000
[ 548.623290] RBP: ffff8801f28c7880 R08: 0000000000000000 R09: ffffed0047fff2c5
[ 548.623292] R10: 0000000000000001 R11: ffffed0047fff2c4 R12: ffff8801e88de040
[ 548.623295] R13: 00000000006040c0 R14: 000000000000000c R15: ffff8801f28c7938
[ 548.623299] FS: 00007f76fca51840(0000) GS:ffff8801f6f00000(0000) knlGS:0000000000000000
[ 548.623302] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 548.623304] CR2: 00007f19b9171760 CR3: 00000001ed952000 CR4: 00000000000006e0
[ 548.623317] Call Trace:
[ 548.623325] ? kasan_check_read+0x11/0x20
[ 548.623330] ? __zone_watermark_ok+0x92/0x240
[ 548.623336] ? get_page_from_freelist+0x1c3/0x1d90
[ 548.623347] ? _raw_spin_lock_irqsave+0x2a/0x60
[ 548.623353] ? warn_alloc+0x250/0x250
[ 548.623358] ? save_stack+0x46/0xd0
[ 548.623361] ? kasan_kmalloc+0xad/0xe0
[ 548.623366] ? __isolate_free_page+0x2a0/0x2a0
[ 548.623370] ? mount_fs+0x60/0x1a0
[ 548.623374] ? vfs_kern_mount+0x6b/0x1a0
[ 548.623378] ? do_mount+0x34a/0x18c0
[ 548.623383] ? ksys_mount+0x83/0xd0
[ 548.623387] ? __x64_sys_mount+0x67/0x80
[ 548.623391] ? do_syscall_64+0x78/0x170
[ 548.623396] ? entry_SYSCALL_64_after_hwframe+0x44/0xa9
[ 548.623401] __alloc_pages_nodemask+0x3c5/0x400
[ 548.623407] ? __alloc_pages_slowpath+0x1420/0x1420
[ 548.623412] ? __mutex_lock_slowpath+0x20/0x20
[ 548.623417] ? kvmalloc_node+0x31/0x80
[ 548.623424] alloc_pages_current+0x75/0x110
[ 548.623436] kmalloc_order+0x24/0x60
[ 548.623442] kmalloc_order_trace+0x24/0xb0
[ 548.623448] __kmalloc_track_caller+0x207/0x220
[ 548.623455] ? f2fs_build_node_manager+0x399/0xbb0
[ 548.623460] kmemdup+0x20/0x50
[ 548.623465] f2fs_build_node_manager+0x399/0xbb0
[ 548.623470] f2fs_fill_super+0x195e/0x2b40
[ 548.623477] ? f2fs_commit_super+0x1b0/0x1b0
[ 548.623481] ? set_blocksize+0x90/0x140
[ 548.623486] mount_bdev+0x1c5/0x210
[ 548.623489] ? f2fs_commit_super+0x1b0/0x1b0
[ 548.623495] f2fs_mount+0x15/0x20
[ 548.623498] mount_fs+0x60/0x1a0
[ 548.623503] ? alloc_vfsmnt+0x309/0x360
[ 548.623508] vfs_kern_mount+0x6b/0x1a0
[ 548.623513] do_mount+0x34a/0x18c0
[ 548.623518] ? lockref_put_or_lock+0xcf/0x160
[ 548.623523] ? copy_mount_string+0x20/0x20
[ 548.623528] ? memcg_kmem_put_cache+0x1b/0xa0
[ 548.623533] ? kasan_check_write+0x14/0x20
[ 548.623537] ? _copy_from_user+0x6a/0x90
[ 548.623542] ? memdup_user+0x42/0x60
[ 548.623547] ksys_mount+0x83/0xd0
[ 548.623552] __x64_sys_mount+0x67/0x80
[ 548.623557] do_syscall_64+0x78/0x170
[ 548.623562] entry_SYSCALL_64_after_hwframe+0x44/0xa9
[ 548.623566] RIP: 0033:0x7f76fc331b9a
[ 548.623567] Code: 48 8b 0d 01 c3 2b 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 49 89 ca b8 a5 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d ce c2 2b 00 f7 d8 64 89 01 48
[ 548.623632] RSP: 002b:00007ffd4f0a0e48 EFLAGS: 00000206 ORIG_RAX: 00000000000000a5
[ 548.623636] RAX: ffffffffffffffda RBX: 000000000146c030 RCX: 00007f76fc331b9a
[ 548.623639] RDX: 000000000146c210 RSI: 000000000146df30 RDI: 0000000001474ec0
[ 548.623641] RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000013
[ 548.623643] R10: 00000000c0ed0000 R11: 0000000000000206 R12: 0000000001474ec0
[ 548.623646] R13: 000000000146c210 R14: 0000000000000000 R15: 0000000000000003
[ 548.623650] ---[ end trace 4ce02f25ff7d3df5 ]---
[ 548.623656] F2FS-fs (loop0): Failed to initialize F2FS node manager
[ 548.627936] F2FS-fs (loop0): Invalid log blocks per segment (8201)

[ 548.627940] F2FS-fs (loop0): Can't find valid F2FS filesystem in 1th superblock
[ 548.635835] F2FS-fs (loop0): Failed to initialize F2FS node manager

- Location
https://elixir.bootlin.com/linux/v4.18-rc1/source/fs/f2fs/segment.c#L3578

sit_i->sit_bitmap = kmemdup(src_bitmap, bitmap_size, GFP_KERNEL);

Buffer overrun happens when doing memcpy. I suspect there is missing (inconsistent) checks on bitmap_size.

Reported by Wen Xu ([email protected]) from SSLab, Gatech.

Reported-by: Wen Xu <[email protected]>
Signed-off-by: Chao Yu <[email protected]>
Signed-off-by: Jaegeuk Kim <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
fs/f2fs/super.c | 21 +++++++++++++++++++--
1 file changed, 19 insertions(+), 2 deletions(-)

--- a/fs/f2fs/super.c
+++ b/fs/f2fs/super.c
@@ -1883,12 +1883,17 @@ int sanity_check_ckpt(struct f2fs_sb_inf
struct f2fs_checkpoint *ckpt = F2FS_CKPT(sbi);
unsigned int ovp_segments, reserved_segments;
unsigned int main_segs, blocks_per_seg;
+ unsigned int sit_segs, nat_segs;
+ unsigned int sit_bitmap_size, nat_bitmap_size;
+ unsigned int log_blocks_per_seg;
int i;

total = le32_to_cpu(raw_super->segment_count);
fsmeta = le32_to_cpu(raw_super->segment_count_ckpt);
- fsmeta += le32_to_cpu(raw_super->segment_count_sit);
- fsmeta += le32_to_cpu(raw_super->segment_count_nat);
+ sit_segs = le32_to_cpu(raw_super->segment_count_sit);
+ fsmeta += sit_segs;
+ nat_segs = le32_to_cpu(raw_super->segment_count_nat);
+ fsmeta += nat_segs;
fsmeta += le32_to_cpu(ckpt->rsvd_segment_count);
fsmeta += le32_to_cpu(raw_super->segment_count_ssa);

@@ -1919,6 +1924,18 @@ int sanity_check_ckpt(struct f2fs_sb_inf
return 1;
}

+ sit_bitmap_size = le32_to_cpu(ckpt->sit_ver_bitmap_bytesize);
+ nat_bitmap_size = le32_to_cpu(ckpt->nat_ver_bitmap_bytesize);
+ log_blocks_per_seg = le32_to_cpu(raw_super->log_blocks_per_seg);
+
+ if (sit_bitmap_size != ((sit_segs / 2) << log_blocks_per_seg) / 8 ||
+ nat_bitmap_size != ((nat_segs / 2) << log_blocks_per_seg) / 8) {
+ f2fs_msg(sbi->sb, KERN_ERR,
+ "Wrong bitmap size: sit: %u, nat:%u",
+ sit_bitmap_size, nat_bitmap_size);
+ return 1;
+ }
+
if (unlikely(f2fs_cp_error(sbi))) {
f2fs_msg(sbi->sb, KERN_ERR, "A bug case: need to run fsck");
return 1;



2018-09-17 23:06:52

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.14 119/126] ip: add helpers to process in-order fragments faster.

4.14-stable review patch. If anyone has any objections, please let me know.

------------------

From: Peter Oskolkov <[email protected]>

This patch introduces several helper functions/macros that will be
used in the follow-up patch. No runtime changes yet.

The new logic (fully implemented in the second patch) is as follows:

* Nodes in the rb-tree will now contain not single fragments, but lists
of consecutive fragments ("runs").

* At each point in time, the current "active" run at the tail is
maintained/tracked. Fragments that arrive in-order, adjacent
to the previous tail fragment, are added to this tail run without
triggering the re-balancing of the rb-tree.

* If a fragment arrives out of order with the offset _before_ the tail run,
it is inserted into the rb-tree as a single fragment.

* If a fragment arrives after the current tail fragment (with a gap),
it starts a new "tail" run, as is inserted into the rb-tree
at the end as the head of the new run.

skb->cb is used to store additional information
needed here (suggested by Eric Dumazet).

Reported-by: Willem de Bruijn <[email protected]>
Signed-off-by: Peter Oskolkov <[email protected]>
Cc: Eric Dumazet <[email protected]>
Cc: Florian Westphal <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
(cherry picked from commit 353c9cb360874e737fb000545f783df756c06f9a)
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
include/net/inet_frag.h | 6 +++
net/ipv4/ip_fragment.c | 73 ++++++++++++++++++++++++++++++++++++++++++++++++
2 files changed, 79 insertions(+)

--- a/include/net/inet_frag.h
+++ b/include/net/inet_frag.h
@@ -57,7 +57,9 @@ struct frag_v6_compare_key {
* @lock: spinlock protecting this frag
* @refcnt: reference count of the queue
* @fragments: received fragments head
+ * @rb_fragments: received fragments rb-tree root
* @fragments_tail: received fragments tail
+ * @last_run_head: the head of the last "run". see ip_fragment.c
* @stamp: timestamp of the last received fragment
* @len: total length of the original datagram
* @meat: length of received fragments so far
@@ -78,6 +80,7 @@ struct inet_frag_queue {
struct sk_buff *fragments; /* Used in IPv6. */
struct rb_root rb_fragments; /* Used in IPv4. */
struct sk_buff *fragments_tail;
+ struct sk_buff *last_run_head;
ktime_t stamp;
int len;
int meat;
@@ -113,6 +116,9 @@ void inet_frag_kill(struct inet_frag_que
void inet_frag_destroy(struct inet_frag_queue *q);
struct inet_frag_queue *inet_frag_find(struct netns_frags *nf, void *key);

+/* Free all skbs in the queue; return the sum of their truesizes. */
+unsigned int inet_frag_rbtree_purge(struct rb_root *root);
+
static inline void inet_frag_put(struct inet_frag_queue *q)
{
if (refcount_dec_and_test(&q->refcnt))
--- a/net/ipv4/ip_fragment.c
+++ b/net/ipv4/ip_fragment.c
@@ -57,6 +57,57 @@
*/
static const char ip_frag_cache_name[] = "ip4-frags";

+/* Use skb->cb to track consecutive/adjacent fragments coming at
+ * the end of the queue. Nodes in the rb-tree queue will
+ * contain "runs" of one or more adjacent fragments.
+ *
+ * Invariants:
+ * - next_frag is NULL at the tail of a "run";
+ * - the head of a "run" has the sum of all fragment lengths in frag_run_len.
+ */
+struct ipfrag_skb_cb {
+ struct inet_skb_parm h;
+ struct sk_buff *next_frag;
+ int frag_run_len;
+};
+
+#define FRAG_CB(skb) ((struct ipfrag_skb_cb *)((skb)->cb))
+
+static void ip4_frag_init_run(struct sk_buff *skb)
+{
+ BUILD_BUG_ON(sizeof(struct ipfrag_skb_cb) > sizeof(skb->cb));
+
+ FRAG_CB(skb)->next_frag = NULL;
+ FRAG_CB(skb)->frag_run_len = skb->len;
+}
+
+/* Append skb to the last "run". */
+static void ip4_frag_append_to_last_run(struct inet_frag_queue *q,
+ struct sk_buff *skb)
+{
+ RB_CLEAR_NODE(&skb->rbnode);
+ FRAG_CB(skb)->next_frag = NULL;
+
+ FRAG_CB(q->last_run_head)->frag_run_len += skb->len;
+ FRAG_CB(q->fragments_tail)->next_frag = skb;
+ q->fragments_tail = skb;
+}
+
+/* Create a new "run" with the skb. */
+static void ip4_frag_create_run(struct inet_frag_queue *q, struct sk_buff *skb)
+{
+ if (q->last_run_head)
+ rb_link_node(&skb->rbnode, &q->last_run_head->rbnode,
+ &q->last_run_head->rbnode.rb_right);
+ else
+ rb_link_node(&skb->rbnode, NULL, &q->rb_fragments.rb_node);
+ rb_insert_color(&skb->rbnode, &q->rb_fragments);
+
+ ip4_frag_init_run(skb);
+ q->fragments_tail = skb;
+ q->last_run_head = skb;
+}
+
/* Describe an entry in the "incomplete datagrams" queue. */
struct ipq {
struct inet_frag_queue q;
@@ -654,6 +705,28 @@ struct sk_buff *ip_check_defrag(struct n
}
EXPORT_SYMBOL(ip_check_defrag);

+unsigned int inet_frag_rbtree_purge(struct rb_root *root)
+{
+ struct rb_node *p = rb_first(root);
+ unsigned int sum = 0;
+
+ while (p) {
+ struct sk_buff *skb = rb_entry(p, struct sk_buff, rbnode);
+
+ p = rb_next(p);
+ rb_erase(&skb->rbnode, root);
+ while (skb) {
+ struct sk_buff *next = FRAG_CB(skb)->next_frag;
+
+ sum += skb->truesize;
+ kfree_skb(skb);
+ skb = next;
+ }
+ }
+ return sum;
+}
+EXPORT_SYMBOL(inet_frag_rbtree_purge);
+
#ifdef CONFIG_SYSCTL
static int dist_min;




2018-09-17 23:07:00

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.14 122/126] mtd: ubi: wl: Fix error return code in ubi_wl_init()

4.14-stable review patch. If anyone has any objections, please let me know.

------------------

From: Wei Yongjun <[email protected]>

commit 7233982ade15eeac05c6f351e8d347406e6bcd2f upstream.

Fix to return error code -ENOMEM from the kmem_cache_alloc() error
handling case instead of 0, as done elsewhere in this function.

Fixes: f78e5623f45b ("ubi: fastmap: Erase outdated anchor PEBs during
attach")
Signed-off-by: Wei Yongjun <[email protected]>
Reviewed-by: Boris Brezillon <[email protected]>
Signed-off-by: Richard Weinberger <[email protected]>
Cc: Ben Hutchings <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
drivers/mtd/ubi/wl.c | 8 ++++++--
1 file changed, 6 insertions(+), 2 deletions(-)

--- a/drivers/mtd/ubi/wl.c
+++ b/drivers/mtd/ubi/wl.c
@@ -1615,8 +1615,10 @@ int ubi_wl_init(struct ubi_device *ubi,
cond_resched();

e = kmem_cache_alloc(ubi_wl_entry_slab, GFP_KERNEL);
- if (!e)
+ if (!e) {
+ err = -ENOMEM;
goto out_free;
+ }

e->pnum = aeb->pnum;
e->ec = aeb->ec;
@@ -1635,8 +1637,10 @@ int ubi_wl_init(struct ubi_device *ubi,
cond_resched();

e = kmem_cache_alloc(ubi_wl_entry_slab, GFP_KERNEL);
- if (!e)
+ if (!e) {
+ err = -ENOMEM;
goto out_free;
+ }

e->pnum = aeb->pnum;
e->ec = aeb->ec;



2018-09-17 23:07:00

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.14 117/126] net: sk_buff rbnode reorg

4.14-stable review patch. If anyone has any objections, please let me know.

------------------

From: Eric Dumazet <[email protected]>

commit bffa72cf7f9df842f0016ba03586039296b4caaf upstream

skb->rbnode shares space with skb->next, skb->prev and skb->tstamp

Current uses (TCP receive ofo queue and netem) need to save/restore
tstamp, while skb->dev is either NULL (TCP) or a constant for a given
queue (netem).

Since we plan using an RB tree for TCP retransmit queue to speedup SACK
processing with large BDP, this patch exchanges skb->dev and
skb->tstamp.

This saves some overhead in both TCP and netem.

v2: removes the swtstamp field from struct tcp_skb_cb

Signed-off-by: Eric Dumazet <[email protected]>
Cc: Soheil Hassas Yeganeh <[email protected]>
Cc: Wei Wang <[email protected]>
Cc: Willem de Bruijn <[email protected]>
Acked-by: Soheil Hassas Yeganeh <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
include/linux/skbuff.h | 24 ++--
include/net/inet_frag.h | 3
net/ipv4/inet_fragment.c | 14 +-
net/ipv4/ip_fragment.c | 182 +++++++++++++++++---------------
net/ipv6/netfilter/nf_conntrack_reasm.c | 1
net/ipv6/reassembly.c | 1
6 files changed, 128 insertions(+), 97 deletions(-)

--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -663,23 +663,27 @@ struct sk_buff {
struct sk_buff *prev;

union {
- ktime_t tstamp;
- u64 skb_mstamp;
+ struct net_device *dev;
+ /* Some protocols might use this space to store information,
+ * while device pointer would be NULL.
+ * UDP receive path is one user.
+ */
+ unsigned long dev_scratch;
};
};
- struct rb_node rbnode; /* used in netem & tcp stack */
+ struct rb_node rbnode; /* used in netem, ip4 defrag, and tcp stack */
+ struct list_head list;
};
- struct sock *sk;

union {
- struct net_device *dev;
- /* Some protocols might use this space to store information,
- * while device pointer would be NULL.
- * UDP receive path is one user.
- */
- unsigned long dev_scratch;
+ struct sock *sk;
int ip_defrag_offset;
};
+
+ union {
+ ktime_t tstamp;
+ u64 skb_mstamp;
+ };
/*
* This is the control buffer. It is free to use for every
* layer. Please put your private variables there. If you
--- a/include/net/inet_frag.h
+++ b/include/net/inet_frag.h
@@ -75,7 +75,8 @@ struct inet_frag_queue {
struct timer_list timer;
spinlock_t lock;
refcount_t refcnt;
- struct sk_buff *fragments;
+ struct sk_buff *fragments; /* Used in IPv6. */
+ struct rb_root rb_fragments; /* Used in IPv4. */
struct sk_buff *fragments_tail;
ktime_t stamp;
int len;
--- a/net/ipv4/inet_fragment.c
+++ b/net/ipv4/inet_fragment.c
@@ -136,12 +136,16 @@ void inet_frag_destroy(struct inet_frag_
fp = q->fragments;
nf = q->net;
f = nf->f;
- while (fp) {
- struct sk_buff *xp = fp->next;
+ if (fp) {
+ do {
+ struct sk_buff *xp = fp->next;

- sum_truesize += fp->truesize;
- kfree_skb(fp);
- fp = xp;
+ sum_truesize += fp->truesize;
+ kfree_skb(fp);
+ fp = xp;
+ } while (fp);
+ } else {
+ sum_truesize = skb_rbtree_purge(&q->rb_fragments);
}
sum = sum_truesize + f->qsize;

--- a/net/ipv4/ip_fragment.c
+++ b/net/ipv4/ip_fragment.c
@@ -136,7 +136,7 @@ static void ip_expire(struct timer_list
{
struct inet_frag_queue *frag = from_timer(frag, t, timer);
const struct iphdr *iph;
- struct sk_buff *head;
+ struct sk_buff *head = NULL;
struct net *net;
struct ipq *qp;
int err;
@@ -152,14 +152,31 @@ static void ip_expire(struct timer_list

ipq_kill(qp);
__IP_INC_STATS(net, IPSTATS_MIB_REASMFAILS);
-
- head = qp->q.fragments;
-
__IP_INC_STATS(net, IPSTATS_MIB_REASMTIMEOUT);

- if (!(qp->q.flags & INET_FRAG_FIRST_IN) || !head)
+ if (!qp->q.flags & INET_FRAG_FIRST_IN)
goto out;

+ /* sk_buff::dev and sk_buff::rbnode are unionized. So we
+ * pull the head out of the tree in order to be able to
+ * deal with head->dev.
+ */
+ if (qp->q.fragments) {
+ head = qp->q.fragments;
+ qp->q.fragments = head->next;
+ } else {
+ head = skb_rb_first(&qp->q.rb_fragments);
+ if (!head)
+ goto out;
+ rb_erase(&head->rbnode, &qp->q.rb_fragments);
+ memset(&head->rbnode, 0, sizeof(head->rbnode));
+ barrier();
+ }
+ if (head == qp->q.fragments_tail)
+ qp->q.fragments_tail = NULL;
+
+ sub_frag_mem_limit(qp->q.net, head->truesize);
+
head->dev = dev_get_by_index_rcu(net, qp->iif);
if (!head->dev)
goto out;
@@ -179,16 +196,16 @@ static void ip_expire(struct timer_list
(skb_rtable(head)->rt_type != RTN_LOCAL))
goto out;

- skb_get(head);
spin_unlock(&qp->q.lock);
icmp_send(head, ICMP_TIME_EXCEEDED, ICMP_EXC_FRAGTIME, 0);
- kfree_skb(head);
goto out_rcu_unlock;

out:
spin_unlock(&qp->q.lock);
out_rcu_unlock:
rcu_read_unlock();
+ if (head)
+ kfree_skb(head);
ipq_put(qp);
}

@@ -231,7 +248,7 @@ static int ip_frag_too_far(struct ipq *q
end = atomic_inc_return(&peer->rid);
qp->rid = end;

- rc = qp->q.fragments && (end - start) > max;
+ rc = qp->q.fragments_tail && (end - start) > max;

if (rc) {
struct net *net;
@@ -245,7 +262,6 @@ static int ip_frag_too_far(struct ipq *q

static int ip_frag_reinit(struct ipq *qp)
{
- struct sk_buff *fp;
unsigned int sum_truesize = 0;

if (!mod_timer(&qp->q.timer, jiffies + qp->q.net->timeout)) {
@@ -253,20 +269,14 @@ static int ip_frag_reinit(struct ipq *qp
return -ETIMEDOUT;
}

- fp = qp->q.fragments;
- do {
- struct sk_buff *xp = fp->next;
-
- sum_truesize += fp->truesize;
- kfree_skb(fp);
- fp = xp;
- } while (fp);
+ sum_truesize = skb_rbtree_purge(&qp->q.rb_fragments);
sub_frag_mem_limit(qp->q.net, sum_truesize);

qp->q.flags = 0;
qp->q.len = 0;
qp->q.meat = 0;
qp->q.fragments = NULL;
+ qp->q.rb_fragments = RB_ROOT;
qp->q.fragments_tail = NULL;
qp->iif = 0;
qp->ecn = 0;
@@ -278,7 +288,8 @@ static int ip_frag_reinit(struct ipq *qp
static int ip_frag_queue(struct ipq *qp, struct sk_buff *skb)
{
struct net *net = container_of(qp->q.net, struct net, ipv4.frags);
- struct sk_buff *prev, *next;
+ struct rb_node **rbn, *parent;
+ struct sk_buff *skb1;
struct net_device *dev;
unsigned int fragsize;
int flags, offset;
@@ -341,58 +352,58 @@ static int ip_frag_queue(struct ipq *qp,
if (err)
goto err;

- /* Find out which fragments are in front and at the back of us
- * in the chain of fragments so far. We must know where to put
- * this fragment, right?
- */
- prev = qp->q.fragments_tail;
- if (!prev || prev->ip_defrag_offset < offset) {
- next = NULL;
- goto found;
- }
- prev = NULL;
- for (next = qp->q.fragments; next != NULL; next = next->next) {
- if (next->ip_defrag_offset >= offset)
- break; /* bingo! */
- prev = next;
- }
+ /* Note : skb->rbnode and skb->dev share the same location. */
+ dev = skb->dev;
+ /* Makes sure compiler wont do silly aliasing games */
+ barrier();

-found:
/* RFC5722, Section 4, amended by Errata ID : 3089
* When reassembling an IPv6 datagram, if
* one or more its constituent fragments is determined to be an
* overlapping fragment, the entire datagram (and any constituent
* fragments) MUST be silently discarded.
*
- * We do the same here for IPv4.
+ * We do the same here for IPv4 (and increment an snmp counter).
*/

- /* Is there an overlap with the previous fragment? */
- if (prev &&
- (prev->ip_defrag_offset + prev->len) > offset)
- goto discard_qp;
-
- /* Is there an overlap with the next fragment? */
- if (next && next->ip_defrag_offset < end)
- goto discard_qp;
+ /* Find out where to put this fragment. */
+ skb1 = qp->q.fragments_tail;
+ if (!skb1) {
+ /* This is the first fragment we've received. */
+ rb_link_node(&skb->rbnode, NULL, &qp->q.rb_fragments.rb_node);
+ qp->q.fragments_tail = skb;
+ } else if ((skb1->ip_defrag_offset + skb1->len) < end) {
+ /* This is the common/special case: skb goes to the end. */
+ /* Detect and discard overlaps. */
+ if (offset < (skb1->ip_defrag_offset + skb1->len))
+ goto discard_qp;
+ /* Insert after skb1. */
+ rb_link_node(&skb->rbnode, &skb1->rbnode, &skb1->rbnode.rb_right);
+ qp->q.fragments_tail = skb;
+ } else {
+ /* Binary search. Note that skb can become the first fragment, but
+ * not the last (covered above). */
+ rbn = &qp->q.rb_fragments.rb_node;
+ do {
+ parent = *rbn;
+ skb1 = rb_to_skb(parent);
+ if (end <= skb1->ip_defrag_offset)
+ rbn = &parent->rb_left;
+ else if (offset >= skb1->ip_defrag_offset + skb1->len)
+ rbn = &parent->rb_right;
+ else /* Found an overlap with skb1. */
+ goto discard_qp;
+ } while (*rbn);
+ /* Here we have parent properly set, and rbn pointing to
+ * one of its NULL left/right children. Insert skb. */
+ rb_link_node(&skb->rbnode, parent, rbn);
+ }
+ rb_insert_color(&skb->rbnode, &qp->q.rb_fragments);

- /* Note : skb->ip_defrag_offset and skb->dev share the same location */
- dev = skb->dev;
if (dev)
qp->iif = dev->ifindex;
- /* Makes sure compiler wont do silly aliasing games */
- barrier();
skb->ip_defrag_offset = offset;

- /* Insert this fragment in the chain of fragments. */
- skb->next = next;
- if (!next)
- qp->q.fragments_tail = skb;
- if (prev)
- prev->next = skb;
- else
- qp->q.fragments = skb;
-
qp->q.stamp = skb->tstamp;
qp->q.meat += skb->len;
qp->ecn |= ecn;
@@ -414,7 +425,7 @@ found:
unsigned long orefdst = skb->_skb_refdst;

skb->_skb_refdst = 0UL;
- err = ip_frag_reasm(qp, prev, dev);
+ err = ip_frag_reasm(qp, skb, dev);
skb->_skb_refdst = orefdst;
return err;
}
@@ -431,15 +442,15 @@ err:
return err;
}

-
/* Build a new IP datagram from all its fragments. */
-
-static int ip_frag_reasm(struct ipq *qp, struct sk_buff *prev,
+static int ip_frag_reasm(struct ipq *qp, struct sk_buff *skb,
struct net_device *dev)
{
struct net *net = container_of(qp->q.net, struct net, ipv4.frags);
struct iphdr *iph;
- struct sk_buff *fp, *head = qp->q.fragments;
+ struct sk_buff *fp, *head = skb_rb_first(&qp->q.rb_fragments);
+ struct sk_buff **nextp; /* To build frag_list. */
+ struct rb_node *rbn;
int len;
int ihlen;
int err;
@@ -453,25 +464,20 @@ static int ip_frag_reasm(struct ipq *qp,
goto out_fail;
}
/* Make the one we just received the head. */
- if (prev) {
- head = prev->next;
- fp = skb_clone(head, GFP_ATOMIC);
+ if (head != skb) {
+ fp = skb_clone(skb, GFP_ATOMIC);
if (!fp)
goto out_nomem;
-
- fp->next = head->next;
- if (!fp->next)
+ rb_replace_node(&skb->rbnode, &fp->rbnode, &qp->q.rb_fragments);
+ if (qp->q.fragments_tail == skb)
qp->q.fragments_tail = fp;
- prev->next = fp;
-
- skb_morph(head, qp->q.fragments);
- head->next = qp->q.fragments->next;
-
- consume_skb(qp->q.fragments);
- qp->q.fragments = head;
+ skb_morph(skb, head);
+ rb_replace_node(&head->rbnode, &skb->rbnode,
+ &qp->q.rb_fragments);
+ consume_skb(head);
+ head = skb;
}

- WARN_ON(!head);
WARN_ON(head->ip_defrag_offset != 0);

/* Allocate a new buffer for the datagram. */
@@ -496,24 +502,35 @@ static int ip_frag_reasm(struct ipq *qp,
clone = alloc_skb(0, GFP_ATOMIC);
if (!clone)
goto out_nomem;
- clone->next = head->next;
- head->next = clone;
skb_shinfo(clone)->frag_list = skb_shinfo(head)->frag_list;
skb_frag_list_init(head);
for (i = 0; i < skb_shinfo(head)->nr_frags; i++)
plen += skb_frag_size(&skb_shinfo(head)->frags[i]);
clone->len = clone->data_len = head->data_len - plen;
- head->data_len -= clone->len;
- head->len -= clone->len;
+ skb->truesize += clone->truesize;
clone->csum = 0;
clone->ip_summed = head->ip_summed;
add_frag_mem_limit(qp->q.net, clone->truesize);
+ skb_shinfo(head)->frag_list = clone;
+ nextp = &clone->next;
+ } else {
+ nextp = &skb_shinfo(head)->frag_list;
}

- skb_shinfo(head)->frag_list = head->next;
skb_push(head, head->data - skb_network_header(head));

- for (fp=head->next; fp; fp = fp->next) {
+ /* Traverse the tree in order, to build frag_list. */
+ rbn = rb_next(&head->rbnode);
+ rb_erase(&head->rbnode, &qp->q.rb_fragments);
+ while (rbn) {
+ struct rb_node *rbnext = rb_next(rbn);
+ fp = rb_to_skb(rbn);
+ rb_erase(rbn, &qp->q.rb_fragments);
+ rbn = rbnext;
+ *nextp = fp;
+ nextp = &fp->next;
+ fp->prev = NULL;
+ memset(&fp->rbnode, 0, sizeof(fp->rbnode));
head->data_len += fp->len;
head->len += fp->len;
if (head->ip_summed != fp->ip_summed)
@@ -524,7 +541,9 @@ static int ip_frag_reasm(struct ipq *qp,
}
sub_frag_mem_limit(qp->q.net, head->truesize);

+ *nextp = NULL;
head->next = NULL;
+ head->prev = NULL;
head->dev = dev;
head->tstamp = qp->q.stamp;
IPCB(head)->frag_max_size = max(qp->max_df_size, qp->q.max_size);
@@ -552,6 +571,7 @@ static int ip_frag_reasm(struct ipq *qp,

__IP_INC_STATS(net, IPSTATS_MIB_REASMOKS);
qp->q.fragments = NULL;
+ qp->q.rb_fragments = RB_ROOT;
qp->q.fragments_tail = NULL;
return 0;

--- a/net/ipv6/netfilter/nf_conntrack_reasm.c
+++ b/net/ipv6/netfilter/nf_conntrack_reasm.c
@@ -471,6 +471,7 @@ nf_ct_frag6_reasm(struct frag_queue *fq,
head->csum);

fq->q.fragments = NULL;
+ fq->q.rb_fragments = RB_ROOT;
fq->q.fragments_tail = NULL;

return true;
--- a/net/ipv6/reassembly.c
+++ b/net/ipv6/reassembly.c
@@ -472,6 +472,7 @@ static int ip6_frag_reasm(struct frag_qu
__IP6_INC_STATS(net, __in6_dev_get(dev), IPSTATS_MIB_REASMOKS);
rcu_read_unlock();
fq->q.fragments = NULL;
+ fq->q.rb_fragments = RB_ROOT;
fq->q.fragments_tail = NULL;
return 1;




2018-09-17 23:07:07

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.14 082/126] media: helene: fix xtal frequency setting at power on

4.14-stable review patch. If anyone has any objections, please let me know.

------------------

From: Katsuhiro Suzuki <[email protected]>

[ Upstream commit a00e5f074b3f3cd39d1ccdc53d4d805b014df3f3 ]

This patch fixes crystal frequency setting when power on this device.

Signed-off-by: Katsuhiro Suzuki <[email protected]>
Acked-by: Abylay Ospan <[email protected]>
Signed-off-by: Mauro Carvalho Chehab <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
drivers/media/dvb-frontends/helene.c | 5 ++++-
1 file changed, 4 insertions(+), 1 deletion(-)

--- a/drivers/media/dvb-frontends/helene.c
+++ b/drivers/media/dvb-frontends/helene.c
@@ -897,7 +897,10 @@ static int helene_x_pon(struct helene_pr
helene_write_regs(priv, 0x99, cdata, sizeof(cdata));

/* 0x81 - 0x94 */
- data[0] = 0x18; /* xtal 24 MHz */
+ if (priv->xtal == SONY_HELENE_XTAL_16000)
+ data[0] = 0x10; /* xtal 16 MHz */
+ else
+ data[0] = 0x18; /* xtal 24 MHz */
data[1] = (uint8_t)(0x80 | (0x04 & 0x1F)); /* 4 x 25 = 100uA */
data[2] = (uint8_t)(0x80 | (0x26 & 0x7F)); /* 38 x 0.25 = 9.5pF */
data[3] = 0x80; /* REFOUT signal output 500mVpp */



2018-09-17 23:07:13

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.14 125/126] autofs: fix autofs_sbi() does not check super block type

4.14-stable review patch. If anyone has any objections, please let me know.

------------------

From: Ian Kent <[email protected]>

commit 0633da48f0793aeba27f82d30605624416723a91 upstream.

autofs_sbi() does not check the superblock magic number to verify it has
been given an autofs super block.

Backport Note: autofs4 has been renamed to autofs upstream. As a result
the upstream patch does not apply cleanly onto 4.14.y.

Link: http://lkml.kernel.org/r/[email protected]
Reported-by: <[email protected]>
Signed-off-by: Ian Kent <[email protected]>
Reviewed-by: Andrew Morton <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
Signed-off-by: Zubin Mithra <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
fs/autofs4/autofs_i.h | 4 +++-
fs/autofs4/inode.c | 1 -
2 files changed, 3 insertions(+), 2 deletions(-)

--- a/fs/autofs4/autofs_i.h
+++ b/fs/autofs4/autofs_i.h
@@ -26,6 +26,7 @@
#include <linux/list.h>
#include <linux/completion.h>
#include <asm/current.h>
+#include <linux/magic.h>

/* This is the range of ioctl() numbers we claim as ours */
#define AUTOFS_IOC_FIRST AUTOFS_IOC_READY
@@ -124,7 +125,8 @@ struct autofs_sb_info {

static inline struct autofs_sb_info *autofs4_sbi(struct super_block *sb)
{
- return (struct autofs_sb_info *)(sb->s_fs_info);
+ return sb->s_magic != AUTOFS_SUPER_MAGIC ?
+ NULL : (struct autofs_sb_info *)(sb->s_fs_info);
}

static inline struct autofs_info *autofs4_dentry_ino(struct dentry *dentry)
--- a/fs/autofs4/inode.c
+++ b/fs/autofs4/inode.c
@@ -14,7 +14,6 @@
#include <linux/pagemap.h>
#include <linux/parser.h>
#include <linux/bitops.h>
-#include <linux/magic.h>
#include "autofs_i.h"
#include <linux/module.h>




2018-09-17 23:07:26

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.14 124/126] tuntap: fix use after free during release

4.14-stable review patch. If anyone has any objections, please let me know.

------------------

From: Jason Wang <[email protected]>

commit 7063efd33bb15abc0160347f89eb5aba6b7d000e upstream.

After commit b196d88aba8a ("tun: fix use after free for ptr_ring") we
need clean up tx ring during release(). But unfortunately, it tries to
do the cleanup blindly after socket were destroyed which will lead
another use-after-free. Fix this by doing the cleanup before dropping
the last reference of the socket in __tun_detach().

Backport Note :-
Upstream commit moves the ptr_ring_cleanup call from tun_chr_close to
__tun_detach. Upstream applied that patch after replacing skb_array with
ptr_ring. This patch moves the skb_array_cleanup call from
tun_chr_close to __tun_detach.

Reported-by: Andrei Vagin <[email protected]>
Acked-by: Andrei Vagin <[email protected]>
Fixes: b196d88aba8a ("tun: fix use after free for ptr_ring")
Signed-off-by: Jason Wang <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Zubin Mithra <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
drivers/net/tun.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

--- a/drivers/net/tun.c
+++ b/drivers/net/tun.c
@@ -575,6 +575,7 @@ static void __tun_detach(struct tun_file
tun->dev->reg_state == NETREG_REGISTERED)
unregister_netdevice(tun->dev);
}
+ skb_array_cleanup(&tfile->tx_array);
sock_put(&tfile->sk);
}
}
@@ -2646,7 +2647,6 @@ static int tun_chr_close(struct inode *i
struct tun_file *tfile = file->private_data;

tun_detach(tfile, true);
- skb_array_cleanup(&tfile->tx_array);

return 0;
}



2018-09-17 23:07:30

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.14 083/126] f2fs: fix to wait on page writeback before updating page

4.14-stable review patch. If anyone has any objections, please let me know.

------------------

From: Chao Yu <[email protected]>

[ Upstream commit 6aead1617b3adf2b7e2c56f0f13e4e0ee42ebb4a ]

In error path of f2fs_move_rehashed_dirents, inode page could be writeback
state, so we should wait on inode page writeback before updating it.

Signed-off-by: Chao Yu <[email protected]>
Signed-off-by: Jaegeuk Kim <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
fs/f2fs/inline.c | 1 +
1 file changed, 1 insertion(+)

--- a/fs/f2fs/inline.c
+++ b/fs/f2fs/inline.c
@@ -502,6 +502,7 @@ static int f2fs_move_rehashed_dirents(st
return 0;
recover:
lock_page(ipage);
+ f2fs_wait_on_page_writeback(ipage, NODE, true);
memcpy(inline_dentry, backup_dentry, MAX_INLINE_DATA(dir));
f2fs_i_depth_write(dir, 0);
f2fs_i_size_write(dir, MAX_INLINE_DATA(dir));



2018-09-17 23:07:51

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.14 086/126] mfd: ti_am335x_tscadc: Fix struct clk memory leak

4.14-stable review patch. If anyone has any objections, please let me know.

------------------

From: Zumeng Chen <[email protected]>

[ Upstream commit c2b1509c77a99a0dcea0a9051ca743cb88385f50 ]

Use devm_elk_get() to let Linux manage struct clk memory to avoid the following
memory leakage report:

unreferenced object 0xdd75efc0 (size 64):
comm "systemd-udevd", pid 186, jiffies 4294945126 (age 1195.750s)
hex dump (first 32 bytes):
61 64 63 5f 74 73 63 5f 66 63 6b 00 00 00 00 00 adc_tsc_fck.....
00 00 00 00 92 03 00 00 00 00 00 00 00 00 00 00 ................
backtrace:
[<c0a15260>] kmemleak_alloc+0x40/0x74
[<c0287a10>] __kmalloc_track_caller+0x198/0x388
[<c0255610>] kstrdup+0x40/0x5c
[<c025565c>] kstrdup_const+0x30/0x3c
[<c0636630>] __clk_create_clk+0x60/0xac
[<c0630918>] clk_get_sys+0x74/0x144
[<c0630cdc>] clk_get+0x5c/0x68
[<bf0ac540>] ti_tscadc_probe+0x260/0x468 [ti_am335x_tscadc]
[<c06f3c0c>] platform_drv_probe+0x60/0xac
[<c06f1abc>] driver_probe_device+0x214/0x2dc
[<c06f1c18>] __driver_attach+0x94/0xc0
[<c06efe2c>] bus_for_each_dev+0x90/0xa0
[<c06f1470>] driver_attach+0x28/0x30
[<c06f1030>] bus_add_driver+0x184/0x1ec
[<c06f2b74>] driver_register+0xb0/0xf0
[<c06f3b4c>] __platform_driver_register+0x40/0x54

Signed-off-by: Zumeng Chen <[email protected]>
Signed-off-by: Lee Jones <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
drivers/mfd/ti_am335x_tscadc.c | 3 +--
1 file changed, 1 insertion(+), 2 deletions(-)

--- a/drivers/mfd/ti_am335x_tscadc.c
+++ b/drivers/mfd/ti_am335x_tscadc.c
@@ -210,14 +210,13 @@ static int ti_tscadc_probe(struct platfo
* The TSC_ADC_SS controller design assumes the OCP clock is
* at least 6x faster than the ADC clock.
*/
- clk = clk_get(&pdev->dev, "adc_tsc_fck");
+ clk = devm_clk_get(&pdev->dev, "adc_tsc_fck");
if (IS_ERR(clk)) {
dev_err(&pdev->dev, "failed to get TSC fck\n");
err = PTR_ERR(clk);
goto err_disable_clk;
}
clock_rate = clk_get_rate(clk);
- clk_put(clk);
tscadc->clk_div = clock_rate / ADC_CLK;

/* TSCADC_CLKDIV needs to be configured to the value minus 1 */



2018-09-17 23:07:51

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.14 088/126] NFSv4.1: Fix a potential layoutget/layoutrecall deadlock

4.14-stable review patch. If anyone has any objections, please let me know.

------------------

From: Trond Myklebust <[email protected]>

[ Upstream commit bd3d16a887b0c19a2a20d35ffed499e3a3637feb ]

If the client is sending a layoutget, but the server issues a callback
to recall what it thinks may be an outstanding layout, then we may find
an uninitialised layout attached to the inode due to the layoutget.
In that case, it is appropriate to return NFS4ERR_NOMATCHING_LAYOUT
rather than NFS4ERR_DELAY, as the latter can end up deadlocking.

Signed-off-by: Trond Myklebust <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
fs/nfs/callback_proc.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)

--- a/fs/nfs/callback_proc.c
+++ b/fs/nfs/callback_proc.c
@@ -213,9 +213,9 @@ static u32 pnfs_check_callback_stateid(s
{
u32 oldseq, newseq;

- /* Is the stateid still not initialised? */
+ /* Is the stateid not initialised? */
if (!pnfs_layout_is_valid(lo))
- return NFS4ERR_DELAY;
+ return NFS4ERR_NOMATCHING_LAYOUT;

/* Mismatched stateid? */
if (!nfs4_stateid_match_other(&lo->plh_stateid, new))



2018-09-17 23:08:01

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.14 126/126] mm: get rid of vmacache_flush_all() entirely

4.14-stable review patch. If anyone has any objections, please let me know.

------------------

From: Linus Torvalds <[email protected]>

commit 7a9cdebdcc17e426fb5287e4a82db1dfe86339b2 upstream.

Jann Horn points out that the vmacache_flush_all() function is not only
potentially expensive, it's buggy too. It also happens to be entirely
unnecessary, because the sequence number overflow case can be avoided by
simply making the sequence number be 64-bit. That doesn't even grow the
data structures in question, because the other adjacent fields are
already 64-bit.

So simplify the whole thing by just making the sequence number overflow
case go away entirely, which gets rid of all the complications and makes
the code faster too. Win-win.

[ Oleg Nesterov points out that the VMACACHE_FULL_FLUSHES statistics
also just goes away entirely with this ]

Reported-by: Jann Horn <[email protected]>
Suggested-by: Will Deacon <[email protected]>
Acked-by: Davidlohr Bueso <[email protected]>
Cc: Oleg Nesterov <[email protected]>
Cc: [email protected]
Signed-off-by: Linus Torvalds <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
include/linux/mm_types.h | 2 +-
include/linux/mm_types_task.h | 2 +-
include/linux/vm_event_item.h | 1 -
include/linux/vmacache.h | 5 -----
mm/debug.c | 4 ++--
mm/vmacache.c | 38 --------------------------------------
6 files changed, 4 insertions(+), 48 deletions(-)

--- a/include/linux/mm_types.h
+++ b/include/linux/mm_types.h
@@ -354,7 +354,7 @@ struct kioctx_table;
struct mm_struct {
struct vm_area_struct *mmap; /* list of VMAs */
struct rb_root mm_rb;
- u32 vmacache_seqnum; /* per-thread vmacache */
+ u64 vmacache_seqnum; /* per-thread vmacache */
#ifdef CONFIG_MMU
unsigned long (*get_unmapped_area) (struct file *filp,
unsigned long addr, unsigned long len,
--- a/include/linux/mm_types_task.h
+++ b/include/linux/mm_types_task.h
@@ -32,7 +32,7 @@
#define VMACACHE_MASK (VMACACHE_SIZE - 1)

struct vmacache {
- u32 seqnum;
+ u64 seqnum;
struct vm_area_struct *vmas[VMACACHE_SIZE];
};

--- a/include/linux/vm_event_item.h
+++ b/include/linux/vm_event_item.h
@@ -105,7 +105,6 @@ enum vm_event_item { PGPGIN, PGPGOUT, PS
#ifdef CONFIG_DEBUG_VM_VMACACHE
VMACACHE_FIND_CALLS,
VMACACHE_FIND_HITS,
- VMACACHE_FULL_FLUSHES,
#endif
#ifdef CONFIG_SWAP
SWAP_RA,
--- a/include/linux/vmacache.h
+++ b/include/linux/vmacache.h
@@ -16,7 +16,6 @@ static inline void vmacache_flush(struct
memset(tsk->vmacache.vmas, 0, sizeof(tsk->vmacache.vmas));
}

-extern void vmacache_flush_all(struct mm_struct *mm);
extern void vmacache_update(unsigned long addr, struct vm_area_struct *newvma);
extern struct vm_area_struct *vmacache_find(struct mm_struct *mm,
unsigned long addr);
@@ -30,10 +29,6 @@ extern struct vm_area_struct *vmacache_f
static inline void vmacache_invalidate(struct mm_struct *mm)
{
mm->vmacache_seqnum++;
-
- /* deal with overflows */
- if (unlikely(mm->vmacache_seqnum == 0))
- vmacache_flush_all(mm);
}

#endif /* __LINUX_VMACACHE_H */
--- a/mm/debug.c
+++ b/mm/debug.c
@@ -100,7 +100,7 @@ EXPORT_SYMBOL(dump_vma);

void dump_mm(const struct mm_struct *mm)
{
- pr_emerg("mm %p mmap %p seqnum %d task_size %lu\n"
+ pr_emerg("mm %p mmap %p seqnum %llu task_size %lu\n"
#ifdef CONFIG_MMU
"get_unmapped_area %p\n"
#endif
@@ -128,7 +128,7 @@ void dump_mm(const struct mm_struct *mm)
"tlb_flush_pending %d\n"
"def_flags: %#lx(%pGv)\n",

- mm, mm->mmap, mm->vmacache_seqnum, mm->task_size,
+ mm, mm->mmap, (long long) mm->vmacache_seqnum, mm->task_size,
#ifdef CONFIG_MMU
mm->get_unmapped_area,
#endif
--- a/mm/vmacache.c
+++ b/mm/vmacache.c
@@ -8,44 +8,6 @@
#include <linux/vmacache.h>

/*
- * Flush vma caches for threads that share a given mm.
- *
- * The operation is safe because the caller holds the mmap_sem
- * exclusively and other threads accessing the vma cache will
- * have mmap_sem held at least for read, so no extra locking
- * is required to maintain the vma cache.
- */
-void vmacache_flush_all(struct mm_struct *mm)
-{
- struct task_struct *g, *p;
-
- count_vm_vmacache_event(VMACACHE_FULL_FLUSHES);
-
- /*
- * Single threaded tasks need not iterate the entire
- * list of process. We can avoid the flushing as well
- * since the mm's seqnum was increased and don't have
- * to worry about other threads' seqnum. Current's
- * flush will occur upon the next lookup.
- */
- if (atomic_read(&mm->mm_users) == 1)
- return;
-
- rcu_read_lock();
- for_each_process_thread(g, p) {
- /*
- * Only flush the vmacache pointers as the
- * mm seqnum is already set and curr's will
- * be set upon invalidation when the next
- * lookup is done.
- */
- if (mm == p->mm)
- vmacache_flush(p);
- }
- rcu_read_unlock();
-}
-
-/*
* This task may be accessing a foreign mm via (for example)
* get_user_pages()->find_vma(). The vmacache is task-local and this
* task's vmacache pertains to a different mm (ie, its own). There is



2018-09-17 23:08:04

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.14 118/126] ipv4: frags: precedence bug in ip_expire()

4.14-stable review patch. If anyone has any objections, please let me know.

------------------

From: Dan Carpenter <[email protected]>

We accidentally removed the parentheses here, but they are required
because '!' has higher precedence than '&'.

Fixes: fa0f527358bd ("ip: use rb trees for IP frag queue.")
Signed-off-by: Dan Carpenter <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
(cherry picked from commit 70837ffe3085c9a91488b52ca13ac84424da1042)
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
net/ipv4/ip_fragment.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

--- a/net/ipv4/ip_fragment.c
+++ b/net/ipv4/ip_fragment.c
@@ -154,7 +154,7 @@ static void ip_expire(struct timer_list
__IP_INC_STATS(net, IPSTATS_MIB_REASMFAILS);
__IP_INC_STATS(net, IPSTATS_MIB_REASMTIMEOUT);

- if (!qp->q.flags & INET_FRAG_FIRST_IN)
+ if (!(qp->q.flags & INET_FRAG_FIRST_IN))
goto out;

/* sk_buff::dev and sk_buff::rbnode are unionized. So we



2018-09-17 23:08:09

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.14 120/126] ip: process in-order fragments efficiently

4.14-stable review patch. If anyone has any objections, please let me know.

------------------

From: Peter Oskolkov <[email protected]>

This patch changes the runtime behavior of IP defrag queue:
incoming in-order fragments are added to the end of the current
list/"run" of in-order fragments at the tail.

On some workloads, UDP stream performance is substantially improved:

RX: ./udp_stream -F 10 -T 2 -l 60
TX: ./udp_stream -c -H <host> -F 10 -T 5 -l 60

with this patchset applied on a 10Gbps receiver:

throughput=9524.18
throughput_units=Mbit/s

upstream (net-next):

throughput=4608.93
throughput_units=Mbit/s

Reported-by: Willem de Bruijn <[email protected]>
Signed-off-by: Peter Oskolkov <[email protected]>
Cc: Eric Dumazet <[email protected]>
Cc: Florian Westphal <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
(cherry picked from commit a4fd284a1f8fd4b6c59aa59db2185b1e17c5c11c)
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
net/ipv4/inet_fragment.c | 2
net/ipv4/ip_fragment.c | 110 +++++++++++++++++++++++++++++------------------
2 files changed, 70 insertions(+), 42 deletions(-)

--- a/net/ipv4/inet_fragment.c
+++ b/net/ipv4/inet_fragment.c
@@ -145,7 +145,7 @@ void inet_frag_destroy(struct inet_frag_
fp = xp;
} while (fp);
} else {
- sum_truesize = skb_rbtree_purge(&q->rb_fragments);
+ sum_truesize = inet_frag_rbtree_purge(&q->rb_fragments);
}
sum = sum_truesize + f->qsize;

--- a/net/ipv4/ip_fragment.c
+++ b/net/ipv4/ip_fragment.c
@@ -126,8 +126,8 @@ static u8 ip4_frag_ecn(u8 tos)

static struct inet_frags ip4_frags;

-static int ip_frag_reasm(struct ipq *qp, struct sk_buff *prev,
- struct net_device *dev);
+static int ip_frag_reasm(struct ipq *qp, struct sk_buff *skb,
+ struct sk_buff *prev_tail, struct net_device *dev);


static void ip4_frag_init(struct inet_frag_queue *q, const void *a)
@@ -219,7 +219,12 @@ static void ip_expire(struct timer_list
head = skb_rb_first(&qp->q.rb_fragments);
if (!head)
goto out;
- rb_erase(&head->rbnode, &qp->q.rb_fragments);
+ if (FRAG_CB(head)->next_frag)
+ rb_replace_node(&head->rbnode,
+ &FRAG_CB(head)->next_frag->rbnode,
+ &qp->q.rb_fragments);
+ else
+ rb_erase(&head->rbnode, &qp->q.rb_fragments);
memset(&head->rbnode, 0, sizeof(head->rbnode));
barrier();
}
@@ -320,7 +325,7 @@ static int ip_frag_reinit(struct ipq *qp
return -ETIMEDOUT;
}

- sum_truesize = skb_rbtree_purge(&qp->q.rb_fragments);
+ sum_truesize = inet_frag_rbtree_purge(&qp->q.rb_fragments);
sub_frag_mem_limit(qp->q.net, sum_truesize);

qp->q.flags = 0;
@@ -329,6 +334,7 @@ static int ip_frag_reinit(struct ipq *qp
qp->q.fragments = NULL;
qp->q.rb_fragments = RB_ROOT;
qp->q.fragments_tail = NULL;
+ qp->q.last_run_head = NULL;
qp->iif = 0;
qp->ecn = 0;

@@ -340,7 +346,7 @@ static int ip_frag_queue(struct ipq *qp,
{
struct net *net = container_of(qp->q.net, struct net, ipv4.frags);
struct rb_node **rbn, *parent;
- struct sk_buff *skb1;
+ struct sk_buff *skb1, *prev_tail;
struct net_device *dev;
unsigned int fragsize;
int flags, offset;
@@ -418,38 +424,41 @@ static int ip_frag_queue(struct ipq *qp,
*/

/* Find out where to put this fragment. */
- skb1 = qp->q.fragments_tail;
- if (!skb1) {
- /* This is the first fragment we've received. */
- rb_link_node(&skb->rbnode, NULL, &qp->q.rb_fragments.rb_node);
- qp->q.fragments_tail = skb;
- } else if ((skb1->ip_defrag_offset + skb1->len) < end) {
- /* This is the common/special case: skb goes to the end. */
+ prev_tail = qp->q.fragments_tail;
+ if (!prev_tail)
+ ip4_frag_create_run(&qp->q, skb); /* First fragment. */
+ else if (prev_tail->ip_defrag_offset + prev_tail->len < end) {
+ /* This is the common case: skb goes to the end. */
/* Detect and discard overlaps. */
- if (offset < (skb1->ip_defrag_offset + skb1->len))
+ if (offset < prev_tail->ip_defrag_offset + prev_tail->len)
goto discard_qp;
- /* Insert after skb1. */
- rb_link_node(&skb->rbnode, &skb1->rbnode, &skb1->rbnode.rb_right);
- qp->q.fragments_tail = skb;
+ if (offset == prev_tail->ip_defrag_offset + prev_tail->len)
+ ip4_frag_append_to_last_run(&qp->q, skb);
+ else
+ ip4_frag_create_run(&qp->q, skb);
} else {
- /* Binary search. Note that skb can become the first fragment, but
- * not the last (covered above). */
+ /* Binary search. Note that skb can become the first fragment,
+ * but not the last (covered above).
+ */
rbn = &qp->q.rb_fragments.rb_node;
do {
parent = *rbn;
skb1 = rb_to_skb(parent);
if (end <= skb1->ip_defrag_offset)
rbn = &parent->rb_left;
- else if (offset >= skb1->ip_defrag_offset + skb1->len)
+ else if (offset >= skb1->ip_defrag_offset +
+ FRAG_CB(skb1)->frag_run_len)
rbn = &parent->rb_right;
else /* Found an overlap with skb1. */
goto discard_qp;
} while (*rbn);
/* Here we have parent properly set, and rbn pointing to
- * one of its NULL left/right children. Insert skb. */
+ * one of its NULL left/right children. Insert skb.
+ */
+ ip4_frag_init_run(skb);
rb_link_node(&skb->rbnode, parent, rbn);
+ rb_insert_color(&skb->rbnode, &qp->q.rb_fragments);
}
- rb_insert_color(&skb->rbnode, &qp->q.rb_fragments);

if (dev)
qp->iif = dev->ifindex;
@@ -476,7 +485,7 @@ static int ip_frag_queue(struct ipq *qp,
unsigned long orefdst = skb->_skb_refdst;

skb->_skb_refdst = 0UL;
- err = ip_frag_reasm(qp, skb, dev);
+ err = ip_frag_reasm(qp, skb, prev_tail, dev);
skb->_skb_refdst = orefdst;
return err;
}
@@ -495,7 +504,7 @@ err:

/* Build a new IP datagram from all its fragments. */
static int ip_frag_reasm(struct ipq *qp, struct sk_buff *skb,
- struct net_device *dev)
+ struct sk_buff *prev_tail, struct net_device *dev)
{
struct net *net = container_of(qp->q.net, struct net, ipv4.frags);
struct iphdr *iph;
@@ -519,10 +528,16 @@ static int ip_frag_reasm(struct ipq *qp,
fp = skb_clone(skb, GFP_ATOMIC);
if (!fp)
goto out_nomem;
- rb_replace_node(&skb->rbnode, &fp->rbnode, &qp->q.rb_fragments);
+ FRAG_CB(fp)->next_frag = FRAG_CB(skb)->next_frag;
+ if (RB_EMPTY_NODE(&skb->rbnode))
+ FRAG_CB(prev_tail)->next_frag = fp;
+ else
+ rb_replace_node(&skb->rbnode, &fp->rbnode,
+ &qp->q.rb_fragments);
if (qp->q.fragments_tail == skb)
qp->q.fragments_tail = fp;
skb_morph(skb, head);
+ FRAG_CB(skb)->next_frag = FRAG_CB(head)->next_frag;
rb_replace_node(&head->rbnode, &skb->rbnode,
&qp->q.rb_fragments);
consume_skb(head);
@@ -558,7 +573,7 @@ static int ip_frag_reasm(struct ipq *qp,
for (i = 0; i < skb_shinfo(head)->nr_frags; i++)
plen += skb_frag_size(&skb_shinfo(head)->frags[i]);
clone->len = clone->data_len = head->data_len - plen;
- skb->truesize += clone->truesize;
+ head->truesize += clone->truesize;
clone->csum = 0;
clone->ip_summed = head->ip_summed;
add_frag_mem_limit(qp->q.net, clone->truesize);
@@ -571,24 +586,36 @@ static int ip_frag_reasm(struct ipq *qp,
skb_push(head, head->data - skb_network_header(head));

/* Traverse the tree in order, to build frag_list. */
+ fp = FRAG_CB(head)->next_frag;
rbn = rb_next(&head->rbnode);
rb_erase(&head->rbnode, &qp->q.rb_fragments);
- while (rbn) {
- struct rb_node *rbnext = rb_next(rbn);
- fp = rb_to_skb(rbn);
- rb_erase(rbn, &qp->q.rb_fragments);
- rbn = rbnext;
- *nextp = fp;
- nextp = &fp->next;
- fp->prev = NULL;
- memset(&fp->rbnode, 0, sizeof(fp->rbnode));
- head->data_len += fp->len;
- head->len += fp->len;
- if (head->ip_summed != fp->ip_summed)
- head->ip_summed = CHECKSUM_NONE;
- else if (head->ip_summed == CHECKSUM_COMPLETE)
- head->csum = csum_add(head->csum, fp->csum);
- head->truesize += fp->truesize;
+ while (rbn || fp) {
+ /* fp points to the next sk_buff in the current run;
+ * rbn points to the next run.
+ */
+ /* Go through the current run. */
+ while (fp) {
+ *nextp = fp;
+ nextp = &fp->next;
+ fp->prev = NULL;
+ memset(&fp->rbnode, 0, sizeof(fp->rbnode));
+ head->data_len += fp->len;
+ head->len += fp->len;
+ if (head->ip_summed != fp->ip_summed)
+ head->ip_summed = CHECKSUM_NONE;
+ else if (head->ip_summed == CHECKSUM_COMPLETE)
+ head->csum = csum_add(head->csum, fp->csum);
+ head->truesize += fp->truesize;
+ fp = FRAG_CB(fp)->next_frag;
+ }
+ /* Move to the next run. */
+ if (rbn) {
+ struct rb_node *rbnext = rb_next(rbn);
+
+ fp = rb_to_skb(rbn);
+ rb_erase(rbn, &qp->q.rb_fragments);
+ rbn = rbnext;
+ }
}
sub_frag_mem_limit(qp->q.net, head->truesize);

@@ -624,6 +651,7 @@ static int ip_frag_reasm(struct ipq *qp,
qp->q.fragments = NULL;
qp->q.rb_fragments = RB_ROOT;
qp->q.fragments_tail = NULL;
+ qp->q.last_run_head = NULL;
return 0;

out_nomem:



2018-09-17 23:08:29

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.14 116/126] net: add rb_to_skb() and other rb tree helpers

4.14-stable review patch. If anyone has any objections, please let me know.

------------------

From: Eric Dumazet <[email protected]>

Geeralize private netem_rb_to_skb()

TCP rtx queue will soon be converted to rb-tree,
so we will need skb_rbtree_walk() helpers.

Signed-off-by: Eric Dumazet <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
(cherry picked from commit 18a4c0eab2623cc95be98a1e6af1ad18e7695977)
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
include/linux/skbuff.h | 18 ++++++++++++++++++
net/ipv4/tcp_fastopen.c | 8 +++-----
net/ipv4/tcp_input.c | 33 ++++++++++++---------------------
net/sched/sch_netem.c | 14 ++++----------
4 files changed, 37 insertions(+), 36 deletions(-)

--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -3169,6 +3169,12 @@ static inline int __skb_grow_rcsum(struc

#define rb_to_skb(rb) rb_entry_safe(rb, struct sk_buff, rbnode)

+#define rb_to_skb(rb) rb_entry_safe(rb, struct sk_buff, rbnode)
+#define skb_rb_first(root) rb_to_skb(rb_first(root))
+#define skb_rb_last(root) rb_to_skb(rb_last(root))
+#define skb_rb_next(skb) rb_to_skb(rb_next(&(skb)->rbnode))
+#define skb_rb_prev(skb) rb_to_skb(rb_prev(&(skb)->rbnode))
+
#define skb_queue_walk(queue, skb) \
for (skb = (queue)->next; \
skb != (struct sk_buff *)(queue); \
@@ -3183,6 +3189,18 @@ static inline int __skb_grow_rcsum(struc
for (; skb != (struct sk_buff *)(queue); \
skb = skb->next)

+#define skb_rbtree_walk(skb, root) \
+ for (skb = skb_rb_first(root); skb != NULL; \
+ skb = skb_rb_next(skb))
+
+#define skb_rbtree_walk_from(skb) \
+ for (; skb != NULL; \
+ skb = skb_rb_next(skb))
+
+#define skb_rbtree_walk_from_safe(skb, tmp) \
+ for (; tmp = skb ? skb_rb_next(skb) : NULL, (skb != NULL); \
+ skb = tmp)
+
#define skb_queue_walk_from_safe(queue, skb, tmp) \
for (tmp = skb->next; \
skb != (struct sk_buff *)(queue); \
--- a/net/ipv4/tcp_fastopen.c
+++ b/net/ipv4/tcp_fastopen.c
@@ -458,17 +458,15 @@ bool tcp_fastopen_active_should_disable(
void tcp_fastopen_active_disable_ofo_check(struct sock *sk)
{
struct tcp_sock *tp = tcp_sk(sk);
- struct rb_node *p;
- struct sk_buff *skb;
struct dst_entry *dst;
+ struct sk_buff *skb;

if (!tp->syn_fastopen)
return;

if (!tp->data_segs_in) {
- p = rb_first(&tp->out_of_order_queue);
- if (p && !rb_next(p)) {
- skb = rb_entry(p, struct sk_buff, rbnode);
+ skb = skb_rb_first(&tp->out_of_order_queue);
+ if (skb && !skb_rb_next(skb)) {
if (TCP_SKB_CB(skb)->tcp_flags & TCPHDR_FIN) {
tcp_fastopen_active_disable(sk);
return;
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -4372,7 +4372,7 @@ static void tcp_ofo_queue(struct sock *s

p = rb_first(&tp->out_of_order_queue);
while (p) {
- skb = rb_entry(p, struct sk_buff, rbnode);
+ skb = rb_to_skb(p);
if (after(TCP_SKB_CB(skb)->seq, tp->rcv_nxt))
break;

@@ -4440,7 +4440,7 @@ static int tcp_try_rmem_schedule(struct
static void tcp_data_queue_ofo(struct sock *sk, struct sk_buff *skb)
{
struct tcp_sock *tp = tcp_sk(sk);
- struct rb_node **p, *q, *parent;
+ struct rb_node **p, *parent;
struct sk_buff *skb1;
u32 seq, end_seq;
bool fragstolen;
@@ -4503,7 +4503,7 @@ coalesce_done:
parent = NULL;
while (*p) {
parent = *p;
- skb1 = rb_entry(parent, struct sk_buff, rbnode);
+ skb1 = rb_to_skb(parent);
if (before(seq, TCP_SKB_CB(skb1)->seq)) {
p = &parent->rb_left;
continue;
@@ -4548,9 +4548,7 @@ insert:

merge_right:
/* Remove other segments covered by skb. */
- while ((q = rb_next(&skb->rbnode)) != NULL) {
- skb1 = rb_entry(q, struct sk_buff, rbnode);
-
+ while ((skb1 = skb_rb_next(skb)) != NULL) {
if (!after(end_seq, TCP_SKB_CB(skb1)->seq))
break;
if (before(end_seq, TCP_SKB_CB(skb1)->end_seq)) {
@@ -4565,7 +4563,7 @@ merge_right:
tcp_drop(sk, skb1);
}
/* If there is no skb after us, we are the last_skb ! */
- if (!q)
+ if (!skb1)
tp->ooo_last_skb = skb;

add_sack:
@@ -4749,7 +4747,7 @@ static struct sk_buff *tcp_skb_next(stru
if (list)
return !skb_queue_is_last(list, skb) ? skb->next : NULL;

- return rb_entry_safe(rb_next(&skb->rbnode), struct sk_buff, rbnode);
+ return skb_rb_next(skb);
}

static struct sk_buff *tcp_collapse_one(struct sock *sk, struct sk_buff *skb,
@@ -4778,7 +4776,7 @@ static void tcp_rbtree_insert(struct rb_

while (*p) {
parent = *p;
- skb1 = rb_entry(parent, struct sk_buff, rbnode);
+ skb1 = rb_to_skb(parent);
if (before(TCP_SKB_CB(skb)->seq, TCP_SKB_CB(skb1)->seq))
p = &parent->rb_left;
else
@@ -4898,19 +4896,12 @@ static void tcp_collapse_ofo_queue(struc
struct tcp_sock *tp = tcp_sk(sk);
u32 range_truesize, sum_tiny = 0;
struct sk_buff *skb, *head;
- struct rb_node *p;
u32 start, end;

- p = rb_first(&tp->out_of_order_queue);
- skb = rb_entry_safe(p, struct sk_buff, rbnode);
+ skb = skb_rb_first(&tp->out_of_order_queue);
new_range:
if (!skb) {
- p = rb_last(&tp->out_of_order_queue);
- /* Note: This is possible p is NULL here. We do not
- * use rb_entry_safe(), as ooo_last_skb is valid only
- * if rbtree is not empty.
- */
- tp->ooo_last_skb = rb_entry(p, struct sk_buff, rbnode);
+ tp->ooo_last_skb = skb_rb_last(&tp->out_of_order_queue);
return;
}
start = TCP_SKB_CB(skb)->seq;
@@ -4918,7 +4909,7 @@ new_range:
range_truesize = skb->truesize;

for (head = skb;;) {
- skb = tcp_skb_next(skb, NULL);
+ skb = skb_rb_next(skb);

/* Range is terminated when we see a gap or when
* we are at the queue end.
@@ -4974,7 +4965,7 @@ static bool tcp_prune_ofo_queue(struct s
prev = rb_prev(node);
rb_erase(node, &tp->out_of_order_queue);
goal -= rb_to_skb(node)->truesize;
- tcp_drop(sk, rb_entry(node, struct sk_buff, rbnode));
+ tcp_drop(sk, rb_to_skb(node));
if (!prev || goal <= 0) {
sk_mem_reclaim(sk);
if (atomic_read(&sk->sk_rmem_alloc) <= sk->sk_rcvbuf &&
@@ -4984,7 +4975,7 @@ static bool tcp_prune_ofo_queue(struct s
}
node = prev;
} while (node);
- tp->ooo_last_skb = rb_entry(prev, struct sk_buff, rbnode);
+ tp->ooo_last_skb = rb_to_skb(prev);

/* Reset SACK state. A conforming SACK implementation will
* do the same at a timeout based retransmit. When a connection
--- a/net/sched/sch_netem.c
+++ b/net/sched/sch_netem.c
@@ -149,12 +149,6 @@ struct netem_skb_cb {
ktime_t tstamp_save;
};

-
-static struct sk_buff *netem_rb_to_skb(struct rb_node *rb)
-{
- return rb_entry(rb, struct sk_buff, rbnode);
-}
-
static inline struct netem_skb_cb *netem_skb_cb(struct sk_buff *skb)
{
/* we assume we can use skb next/prev/tstamp as storage for rb_node */
@@ -365,7 +359,7 @@ static void tfifo_reset(struct Qdisc *sc
struct rb_node *p;

while ((p = rb_first(&q->t_root))) {
- struct sk_buff *skb = netem_rb_to_skb(p);
+ struct sk_buff *skb = rb_to_skb(p);

rb_erase(p, &q->t_root);
rtnl_kfree_skbs(skb, skb);
@@ -382,7 +376,7 @@ static void tfifo_enqueue(struct sk_buff
struct sk_buff *skb;

parent = *p;
- skb = netem_rb_to_skb(parent);
+ skb = rb_to_skb(parent);
if (tnext >= netem_skb_cb(skb)->time_to_send)
p = &parent->rb_right;
else
@@ -538,7 +532,7 @@ static int netem_enqueue(struct sk_buff
struct sk_buff *t_skb;
struct netem_skb_cb *t_last;

- t_skb = netem_rb_to_skb(rb_last(&q->t_root));
+ t_skb = skb_rb_last(&q->t_root);
t_last = netem_skb_cb(t_skb);
if (!last ||
t_last->time_to_send > last->time_to_send) {
@@ -618,7 +612,7 @@ deliver:
if (p) {
psched_time_t time_to_send;

- skb = netem_rb_to_skb(p);
+ skb = rb_to_skb(p);

/* if more time remaining? */
time_to_send = netem_skb_cb(skb)->time_to_send;



2018-09-17 23:24:41

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.14 123/126] tun: fix use after free for ptr_ring

4.14-stable review patch. If anyone has any objections, please let me know.

------------------

From: Jason Wang <[email protected]>

commit b196d88aba8ac72b775137854121097f4c4c6862 upstream.

We used to initialize ptr_ring during TUNSETIFF, this is because its
size depends on the tx_queue_len of netdevice. And we try to clean it
up when socket were detached from netdevice. A race were spotted when
trying to do uninit during a read which will lead a use after free for
pointer ring. Solving this by always initialize a zero size ptr_ring
in open() and do resizing during TUNSETIFF, and then we can safely do
cleanup during close(). With this, there's no need for the workaround
that was introduced by commit 4df0bfc79904 ("tun: fix a memory leak
for tfile->tx_array").

Backport Note :-
Comparison with the upstream patch:
[1] A "semantic revert" of the changes made in
4df0bfc799("tun: fix a memory leak for tfile->tx_array").
4df0bfc799 was applied upstream, and then skb array was changed
to use ptr_ring. The upstream patch then removes the changes introduced
by 4df0bfc799. This backport does the same; "revert" the changes
made by 4df0bfc799.
[2] xdp_rxq_info_unreg() being called in relevant locations
As xdp_rxq_info related patches are not present in 4.14, these
changes are not needed in the backport.
[3] An instance of ptr_ring_init needs to be replaced by skb_array_init
Inside tun_attach()
[4] ptr_ring_cleanup needs to be replaced by skb_array_cleanup
Inside tun_chr_close()

Note that the backport for 7063efd33b ("tuntap: fix use after free during release")
needs to be applied on top of this patch.

Reported-by: [email protected]
Cc: Eric Dumazet <[email protected]>
Cc: Cong Wang <[email protected]>
Cc: Michael S. Tsirkin <[email protected]>
Fixes: 1576d9860599 ("tun: switch to use skb array for tx")
Signed-off-by: Jason Wang <[email protected]>
Acked-by: Michael S. Tsirkin <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Zubin Mithra <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
drivers/net/tun.c | 21 +++++++--------------
1 file changed, 7 insertions(+), 14 deletions(-)

--- a/drivers/net/tun.c
+++ b/drivers/net/tun.c
@@ -534,14 +534,6 @@ static void tun_queue_purge(struct tun_f
skb_queue_purge(&tfile->sk.sk_error_queue);
}

-static void tun_cleanup_tx_array(struct tun_file *tfile)
-{
- if (tfile->tx_array.ring.queue) {
- skb_array_cleanup(&tfile->tx_array);
- memset(&tfile->tx_array, 0, sizeof(tfile->tx_array));
- }
-}
-
static void __tun_detach(struct tun_file *tfile, bool clean)
{
struct tun_file *ntfile;
@@ -583,7 +575,6 @@ static void __tun_detach(struct tun_file
tun->dev->reg_state == NETREG_REGISTERED)
unregister_netdevice(tun->dev);
}
- tun_cleanup_tx_array(tfile);
sock_put(&tfile->sk);
}
}
@@ -623,13 +614,11 @@ static void tun_detach_all(struct net_de
/* Drop read queue */
tun_queue_purge(tfile);
sock_put(&tfile->sk);
- tun_cleanup_tx_array(tfile);
}
list_for_each_entry_safe(tfile, tmp, &tun->disabled, next) {
tun_enable_queue(tfile);
tun_queue_purge(tfile);
sock_put(&tfile->sk);
- tun_cleanup_tx_array(tfile);
}
BUG_ON(tun->numdisabled != 0);

@@ -675,7 +664,7 @@ static int tun_attach(struct tun_struct
}

if (!tfile->detached &&
- skb_array_init(&tfile->tx_array, dev->tx_queue_len, GFP_KERNEL)) {
+ skb_array_resize(&tfile->tx_array, dev->tx_queue_len, GFP_KERNEL)) {
err = -ENOMEM;
goto out;
}
@@ -2624,6 +2613,11 @@ static int tun_chr_open(struct inode *in
&tun_proto, 0);
if (!tfile)
return -ENOMEM;
+ if (skb_array_init(&tfile->tx_array, 0, GFP_KERNEL)) {
+ sk_free(&tfile->sk);
+ return -ENOMEM;
+ }
+
RCU_INIT_POINTER(tfile->tun, NULL);
tfile->flags = 0;
tfile->ifindex = 0;
@@ -2644,8 +2638,6 @@ static int tun_chr_open(struct inode *in

sock_set_flag(&tfile->sk, SOCK_ZEROCOPY);

- memset(&tfile->tx_array, 0, sizeof(tfile->tx_array));
-
return 0;
}

@@ -2654,6 +2646,7 @@ static int tun_chr_close(struct inode *i
struct tun_file *tfile = file->private_data;

tun_detach(tfile, true);
+ skb_array_cleanup(&tfile->tx_array);

return 0;
}



2018-09-17 23:24:45

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.14 121/126] ip: frags: fix crash in ip_do_fragment()

4.14-stable review patch. If anyone has any objections, please let me know.

------------------

From: Taehee Yoo <[email protected]>

commit 5d407b071dc369c26a38398326ee2be53651cfe4 upstream

A kernel crash occurrs when defragmented packet is fragmented
in ip_do_fragment().
In defragment routine, skb_orphan() is called and
skb->ip_defrag_offset is set. but skb->sk and
skb->ip_defrag_offset are same union member. so that
frag->sk is not NULL.
Hence crash occurrs in skb->sk check routine in ip_do_fragment() when
defragmented packet is fragmented.

test commands:
%iptables -t nat -I POSTROUTING -j MASQUERADE
%hping3 192.168.4.2 -s 1000 -p 2000 -d 60000

splat looks like:
[ 261.069429] kernel BUG at net/ipv4/ip_output.c:636!
[ 261.075753] invalid opcode: 0000 [#1] SMP DEBUG_PAGEALLOC KASAN PTI
[ 261.083854] CPU: 1 PID: 1349 Comm: hping3 Not tainted 4.19.0-rc2+ #3
[ 261.100977] RIP: 0010:ip_do_fragment+0x1613/0x2600
[ 261.106945] Code: e8 e2 38 e3 fe 4c 8b 44 24 18 48 8b 74 24 08 e9 92 f6 ff ff 80 3c 02 00 0f 85 da 07 00 00 48 8b b5 d0 00 00 00 e9 25 f6 ff ff <0f> 0b 0f 0b 44 8b 54 24 58 4c 8b 4c 24 18 4c 8b 5c 24 60 4c 8b 6c
[ 261.127015] RSP: 0018:ffff8801031cf2c0 EFLAGS: 00010202
[ 261.134156] RAX: 1ffff1002297537b RBX: ffffed0020639e6e RCX: 0000000000000004
[ 261.142156] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff880114ba9bd8
[ 261.150157] RBP: ffff880114ba8a40 R08: ffffed0022975395 R09: ffffed0022975395
[ 261.158157] R10: 0000000000000001 R11: ffffed0022975394 R12: ffff880114ba9ca4
[ 261.166159] R13: 0000000000000010 R14: ffff880114ba9bc0 R15: dffffc0000000000
[ 261.174169] FS: 00007fbae2199700(0000) GS:ffff88011b400000(0000) knlGS:0000000000000000
[ 261.183012] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 261.189013] CR2: 00005579244fe000 CR3: 0000000119bf4000 CR4: 00000000001006e0
[ 261.198158] Call Trace:
[ 261.199018] ? dst_output+0x180/0x180
[ 261.205011] ? save_trace+0x300/0x300
[ 261.209018] ? ip_copy_metadata+0xb00/0xb00
[ 261.213034] ? sched_clock_local+0xd4/0x140
[ 261.218158] ? kill_l4proto+0x120/0x120 [nf_conntrack]
[ 261.223014] ? rt_cpu_seq_stop+0x10/0x10
[ 261.227014] ? find_held_lock+0x39/0x1c0
[ 261.233008] ip_finish_output+0x51d/0xb50
[ 261.237006] ? ip_fragment.constprop.56+0x220/0x220
[ 261.243011] ? nf_ct_l4proto_register_one+0x5b0/0x5b0 [nf_conntrack]
[ 261.250152] ? rcu_is_watching+0x77/0x120
[ 261.255010] ? nf_nat_ipv4_out+0x1e/0x2b0 [nf_nat_ipv4]
[ 261.261033] ? nf_hook_slow+0xb1/0x160
[ 261.265007] ip_output+0x1c7/0x710
[ 261.269005] ? ip_mc_output+0x13f0/0x13f0
[ 261.273002] ? __local_bh_enable_ip+0xe9/0x1b0
[ 261.278152] ? ip_fragment.constprop.56+0x220/0x220
[ 261.282996] ? nf_hook_slow+0xb1/0x160
[ 261.287007] raw_sendmsg+0x21f9/0x4420
[ 261.291008] ? dst_output+0x180/0x180
[ 261.297003] ? sched_clock_cpu+0x126/0x170
[ 261.301003] ? find_held_lock+0x39/0x1c0
[ 261.306155] ? stop_critical_timings+0x420/0x420
[ 261.311004] ? check_flags.part.36+0x450/0x450
[ 261.315005] ? _raw_spin_unlock_irq+0x29/0x40
[ 261.320995] ? _raw_spin_unlock_irq+0x29/0x40
[ 261.326142] ? cyc2ns_read_end+0x10/0x10
[ 261.330139] ? raw_bind+0x280/0x280
[ 261.334138] ? sched_clock_cpu+0x126/0x170
[ 261.338995] ? check_flags.part.36+0x450/0x450
[ 261.342991] ? __lock_acquire+0x4500/0x4500
[ 261.348994] ? inet_sendmsg+0x11c/0x500
[ 261.352989] ? dst_output+0x180/0x180
[ 261.357012] inet_sendmsg+0x11c/0x500
[ ... ]

v2:
- clear skb->sk at reassembly routine.(Eric Dumarzet)

Fixes: fa0f527358bd ("ip: use rb trees for IP frag queue.")
Suggested-by: Eric Dumazet <[email protected]>
Signed-off-by: Taehee Yoo <[email protected]>
Reviewed-by: Eric Dumazet <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
net/ipv4/ip_fragment.c | 1 +
net/ipv6/netfilter/nf_conntrack_reasm.c | 1 +
2 files changed, 2 insertions(+)

--- a/net/ipv4/ip_fragment.c
+++ b/net/ipv4/ip_fragment.c
@@ -599,6 +599,7 @@ static int ip_frag_reasm(struct ipq *qp,
nextp = &fp->next;
fp->prev = NULL;
memset(&fp->rbnode, 0, sizeof(fp->rbnode));
+ fp->sk = NULL;
head->data_len += fp->len;
head->len += fp->len;
if (head->ip_summed != fp->ip_summed)
--- a/net/ipv6/netfilter/nf_conntrack_reasm.c
+++ b/net/ipv6/netfilter/nf_conntrack_reasm.c
@@ -453,6 +453,7 @@ nf_ct_frag6_reasm(struct frag_queue *fq,
else if (head->ip_summed == CHECKSUM_COMPLETE)
head->csum = csum_add(head->csum, fp->csum);
head->truesize += fp->truesize;
+ fp->sk = NULL;
}
sub_frag_mem_limit(fq->q.net, head->truesize);




2018-09-17 23:25:49

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.14 110/126] inet: frags: fix ip6frag_low_thresh boundary

4.14-stable review patch. If anyone has any objections, please let me know.

------------------

From: Eric Dumazet <[email protected]>

Giving an integer to proc_doulongvec_minmax() is dangerous on 64bit arches,
since linker might place next to it a non zero value preventing a change
to ip6frag_low_thresh.

ip6frag_low_thresh is not used anymore in the kernel, but we do not
want to prematuraly break user scripts wanting to change it.

Since specifying a minimal value of 0 for proc_doulongvec_minmax()
is moot, let's remove these zero values in all defrag units.

Fixes: 6e00f7dd5e4e ("ipv6: frags: fix /proc/sys/net/ipv6/ip6frag_low_thresh")
Signed-off-by: Eric Dumazet <[email protected]>
Reported-by: Maciej Żenczykowski <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
(cherry picked from commit 3d23401283e80ceb03f765842787e0e79ff598b7)
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
net/ieee802154/6lowpan/reassembly.c | 2 --
net/ipv4/ip_fragment.c | 5 ++---
net/ipv6/netfilter/nf_conntrack_reasm.c | 2 --
net/ipv6/reassembly.c | 4 +---
4 files changed, 3 insertions(+), 10 deletions(-)

--- a/net/ieee802154/6lowpan/reassembly.c
+++ b/net/ieee802154/6lowpan/reassembly.c
@@ -411,7 +411,6 @@ err:
}

#ifdef CONFIG_SYSCTL
-static long zero;

static struct ctl_table lowpan_frags_ns_ctl_table[] = {
{
@@ -428,7 +427,6 @@ static struct ctl_table lowpan_frags_ns_
.maxlen = sizeof(unsigned long),
.mode = 0644,
.proc_handler = proc_doulongvec_minmax,
- .extra1 = &zero,
.extra2 = &init_net.ieee802154_lowpan.frags.high_thresh
},
{
--- a/net/ipv4/ip_fragment.c
+++ b/net/ipv4/ip_fragment.c
@@ -672,7 +672,7 @@ struct sk_buff *ip_check_defrag(struct n
EXPORT_SYMBOL(ip_check_defrag);

#ifdef CONFIG_SYSCTL
-static long zero;
+static int dist_min;

static struct ctl_table ip4_frags_ns_ctl_table[] = {
{
@@ -689,7 +689,6 @@ static struct ctl_table ip4_frags_ns_ctl
.maxlen = sizeof(unsigned long),
.mode = 0644,
.proc_handler = proc_doulongvec_minmax,
- .extra1 = &zero,
.extra2 = &init_net.ipv4.frags.high_thresh
},
{
@@ -705,7 +704,7 @@ static struct ctl_table ip4_frags_ns_ctl
.maxlen = sizeof(int),
.mode = 0644,
.proc_handler = proc_dointvec_minmax,
- .extra1 = &zero
+ .extra1 = &dist_min,
},
{ }
};
--- a/net/ipv6/netfilter/nf_conntrack_reasm.c
+++ b/net/ipv6/netfilter/nf_conntrack_reasm.c
@@ -63,7 +63,6 @@ struct nf_ct_frag6_skb_cb
static struct inet_frags nf_frags;

#ifdef CONFIG_SYSCTL
-static long zero;

static struct ctl_table nf_ct_frag6_sysctl_table[] = {
{
@@ -79,7 +78,6 @@ static struct ctl_table nf_ct_frag6_sysc
.maxlen = sizeof(unsigned long),
.mode = 0644,
.proc_handler = proc_doulongvec_minmax,
- .extra1 = &zero,
.extra2 = &init_net.nf_frag.frags.high_thresh
},
{
--- a/net/ipv6/reassembly.c
+++ b/net/ipv6/reassembly.c
@@ -554,7 +554,6 @@ static const struct inet6_protocol frag_
};

#ifdef CONFIG_SYSCTL
-static int zero;

static struct ctl_table ip6_frags_ns_ctl_table[] = {
{
@@ -570,8 +569,7 @@ static struct ctl_table ip6_frags_ns_ctl
.data = &init_net.ipv6.frags.low_thresh,
.maxlen = sizeof(unsigned long),
.mode = 0644,
- .proc_handler = proc_dointvec_minmax,
- .extra1 = &zero,
+ .proc_handler = proc_doulongvec_minmax,
.extra2 = &init_net.ipv6.frags.high_thresh
},
{



2018-09-17 23:26:13

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.14 106/126] ipv6: frags: rewrite ip6_expire_frag_queue()

4.14-stable review patch. If anyone has any objections, please let me know.

------------------

From: Eric Dumazet <[email protected]>

Make it similar to IPv4 ip_expire(), and release the lock
before calling icmp functions.

Signed-off-by: Eric Dumazet <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
(cherry picked from commit 05c0b86b9696802fd0ce5676a92a63f1b455bdf3)
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
net/ipv6/reassembly.c | 24 ++++++++++++++++--------
1 file changed, 16 insertions(+), 8 deletions(-)

--- a/net/ipv6/reassembly.c
+++ b/net/ipv6/reassembly.c
@@ -92,7 +92,9 @@ EXPORT_SYMBOL(ip6_frag_init);
void ip6_expire_frag_queue(struct net *net, struct frag_queue *fq)
{
struct net_device *dev = NULL;
+ struct sk_buff *head;

+ rcu_read_lock();
spin_lock(&fq->q.lock);

if (fq->q.flags & INET_FRAG_COMPLETE)
@@ -100,28 +102,34 @@ void ip6_expire_frag_queue(struct net *n

inet_frag_kill(&fq->q);

- rcu_read_lock();
dev = dev_get_by_index_rcu(net, fq->iif);
if (!dev)
- goto out_rcu_unlock;
+ goto out;

__IP6_INC_STATS(net, __in6_dev_get(dev), IPSTATS_MIB_REASMFAILS);
__IP6_INC_STATS(net, __in6_dev_get(dev), IPSTATS_MIB_REASMTIMEOUT);

/* Don't send error if the first segment did not arrive. */
- if (!(fq->q.flags & INET_FRAG_FIRST_IN) || !fq->q.fragments)
- goto out_rcu_unlock;
+ head = fq->q.fragments;
+ if (!(fq->q.flags & INET_FRAG_FIRST_IN) || !head)
+ goto out;

/* But use as source device on which LAST ARRIVED
* segment was received. And do not use fq->dev
* pointer directly, device might already disappeared.
*/
- fq->q.fragments->dev = dev;
- icmpv6_send(fq->q.fragments, ICMPV6_TIME_EXCEED, ICMPV6_EXC_FRAGTIME, 0);
-out_rcu_unlock:
- rcu_read_unlock();
+ head->dev = dev;
+ skb_get(head);
+ spin_unlock(&fq->q.lock);
+
+ icmpv6_send(head, ICMPV6_TIME_EXCEED, ICMPV6_EXC_FRAGTIME, 0);
+ kfree_skb(head);
+ goto out_rcu_unlock;
+
out:
spin_unlock(&fq->q.lock);
+out_rcu_unlock:
+ rcu_read_unlock();
inet_frag_put(&fq->q);
}
EXPORT_SYMBOL(ip6_expire_frag_queue);



2018-09-17 23:26:23

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.14 103/126] inet: frags: remove inet_frag_maybe_warn_overflow()

4.14-stable review patch. If anyone has any objections, please let me know.

------------------

From: Eric Dumazet <[email protected]>

This function is obsolete, after rhashtable addition to inet defrag.

Signed-off-by: Eric Dumazet <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
(cherry picked from commit 2d44ed22e607f9a285b049de2263e3840673a260)
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
include/net/inet_frag.h | 2 --
net/ieee802154/6lowpan/reassembly.c | 5 ++---
net/ipv4/inet_fragment.c | 11 -----------
net/ipv4/ip_fragment.c | 5 ++---
net/ipv6/netfilter/nf_conntrack_reasm.c | 5 ++---
net/ipv6/reassembly.c | 5 ++---
6 files changed, 8 insertions(+), 25 deletions(-)

--- a/include/net/inet_frag.h
+++ b/include/net/inet_frag.h
@@ -110,8 +110,6 @@ void inet_frags_exit_net(struct netns_fr
void inet_frag_kill(struct inet_frag_queue *q);
void inet_frag_destroy(struct inet_frag_queue *q);
struct inet_frag_queue *inet_frag_find(struct netns_frags *nf, void *key);
-void inet_frag_maybe_warn_overflow(struct inet_frag_queue *q,
- const char *prefix);

static inline void inet_frag_put(struct inet_frag_queue *q)
{
--- a/net/ieee802154/6lowpan/reassembly.c
+++ b/net/ieee802154/6lowpan/reassembly.c
@@ -84,10 +84,9 @@ fq_find(struct net *net, const struct lo
struct inet_frag_queue *q;

q = inet_frag_find(&ieee802154_lowpan->frags, &key);
- if (IS_ERR_OR_NULL(q)) {
- inet_frag_maybe_warn_overflow(q, pr_fmt());
+ if (!q)
return NULL;
- }
+
return container_of(q, struct lowpan_frag_queue, q);
}

--- a/net/ipv4/inet_fragment.c
+++ b/net/ipv4/inet_fragment.c
@@ -218,14 +218,3 @@ struct inet_frag_queue *inet_frag_find(s
return inet_frag_create(nf, key);
}
EXPORT_SYMBOL(inet_frag_find);
-
-void inet_frag_maybe_warn_overflow(struct inet_frag_queue *q,
- const char *prefix)
-{
- static const char msg[] = "inet_frag_find: Fragment hash bucket"
- " list length grew over limit. Dropping fragment.\n";
-
- if (PTR_ERR(q) == -ENOBUFS)
- net_dbg_ratelimited("%s%s", prefix, msg);
-}
-EXPORT_SYMBOL(inet_frag_maybe_warn_overflow);
--- a/net/ipv4/ip_fragment.c
+++ b/net/ipv4/ip_fragment.c
@@ -221,10 +221,9 @@ static struct ipq *ip_find(struct net *n
struct inet_frag_queue *q;

q = inet_frag_find(&net->ipv4.frags, &key);
- if (IS_ERR_OR_NULL(q)) {
- inet_frag_maybe_warn_overflow(q, pr_fmt());
+ if (!q)
return NULL;
- }
+
return container_of(q, struct ipq, q);
}

--- a/net/ipv6/netfilter/nf_conntrack_reasm.c
+++ b/net/ipv6/netfilter/nf_conntrack_reasm.c
@@ -178,10 +178,9 @@ static struct frag_queue *fq_find(struct
struct inet_frag_queue *q;

q = inet_frag_find(&net->nf_frag.frags, &key);
- if (IS_ERR_OR_NULL(q)) {
- inet_frag_maybe_warn_overflow(q, pr_fmt());
+ if (!q)
return NULL;
- }
+
return container_of(q, struct frag_queue, q);
}

--- a/net/ipv6/reassembly.c
+++ b/net/ipv6/reassembly.c
@@ -155,10 +155,9 @@ fq_find(struct net *net, __be32 id, cons
key.iif = 0;

q = inet_frag_find(&net->ipv6.frags, &key);
- if (IS_ERR_OR_NULL(q)) {
- inet_frag_maybe_warn_overflow(q, pr_fmt());
+ if (!q)
return NULL;
- }
+
return container_of(q, struct frag_queue, q);
}




2018-09-17 23:26:34

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.14 100/126] inet: frags: use rhashtables for reassembly units

4.14-stable review patch. If anyone has any objections, please let me know.

------------------

From: Eric Dumazet <[email protected]>

Some applications still rely on IP fragmentation, and to be fair linux
reassembly unit is not working under any serious load.

It uses static hash tables of 1024 buckets, and up to 128 items per bucket (!!!)

A work queue is supposed to garbage collect items when host is under memory
pressure, and doing a hash rebuild, changing seed used in hash computations.

This work queue blocks softirqs for up to 25 ms when doing a hash rebuild,
occurring every 5 seconds if host is under fire.

Then there is the problem of sharing this hash table for all netns.

It is time to switch to rhashtables, and allocate one of them per netns
to speedup netns dismantle, since this is a critical metric these days.

Lookup is now using RCU. A followup patch will even remove
the refcount hold/release left from prior implementation and save
a couple of atomic operations.

Before this patch, 16 cpus (16 RX queue NIC) could not handle more
than 1 Mpps frags DDOS.

After the patch, I reach 9 Mpps without any tuning, and can use up to 2GB
of storage for the fragments (exact number depends on frags being evicted
after timeout)

$ grep FRAG /proc/net/sockstat
FRAG: inuse 1966916 memory 2140004608

A followup patch will change the limits for 64bit arches.

Signed-off-by: Eric Dumazet <[email protected]>
Cc: Kirill Tkhai <[email protected]>
Cc: Herbert Xu <[email protected]>
Cc: Florian Westphal <[email protected]>
Cc: Jesper Dangaard Brouer <[email protected]>
Cc: Alexander Aring <[email protected]>
Cc: Stefan Schmidt <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
(cherry picked from commit 648700f76b03b7e8149d13cc2bdb3355035258a9)
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
Documentation/networking/ip-sysctl.txt | 7
include/net/inet_frag.h | 81 +++----
include/net/ipv6.h | 16 -
net/ieee802154/6lowpan/6lowpan_i.h | 26 --
net/ieee802154/6lowpan/reassembly.c | 91 +++-----
net/ipv4/inet_fragment.c | 346 ++++++--------------------------
net/ipv4/ip_fragment.c | 112 ++++------
net/ipv6/netfilter/nf_conntrack_reasm.c | 51 +---
net/ipv6/reassembly.c | 110 ++++------
9 files changed, 266 insertions(+), 574 deletions(-)

--- a/Documentation/networking/ip-sysctl.txt
+++ b/Documentation/networking/ip-sysctl.txt
@@ -134,13 +134,10 @@ min_adv_mss - INTEGER
IP Fragmentation:

ipfrag_high_thresh - INTEGER
- Maximum memory used to reassemble IP fragments. When
- ipfrag_high_thresh bytes of memory is allocated for this purpose,
- the fragment handler will toss packets until ipfrag_low_thresh
- is reached. This also serves as a maximum limit to namespaces
- different from the initial one.
+ Maximum memory used to reassemble IP fragments.

ipfrag_low_thresh - INTEGER
+ (Obsolete since linux-4.17)
Maximum memory used to reassemble IP fragments before the kernel
begins to remove incomplete fragment queues to free up resources.
The kernel still accepts new fragments for defragmentation.
--- a/include/net/inet_frag.h
+++ b/include/net/inet_frag.h
@@ -2,7 +2,11 @@
#ifndef __NET_FRAG_H__
#define __NET_FRAG_H__

+#include <linux/rhashtable.h>
+
struct netns_frags {
+ struct rhashtable rhashtable ____cacheline_aligned_in_smp;
+
/* Keep atomic mem on separate cachelines in structs that include it */
atomic_t mem ____cacheline_aligned_in_smp;
/* sysctls */
@@ -26,12 +30,30 @@ enum {
INET_FRAG_COMPLETE = BIT(2),
};

+struct frag_v4_compare_key {
+ __be32 saddr;
+ __be32 daddr;
+ u32 user;
+ u32 vif;
+ __be16 id;
+ u16 protocol;
+};
+
+struct frag_v6_compare_key {
+ struct in6_addr saddr;
+ struct in6_addr daddr;
+ u32 user;
+ __be32 id;
+ u32 iif;
+};
+
/**
* struct inet_frag_queue - fragment queue
*
- * @lock: spinlock protecting the queue
+ * @node: rhash node
+ * @key: keys identifying this frag.
* @timer: queue expiration timer
- * @list: hash bucket list
+ * @lock: spinlock protecting this frag
* @refcnt: reference count of the queue
* @fragments: received fragments head
* @fragments_tail: received fragments tail
@@ -41,12 +63,16 @@ enum {
* @flags: fragment queue flags
* @max_size: maximum received fragment size
* @net: namespace that this frag belongs to
- * @list_evictor: list of queues to forcefully evict (e.g. due to low memory)
+ * @rcu: rcu head for freeing deferall
*/
struct inet_frag_queue {
- spinlock_t lock;
+ struct rhash_head node;
+ union {
+ struct frag_v4_compare_key v4;
+ struct frag_v6_compare_key v6;
+ } key;
struct timer_list timer;
- struct hlist_node list;
+ spinlock_t lock;
refcount_t refcnt;
struct sk_buff *fragments;
struct sk_buff *fragments_tail;
@@ -55,51 +81,20 @@ struct inet_frag_queue {
int meat;
__u8 flags;
u16 max_size;
- struct netns_frags *net;
- struct hlist_node list_evictor;
-};
-
-#define INETFRAGS_HASHSZ 1024
-
-/* averaged:
- * max_depth = default ipfrag_high_thresh / INETFRAGS_HASHSZ /
- * rounded up (SKB_TRUELEN(0) + sizeof(struct ipq or
- * struct frag_queue))
- */
-#define INETFRAGS_MAXDEPTH 128
-
-struct inet_frag_bucket {
- struct hlist_head chain;
- spinlock_t chain_lock;
+ struct netns_frags *net;
+ struct rcu_head rcu;
};

struct inet_frags {
- struct inet_frag_bucket hash[INETFRAGS_HASHSZ];
-
- struct work_struct frags_work;
- unsigned int next_bucket;
- unsigned long last_rebuild_jiffies;
- bool rebuild;
-
- /* The first call to hashfn is responsible to initialize
- * rnd. This is best done with net_get_random_once.
- *
- * rnd_seqlock is used to let hash insertion detect
- * when it needs to re-lookup the hash chain to use.
- */
- u32 rnd;
- seqlock_t rnd_seqlock;
unsigned int qsize;

- unsigned int (*hashfn)(const struct inet_frag_queue *);
- bool (*match)(const struct inet_frag_queue *q,
- const void *arg);
void (*constructor)(struct inet_frag_queue *q,
const void *arg);
void (*destructor)(struct inet_frag_queue *);
void (*frag_expire)(struct timer_list *t);
struct kmem_cache *frags_cachep;
const char *frags_cache_name;
+ struct rhashtable_params rhash_params;
};

int inet_frags_init(struct inet_frags *);
@@ -108,15 +103,13 @@ void inet_frags_fini(struct inet_frags *
static inline int inet_frags_init_net(struct netns_frags *nf)
{
atomic_set(&nf->mem, 0);
- return 0;
+ return rhashtable_init(&nf->rhashtable, &nf->f->rhash_params);
}
void inet_frags_exit_net(struct netns_frags *nf);

void inet_frag_kill(struct inet_frag_queue *q);
void inet_frag_destroy(struct inet_frag_queue *q);
-struct inet_frag_queue *inet_frag_find(struct netns_frags *nf,
- struct inet_frags *f, void *key, unsigned int hash);
-
+struct inet_frag_queue *inet_frag_find(struct netns_frags *nf, void *key);
void inet_frag_maybe_warn_overflow(struct inet_frag_queue *q,
const char *prefix);

@@ -128,7 +121,7 @@ static inline void inet_frag_put(struct

static inline bool inet_frag_evicting(struct inet_frag_queue *q)
{
- return !hlist_unhashed(&q->list_evictor);
+ return false;
}

/* Memory Tracking Functions. */
--- a/include/net/ipv6.h
+++ b/include/net/ipv6.h
@@ -531,17 +531,8 @@ enum ip6_defrag_users {
__IP6_DEFRAG_CONNTRACK_BRIDGE_IN = IP6_DEFRAG_CONNTRACK_BRIDGE_IN + USHRT_MAX,
};

-struct ip6_create_arg {
- __be32 id;
- u32 user;
- const struct in6_addr *src;
- const struct in6_addr *dst;
- int iif;
- u8 ecn;
-};
-
void ip6_frag_init(struct inet_frag_queue *q, const void *a);
-bool ip6_frag_match(const struct inet_frag_queue *q, const void *a);
+extern const struct rhashtable_params ip6_rhash_params;

/*
* Equivalent of ipv4 struct ip
@@ -549,11 +540,6 @@ bool ip6_frag_match(const struct inet_fr
struct frag_queue {
struct inet_frag_queue q;

- __be32 id; /* fragment id */
- u32 user;
- struct in6_addr saddr;
- struct in6_addr daddr;
-
int iif;
unsigned int csum;
__u16 nhoffset;
--- a/net/ieee802154/6lowpan/6lowpan_i.h
+++ b/net/ieee802154/6lowpan/6lowpan_i.h
@@ -17,37 +17,19 @@ typedef unsigned __bitwise lowpan_rx_res
#define LOWPAN_DISPATCH_FRAG1 0xc0
#define LOWPAN_DISPATCH_FRAGN 0xe0

-struct lowpan_create_arg {
+struct frag_lowpan_compare_key {
u16 tag;
u16 d_size;
- const struct ieee802154_addr *src;
- const struct ieee802154_addr *dst;
+ const struct ieee802154_addr src;
+ const struct ieee802154_addr dst;
};

-/* Equivalent of ipv4 struct ip
+/* Equivalent of ipv4 struct ipq
*/
struct lowpan_frag_queue {
struct inet_frag_queue q;
-
- u16 tag;
- u16 d_size;
- struct ieee802154_addr saddr;
- struct ieee802154_addr daddr;
};

-static inline u32 ieee802154_addr_hash(const struct ieee802154_addr *a)
-{
- switch (a->mode) {
- case IEEE802154_ADDR_LONG:
- return (((__force u64)a->extended_addr) >> 32) ^
- (((__force u64)a->extended_addr) & 0xffffffff);
- case IEEE802154_ADDR_SHORT:
- return (__force u32)(a->short_addr + (a->pan_id << 16));
- default:
- return 0;
- }
-}
-
int lowpan_frag_rcv(struct sk_buff *skb, const u8 frag_type);
void lowpan_net_frag_exit(void);
int lowpan_net_frag_init(void);
--- a/net/ieee802154/6lowpan/reassembly.c
+++ b/net/ieee802154/6lowpan/reassembly.c
@@ -37,47 +37,15 @@ static struct inet_frags lowpan_frags;
static int lowpan_frag_reasm(struct lowpan_frag_queue *fq,
struct sk_buff *prev, struct net_device *ldev);

-static unsigned int lowpan_hash_frag(u16 tag, u16 d_size,
- const struct ieee802154_addr *saddr,
- const struct ieee802154_addr *daddr)
-{
- net_get_random_once(&lowpan_frags.rnd, sizeof(lowpan_frags.rnd));
- return jhash_3words(ieee802154_addr_hash(saddr),
- ieee802154_addr_hash(daddr),
- (__force u32)(tag + (d_size << 16)),
- lowpan_frags.rnd);
-}
-
-static unsigned int lowpan_hashfn(const struct inet_frag_queue *q)
-{
- const struct lowpan_frag_queue *fq;
-
- fq = container_of(q, struct lowpan_frag_queue, q);
- return lowpan_hash_frag(fq->tag, fq->d_size, &fq->saddr, &fq->daddr);
-}
-
-static bool lowpan_frag_match(const struct inet_frag_queue *q, const void *a)
-{
- const struct lowpan_frag_queue *fq;
- const struct lowpan_create_arg *arg = a;
-
- fq = container_of(q, struct lowpan_frag_queue, q);
- return fq->tag == arg->tag && fq->d_size == arg->d_size &&
- ieee802154_addr_equal(&fq->saddr, arg->src) &&
- ieee802154_addr_equal(&fq->daddr, arg->dst);
-}
-
static void lowpan_frag_init(struct inet_frag_queue *q, const void *a)
{
- const struct lowpan_create_arg *arg = a;
+ const struct frag_lowpan_compare_key *key = a;
struct lowpan_frag_queue *fq;

fq = container_of(q, struct lowpan_frag_queue, q);

- fq->tag = arg->tag;
- fq->d_size = arg->d_size;
- fq->saddr = *arg->src;
- fq->daddr = *arg->dst;
+ BUILD_BUG_ON(sizeof(*key) > sizeof(q->key));
+ memcpy(&q->key, key, sizeof(*key));
}

static void lowpan_frag_expire(struct timer_list *t)
@@ -105,21 +73,17 @@ fq_find(struct net *net, const struct lo
const struct ieee802154_addr *src,
const struct ieee802154_addr *dst)
{
- struct inet_frag_queue *q;
- struct lowpan_create_arg arg;
- unsigned int hash;
struct netns_ieee802154_lowpan *ieee802154_lowpan =
net_ieee802154_lowpan(net);
+ struct frag_lowpan_compare_key key = {
+ .tag = cb->d_tag,
+ .d_size = cb->d_size,
+ .src = *src,
+ .dst = *dst,
+ };
+ struct inet_frag_queue *q;

- arg.tag = cb->d_tag;
- arg.d_size = cb->d_size;
- arg.src = src;
- arg.dst = dst;
-
- hash = lowpan_hash_frag(cb->d_tag, cb->d_size, src, dst);
-
- q = inet_frag_find(&ieee802154_lowpan->frags,
- &lowpan_frags, &arg, hash);
+ q = inet_frag_find(&ieee802154_lowpan->frags, &key);
if (IS_ERR_OR_NULL(q)) {
inet_frag_maybe_warn_overflow(q, pr_fmt());
return NULL;
@@ -611,17 +575,46 @@ static struct pernet_operations lowpan_f
.exit = lowpan_frags_exit_net,
};

+static u32 lowpan_key_hashfn(const void *data, u32 len, u32 seed)
+{
+ return jhash2(data,
+ sizeof(struct frag_lowpan_compare_key) / sizeof(u32), seed);
+}
+
+static u32 lowpan_obj_hashfn(const void *data, u32 len, u32 seed)
+{
+ const struct inet_frag_queue *fq = data;
+
+ return jhash2((const u32 *)&fq->key,
+ sizeof(struct frag_lowpan_compare_key) / sizeof(u32), seed);
+}
+
+static int lowpan_obj_cmpfn(struct rhashtable_compare_arg *arg, const void *ptr)
+{
+ const struct frag_lowpan_compare_key *key = arg->key;
+ const struct inet_frag_queue *fq = ptr;
+
+ return !!memcmp(&fq->key, key, sizeof(*key));
+}
+
+static const struct rhashtable_params lowpan_rhash_params = {
+ .head_offset = offsetof(struct inet_frag_queue, node),
+ .hashfn = lowpan_key_hashfn,
+ .obj_hashfn = lowpan_obj_hashfn,
+ .obj_cmpfn = lowpan_obj_cmpfn,
+ .automatic_shrinking = true,
+};
+
int __init lowpan_net_frag_init(void)
{
int ret;

- lowpan_frags.hashfn = lowpan_hashfn;
lowpan_frags.constructor = lowpan_frag_init;
lowpan_frags.destructor = NULL;
lowpan_frags.qsize = sizeof(struct frag_queue);
- lowpan_frags.match = lowpan_frag_match;
lowpan_frags.frag_expire = lowpan_frag_expire;
lowpan_frags.frags_cache_name = lowpan_frags_cache_name;
+ lowpan_frags.rhash_params = lowpan_rhash_params;
ret = inet_frags_init(&lowpan_frags);
if (ret)
goto out;
--- a/net/ipv4/inet_fragment.c
+++ b/net/ipv4/inet_fragment.c
@@ -25,12 +25,6 @@
#include <net/inet_frag.h>
#include <net/inet_ecn.h>

-#define INETFRAGS_EVICT_BUCKETS 128
-#define INETFRAGS_EVICT_MAX 512
-
-/* don't rebuild inetfrag table with new secret more often than this */
-#define INETFRAGS_MIN_REBUILD_INTERVAL (5 * HZ)
-
/* Given the OR values of all fragments, apply RFC 3168 5.3 requirements
* Value : 0xff if frame should be dropped.
* 0 or INET_ECN_CE value, to be ORed in to final iph->tos field
@@ -52,157 +46,8 @@ const u8 ip_frag_ecn_table[16] = {
};
EXPORT_SYMBOL(ip_frag_ecn_table);

-static unsigned int
-inet_frag_hashfn(const struct inet_frags *f, const struct inet_frag_queue *q)
-{
- return f->hashfn(q) & (INETFRAGS_HASHSZ - 1);
-}
-
-static bool inet_frag_may_rebuild(struct inet_frags *f)
-{
- return time_after(jiffies,
- f->last_rebuild_jiffies + INETFRAGS_MIN_REBUILD_INTERVAL);
-}
-
-static void inet_frag_secret_rebuild(struct inet_frags *f)
-{
- int i;
-
- write_seqlock_bh(&f->rnd_seqlock);
-
- if (!inet_frag_may_rebuild(f))
- goto out;
-
- get_random_bytes(&f->rnd, sizeof(u32));
-
- for (i = 0; i < INETFRAGS_HASHSZ; i++) {
- struct inet_frag_bucket *hb;
- struct inet_frag_queue *q;
- struct hlist_node *n;
-
- hb = &f->hash[i];
- spin_lock(&hb->chain_lock);
-
- hlist_for_each_entry_safe(q, n, &hb->chain, list) {
- unsigned int hval = inet_frag_hashfn(f, q);
-
- if (hval != i) {
- struct inet_frag_bucket *hb_dest;
-
- hlist_del(&q->list);
-
- /* Relink to new hash chain. */
- hb_dest = &f->hash[hval];
-
- /* This is the only place where we take
- * another chain_lock while already holding
- * one. As this will not run concurrently,
- * we cannot deadlock on hb_dest lock below, if its
- * already locked it will be released soon since
- * other caller cannot be waiting for hb lock
- * that we've taken above.
- */
- spin_lock_nested(&hb_dest->chain_lock,
- SINGLE_DEPTH_NESTING);
- hlist_add_head(&q->list, &hb_dest->chain);
- spin_unlock(&hb_dest->chain_lock);
- }
- }
- spin_unlock(&hb->chain_lock);
- }
-
- f->rebuild = false;
- f->last_rebuild_jiffies = jiffies;
-out:
- write_sequnlock_bh(&f->rnd_seqlock);
-}
-
-static bool inet_fragq_should_evict(const struct inet_frag_queue *q)
-{
- if (!hlist_unhashed(&q->list_evictor))
- return false;
-
- return q->net->low_thresh == 0 ||
- frag_mem_limit(q->net) >= q->net->low_thresh;
-}
-
-static unsigned int
-inet_evict_bucket(struct inet_frags *f, struct inet_frag_bucket *hb)
-{
- struct inet_frag_queue *fq;
- struct hlist_node *n;
- unsigned int evicted = 0;
- HLIST_HEAD(expired);
-
- spin_lock(&hb->chain_lock);
-
- hlist_for_each_entry_safe(fq, n, &hb->chain, list) {
- if (!inet_fragq_should_evict(fq))
- continue;
-
- if (!del_timer(&fq->timer))
- continue;
-
- hlist_add_head(&fq->list_evictor, &expired);
- ++evicted;
- }
-
- spin_unlock(&hb->chain_lock);
-
- hlist_for_each_entry_safe(fq, n, &expired, list_evictor)
- f->frag_expire(&fq->timer);
-
- return evicted;
-}
-
-static void inet_frag_worker(struct work_struct *work)
-{
- unsigned int budget = INETFRAGS_EVICT_BUCKETS;
- unsigned int i, evicted = 0;
- struct inet_frags *f;
-
- f = container_of(work, struct inet_frags, frags_work);
-
- BUILD_BUG_ON(INETFRAGS_EVICT_BUCKETS >= INETFRAGS_HASHSZ);
-
- local_bh_disable();
-
- for (i = ACCESS_ONCE(f->next_bucket); budget; --budget) {
- evicted += inet_evict_bucket(f, &f->hash[i]);
- i = (i + 1) & (INETFRAGS_HASHSZ - 1);
- if (evicted > INETFRAGS_EVICT_MAX)
- break;
- }
-
- f->next_bucket = i;
-
- local_bh_enable();
-
- if (f->rebuild && inet_frag_may_rebuild(f))
- inet_frag_secret_rebuild(f);
-}
-
-static void inet_frag_schedule_worker(struct inet_frags *f)
-{
- if (unlikely(!work_pending(&f->frags_work)))
- schedule_work(&f->frags_work);
-}
-
int inet_frags_init(struct inet_frags *f)
{
- int i;
-
- INIT_WORK(&f->frags_work, inet_frag_worker);
-
- for (i = 0; i < INETFRAGS_HASHSZ; i++) {
- struct inet_frag_bucket *hb = &f->hash[i];
-
- spin_lock_init(&hb->chain_lock);
- INIT_HLIST_HEAD(&hb->chain);
- }
-
- seqlock_init(&f->rnd_seqlock);
- f->last_rebuild_jiffies = 0;
f->frags_cachep = kmem_cache_create(f->frags_cache_name, f->qsize, 0, 0,
NULL);
if (!f->frags_cachep)
@@ -214,66 +59,42 @@ EXPORT_SYMBOL(inet_frags_init);

void inet_frags_fini(struct inet_frags *f)
{
- cancel_work_sync(&f->frags_work);
+ /* We must wait that all inet_frag_destroy_rcu() have completed. */
+ rcu_barrier();
+
kmem_cache_destroy(f->frags_cachep);
+ f->frags_cachep = NULL;
}
EXPORT_SYMBOL(inet_frags_fini);

-void inet_frags_exit_net(struct netns_frags *nf)
+static void inet_frags_free_cb(void *ptr, void *arg)
{
- struct inet_frags *f =nf->f;
- unsigned int seq;
- int i;
-
- nf->low_thresh = 0;
-
-evict_again:
- local_bh_disable();
- seq = read_seqbegin(&f->rnd_seqlock);
-
- for (i = 0; i < INETFRAGS_HASHSZ ; i++)
- inet_evict_bucket(f, &f->hash[i]);
+ struct inet_frag_queue *fq = ptr;

- local_bh_enable();
- cond_resched();
-
- if (read_seqretry(&f->rnd_seqlock, seq) ||
- sum_frag_mem_limit(nf))
- goto evict_again;
-}
-EXPORT_SYMBOL(inet_frags_exit_net);
+ /* If we can not cancel the timer, it means this frag_queue
+ * is already disappearing, we have nothing to do.
+ * Otherwise, we own a refcount until the end of this function.
+ */
+ if (!del_timer(&fq->timer))
+ return;

-static struct inet_frag_bucket *
-get_frag_bucket_locked(struct inet_frag_queue *fq, struct inet_frags *f)
-__acquires(hb->chain_lock)
-{
- struct inet_frag_bucket *hb;
- unsigned int seq, hash;
-
- restart:
- seq = read_seqbegin(&f->rnd_seqlock);
-
- hash = inet_frag_hashfn(f, fq);
- hb = &f->hash[hash];
-
- spin_lock(&hb->chain_lock);
- if (read_seqretry(&f->rnd_seqlock, seq)) {
- spin_unlock(&hb->chain_lock);
- goto restart;
+ spin_lock_bh(&fq->lock);
+ if (!(fq->flags & INET_FRAG_COMPLETE)) {
+ fq->flags |= INET_FRAG_COMPLETE;
+ refcount_dec(&fq->refcnt);
}
+ spin_unlock_bh(&fq->lock);

- return hb;
+ inet_frag_put(fq);
}

-static inline void fq_unlink(struct inet_frag_queue *fq)
+void inet_frags_exit_net(struct netns_frags *nf)
{
- struct inet_frag_bucket *hb;
+ nf->low_thresh = 0; /* prevent creation of new frags */

- hb = get_frag_bucket_locked(fq, fq->net->f);
- hlist_del(&fq->list);
- fq->flags |= INET_FRAG_COMPLETE;
- spin_unlock(&hb->chain_lock);
+ rhashtable_free_and_destroy(&nf->rhashtable, inet_frags_free_cb, NULL);
}
+EXPORT_SYMBOL(inet_frags_exit_net);

void inet_frag_kill(struct inet_frag_queue *fq)
{
@@ -281,12 +102,26 @@ void inet_frag_kill(struct inet_frag_que
refcount_dec(&fq->refcnt);

if (!(fq->flags & INET_FRAG_COMPLETE)) {
- fq_unlink(fq);
+ struct netns_frags *nf = fq->net;
+
+ fq->flags |= INET_FRAG_COMPLETE;
+ rhashtable_remove_fast(&nf->rhashtable, &fq->node, nf->f->rhash_params);
refcount_dec(&fq->refcnt);
}
}
EXPORT_SYMBOL(inet_frag_kill);

+static void inet_frag_destroy_rcu(struct rcu_head *head)
+{
+ struct inet_frag_queue *q = container_of(head, struct inet_frag_queue,
+ rcu);
+ struct inet_frags *f = q->net->f;
+
+ if (f->destructor)
+ f->destructor(q);
+ kmem_cache_free(f->frags_cachep, q);
+}
+
void inet_frag_destroy(struct inet_frag_queue *q)
{
struct sk_buff *fp;
@@ -310,55 +145,21 @@ void inet_frag_destroy(struct inet_frag_
}
sum = sum_truesize + f->qsize;

- if (f->destructor)
- f->destructor(q);
- kmem_cache_free(f->frags_cachep, q);
+ call_rcu(&q->rcu, inet_frag_destroy_rcu);

sub_frag_mem_limit(nf, sum);
}
EXPORT_SYMBOL(inet_frag_destroy);

-static struct inet_frag_queue *inet_frag_intern(struct netns_frags *nf,
- struct inet_frag_queue *qp_in,
- struct inet_frags *f,
- void *arg)
-{
- struct inet_frag_bucket *hb = get_frag_bucket_locked(qp_in, f);
- struct inet_frag_queue *qp;
-
-#ifdef CONFIG_SMP
- /* With SMP race we have to recheck hash table, because
- * such entry could have been created on other cpu before
- * we acquired hash bucket lock.
- */
- hlist_for_each_entry(qp, &hb->chain, list) {
- if (qp->net == nf && f->match(qp, arg)) {
- refcount_inc(&qp->refcnt);
- spin_unlock(&hb->chain_lock);
- qp_in->flags |= INET_FRAG_COMPLETE;
- inet_frag_put(qp_in);
- return qp;
- }
- }
-#endif
- qp = qp_in;
- if (!mod_timer(&qp->timer, jiffies + nf->timeout))
- refcount_inc(&qp->refcnt);
-
- refcount_inc(&qp->refcnt);
- hlist_add_head(&qp->list, &hb->chain);
-
- spin_unlock(&hb->chain_lock);
-
- return qp;
-}
-
static struct inet_frag_queue *inet_frag_alloc(struct netns_frags *nf,
struct inet_frags *f,
void *arg)
{
struct inet_frag_queue *q;

+ if (!nf->high_thresh || frag_mem_limit(nf) > nf->high_thresh)
+ return NULL;
+
q = kmem_cache_zalloc(f->frags_cachep, GFP_ATOMIC);
if (!q)
return NULL;
@@ -369,64 +170,52 @@ static struct inet_frag_queue *inet_frag

timer_setup(&q->timer, f->frag_expire, 0);
spin_lock_init(&q->lock);
- refcount_set(&q->refcnt, 1);
+ refcount_set(&q->refcnt, 3);

return q;
}

static struct inet_frag_queue *inet_frag_create(struct netns_frags *nf,
- struct inet_frags *f,
void *arg)
{
+ struct inet_frags *f = nf->f;
struct inet_frag_queue *q;
+ int err;

q = inet_frag_alloc(nf, f, arg);
if (!q)
return NULL;

- return inet_frag_intern(nf, q, f, arg);
-}
-
-struct inet_frag_queue *inet_frag_find(struct netns_frags *nf,
- struct inet_frags *f, void *key,
- unsigned int hash)
-{
- struct inet_frag_bucket *hb;
- struct inet_frag_queue *q;
- int depth = 0;
+ mod_timer(&q->timer, jiffies + nf->timeout);

- if (!nf->high_thresh || frag_mem_limit(nf) > nf->high_thresh) {
- inet_frag_schedule_worker(f);
+ err = rhashtable_insert_fast(&nf->rhashtable, &q->node,
+ f->rhash_params);
+ if (err < 0) {
+ q->flags |= INET_FRAG_COMPLETE;
+ inet_frag_kill(q);
+ inet_frag_destroy(q);
return NULL;
}
+ return q;
+}

- if (frag_mem_limit(nf) > nf->low_thresh)
- inet_frag_schedule_worker(f);
-
- hash &= (INETFRAGS_HASHSZ - 1);
- hb = &f->hash[hash];
-
- spin_lock(&hb->chain_lock);
- hlist_for_each_entry(q, &hb->chain, list) {
- if (q->net == nf && f->match(q, key)) {
- refcount_inc(&q->refcnt);
- spin_unlock(&hb->chain_lock);
- return q;
- }
- depth++;
- }
- spin_unlock(&hb->chain_lock);
+/* TODO : call from rcu_read_lock() and no longer use refcount_inc_not_zero() */
+struct inet_frag_queue *inet_frag_find(struct netns_frags *nf, void *key)
+{
+ struct inet_frag_queue *fq;

- if (depth <= INETFRAGS_MAXDEPTH)
- return inet_frag_create(nf, f, key);
+ rcu_read_lock();

- if (inet_frag_may_rebuild(f)) {
- if (!f->rebuild)
- f->rebuild = true;
- inet_frag_schedule_worker(f);
+ fq = rhashtable_lookup(&nf->rhashtable, key, nf->f->rhash_params);
+ if (fq) {
+ if (!refcount_inc_not_zero(&fq->refcnt))
+ fq = NULL;
+ rcu_read_unlock();
+ return fq;
}
+ rcu_read_unlock();

- return ERR_PTR(-ENOBUFS);
+ return inet_frag_create(nf, key);
}
EXPORT_SYMBOL(inet_frag_find);

@@ -434,8 +223,7 @@ void inet_frag_maybe_warn_overflow(struc
const char *prefix)
{
static const char msg[] = "inet_frag_find: Fragment hash bucket"
- " list length grew over limit " __stringify(INETFRAGS_MAXDEPTH)
- ". Dropping fragment.\n";
+ " list length grew over limit. Dropping fragment.\n";

if (PTR_ERR(q) == -ENOBUFS)
net_dbg_ratelimited("%s%s", prefix, msg);
--- a/net/ipv4/ip_fragment.c
+++ b/net/ipv4/ip_fragment.c
@@ -69,15 +69,9 @@ struct ipfrag_skb_cb
struct ipq {
struct inet_frag_queue q;

- u32 user;
- __be32 saddr;
- __be32 daddr;
- __be16 id;
- u8 protocol;
u8 ecn; /* RFC3168 support */
u16 max_df_size; /* largest frag with DF set seen */
int iif;
- int vif; /* L3 master device index */
unsigned int rid;
struct inet_peer *peer;
};
@@ -97,41 +91,6 @@ int ip_frag_mem(struct net *net)
static int ip_frag_reasm(struct ipq *qp, struct sk_buff *prev,
struct net_device *dev);

-struct ip4_create_arg {
- struct iphdr *iph;
- u32 user;
- int vif;
-};
-
-static unsigned int ipqhashfn(__be16 id, __be32 saddr, __be32 daddr, u8 prot)
-{
- net_get_random_once(&ip4_frags.rnd, sizeof(ip4_frags.rnd));
- return jhash_3words((__force u32)id << 16 | prot,
- (__force u32)saddr, (__force u32)daddr,
- ip4_frags.rnd);
-}
-
-static unsigned int ip4_hashfn(const struct inet_frag_queue *q)
-{
- const struct ipq *ipq;
-
- ipq = container_of(q, struct ipq, q);
- return ipqhashfn(ipq->id, ipq->saddr, ipq->daddr, ipq->protocol);
-}
-
-static bool ip4_frag_match(const struct inet_frag_queue *q, const void *a)
-{
- const struct ipq *qp;
- const struct ip4_create_arg *arg = a;
-
- qp = container_of(q, struct ipq, q);
- return qp->id == arg->iph->id &&
- qp->saddr == arg->iph->saddr &&
- qp->daddr == arg->iph->daddr &&
- qp->protocol == arg->iph->protocol &&
- qp->user == arg->user &&
- qp->vif == arg->vif;
-}

static void ip4_frag_init(struct inet_frag_queue *q, const void *a)
{
@@ -140,17 +99,12 @@ static void ip4_frag_init(struct inet_fr
frags);
struct net *net = container_of(ipv4, struct net, ipv4);

- const struct ip4_create_arg *arg = a;
+ const struct frag_v4_compare_key *key = a;

- qp->protocol = arg->iph->protocol;
- qp->id = arg->iph->id;
- qp->ecn = ip4_frag_ecn(arg->iph->tos);
- qp->saddr = arg->iph->saddr;
- qp->daddr = arg->iph->daddr;
- qp->vif = arg->vif;
- qp->user = arg->user;
+ q->key.v4 = *key;
+ qp->ecn = 0;
qp->peer = q->net->max_dist ?
- inet_getpeer_v4(net->ipv4.peers, arg->iph->saddr, arg->vif, 1) :
+ inet_getpeer_v4(net->ipv4.peers, key->saddr, key->vif, 1) :
NULL;
}

@@ -234,7 +188,7 @@ static void ip_expire(struct timer_list
/* Only an end host needs to send an ICMP
* "Fragment Reassembly Timeout" message, per RFC792.
*/
- if (frag_expire_skip_icmp(qp->user) &&
+ if (frag_expire_skip_icmp(qp->q.key.v4.user) &&
(skb_rtable(head)->rt_type != RTN_LOCAL))
goto out;

@@ -262,17 +216,17 @@ out_rcu_unlock:
static struct ipq *ip_find(struct net *net, struct iphdr *iph,
u32 user, int vif)
{
+ struct frag_v4_compare_key key = {
+ .saddr = iph->saddr,
+ .daddr = iph->daddr,
+ .user = user,
+ .vif = vif,
+ .id = iph->id,
+ .protocol = iph->protocol,
+ };
struct inet_frag_queue *q;
- struct ip4_create_arg arg;
- unsigned int hash;
-
- arg.iph = iph;
- arg.user = user;
- arg.vif = vif;

- hash = ipqhashfn(iph->id, iph->saddr, iph->daddr, iph->protocol);
-
- q = inet_frag_find(&net->ipv4.frags, &ip4_frags, &arg, hash);
+ q = inet_frag_find(&net->ipv4.frags, &key);
if (IS_ERR_OR_NULL(q)) {
inet_frag_maybe_warn_overflow(q, pr_fmt());
return NULL;
@@ -661,7 +615,7 @@ out_nomem:
err = -ENOMEM;
goto out_fail;
out_oversize:
- net_info_ratelimited("Oversized IP packet from %pI4\n", &qp->saddr);
+ net_info_ratelimited("Oversized IP packet from %pI4\n", &qp->q.key.v4.saddr);
out_fail:
__IP_INC_STATS(net, IPSTATS_MIB_REASMFAILS);
return err;
@@ -899,15 +853,47 @@ static struct pernet_operations ip4_frag
.exit = ipv4_frags_exit_net,
};

+
+static u32 ip4_key_hashfn(const void *data, u32 len, u32 seed)
+{
+ return jhash2(data,
+ sizeof(struct frag_v4_compare_key) / sizeof(u32), seed);
+}
+
+static u32 ip4_obj_hashfn(const void *data, u32 len, u32 seed)
+{
+ const struct inet_frag_queue *fq = data;
+
+ return jhash2((const u32 *)&fq->key.v4,
+ sizeof(struct frag_v4_compare_key) / sizeof(u32), seed);
+}
+
+static int ip4_obj_cmpfn(struct rhashtable_compare_arg *arg, const void *ptr)
+{
+ const struct frag_v4_compare_key *key = arg->key;
+ const struct inet_frag_queue *fq = ptr;
+
+ return !!memcmp(&fq->key, key, sizeof(*key));
+}
+
+static const struct rhashtable_params ip4_rhash_params = {
+ .head_offset = offsetof(struct inet_frag_queue, node),
+ .key_offset = offsetof(struct inet_frag_queue, key),
+ .key_len = sizeof(struct frag_v4_compare_key),
+ .hashfn = ip4_key_hashfn,
+ .obj_hashfn = ip4_obj_hashfn,
+ .obj_cmpfn = ip4_obj_cmpfn,
+ .automatic_shrinking = true,
+};
+
void __init ipfrag_init(void)
{
- ip4_frags.hashfn = ip4_hashfn;
ip4_frags.constructor = ip4_frag_init;
ip4_frags.destructor = ip4_frag_free;
ip4_frags.qsize = sizeof(struct ipq);
- ip4_frags.match = ip4_frag_match;
ip4_frags.frag_expire = ip_expire;
ip4_frags.frags_cache_name = ip_frag_cache_name;
+ ip4_frags.rhash_params = ip4_rhash_params;
if (inet_frags_init(&ip4_frags))
panic("IP: failed to allocate ip4_frags cache\n");
ip4_frags_ctl_register();
--- a/net/ipv6/netfilter/nf_conntrack_reasm.c
+++ b/net/ipv6/netfilter/nf_conntrack_reasm.c
@@ -152,23 +152,6 @@ static inline u8 ip6_frag_ecn(const stru
return 1 << (ipv6_get_dsfield(ipv6h) & INET_ECN_MASK);
}

-static unsigned int nf_hash_frag(__be32 id, const struct in6_addr *saddr,
- const struct in6_addr *daddr)
-{
- net_get_random_once(&nf_frags.rnd, sizeof(nf_frags.rnd));
- return jhash_3words(ipv6_addr_hash(saddr), ipv6_addr_hash(daddr),
- (__force u32)id, nf_frags.rnd);
-}
-
-
-static unsigned int nf_hashfn(const struct inet_frag_queue *q)
-{
- const struct frag_queue *nq;
-
- nq = container_of(q, struct frag_queue, q);
- return nf_hash_frag(nq->id, &nq->saddr, &nq->daddr);
-}
-
static void nf_ct_frag6_expire(struct timer_list *t)
{
struct inet_frag_queue *frag = from_timer(frag, t, timer);
@@ -182,26 +165,19 @@ static void nf_ct_frag6_expire(struct ti
}

/* Creation primitives. */
-static inline struct frag_queue *fq_find(struct net *net, __be32 id,
- u32 user, struct in6_addr *src,
- struct in6_addr *dst, int iif, u8 ecn)
+static struct frag_queue *fq_find(struct net *net, __be32 id, u32 user,
+ const struct ipv6hdr *hdr, int iif)
{
+ struct frag_v6_compare_key key = {
+ .id = id,
+ .saddr = hdr->saddr,
+ .daddr = hdr->daddr,
+ .user = user,
+ .iif = iif,
+ };
struct inet_frag_queue *q;
- struct ip6_create_arg arg;
- unsigned int hash;
-
- arg.id = id;
- arg.user = user;
- arg.src = src;
- arg.dst = dst;
- arg.iif = iif;
- arg.ecn = ecn;
-
- local_bh_disable();
- hash = nf_hash_frag(id, src, dst);

- q = inet_frag_find(&net->nf_frag.frags, &nf_frags, &arg, hash);
- local_bh_enable();
+ q = inet_frag_find(&net->nf_frag.frags, &key);
if (IS_ERR_OR_NULL(q)) {
inet_frag_maybe_warn_overflow(q, pr_fmt());
return NULL;
@@ -593,8 +569,8 @@ int nf_ct_frag6_gather(struct net *net,
fhdr = (struct frag_hdr *)skb_transport_header(skb);

skb_orphan(skb);
- fq = fq_find(net, fhdr->identification, user, &hdr->saddr, &hdr->daddr,
- skb->dev ? skb->dev->ifindex : 0, ip6_frag_ecn(hdr));
+ fq = fq_find(net, fhdr->identification, user, hdr,
+ skb->dev ? skb->dev->ifindex : 0);
if (fq == NULL) {
pr_debug("Can't find and can't create new queue\n");
return -ENOMEM;
@@ -662,13 +638,12 @@ int nf_ct_frag6_init(void)
{
int ret = 0;

- nf_frags.hashfn = nf_hashfn;
nf_frags.constructor = ip6_frag_init;
nf_frags.destructor = NULL;
nf_frags.qsize = sizeof(struct frag_queue);
- nf_frags.match = ip6_frag_match;
nf_frags.frag_expire = nf_ct_frag6_expire;
nf_frags.frags_cache_name = nf_frags_cache_name;
+ nf_frags.rhash_params = ip6_rhash_params;
ret = inet_frags_init(&nf_frags);
if (ret)
goto out;
--- a/net/ipv6/reassembly.c
+++ b/net/ipv6/reassembly.c
@@ -79,52 +79,13 @@ static struct inet_frags ip6_frags;
static int ip6_frag_reasm(struct frag_queue *fq, struct sk_buff *prev,
struct net_device *dev);

-/*
- * callers should be careful not to use the hash value outside the ipfrag_lock
- * as doing so could race with ipfrag_hash_rnd being recalculated.
- */
-static unsigned int inet6_hash_frag(__be32 id, const struct in6_addr *saddr,
- const struct in6_addr *daddr)
-{
- net_get_random_once(&ip6_frags.rnd, sizeof(ip6_frags.rnd));
- return jhash_3words(ipv6_addr_hash(saddr), ipv6_addr_hash(daddr),
- (__force u32)id, ip6_frags.rnd);
-}
-
-static unsigned int ip6_hashfn(const struct inet_frag_queue *q)
-{
- const struct frag_queue *fq;
-
- fq = container_of(q, struct frag_queue, q);
- return inet6_hash_frag(fq->id, &fq->saddr, &fq->daddr);
-}
-
-bool ip6_frag_match(const struct inet_frag_queue *q, const void *a)
-{
- const struct frag_queue *fq;
- const struct ip6_create_arg *arg = a;
-
- fq = container_of(q, struct frag_queue, q);
- return fq->id == arg->id &&
- fq->user == arg->user &&
- ipv6_addr_equal(&fq->saddr, arg->src) &&
- ipv6_addr_equal(&fq->daddr, arg->dst) &&
- (arg->iif == fq->iif ||
- !(ipv6_addr_type(arg->dst) & (IPV6_ADDR_MULTICAST |
- IPV6_ADDR_LINKLOCAL)));
-}
-EXPORT_SYMBOL(ip6_frag_match);
-
void ip6_frag_init(struct inet_frag_queue *q, const void *a)
{
struct frag_queue *fq = container_of(q, struct frag_queue, q);
- const struct ip6_create_arg *arg = a;
+ const struct frag_v6_compare_key *key = a;

- fq->id = arg->id;
- fq->user = arg->user;
- fq->saddr = *arg->src;
- fq->daddr = *arg->dst;
- fq->ecn = arg->ecn;
+ q->key.v6 = *key;
+ fq->ecn = 0;
}
EXPORT_SYMBOL(ip6_frag_init);

@@ -182,23 +143,22 @@ static void ip6_frag_expire(struct timer
}

static struct frag_queue *
-fq_find(struct net *net, __be32 id, const struct in6_addr *src,
- const struct in6_addr *dst, int iif, u8 ecn)
+fq_find(struct net *net, __be32 id, const struct ipv6hdr *hdr, int iif)
{
+ struct frag_v6_compare_key key = {
+ .id = id,
+ .saddr = hdr->saddr,
+ .daddr = hdr->daddr,
+ .user = IP6_DEFRAG_LOCAL_DELIVER,
+ .iif = iif,
+ };
struct inet_frag_queue *q;
- struct ip6_create_arg arg;
- unsigned int hash;
-
- arg.id = id;
- arg.user = IP6_DEFRAG_LOCAL_DELIVER;
- arg.src = src;
- arg.dst = dst;
- arg.iif = iif;
- arg.ecn = ecn;

- hash = inet6_hash_frag(id, src, dst);
+ if (!(ipv6_addr_type(&hdr->daddr) & (IPV6_ADDR_MULTICAST |
+ IPV6_ADDR_LINKLOCAL)))
+ key.iif = 0;

- q = inet_frag_find(&net->ipv6.frags, &ip6_frags, &arg, hash);
+ q = inet_frag_find(&net->ipv6.frags, &key);
if (IS_ERR_OR_NULL(q)) {
inet_frag_maybe_warn_overflow(q, pr_fmt());
return NULL;
@@ -530,6 +490,7 @@ static int ipv6_frag_rcv(struct sk_buff
struct frag_queue *fq;
const struct ipv6hdr *hdr = ipv6_hdr(skb);
struct net *net = dev_net(skb_dst(skb)->dev);
+ int iif;

if (IP6CB(skb)->flags & IP6SKB_FRAGMENTED)
goto fail_hdr;
@@ -558,13 +519,14 @@ static int ipv6_frag_rcv(struct sk_buff
return 1;
}

- fq = fq_find(net, fhdr->identification, &hdr->saddr, &hdr->daddr,
- skb->dev ? skb->dev->ifindex : 0, ip6_frag_ecn(hdr));
+ iif = skb->dev ? skb->dev->ifindex : 0;
+ fq = fq_find(net, fhdr->identification, hdr, iif);
if (fq) {
int ret;

spin_lock(&fq->q.lock);

+ fq->iif = iif;
ret = ip6_frag_queue(fq, skb, fhdr, IP6CB(skb)->nhoff);

spin_unlock(&fq->q.lock);
@@ -738,17 +700,47 @@ static struct pernet_operations ip6_frag
.exit = ipv6_frags_exit_net,
};

+static u32 ip6_key_hashfn(const void *data, u32 len, u32 seed)
+{
+ return jhash2(data,
+ sizeof(struct frag_v6_compare_key) / sizeof(u32), seed);
+}
+
+static u32 ip6_obj_hashfn(const void *data, u32 len, u32 seed)
+{
+ const struct inet_frag_queue *fq = data;
+
+ return jhash2((const u32 *)&fq->key.v6,
+ sizeof(struct frag_v6_compare_key) / sizeof(u32), seed);
+}
+
+static int ip6_obj_cmpfn(struct rhashtable_compare_arg *arg, const void *ptr)
+{
+ const struct frag_v6_compare_key *key = arg->key;
+ const struct inet_frag_queue *fq = ptr;
+
+ return !!memcmp(&fq->key, key, sizeof(*key));
+}
+
+const struct rhashtable_params ip6_rhash_params = {
+ .head_offset = offsetof(struct inet_frag_queue, node),
+ .hashfn = ip6_key_hashfn,
+ .obj_hashfn = ip6_obj_hashfn,
+ .obj_cmpfn = ip6_obj_cmpfn,
+ .automatic_shrinking = true,
+};
+EXPORT_SYMBOL(ip6_rhash_params);
+
int __init ipv6_frag_init(void)
{
int ret;

- ip6_frags.hashfn = ip6_hashfn;
ip6_frags.constructor = ip6_frag_init;
ip6_frags.destructor = NULL;
ip6_frags.qsize = sizeof(struct frag_queue);
- ip6_frags.match = ip6_frag_match;
ip6_frags.frag_expire = ip6_frag_expire;
ip6_frags.frags_cache_name = ip6_frag_cache_name;
+ ip6_frags.rhash_params = ip6_rhash_params;
ret = inet_frags_init(&ip6_frags);
if (ret)
goto out;



2018-09-17 23:27:02

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.14 096/126] inet: frags: refactor ipv6_frag_init()

4.14-stable review patch. If anyone has any objections, please let me know.

------------------

From: Eric Dumazet <[email protected]>

We want to call inet_frags_init() earlier.

This is a prereq to "inet: frags: use rhashtables for reassembly units"

Signed-off-by: Eric Dumazet <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
(cherry picked from commit 5b975bab23615cd0fdf67af6c9298eb01c4b9f61)
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
net/ipv6/reassembly.c | 25 ++++++++++++++-----------
1 file changed, 14 insertions(+), 11 deletions(-)

--- a/net/ipv6/reassembly.c
+++ b/net/ipv6/reassembly.c
@@ -746,10 +746,21 @@ int __init ipv6_frag_init(void)
{
int ret;

- ret = inet6_add_protocol(&frag_protocol, IPPROTO_FRAGMENT);
+ ip6_frags.hashfn = ip6_hashfn;
+ ip6_frags.constructor = ip6_frag_init;
+ ip6_frags.destructor = NULL;
+ ip6_frags.qsize = sizeof(struct frag_queue);
+ ip6_frags.match = ip6_frag_match;
+ ip6_frags.frag_expire = ip6_frag_expire;
+ ip6_frags.frags_cache_name = ip6_frag_cache_name;
+ ret = inet_frags_init(&ip6_frags);
if (ret)
goto out;

+ ret = inet6_add_protocol(&frag_protocol, IPPROTO_FRAGMENT);
+ if (ret)
+ goto err_protocol;
+
ret = ip6_frags_sysctl_register();
if (ret)
goto err_sysctl;
@@ -758,16 +769,6 @@ int __init ipv6_frag_init(void)
if (ret)
goto err_pernet;

- ip6_frags.hashfn = ip6_hashfn;
- ip6_frags.constructor = ip6_frag_init;
- ip6_frags.destructor = NULL;
- ip6_frags.qsize = sizeof(struct frag_queue);
- ip6_frags.match = ip6_frag_match;
- ip6_frags.frag_expire = ip6_frag_expire;
- ip6_frags.frags_cache_name = ip6_frag_cache_name;
- ret = inet_frags_init(&ip6_frags);
- if (ret)
- goto err_pernet;
out:
return ret;

@@ -775,6 +776,8 @@ err_pernet:
ip6_frags_sysctl_unregister();
err_sysctl:
inet6_del_protocol(&frag_protocol, IPPROTO_FRAGMENT);
+err_protocol:
+ inet_frags_fini(&ip6_frags);
goto out;
}




2018-09-17 23:28:01

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.14 074/126] MIPS: Octeon: add missing of_node_put()

4.14-stable review patch. If anyone has any objections, please let me know.

------------------

From: Nicholas Mc Guire <[email protected]>

[ Upstream commit b1259519e618d479ede8a0db5474b3aff99f5056 ]

The call to of_find_node_by_name returns a node pointer with refcount
incremented thus it must be explicitly decremented here after the last
usage.

Signed-off-by: Nicholas Mc Guire <[email protected]>
Signed-off-by: Paul Burton <[email protected]>
Patchwork: https://patchwork.linux-mips.org/patch/19558/
Cc: Ralf Baechle <[email protected]>
Cc: James Hogan <[email protected]>
Cc: [email protected]
Cc: [email protected]
Signed-off-by: Sasha Levin <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
arch/mips/cavium-octeon/octeon-platform.c | 2 ++
1 file changed, 2 insertions(+)

--- a/arch/mips/cavium-octeon/octeon-platform.c
+++ b/arch/mips/cavium-octeon/octeon-platform.c
@@ -322,6 +322,7 @@ static int __init octeon_ehci_device_ini
return 0;

pd = of_find_device_by_node(ehci_node);
+ of_node_put(ehci_node);
if (!pd)
return 0;

@@ -384,6 +385,7 @@ static int __init octeon_ohci_device_ini
return 0;

pd = of_find_device_by_node(ohci_node);
+ of_node_put(ohci_node);
if (!pd)
return 0;




2018-09-17 23:28:21

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.14 067/126] net: mvneta: fix mtu change on port without link

4.14-stable review patch. If anyone has any objections, please let me know.

------------------

From: Yelena Krivosheev <[email protected]>

[ Upstream commit 8466baf788ec3e18836bd9c91ba0b1a07af25878 ]

It is incorrect to enable TX/RX queues (call by mvneta_port_up()) for
port without link. Indeed MTU change for interface without link causes TX
queues to stuck.

Fixes: c5aff18204da ("net: mvneta: driver for Marvell Armada 370/XP
network unit")
Signed-off-by: Yelena Krivosheev <[email protected]>
[gregory.clement: adding Fixes tags and rewording commit log]
Signed-off-by: Gregory CLEMENT <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
drivers/net/ethernet/marvell/mvneta.c | 1 -
1 file changed, 1 deletion(-)

--- a/drivers/net/ethernet/marvell/mvneta.c
+++ b/drivers/net/ethernet/marvell/mvneta.c
@@ -3195,7 +3195,6 @@ static int mvneta_change_mtu(struct net_

on_each_cpu(mvneta_percpu_enable, pp, true);
mvneta_start_dev(pp);
- mvneta_port_up(pp);

netdev_update_features(dev);




2018-09-17 23:29:48

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.14 016/126] x86/microcode: Make sure boot_cpu_data.microcode is up-to-date

4.14-stable review patch. If anyone has any objections, please let me know.

------------------

From: Prarit Bhargava <[email protected]>

commit 370a132bb2227ff76278f98370e0e701d86ff752 upstream.

When preparing an MCE record for logging, boot_cpu_data.microcode is used
to read out the microcode revision on the box.

However, on systems where late microcode update has happened, the microcode
revision output in a MCE log record is wrong because
boot_cpu_data.microcode is not updated when the microcode gets updated.

But, the microcode revision saved in boot_cpu_data's microcode member
should be kept up-to-date, regardless, for consistency.

Make it so.

Fixes: fa94d0c6e0f3 ("x86/MCE: Save microcode revision in machine check records")
Signed-off-by: Prarit Bhargava <[email protected]>
Signed-off-by: Borislav Petkov <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Cc: Tony Luck <[email protected]>
Cc: [email protected]
Cc: [email protected]
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
arch/x86/kernel/cpu/microcode/amd.c | 4 ++++
arch/x86/kernel/cpu/microcode/intel.c | 4 ++++
2 files changed, 8 insertions(+)

--- a/arch/x86/kernel/cpu/microcode/amd.c
+++ b/arch/x86/kernel/cpu/microcode/amd.c
@@ -537,6 +537,10 @@ static enum ucode_state apply_microcode_
uci->cpu_sig.rev = mc_amd->hdr.patch_id;
c->microcode = mc_amd->hdr.patch_id;

+ /* Update boot_cpu_data's revision too, if we're on the BSP: */
+ if (c->cpu_index == boot_cpu_data.cpu_index)
+ boot_cpu_data.microcode = mc_amd->hdr.patch_id;
+
return UCODE_UPDATED;
}

--- a/arch/x86/kernel/cpu/microcode/intel.c
+++ b/arch/x86/kernel/cpu/microcode/intel.c
@@ -851,6 +851,10 @@ static enum ucode_state apply_microcode_
uci->cpu_sig.rev = rev;
c->microcode = rev;

+ /* Update boot_cpu_data's revision too, if we're on the BSP: */
+ if (c->cpu_index == boot_cpu_data.cpu_index)
+ boot_cpu_data.microcode = rev;
+
return UCODE_UPDATED;
}




2018-09-17 23:30:37

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.14 020/126] tpm: separate cmd_ready/go_idle from runtime_pm

4.14-stable review patch. If anyone has any objections, please let me know.

------------------

From: Tomas Winkler <[email protected]>

commit 627448e85c766587f6fdde1ea3886d6615081c77 upstream.

Fix tpm ptt initialization error:
tpm tpm0: A TPM error (378) occurred get tpm pcr allocation.

We cannot use go_idle cmd_ready commands via runtime_pm handles
as with the introduction of localities this is no longer an optional
feature, while runtime pm can be not enabled.
Though cmd_ready/go_idle provides a power saving, it's also a part of
TPM2 protocol and should be called explicitly.
This patch exposes cmd_read/go_idle via tpm class ops and removes
runtime pm support as it is not used by any driver.

When calling from nested context always use both flags:
TPM_TRANSMIT_UNLOCKED and TPM_TRANSMIT_RAW. Both are needed to resolve
tpm spaces and locality request recursive calls to tpm_transmit().
TPM_TRANSMIT_RAW should never be used standalone as it will fail
on double locking. While TPM_TRANSMIT_UNLOCKED standalone should be
called from non-recursive locked contexts.

New wrappers are added tpm_cmd_ready() and tpm_go_idle() to
streamline tpm_try_transmit code.

tpm_crb no longer needs own power saving functions and can drop using
tpm_pm_suspend/resume.

This patch cannot be really separated from the locality fix.
Fixes: 888d867df441 (tpm: cmd_ready command can be issued only after granting locality)

Cc: [email protected]
Fixes: 888d867df441 (tpm: cmd_ready command can be issued only after granting locality)
Signed-off-by: Tomas Winkler <[email protected]>
Tested-by: Jarkko Sakkinen <[email protected]>
Reviewed-by: Jarkko Sakkinen <[email protected]>
Signed-off-by: Jarkko Sakkinen <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
drivers/char/tpm/tpm-interface.c | 50 +++++++++++++++----
drivers/char/tpm/tpm.h | 12 +++-
drivers/char/tpm/tpm2-space.c | 16 +++---
drivers/char/tpm/tpm_crb.c | 101 ++++++++++-----------------------------
include/linux/tpm.h | 2
5 files changed, 90 insertions(+), 91 deletions(-)

--- a/drivers/char/tpm/tpm-interface.c
+++ b/drivers/char/tpm/tpm-interface.c
@@ -369,10 +369,13 @@ err_len:
return -EINVAL;
}

-static int tpm_request_locality(struct tpm_chip *chip)
+static int tpm_request_locality(struct tpm_chip *chip, unsigned int flags)
{
int rc;

+ if (flags & TPM_TRANSMIT_RAW)
+ return 0;
+
if (!chip->ops->request_locality)
return 0;

@@ -385,10 +388,13 @@ static int tpm_request_locality(struct t
return 0;
}

-static void tpm_relinquish_locality(struct tpm_chip *chip)
+static void tpm_relinquish_locality(struct tpm_chip *chip, unsigned int flags)
{
int rc;

+ if (flags & TPM_TRANSMIT_RAW)
+ return;
+
if (!chip->ops->relinquish_locality)
return;

@@ -399,6 +405,28 @@ static void tpm_relinquish_locality(stru
chip->locality = -1;
}

+static int tpm_cmd_ready(struct tpm_chip *chip, unsigned int flags)
+{
+ if (flags & TPM_TRANSMIT_RAW)
+ return 0;
+
+ if (!chip->ops->cmd_ready)
+ return 0;
+
+ return chip->ops->cmd_ready(chip);
+}
+
+static int tpm_go_idle(struct tpm_chip *chip, unsigned int flags)
+{
+ if (flags & TPM_TRANSMIT_RAW)
+ return 0;
+
+ if (!chip->ops->go_idle)
+ return 0;
+
+ return chip->ops->go_idle(chip);
+}
+
static ssize_t tpm_try_transmit(struct tpm_chip *chip,
struct tpm_space *space,
u8 *buf, size_t bufsiz,
@@ -449,14 +477,15 @@ static ssize_t tpm_try_transmit(struct t
/* Store the decision as chip->locality will be changed. */
need_locality = chip->locality == -1;

- if (!(flags & TPM_TRANSMIT_RAW) && need_locality) {
- rc = tpm_request_locality(chip);
+ if (need_locality) {
+ rc = tpm_request_locality(chip, flags);
if (rc < 0)
goto out_no_locality;
}

- if (chip->dev.parent)
- pm_runtime_get_sync(chip->dev.parent);
+ rc = tpm_cmd_ready(chip, flags);
+ if (rc)
+ goto out;

rc = tpm2_prepare_space(chip, space, ordinal, buf);
if (rc)
@@ -516,13 +545,16 @@ out_recv:
}

rc = tpm2_commit_space(chip, space, ordinal, buf, &len);
+ if (rc)
+ dev_err(&chip->dev, "tpm2_commit_space: error %d\n", rc);

out:
- if (chip->dev.parent)
- pm_runtime_put_sync(chip->dev.parent);
+ rc = tpm_go_idle(chip, flags);
+ if (rc)
+ goto out;

if (need_locality)
- tpm_relinquish_locality(chip);
+ tpm_relinquish_locality(chip, flags);

out_no_locality:
if (chip->ops->clk_enable != NULL)
--- a/drivers/char/tpm/tpm.h
+++ b/drivers/char/tpm/tpm.h
@@ -511,9 +511,17 @@ extern const struct file_operations tpm_
extern const struct file_operations tpmrm_fops;
extern struct idr dev_nums_idr;

+/**
+ * enum tpm_transmit_flags
+ *
+ * @TPM_TRANSMIT_UNLOCKED: used to lock sequence of tpm_transmit calls.
+ * @TPM_TRANSMIT_RAW: prevent recursive calls into setup steps
+ * (go idle, locality,..). Always use with UNLOCKED
+ * as it will fail on double locking.
+ */
enum tpm_transmit_flags {
- TPM_TRANSMIT_UNLOCKED = BIT(0),
- TPM_TRANSMIT_RAW = BIT(1),
+ TPM_TRANSMIT_UNLOCKED = BIT(0),
+ TPM_TRANSMIT_RAW = BIT(1),
};

ssize_t tpm_transmit(struct tpm_chip *chip, struct tpm_space *space,
--- a/drivers/char/tpm/tpm2-space.c
+++ b/drivers/char/tpm/tpm2-space.c
@@ -39,7 +39,8 @@ static void tpm2_flush_sessions(struct t
for (i = 0; i < ARRAY_SIZE(space->session_tbl); i++) {
if (space->session_tbl[i])
tpm2_flush_context_cmd(chip, space->session_tbl[i],
- TPM_TRANSMIT_UNLOCKED);
+ TPM_TRANSMIT_UNLOCKED |
+ TPM_TRANSMIT_RAW);
}
}

@@ -84,7 +85,7 @@ static int tpm2_load_context(struct tpm_
tpm_buf_append(&tbuf, &buf[*offset], body_size);

rc = tpm_transmit_cmd(chip, NULL, tbuf.data, PAGE_SIZE, 4,
- TPM_TRANSMIT_UNLOCKED, NULL);
+ TPM_TRANSMIT_UNLOCKED | TPM_TRANSMIT_RAW, NULL);
if (rc < 0) {
dev_warn(&chip->dev, "%s: failed with a system error %d\n",
__func__, rc);
@@ -133,7 +134,7 @@ static int tpm2_save_context(struct tpm_
tpm_buf_append_u32(&tbuf, handle);

rc = tpm_transmit_cmd(chip, NULL, tbuf.data, PAGE_SIZE, 0,
- TPM_TRANSMIT_UNLOCKED, NULL);
+ TPM_TRANSMIT_UNLOCKED | TPM_TRANSMIT_RAW, NULL);
if (rc < 0) {
dev_warn(&chip->dev, "%s: failed with a system error %d\n",
__func__, rc);
@@ -170,7 +171,8 @@ static void tpm2_flush_space(struct tpm_
for (i = 0; i < ARRAY_SIZE(space->context_tbl); i++)
if (space->context_tbl[i] && ~space->context_tbl[i])
tpm2_flush_context_cmd(chip, space->context_tbl[i],
- TPM_TRANSMIT_UNLOCKED);
+ TPM_TRANSMIT_UNLOCKED |
+ TPM_TRANSMIT_RAW);

tpm2_flush_sessions(chip, space);
}
@@ -377,7 +379,8 @@ static int tpm2_map_response_header(stru

return 0;
out_no_slots:
- tpm2_flush_context_cmd(chip, phandle, TPM_TRANSMIT_UNLOCKED);
+ tpm2_flush_context_cmd(chip, phandle,
+ TPM_TRANSMIT_UNLOCKED | TPM_TRANSMIT_RAW);
dev_warn(&chip->dev, "%s: out of slots for 0x%08X\n", __func__,
phandle);
return -ENOMEM;
@@ -465,7 +468,8 @@ static int tpm2_save_space(struct tpm_ch
return rc;

tpm2_flush_context_cmd(chip, space->context_tbl[i],
- TPM_TRANSMIT_UNLOCKED);
+ TPM_TRANSMIT_UNLOCKED |
+ TPM_TRANSMIT_RAW);
space->context_tbl[i] = ~0;
}

--- a/drivers/char/tpm/tpm_crb.c
+++ b/drivers/char/tpm/tpm_crb.c
@@ -137,7 +137,7 @@ static bool crb_wait_for_reg_32(u32 __io
}

/**
- * crb_go_idle - request tpm crb device to go the idle state
+ * __crb_go_idle - request tpm crb device to go the idle state
*
* @dev: crb device
* @priv: crb private data
@@ -151,7 +151,7 @@ static bool crb_wait_for_reg_32(u32 __io
*
* Return: 0 always
*/
-static int crb_go_idle(struct device *dev, struct crb_priv *priv)
+static int __crb_go_idle(struct device *dev, struct crb_priv *priv)
{
if ((priv->flags & CRB_FL_ACPI_START) ||
(priv->flags & CRB_FL_CRB_SMC_START))
@@ -166,11 +166,20 @@ static int crb_go_idle(struct device *de
dev_warn(dev, "goIdle timed out\n");
return -ETIME;
}
+
return 0;
}

+static int crb_go_idle(struct tpm_chip *chip)
+{
+ struct device *dev = &chip->dev;
+ struct crb_priv *priv = dev_get_drvdata(dev);
+
+ return __crb_go_idle(dev, priv);
+}
+
/**
- * crb_cmd_ready - request tpm crb device to enter ready state
+ * __crb_cmd_ready - request tpm crb device to enter ready state
*
* @dev: crb device
* @priv: crb private data
@@ -183,7 +192,7 @@ static int crb_go_idle(struct device *de
*
* Return: 0 on success -ETIME on timeout;
*/
-static int crb_cmd_ready(struct device *dev, struct crb_priv *priv)
+static int __crb_cmd_ready(struct device *dev, struct crb_priv *priv)
{
if ((priv->flags & CRB_FL_ACPI_START) ||
(priv->flags & CRB_FL_CRB_SMC_START))
@@ -201,6 +210,14 @@ static int crb_cmd_ready(struct device *
return 0;
}

+static int crb_cmd_ready(struct tpm_chip *chip)
+{
+ struct device *dev = &chip->dev;
+ struct crb_priv *priv = dev_get_drvdata(dev);
+
+ return __crb_cmd_ready(dev, priv);
+}
+
static int __crb_request_locality(struct device *dev,
struct crb_priv *priv, int loc)
{
@@ -393,6 +410,8 @@ static const struct tpm_class_ops tpm_cr
.send = crb_send,
.cancel = crb_cancel,
.req_canceled = crb_req_canceled,
+ .go_idle = crb_go_idle,
+ .cmd_ready = crb_cmd_ready,
.request_locality = crb_request_locality,
.relinquish_locality = crb_relinquish_locality,
.req_complete_mask = CRB_DRV_STS_COMPLETE,
@@ -508,7 +527,7 @@ static int crb_map_io(struct acpi_device
* PTT HW bug w/a: wake up the device to access
* possibly not retained registers.
*/
- ret = crb_cmd_ready(dev, priv);
+ ret = __crb_cmd_ready(dev, priv);
if (ret)
return ret;

@@ -553,7 +572,7 @@ out:
if (!ret)
priv->cmd_size = cmd_size;

- crb_go_idle(dev, priv);
+ __crb_go_idle(dev, priv);

__crb_relinquish_locality(dev, priv, 0);

@@ -624,32 +643,7 @@ static int crb_acpi_add(struct acpi_devi
chip->acpi_dev_handle = device->handle;
chip->flags = TPM_CHIP_FLAG_TPM2;

- rc = __crb_request_locality(dev, priv, 0);
- if (rc)
- return rc;
-
- rc = crb_cmd_ready(dev, priv);
- if (rc)
- goto out;
-
- pm_runtime_get_noresume(dev);
- pm_runtime_set_active(dev);
- pm_runtime_enable(dev);
-
- rc = tpm_chip_register(chip);
- if (rc) {
- crb_go_idle(dev, priv);
- pm_runtime_put_noidle(dev);
- pm_runtime_disable(dev);
- goto out;
- }
-
- pm_runtime_put_sync(dev);
-
-out:
- __crb_relinquish_locality(dev, priv, 0);
-
- return rc;
+ return tpm_chip_register(chip);
}

static int crb_acpi_remove(struct acpi_device *device)
@@ -659,52 +653,11 @@ static int crb_acpi_remove(struct acpi_d

tpm_chip_unregister(chip);

- pm_runtime_disable(dev);
-
return 0;
}

-static int __maybe_unused crb_pm_runtime_suspend(struct device *dev)
-{
- struct tpm_chip *chip = dev_get_drvdata(dev);
- struct crb_priv *priv = dev_get_drvdata(&chip->dev);
-
- return crb_go_idle(dev, priv);
-}
-
-static int __maybe_unused crb_pm_runtime_resume(struct device *dev)
-{
- struct tpm_chip *chip = dev_get_drvdata(dev);
- struct crb_priv *priv = dev_get_drvdata(&chip->dev);
-
- return crb_cmd_ready(dev, priv);
-}
-
-static int __maybe_unused crb_pm_suspend(struct device *dev)
-{
- int ret;
-
- ret = tpm_pm_suspend(dev);
- if (ret)
- return ret;
-
- return crb_pm_runtime_suspend(dev);
-}
-
-static int __maybe_unused crb_pm_resume(struct device *dev)
-{
- int ret;
-
- ret = crb_pm_runtime_resume(dev);
- if (ret)
- return ret;
-
- return tpm_pm_resume(dev);
-}
-
static const struct dev_pm_ops crb_pm = {
- SET_SYSTEM_SLEEP_PM_OPS(crb_pm_suspend, crb_pm_resume)
- SET_RUNTIME_PM_OPS(crb_pm_runtime_suspend, crb_pm_runtime_resume, NULL)
+ SET_SYSTEM_SLEEP_PM_OPS(tpm_pm_suspend, tpm_pm_resume)
};

static const struct acpi_device_id crb_device_ids[] = {
--- a/include/linux/tpm.h
+++ b/include/linux/tpm.h
@@ -48,6 +48,8 @@ struct tpm_class_ops {
u8 (*status) (struct tpm_chip *chip);
bool (*update_timeouts)(struct tpm_chip *chip,
unsigned long *timeout_cap);
+ int (*go_idle)(struct tpm_chip *chip);
+ int (*cmd_ready)(struct tpm_chip *chip);
int (*request_locality)(struct tpm_chip *chip, int loc);
int (*relinquish_locality)(struct tpm_chip *chip, int loc);
void (*clk_enable)(struct tpm_chip *chip, bool value);



2018-09-17 23:31:03

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.14 002/126] i2c: i801: fix DNVs SMBCTRL register offset

4.14-stable review patch. If anyone has any objections, please let me know.

------------------

From: Felipe Balbi <[email protected]>

commit 851a15114895c5bce163a6f2d57e0aa4658a1be4 upstream.

DNV's iTCO is slightly different with SMBCTRL sitting at a different
offset when compared to all other devices. Let's fix so that we can
properly use iTCO watchdog.

Fixes: 84d7f2ebd70d ("i2c: i801: Add support for Intel DNV")
Cc: <[email protected]> # v4.4+
Signed-off-by: Felipe Balbi <[email protected]>
Reviewed-by: Jean Delvare <[email protected]>
Signed-off-by: Wolfram Sang <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
drivers/i2c/busses/i2c-i801.c | 7 ++++++-
1 file changed, 6 insertions(+), 1 deletion(-)

--- a/drivers/i2c/busses/i2c-i801.c
+++ b/drivers/i2c/busses/i2c-i801.c
@@ -138,6 +138,7 @@

#define SBREG_BAR 0x10
#define SBREG_SMBCTRL 0xc6000c
+#define SBREG_SMBCTRL_DNV 0xcf000c

/* Host status bits for SMBPCISTS */
#define SMBPCISTS_INTS BIT(3)
@@ -1395,7 +1396,11 @@ static void i801_add_tco(struct i801_pri
spin_unlock(&p2sb_spinlock);

res = &tco_res[ICH_RES_MEM_OFF];
- res->start = (resource_size_t)base64_addr + SBREG_SMBCTRL;
+ if (pci_dev->device == PCI_DEVICE_ID_INTEL_DNV_SMBUS)
+ res->start = (resource_size_t)base64_addr + SBREG_SMBCTRL_DNV;
+ else
+ res->start = (resource_size_t)base64_addr + SBREG_SMBCTRL;
+
res->end = res->start + 3;
res->flags = IORESOURCE_MEM;




2018-09-17 23:31:58

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.14 017/126] x86/microcode: Update the new microcode revision unconditionally

4.14-stable review patch. If anyone has any objections, please let me know.

------------------

From: Filippo Sironi <[email protected]>

commit 8da38ebaad23fe1b0c4a205438676f6356607cfc upstream.

Handle the case where microcode gets loaded on the BSP's hyperthread
sibling first and the boot_cpu_data's microcode revision doesn't get
updated because of early exit due to the siblings sharing a microcode
engine.

For that, simply write the updated revision on all CPUs unconditionally.

Signed-off-by: Filippo Sironi <[email protected]>
Signed-off-by: Borislav Petkov <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Cc: [email protected]
Cc: [email protected]
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
arch/x86/kernel/cpu/microcode/amd.c | 22 +++++++++++++---------
arch/x86/kernel/cpu/microcode/intel.c | 13 ++++++++-----
2 files changed, 21 insertions(+), 14 deletions(-)

--- a/arch/x86/kernel/cpu/microcode/amd.c
+++ b/arch/x86/kernel/cpu/microcode/amd.c
@@ -504,6 +504,7 @@ static enum ucode_state apply_microcode_
struct microcode_amd *mc_amd;
struct ucode_cpu_info *uci;
struct ucode_patch *p;
+ enum ucode_state ret;
u32 rev, dummy;

BUG_ON(raw_smp_processor_id() != cpu);
@@ -521,9 +522,8 @@ static enum ucode_state apply_microcode_

/* need to apply patch? */
if (rev >= mc_amd->hdr.patch_id) {
- c->microcode = rev;
- uci->cpu_sig.rev = rev;
- return UCODE_OK;
+ ret = UCODE_OK;
+ goto out;
}

if (__apply_microcode_amd(mc_amd)) {
@@ -531,17 +531,21 @@ static enum ucode_state apply_microcode_
cpu, mc_amd->hdr.patch_id);
return UCODE_ERROR;
}
- pr_info("CPU%d: new patch_level=0x%08x\n", cpu,
- mc_amd->hdr.patch_id);

- uci->cpu_sig.rev = mc_amd->hdr.patch_id;
- c->microcode = mc_amd->hdr.patch_id;
+ rev = mc_amd->hdr.patch_id;
+ ret = UCODE_UPDATED;
+
+ pr_info("CPU%d: new patch_level=0x%08x\n", cpu, rev);
+
+out:
+ uci->cpu_sig.rev = rev;
+ c->microcode = rev;

/* Update boot_cpu_data's revision too, if we're on the BSP: */
if (c->cpu_index == boot_cpu_data.cpu_index)
- boot_cpu_data.microcode = mc_amd->hdr.patch_id;
+ boot_cpu_data.microcode = rev;

- return UCODE_UPDATED;
+ return ret;
}

static int install_equiv_cpu_table(const u8 *buf)
--- a/arch/x86/kernel/cpu/microcode/intel.c
+++ b/arch/x86/kernel/cpu/microcode/intel.c
@@ -795,6 +795,7 @@ static enum ucode_state apply_microcode_
struct ucode_cpu_info *uci = ucode_cpu_info + cpu;
struct cpuinfo_x86 *c = &cpu_data(cpu);
struct microcode_intel *mc;
+ enum ucode_state ret;
static int prev_rev;
u32 rev;

@@ -817,9 +818,8 @@ static enum ucode_state apply_microcode_
*/
rev = intel_get_microcode_revision();
if (rev >= mc->hdr.rev) {
- uci->cpu_sig.rev = rev;
- c->microcode = rev;
- return UCODE_OK;
+ ret = UCODE_OK;
+ goto out;
}

/*
@@ -848,14 +848,17 @@ static enum ucode_state apply_microcode_
prev_rev = rev;
}

+ ret = UCODE_UPDATED;
+
+out:
uci->cpu_sig.rev = rev;
- c->microcode = rev;
+ c->microcode = rev;

/* Update boot_cpu_data's revision too, if we're on the BSP: */
if (c->cpu_index == boot_cpu_data.cpu_index)
boot_cpu_data.microcode = rev;

- return UCODE_UPDATED;
+ return ret;
}

static enum ucode_state generic_load_microcode(int cpu, void *data, size_t size,



2018-09-18 00:00:49

by Nathan Chancellor

[permalink] [raw]
Subject: Re: [PATCH 4.14 000/126] 4.14.71-stable review

On Tue, Sep 18, 2018 at 12:40:48AM +0200, Greg Kroah-Hartman wrote:
> This is the start of the stable review cycle for the 4.14.71 release.
> There are 126 patches in this series, all will be posted as a response
> to this one. If anyone has any issues with these being applied, please
> let me know.
>
> Responses should be made by Wed Sep 19 21:16:12 UTC 2018.
> Anything received after that time might be too late.
>
> The whole patch series can be found in one patch at:
> https://www.kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.14.71-rc1.gz
> or in the git tree and branch at:
> git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-4.14.y
> and the diffstat can be found below.
>
> thanks,
>
> greg k-h
>

Merged, compiled, and installed onto my Raspberry Pi.

No initial issues noticed in dmesg or general usage.

Thanks!
Nathan

2018-09-18 07:44:55

by Greg Kroah-Hartman

[permalink] [raw]
Subject: Re: [PATCH 4.14 000/126] 4.14.71-stable review

On Mon, Sep 17, 2018 at 04:59:49PM -0700, Nathan Chancellor wrote:
> On Tue, Sep 18, 2018 at 12:40:48AM +0200, Greg Kroah-Hartman wrote:
> > This is the start of the stable review cycle for the 4.14.71 release.
> > There are 126 patches in this series, all will be posted as a response
> > to this one. If anyone has any issues with these being applied, please
> > let me know.
> >
> > Responses should be made by Wed Sep 19 21:16:12 UTC 2018.
> > Anything received after that time might be too late.
> >
> > The whole patch series can be found in one patch at:
> > https://www.kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.14.71-rc1.gz
> > or in the git tree and branch at:
> > git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-4.14.y
> > and the diffstat can be found below.
> >
> > thanks,
> >
> > greg k-h
> >
>
> Merged, compiled, and installed onto my Raspberry Pi.
>
> No initial issues noticed in dmesg or general usage.

Thanks for testing 3 of these and letting me know.

greg k-h

2018-09-18 16:23:52

by Guenter Roeck

[permalink] [raw]
Subject: Re: [PATCH 4.14 000/126] 4.14.71-stable review

On Tue, Sep 18, 2018 at 12:40:48AM +0200, Greg Kroah-Hartman wrote:
> This is the start of the stable review cycle for the 4.14.71 release.
> There are 126 patches in this series, all will be posted as a response
> to this one. If anyone has any issues with these being applied, please
> let me know.
>
> Responses should be made by Wed Sep 19 21:16:12 UTC 2018.
> Anything received after that time might be too late.
>

Build results:
total: 151 pass: 151 fail: 0
Qemu test results:
total: 315 pass: 315 fail: 0

Details are available at https://kerneltests.org/builders/.

Guenter

2018-09-18 16:56:38

by Naresh Kamboju

[permalink] [raw]
Subject: Re: [PATCH 4.14 000/126] 4.14.71-stable review

On 18 September 2018 at 04:10, Greg Kroah-Hartman
<[email protected]> wrote:
> This is the start of the stable review cycle for the 4.14.71 release.
> There are 126 patches in this series, all will be posted as a response
> to this one. If anyone has any issues with these being applied, please
> let me know.
>
> Responses should be made by Wed Sep 19 21:16:12 UTC 2018.
> Anything received after that time might be too late.
>
> The whole patch series can be found in one patch at:
> https://www.kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.14.71-rc1.gz
> or in the git tree and branch at:
> git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-4.14.y
> and the diffstat can be found below.
>
> thanks,
>
> greg k-h

Results from Linaro’s test farm.
No regressions on arm64, arm, x86_64, and i386.

Summary
------------------------------------------------------------------------

kernel: 4.14.71-rc1
git repo: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git
git branch: linux-4.14.y
git commit: 117adc51b16e0068c963f8c1339a0360ccf29f17
git describe: v4.14.70-127-g117adc51b16e
Test details: https://qa-reports.linaro.org/lkft/linux-stable-rc-4.14-oe/build/v4.14.70-127-g117adc51b16e

No regressions (compared to build v4.14.70)


Ran 19763 total tests in the following environments and test suites.

Environments
--------------
- dragonboard-410c - arm64
- hi6220-hikey - arm64
- i386
- juno-r2 - arm64
- qemu_arm
- qemu_arm64
- qemu_i386
- qemu_x86_64
- x15 - arm
- x86_64

Test Suites
-----------
* boot
* kselftest
* libhugetlbfs
* ltp-cap_bounds-tests
* ltp-containers-tests
* ltp-cve-tests
* ltp-fcntl-locktests-tests
* ltp-filecaps-tests
* ltp-fs-tests
* ltp-fs_bind-tests
* ltp-fs_perms_simple-tests
* ltp-fsx-tests
* ltp-hugetlb-tests
* ltp-io-tests
* ltp-ipc-tests
* ltp-math-tests
* ltp-nptl-tests
* ltp-pty-tests
* ltp-sched-tests
* ltp-securebits-tests
* ltp-syscalls-tests
* ltp-timers-tests
* ltp-open-posix-tests
* kselftest-vsyscall-mode-native
* kselftest-vsyscall-mode-none

--
Linaro LKFT
https://lkft.linaro.org

2018-10-04 20:14:21

by Mitch Harder

[permalink] [raw]
Subject: Re: [PATCH 4.14 117/126] net: sk_buff rbnode reorg

On Mon, Sep 17, 2018 at 5:42 PM, Greg Kroah-Hartman
<[email protected]> wrote:
> 4.14-stable review patch. If anyone has any objections, please let me know.
>
> ------------------
>
> From: Eric Dumazet <[email protected]>
>
> commit bffa72cf7f9df842f0016ba03586039296b4caaf upstream
>
> skb->rbnode shares space with skb->next, skb->prev and skb->tstamp
>
> Current uses (TCP receive ofo queue and netem) need to save/restore
> tstamp, while skb->dev is either NULL (TCP) or a constant for a given
> queue (netem).
>
> Since we plan using an RB tree for TCP retransmit queue to speedup SACK
> processing with large BDP, this patch exchanges skb->dev and
> skb->tstamp.
>
> This saves some overhead in both TCP and netem.
>
> v2: removes the swtstamp field from struct tcp_skb_cb
>
> Signed-off-by: Eric Dumazet <[email protected]>
> Cc: Soheil Hassas Yeganeh <[email protected]>
> Cc: Wei Wang <[email protected]>
> Cc: Willem de Bruijn <[email protected]>
> Acked-by: Soheil Hassas Yeganeh <[email protected]>
> Signed-off-by: David S. Miller <[email protected]>
> Signed-off-by: Greg Kroah-Hartman <[email protected]>
> ---
> include/linux/skbuff.h | 24 ++--
> include/net/inet_frag.h | 3
> net/ipv4/inet_fragment.c | 14 +-
> net/ipv4/ip_fragment.c | 182 +++++++++++++++++---------------
> net/ipv6/netfilter/nf_conntrack_reasm.c | 1
> net/ipv6/reassembly.c | 1
> 6 files changed, 128 insertions(+), 97 deletions(-)
>
> --- a/include/linux/skbuff.h
> +++ b/include/linux/skbuff.h
> @@ -663,23 +663,27 @@ struct sk_buff {
> struct sk_buff *prev;
>
> union {
> - ktime_t tstamp;
> - u64 skb_mstamp;
> + struct net_device *dev;
> + /* Some protocols might use this space to store information,
> + * while device pointer would be NULL.
> + * UDP receive path is one user.
> + */
> + unsigned long dev_scratch;
> };
> };
> - struct rb_node rbnode; /* used in netem & tcp stack */
> + struct rb_node rbnode; /* used in netem, ip4 defrag, and tcp stack */
> + struct list_head list;
> };
> - struct sock *sk;
>
> union {
> - struct net_device *dev;
> - /* Some protocols might use this space to store information,
> - * while device pointer would be NULL.
> - * UDP receive path is one user.
> - */
> - unsigned long dev_scratch;
> + struct sock *sk;
> int ip_defrag_offset;
> };
> +
> + union {
> + ktime_t tstamp;
> + u64 skb_mstamp;
> + };
> /*
> * This is the control buffer. It is free to use for every
> * layer. Please put your private variables there. If you
> --- a/include/net/inet_frag.h
> +++ b/include/net/inet_frag.h
> @@ -75,7 +75,8 @@ struct inet_frag_queue {
> struct timer_list timer;
> spinlock_t lock;
> refcount_t refcnt;
> - struct sk_buff *fragments;
> + struct sk_buff *fragments; /* Used in IPv6. */
> + struct rb_root rb_fragments; /* Used in IPv4. */
> struct sk_buff *fragments_tail;
> ktime_t stamp;
> int len;
> --- a/net/ipv4/inet_fragment.c
> +++ b/net/ipv4/inet_fragment.c
> @@ -136,12 +136,16 @@ void inet_frag_destroy(struct inet_frag_
> fp = q->fragments;
> nf = q->net;
> f = nf->f;
> - while (fp) {
> - struct sk_buff *xp = fp->next;
> + if (fp) {
> + do {
> + struct sk_buff *xp = fp->next;
>
> - sum_truesize += fp->truesize;
> - kfree_skb(fp);
> - fp = xp;
> + sum_truesize += fp->truesize;
> + kfree_skb(fp);
> + fp = xp;
> + } while (fp);
> + } else {
> + sum_truesize = skb_rbtree_purge(&q->rb_fragments);
> }
> sum = sum_truesize + f->qsize;
>
> --- a/net/ipv4/ip_fragment.c
> +++ b/net/ipv4/ip_fragment.c
> @@ -136,7 +136,7 @@ static void ip_expire(struct timer_list
> {
> struct inet_frag_queue *frag = from_timer(frag, t, timer);
> const struct iphdr *iph;
> - struct sk_buff *head;
> + struct sk_buff *head = NULL;
> struct net *net;
> struct ipq *qp;
> int err;
> @@ -152,14 +152,31 @@ static void ip_expire(struct timer_list
>
> ipq_kill(qp);
> __IP_INC_STATS(net, IPSTATS_MIB_REASMFAILS);
> -
> - head = qp->q.fragments;
> -
> __IP_INC_STATS(net, IPSTATS_MIB_REASMTIMEOUT);
>
> - if (!(qp->q.flags & INET_FRAG_FIRST_IN) || !head)
> + if (!qp->q.flags & INET_FRAG_FIRST_IN)
> goto out;
>
> + /* sk_buff::dev and sk_buff::rbnode are unionized. So we
> + * pull the head out of the tree in order to be able to
> + * deal with head->dev.
> + */
> + if (qp->q.fragments) {
> + head = qp->q.fragments;
> + qp->q.fragments = head->next;
> + } else {
> + head = skb_rb_first(&qp->q.rb_fragments);
> + if (!head)
> + goto out;
> + rb_erase(&head->rbnode, &qp->q.rb_fragments);
> + memset(&head->rbnode, 0, sizeof(head->rbnode));
> + barrier();
> + }
> + if (head == qp->q.fragments_tail)
> + qp->q.fragments_tail = NULL;
> +
> + sub_frag_mem_limit(qp->q.net, head->truesize);
> +
> head->dev = dev_get_by_index_rcu(net, qp->iif);
> if (!head->dev)
> goto out;
> @@ -179,16 +196,16 @@ static void ip_expire(struct timer_list
> (skb_rtable(head)->rt_type != RTN_LOCAL))
> goto out;
>
> - skb_get(head);
> spin_unlock(&qp->q.lock);
> icmp_send(head, ICMP_TIME_EXCEEDED, ICMP_EXC_FRAGTIME, 0);
> - kfree_skb(head);
> goto out_rcu_unlock;
>
> out:
> spin_unlock(&qp->q.lock);
> out_rcu_unlock:
> rcu_read_unlock();
> + if (head)
> + kfree_skb(head);
> ipq_put(qp);
> }
>
> @@ -231,7 +248,7 @@ static int ip_frag_too_far(struct ipq *q
> end = atomic_inc_return(&peer->rid);
> qp->rid = end;
>
> - rc = qp->q.fragments && (end - start) > max;
> + rc = qp->q.fragments_tail && (end - start) > max;
>
> if (rc) {
> struct net *net;
> @@ -245,7 +262,6 @@ static int ip_frag_too_far(struct ipq *q
>
> static int ip_frag_reinit(struct ipq *qp)
> {
> - struct sk_buff *fp;
> unsigned int sum_truesize = 0;
>
> if (!mod_timer(&qp->q.timer, jiffies + qp->q.net->timeout)) {
> @@ -253,20 +269,14 @@ static int ip_frag_reinit(struct ipq *qp
> return -ETIMEDOUT;
> }
>
> - fp = qp->q.fragments;
> - do {
> - struct sk_buff *xp = fp->next;
> -
> - sum_truesize += fp->truesize;
> - kfree_skb(fp);
> - fp = xp;
> - } while (fp);
> + sum_truesize = skb_rbtree_purge(&qp->q.rb_fragments);
> sub_frag_mem_limit(qp->q.net, sum_truesize);
>
> qp->q.flags = 0;
> qp->q.len = 0;
> qp->q.meat = 0;
> qp->q.fragments = NULL;
> + qp->q.rb_fragments = RB_ROOT;
> qp->q.fragments_tail = NULL;
> qp->iif = 0;
> qp->ecn = 0;
> @@ -278,7 +288,8 @@ static int ip_frag_reinit(struct ipq *qp
> static int ip_frag_queue(struct ipq *qp, struct sk_buff *skb)
> {
> struct net *net = container_of(qp->q.net, struct net, ipv4.frags);
> - struct sk_buff *prev, *next;
> + struct rb_node **rbn, *parent;
> + struct sk_buff *skb1;
> struct net_device *dev;
> unsigned int fragsize;
> int flags, offset;
> @@ -341,58 +352,58 @@ static int ip_frag_queue(struct ipq *qp,
> if (err)
> goto err;
>
> - /* Find out which fragments are in front and at the back of us
> - * in the chain of fragments so far. We must know where to put
> - * this fragment, right?
> - */
> - prev = qp->q.fragments_tail;
> - if (!prev || prev->ip_defrag_offset < offset) {
> - next = NULL;
> - goto found;
> - }
> - prev = NULL;
> - for (next = qp->q.fragments; next != NULL; next = next->next) {
> - if (next->ip_defrag_offset >= offset)
> - break; /* bingo! */
> - prev = next;
> - }
> + /* Note : skb->rbnode and skb->dev share the same location. */
> + dev = skb->dev;
> + /* Makes sure compiler wont do silly aliasing games */
> + barrier();
>
> -found:
> /* RFC5722, Section 4, amended by Errata ID : 3089
> * When reassembling an IPv6 datagram, if
> * one or more its constituent fragments is determined to be an
> * overlapping fragment, the entire datagram (and any constituent
> * fragments) MUST be silently discarded.
> *
> - * We do the same here for IPv4.
> + * We do the same here for IPv4 (and increment an snmp counter).
> */
>
> - /* Is there an overlap with the previous fragment? */
> - if (prev &&
> - (prev->ip_defrag_offset + prev->len) > offset)
> - goto discard_qp;
> -
> - /* Is there an overlap with the next fragment? */
> - if (next && next->ip_defrag_offset < end)
> - goto discard_qp;
> + /* Find out where to put this fragment. */
> + skb1 = qp->q.fragments_tail;
> + if (!skb1) {
> + /* This is the first fragment we've received. */
> + rb_link_node(&skb->rbnode, NULL, &qp->q.rb_fragments.rb_node);
> + qp->q.fragments_tail = skb;
> + } else if ((skb1->ip_defrag_offset + skb1->len) < end) {
> + /* This is the common/special case: skb goes to the end. */
> + /* Detect and discard overlaps. */
> + if (offset < (skb1->ip_defrag_offset + skb1->len))
> + goto discard_qp;
> + /* Insert after skb1. */
> + rb_link_node(&skb->rbnode, &skb1->rbnode, &skb1->rbnode.rb_right);
> + qp->q.fragments_tail = skb;
> + } else {
> + /* Binary search. Note that skb can become the first fragment, but
> + * not the last (covered above). */
> + rbn = &qp->q.rb_fragments.rb_node;
> + do {
> + parent = *rbn;
> + skb1 = rb_to_skb(parent);
> + if (end <= skb1->ip_defrag_offset)
> + rbn = &parent->rb_left;
> + else if (offset >= skb1->ip_defrag_offset + skb1->len)
> + rbn = &parent->rb_right;
> + else /* Found an overlap with skb1. */
> + goto discard_qp;
> + } while (*rbn);
> + /* Here we have parent properly set, and rbn pointing to
> + * one of its NULL left/right children. Insert skb. */
> + rb_link_node(&skb->rbnode, parent, rbn);
> + }
> + rb_insert_color(&skb->rbnode, &qp->q.rb_fragments);
>
> - /* Note : skb->ip_defrag_offset and skb->dev share the same location */
> - dev = skb->dev;
> if (dev)
> qp->iif = dev->ifindex;
> - /* Makes sure compiler wont do silly aliasing games */
> - barrier();
> skb->ip_defrag_offset = offset;
>
> - /* Insert this fragment in the chain of fragments. */
> - skb->next = next;
> - if (!next)
> - qp->q.fragments_tail = skb;
> - if (prev)
> - prev->next = skb;
> - else
> - qp->q.fragments = skb;
> -
> qp->q.stamp = skb->tstamp;
> qp->q.meat += skb->len;
> qp->ecn |= ecn;
> @@ -414,7 +425,7 @@ found:
> unsigned long orefdst = skb->_skb_refdst;
>
> skb->_skb_refdst = 0UL;
> - err = ip_frag_reasm(qp, prev, dev);
> + err = ip_frag_reasm(qp, skb, dev);
> skb->_skb_refdst = orefdst;
> return err;
> }
> @@ -431,15 +442,15 @@ err:
> return err;
> }
>
> -
> /* Build a new IP datagram from all its fragments. */
> -
> -static int ip_frag_reasm(struct ipq *qp, struct sk_buff *prev,
> +static int ip_frag_reasm(struct ipq *qp, struct sk_buff *skb,
> struct net_device *dev)
> {
> struct net *net = container_of(qp->q.net, struct net, ipv4.frags);
> struct iphdr *iph;
> - struct sk_buff *fp, *head = qp->q.fragments;
> + struct sk_buff *fp, *head = skb_rb_first(&qp->q.rb_fragments);
> + struct sk_buff **nextp; /* To build frag_list. */
> + struct rb_node *rbn;
> int len;
> int ihlen;
> int err;
> @@ -453,25 +464,20 @@ static int ip_frag_reasm(struct ipq *qp,
> goto out_fail;
> }
> /* Make the one we just received the head. */
> - if (prev) {
> - head = prev->next;
> - fp = skb_clone(head, GFP_ATOMIC);
> + if (head != skb) {
> + fp = skb_clone(skb, GFP_ATOMIC);
> if (!fp)
> goto out_nomem;
> -
> - fp->next = head->next;
> - if (!fp->next)
> + rb_replace_node(&skb->rbnode, &fp->rbnode, &qp->q.rb_fragments);
> + if (qp->q.fragments_tail == skb)
> qp->q.fragments_tail = fp;
> - prev->next = fp;
> -
> - skb_morph(head, qp->q.fragments);
> - head->next = qp->q.fragments->next;
> -
> - consume_skb(qp->q.fragments);
> - qp->q.fragments = head;
> + skb_morph(skb, head);
> + rb_replace_node(&head->rbnode, &skb->rbnode,
> + &qp->q.rb_fragments);
> + consume_skb(head);
> + head = skb;
> }
>
> - WARN_ON(!head);
> WARN_ON(head->ip_defrag_offset != 0);
>
> /* Allocate a new buffer for the datagram. */
> @@ -496,24 +502,35 @@ static int ip_frag_reasm(struct ipq *qp,
> clone = alloc_skb(0, GFP_ATOMIC);
> if (!clone)
> goto out_nomem;
> - clone->next = head->next;
> - head->next = clone;
> skb_shinfo(clone)->frag_list = skb_shinfo(head)->frag_list;
> skb_frag_list_init(head);
> for (i = 0; i < skb_shinfo(head)->nr_frags; i++)
> plen += skb_frag_size(&skb_shinfo(head)->frags[i]);
> clone->len = clone->data_len = head->data_len - plen;
> - head->data_len -= clone->len;
> - head->len -= clone->len;
> + skb->truesize += clone->truesize;
> clone->csum = 0;bffa72cf7f9df
> clone->ip_summed = head->ip_summed;
> add_frag_mem_limit(qp->q.net, clone->truesize);
> + skb_shinfo(head)->frag_list = clone;
> + nextp = &clone->next;
> + } else {
> + nextp = &skb_shinfo(head)->frag_list;
> }
>
> - skb_shinfo(head)->frag_list = head->next;
> skb_push(head, head->data - skb_network_header(head));
>
> - for (fp=head->next; fp; fp = fp->next) {
> + /* Traverse the tree in order, to build frag_list. */
> + rbn = rb_next(&head->rbnode);
> + rb_erase(&head->rbnode, &qp->q.rb_fragments);
> + while (rbn) {
> + struct rb_node *rbnext = rb_next(rbn);
> + fp = rb_to_skb(rbn);
> + rb_erase(rbn, &qp->q.rb_fragments);
> + rbn = rbnext;
> + *nextp = fp;
> + nextp = &fp->next;
> + fp->prev = NULL;
> + memset(&fp->rbnode, 0, sizeof(fp->rbnode));
> head->data_len += fp->len;
> head->len += fp->len;
> if (head->ip_summed != fp->ip_summed)
> @@ -524,7 +541,9 @@ static int ip_frag_reasm(struct ipq *qp,
> }
> sub_frag_mem_limit(qp->q.net, head->truesize);
>
> + *nextp = NULL;
> head->next = NULL;
> + head->prev = NULL;
> head->dev = dev;
> head->tstamp = qp->q.stamp;
> IPCB(head)->frag_max_size = max(qp->max_df_size, qp->q.max_size);
> @@ -552,6 +571,7 @@ static int ip_frag_reasm(struct ipq *qp,
>
> __IP_INC_STATS(net, IPSTATS_MIB_REASMOKS);
> qp->q.fragments = NULL;
> + qp->q.rb_fragments = RB_ROOT;
> qp->q.fragments_tail = NULL;
> return 0;
>
> --- a/net/ipv6/netfilter/nf_conntrack_reasm.c
> +++ b/net/ipv6/netfilter/nf_conntrack_reasm.c
> @@ -471,6 +471,7 @@ nf_ct_frag6_reasm(struct frag_queue *fq,
> head->csum);
>
> fq->q.fragments = NULL;
> + fq->q.rb_fragments = RB_ROOT;
> fq->q.fragments_tail = NULL;
>
> return true;
> --- a/net/ipv6/reassembly.c
> +++ b/net/ipv6/reassembly.c
> @@ -472,6 +472,7 @@ static int ip6_frag_reasm(struct frag_qu
> __IP6_INC_STATS(net, __in6_dev_get(dev), IPSTATS_MIB_REASMOKS);
> rcu_read_unlock();
> fq->q.fragments = NULL;
> + fq->q.rb_fragments = RB_ROOT;
> fq->q.fragments_tail = NULL;
> return 1;
>
>
>

I'm getting a kernel panic on the >=4.14.71 stable kernels, and I've
isolated the problem back to this patch.

My 4.18.11 kernel seems to be OK.

Whenever I inject a delay into the interface with iproute2 tools, I get a panic.

Example command:
tc qdisc add dev eth0 root netem delay 35ms

The RIP is pointing at netif_skb_features+0x31/0x230

My efforts to get a transmittable copy of the panic have been thwarted.

There's some confusion between this patch and the upstream patch
refered to in the commit message

The upstream commit patches net/sched/sch_netem.c which isn't even
touched in this commit.

Althought the commit messages are the same, the two patches seem to
have a different purpose.

https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/net/sched?id=bffa72cf7f9df842f0016ba03586039296b4caaf

The commit message seems more relavant to this patch.

The upstream commit bffa72cf7f9df842f0016ba03586039296b4caaf has not
yet been applied to the stable tree.

I decided to roll the dice, and apply the upstream patch
bffa72cf7f9df842f0016ba03586039296b4caaf (it's been in the main kernel
tree just over a year).

When I manually patch my 4.14.74 kernel with
bffa72cf7f9df842f0016ba03586039296b4caaf, my panic seems to be solved.

I'm uncertain if this is the proper solution, but I hope this points
in the direction of the issue.

2018-11-29 10:35:40

by Greg Kroah-Hartman

[permalink] [raw]
Subject: Re: [PATCH 4.14 117/126] net: sk_buff rbnode reorg

On Thu, Oct 04, 2018 at 03:13:56PM -0500, Mitch Harder wrote:
> On Mon, Sep 17, 2018 at 5:42 PM, Greg Kroah-Hartman
> <[email protected]> wrote:
> > 4.14-stable review patch. If anyone has any objections, please let me know.
> >
> > ------------------
> >
> > From: Eric Dumazet <[email protected]>
> >
> > commit bffa72cf7f9df842f0016ba03586039296b4caaf upstream
> >
> > skb->rbnode shares space with skb->next, skb->prev and skb->tstamp
> >
> > Current uses (TCP receive ofo queue and netem) need to save/restore
> > tstamp, while skb->dev is either NULL (TCP) or a constant for a given
> > queue (netem).
> >
> > Since we plan using an RB tree for TCP retransmit queue to speedup SACK
> > processing with large BDP, this patch exchanges skb->dev and
> > skb->tstamp.
> >
> > This saves some overhead in both TCP and netem.
> >
> > v2: removes the swtstamp field from struct tcp_skb_cb
> >
> > Signed-off-by: Eric Dumazet <[email protected]>
> > Cc: Soheil Hassas Yeganeh <[email protected]>
> > Cc: Wei Wang <[email protected]>
> > Cc: Willem de Bruijn <[email protected]>
> > Acked-by: Soheil Hassas Yeganeh <[email protected]>
> > Signed-off-by: David S. Miller <[email protected]>
> > Signed-off-by: Greg Kroah-Hartman <[email protected]>
> > ---
> > include/linux/skbuff.h | 24 ++--
> > include/net/inet_frag.h | 3
> > net/ipv4/inet_fragment.c | 14 +-
> > net/ipv4/ip_fragment.c | 182 +++++++++++++++++---------------
> > net/ipv6/netfilter/nf_conntrack_reasm.c | 1
> > net/ipv6/reassembly.c | 1
> > 6 files changed, 128 insertions(+), 97 deletions(-)
> >
> > --- a/include/linux/skbuff.h
> > +++ b/include/linux/skbuff.h
> > @@ -663,23 +663,27 @@ struct sk_buff {
> > struct sk_buff *prev;
> >
> > union {
> > - ktime_t tstamp;
> > - u64 skb_mstamp;
> > + struct net_device *dev;
> > + /* Some protocols might use this space to store information,
> > + * while device pointer would be NULL.
> > + * UDP receive path is one user.
> > + */
> > + unsigned long dev_scratch;
> > };
> > };
> > - struct rb_node rbnode; /* used in netem & tcp stack */
> > + struct rb_node rbnode; /* used in netem, ip4 defrag, and tcp stack */
> > + struct list_head list;
> > };
> > - struct sock *sk;
> >
> > union {
> > - struct net_device *dev;
> > - /* Some protocols might use this space to store information,
> > - * while device pointer would be NULL.
> > - * UDP receive path is one user.
> > - */
> > - unsigned long dev_scratch;
> > + struct sock *sk;
> > int ip_defrag_offset;
> > };
> > +
> > + union {
> > + ktime_t tstamp;
> > + u64 skb_mstamp;
> > + };
> > /*
> > * This is the control buffer. It is free to use for every
> > * layer. Please put your private variables there. If you
> > --- a/include/net/inet_frag.h
> > +++ b/include/net/inet_frag.h
> > @@ -75,7 +75,8 @@ struct inet_frag_queue {
> > struct timer_list timer;
> > spinlock_t lock;
> > refcount_t refcnt;
> > - struct sk_buff *fragments;
> > + struct sk_buff *fragments; /* Used in IPv6. */
> > + struct rb_root rb_fragments; /* Used in IPv4. */
> > struct sk_buff *fragments_tail;
> > ktime_t stamp;
> > int len;
> > --- a/net/ipv4/inet_fragment.c
> > +++ b/net/ipv4/inet_fragment.c
> > @@ -136,12 +136,16 @@ void inet_frag_destroy(struct inet_frag_
> > fp = q->fragments;
> > nf = q->net;
> > f = nf->f;
> > - while (fp) {
> > - struct sk_buff *xp = fp->next;
> > + if (fp) {
> > + do {
> > + struct sk_buff *xp = fp->next;
> >
> > - sum_truesize += fp->truesize;
> > - kfree_skb(fp);
> > - fp = xp;
> > + sum_truesize += fp->truesize;
> > + kfree_skb(fp);
> > + fp = xp;
> > + } while (fp);
> > + } else {
> > + sum_truesize = skb_rbtree_purge(&q->rb_fragments);
> > }
> > sum = sum_truesize + f->qsize;
> >
> > --- a/net/ipv4/ip_fragment.c
> > +++ b/net/ipv4/ip_fragment.c
> > @@ -136,7 +136,7 @@ static void ip_expire(struct timer_list
> > {
> > struct inet_frag_queue *frag = from_timer(frag, t, timer);
> > const struct iphdr *iph;
> > - struct sk_buff *head;
> > + struct sk_buff *head = NULL;
> > struct net *net;
> > struct ipq *qp;
> > int err;
> > @@ -152,14 +152,31 @@ static void ip_expire(struct timer_list
> >
> > ipq_kill(qp);
> > __IP_INC_STATS(net, IPSTATS_MIB_REASMFAILS);
> > -
> > - head = qp->q.fragments;
> > -
> > __IP_INC_STATS(net, IPSTATS_MIB_REASMTIMEOUT);
> >
> > - if (!(qp->q.flags & INET_FRAG_FIRST_IN) || !head)
> > + if (!qp->q.flags & INET_FRAG_FIRST_IN)
> > goto out;
> >
> > + /* sk_buff::dev and sk_buff::rbnode are unionized. So we
> > + * pull the head out of the tree in order to be able to
> > + * deal with head->dev.
> > + */
> > + if (qp->q.fragments) {
> > + head = qp->q.fragments;
> > + qp->q.fragments = head->next;
> > + } else {
> > + head = skb_rb_first(&qp->q.rb_fragments);
> > + if (!head)
> > + goto out;
> > + rb_erase(&head->rbnode, &qp->q.rb_fragments);
> > + memset(&head->rbnode, 0, sizeof(head->rbnode));
> > + barrier();
> > + }
> > + if (head == qp->q.fragments_tail)
> > + qp->q.fragments_tail = NULL;
> > +
> > + sub_frag_mem_limit(qp->q.net, head->truesize);
> > +
> > head->dev = dev_get_by_index_rcu(net, qp->iif);
> > if (!head->dev)
> > goto out;
> > @@ -179,16 +196,16 @@ static void ip_expire(struct timer_list
> > (skb_rtable(head)->rt_type != RTN_LOCAL))
> > goto out;
> >
> > - skb_get(head);
> > spin_unlock(&qp->q.lock);
> > icmp_send(head, ICMP_TIME_EXCEEDED, ICMP_EXC_FRAGTIME, 0);
> > - kfree_skb(head);
> > goto out_rcu_unlock;
> >
> > out:
> > spin_unlock(&qp->q.lock);
> > out_rcu_unlock:
> > rcu_read_unlock();
> > + if (head)
> > + kfree_skb(head);
> > ipq_put(qp);
> > }
> >
> > @@ -231,7 +248,7 @@ static int ip_frag_too_far(struct ipq *q
> > end = atomic_inc_return(&peer->rid);
> > qp->rid = end;
> >
> > - rc = qp->q.fragments && (end - start) > max;
> > + rc = qp->q.fragments_tail && (end - start) > max;
> >
> > if (rc) {
> > struct net *net;
> > @@ -245,7 +262,6 @@ static int ip_frag_too_far(struct ipq *q
> >
> > static int ip_frag_reinit(struct ipq *qp)
> > {
> > - struct sk_buff *fp;
> > unsigned int sum_truesize = 0;
> >
> > if (!mod_timer(&qp->q.timer, jiffies + qp->q.net->timeout)) {
> > @@ -253,20 +269,14 @@ static int ip_frag_reinit(struct ipq *qp
> > return -ETIMEDOUT;
> > }
> >
> > - fp = qp->q.fragments;
> > - do {
> > - struct sk_buff *xp = fp->next;
> > -
> > - sum_truesize += fp->truesize;
> > - kfree_skb(fp);
> > - fp = xp;
> > - } while (fp);
> > + sum_truesize = skb_rbtree_purge(&qp->q.rb_fragments);
> > sub_frag_mem_limit(qp->q.net, sum_truesize);
> >
> > qp->q.flags = 0;
> > qp->q.len = 0;
> > qp->q.meat = 0;
> > qp->q.fragments = NULL;
> > + qp->q.rb_fragments = RB_ROOT;
> > qp->q.fragments_tail = NULL;
> > qp->iif = 0;
> > qp->ecn = 0;
> > @@ -278,7 +288,8 @@ static int ip_frag_reinit(struct ipq *qp
> > static int ip_frag_queue(struct ipq *qp, struct sk_buff *skb)
> > {
> > struct net *net = container_of(qp->q.net, struct net, ipv4.frags);
> > - struct sk_buff *prev, *next;
> > + struct rb_node **rbn, *parent;
> > + struct sk_buff *skb1;
> > struct net_device *dev;
> > unsigned int fragsize;
> > int flags, offset;
> > @@ -341,58 +352,58 @@ static int ip_frag_queue(struct ipq *qp,
> > if (err)
> > goto err;
> >
> > - /* Find out which fragments are in front and at the back of us
> > - * in the chain of fragments so far. We must know where to put
> > - * this fragment, right?
> > - */
> > - prev = qp->q.fragments_tail;
> > - if (!prev || prev->ip_defrag_offset < offset) {
> > - next = NULL;
> > - goto found;
> > - }
> > - prev = NULL;
> > - for (next = qp->q.fragments; next != NULL; next = next->next) {
> > - if (next->ip_defrag_offset >= offset)
> > - break; /* bingo! */
> > - prev = next;
> > - }
> > + /* Note : skb->rbnode and skb->dev share the same location. */
> > + dev = skb->dev;
> > + /* Makes sure compiler wont do silly aliasing games */
> > + barrier();
> >
> > -found:
> > /* RFC5722, Section 4, amended by Errata ID : 3089
> > * When reassembling an IPv6 datagram, if
> > * one or more its constituent fragments is determined to be an
> > * overlapping fragment, the entire datagram (and any constituent
> > * fragments) MUST be silently discarded.
> > *
> > - * We do the same here for IPv4.
> > + * We do the same here for IPv4 (and increment an snmp counter).
> > */
> >
> > - /* Is there an overlap with the previous fragment? */
> > - if (prev &&
> > - (prev->ip_defrag_offset + prev->len) > offset)
> > - goto discard_qp;
> > -
> > - /* Is there an overlap with the next fragment? */
> > - if (next && next->ip_defrag_offset < end)
> > - goto discard_qp;
> > + /* Find out where to put this fragment. */
> > + skb1 = qp->q.fragments_tail;
> > + if (!skb1) {
> > + /* This is the first fragment we've received. */
> > + rb_link_node(&skb->rbnode, NULL, &qp->q.rb_fragments.rb_node);
> > + qp->q.fragments_tail = skb;
> > + } else if ((skb1->ip_defrag_offset + skb1->len) < end) {
> > + /* This is the common/special case: skb goes to the end. */
> > + /* Detect and discard overlaps. */
> > + if (offset < (skb1->ip_defrag_offset + skb1->len))
> > + goto discard_qp;
> > + /* Insert after skb1. */
> > + rb_link_node(&skb->rbnode, &skb1->rbnode, &skb1->rbnode.rb_right);
> > + qp->q.fragments_tail = skb;
> > + } else {
> > + /* Binary search. Note that skb can become the first fragment, but
> > + * not the last (covered above). */
> > + rbn = &qp->q.rb_fragments.rb_node;
> > + do {
> > + parent = *rbn;
> > + skb1 = rb_to_skb(parent);
> > + if (end <= skb1->ip_defrag_offset)
> > + rbn = &parent->rb_left;
> > + else if (offset >= skb1->ip_defrag_offset + skb1->len)
> > + rbn = &parent->rb_right;
> > + else /* Found an overlap with skb1. */
> > + goto discard_qp;
> > + } while (*rbn);
> > + /* Here we have parent properly set, and rbn pointing to
> > + * one of its NULL left/right children. Insert skb. */
> > + rb_link_node(&skb->rbnode, parent, rbn);
> > + }
> > + rb_insert_color(&skb->rbnode, &qp->q.rb_fragments);
> >
> > - /* Note : skb->ip_defrag_offset and skb->dev share the same location */
> > - dev = skb->dev;
> > if (dev)
> > qp->iif = dev->ifindex;
> > - /* Makes sure compiler wont do silly aliasing games */
> > - barrier();
> > skb->ip_defrag_offset = offset;
> >
> > - /* Insert this fragment in the chain of fragments. */
> > - skb->next = next;
> > - if (!next)
> > - qp->q.fragments_tail = skb;
> > - if (prev)
> > - prev->next = skb;
> > - else
> > - qp->q.fragments = skb;
> > -
> > qp->q.stamp = skb->tstamp;
> > qp->q.meat += skb->len;
> > qp->ecn |= ecn;
> > @@ -414,7 +425,7 @@ found:
> > unsigned long orefdst = skb->_skb_refdst;
> >
> > skb->_skb_refdst = 0UL;
> > - err = ip_frag_reasm(qp, prev, dev);
> > + err = ip_frag_reasm(qp, skb, dev);
> > skb->_skb_refdst = orefdst;
> > return err;
> > }
> > @@ -431,15 +442,15 @@ err:
> > return err;
> > }
> >
> > -
> > /* Build a new IP datagram from all its fragments. */
> > -
> > -static int ip_frag_reasm(struct ipq *qp, struct sk_buff *prev,
> > +static int ip_frag_reasm(struct ipq *qp, struct sk_buff *skb,
> > struct net_device *dev)
> > {
> > struct net *net = container_of(qp->q.net, struct net, ipv4.frags);
> > struct iphdr *iph;
> > - struct sk_buff *fp, *head = qp->q.fragments;
> > + struct sk_buff *fp, *head = skb_rb_first(&qp->q.rb_fragments);
> > + struct sk_buff **nextp; /* To build frag_list. */
> > + struct rb_node *rbn;
> > int len;
> > int ihlen;
> > int err;
> > @@ -453,25 +464,20 @@ static int ip_frag_reasm(struct ipq *qp,
> > goto out_fail;
> > }
> > /* Make the one we just received the head. */
> > - if (prev) {
> > - head = prev->next;
> > - fp = skb_clone(head, GFP_ATOMIC);
> > + if (head != skb) {
> > + fp = skb_clone(skb, GFP_ATOMIC);
> > if (!fp)
> > goto out_nomem;
> > -
> > - fp->next = head->next;
> > - if (!fp->next)
> > + rb_replace_node(&skb->rbnode, &fp->rbnode, &qp->q.rb_fragments);
> > + if (qp->q.fragments_tail == skb)
> > qp->q.fragments_tail = fp;
> > - prev->next = fp;
> > -
> > - skb_morph(head, qp->q.fragments);
> > - head->next = qp->q.fragments->next;
> > -
> > - consume_skb(qp->q.fragments);
> > - qp->q.fragments = head;
> > + skb_morph(skb, head);
> > + rb_replace_node(&head->rbnode, &skb->rbnode,
> > + &qp->q.rb_fragments);
> > + consume_skb(head);
> > + head = skb;
> > }
> >
> > - WARN_ON(!head);
> > WARN_ON(head->ip_defrag_offset != 0);
> >
> > /* Allocate a new buffer for the datagram. */
> > @@ -496,24 +502,35 @@ static int ip_frag_reasm(struct ipq *qp,
> > clone = alloc_skb(0, GFP_ATOMIC);
> > if (!clone)
> > goto out_nomem;
> > - clone->next = head->next;
> > - head->next = clone;
> > skb_shinfo(clone)->frag_list = skb_shinfo(head)->frag_list;
> > skb_frag_list_init(head);
> > for (i = 0; i < skb_shinfo(head)->nr_frags; i++)
> > plen += skb_frag_size(&skb_shinfo(head)->frags[i]);
> > clone->len = clone->data_len = head->data_len - plen;
> > - head->data_len -= clone->len;
> > - head->len -= clone->len;
> > + skb->truesize += clone->truesize;
> > clone->csum = 0;bffa72cf7f9df
> > clone->ip_summed = head->ip_summed;
> > add_frag_mem_limit(qp->q.net, clone->truesize);
> > + skb_shinfo(head)->frag_list = clone;
> > + nextp = &clone->next;
> > + } else {
> > + nextp = &skb_shinfo(head)->frag_list;
> > }
> >
> > - skb_shinfo(head)->frag_list = head->next;
> > skb_push(head, head->data - skb_network_header(head));
> >
> > - for (fp=head->next; fp; fp = fp->next) {
> > + /* Traverse the tree in order, to build frag_list. */
> > + rbn = rb_next(&head->rbnode);
> > + rb_erase(&head->rbnode, &qp->q.rb_fragments);
> > + while (rbn) {
> > + struct rb_node *rbnext = rb_next(rbn);
> > + fp = rb_to_skb(rbn);
> > + rb_erase(rbn, &qp->q.rb_fragments);
> > + rbn = rbnext;
> > + *nextp = fp;
> > + nextp = &fp->next;
> > + fp->prev = NULL;
> > + memset(&fp->rbnode, 0, sizeof(fp->rbnode));
> > head->data_len += fp->len;
> > head->len += fp->len;
> > if (head->ip_summed != fp->ip_summed)
> > @@ -524,7 +541,9 @@ static int ip_frag_reasm(struct ipq *qp,
> > }
> > sub_frag_mem_limit(qp->q.net, head->truesize);
> >
> > + *nextp = NULL;
> > head->next = NULL;
> > + head->prev = NULL;
> > head->dev = dev;
> > head->tstamp = qp->q.stamp;
> > IPCB(head)->frag_max_size = max(qp->max_df_size, qp->q.max_size);
> > @@ -552,6 +571,7 @@ static int ip_frag_reasm(struct ipq *qp,
> >
> > __IP_INC_STATS(net, IPSTATS_MIB_REASMOKS);
> > qp->q.fragments = NULL;
> > + qp->q.rb_fragments = RB_ROOT;
> > qp->q.fragments_tail = NULL;
> > return 0;
> >
> > --- a/net/ipv6/netfilter/nf_conntrack_reasm.c
> > +++ b/net/ipv6/netfilter/nf_conntrack_reasm.c
> > @@ -471,6 +471,7 @@ nf_ct_frag6_reasm(struct frag_queue *fq,
> > head->csum);
> >
> > fq->q.fragments = NULL;
> > + fq->q.rb_fragments = RB_ROOT;
> > fq->q.fragments_tail = NULL;
> >
> > return true;
> > --- a/net/ipv6/reassembly.c
> > +++ b/net/ipv6/reassembly.c
> > @@ -472,6 +472,7 @@ static int ip6_frag_reasm(struct frag_qu
> > __IP6_INC_STATS(net, __in6_dev_get(dev), IPSTATS_MIB_REASMOKS);
> > rcu_read_unlock();
> > fq->q.fragments = NULL;
> > + fq->q.rb_fragments = RB_ROOT;
> > fq->q.fragments_tail = NULL;
> > return 1;
> >
> >
> >
>
> I'm getting a kernel panic on the >=4.14.71 stable kernels, and I've
> isolated the problem back to this patch.
>
> My 4.18.11 kernel seems to be OK.
>
> Whenever I inject a delay into the interface with iproute2 tools, I get a panic.
>
> Example command:
> tc qdisc add dev eth0 root netem delay 35ms
>
> The RIP is pointing at netif_skb_features+0x31/0x230
>
> My efforts to get a transmittable copy of the panic have been thwarted.
>
> There's some confusion between this patch and the upstream patch
> refered to in the commit message
>
> The upstream commit patches net/sched/sch_netem.c which isn't even
> touched in this commit.
>
> Althought the commit messages are the same, the two patches seem to
> have a different purpose.
>
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/net/sched?id=bffa72cf7f9df842f0016ba03586039296b4caaf
>
> The commit message seems more relavant to this patch.
>
> The upstream commit bffa72cf7f9df842f0016ba03586039296b4caaf has not
> yet been applied to the stable tree.
>
> I decided to roll the dice, and apply the upstream patch
> bffa72cf7f9df842f0016ba03586039296b4caaf (it's been in the main kernel
> tree just over a year).
>
> When I manually patch my 4.14.74 kernel with
> bffa72cf7f9df842f0016ba03586039296b4caaf, my panic seems to be solved.

That is odd, as this commit is in the 4.14.71 kernel release, so it
should not be able to be applied to 4.14.74.

If something still needs to be done here for the 4.14.y kernel tree,
please let me know.

thanks,

greg k-h

2018-11-29 15:10:20

by Mitch Harder

[permalink] [raw]
Subject: Re: [PATCH 4.14 117/126] net: sk_buff rbnode reorg

On Thu, Nov 29, 2018 at 4:33 AM Greg Kroah-Hartman
<[email protected]> wrote:
>
> On Thu, Oct 04, 2018 at 03:13:56PM -0500, Mitch Harder wrote:
> > On Mon, Sep 17, 2018 at 5:42 PM, Greg Kroah-Hartman
> > <[email protected]> wrote:
> > > 4.14-stable review patch. If anyone has any objections, please let me know.
> > >
> > > ------------------
> > >
> > > From: Eric Dumazet <[email protected]>
> > >
> > > commit bffa72cf7f9df842f0016ba03586039296b4caaf upstream
> > >
> > > skb->rbnode shares space with skb->next, skb->prev and skb->tstamp
> > >
> > > Current uses (TCP receive ofo queue and netem) need to save/restore
> > > tstamp, while skb->dev is either NULL (TCP) or a constant for a given
> > > queue (netem).
> > >
> > > Since we plan using an RB tree for TCP retransmit queue to speedup SACK
> > > processing with large BDP, this patch exchanges skb->dev and
> > > skb->tstamp.
> > >
> > > This saves some overhead in both TCP and netem.
> > >
> > > v2: removes the swtstamp field from struct tcp_skb_cb
> > >
> > > Signed-off-by: Eric Dumazet <[email protected]>
> > > Cc: Soheil Hassas Yeganeh <[email protected]>
> > > Cc: Wei Wang <[email protected]>
> > > Cc: Willem de Bruijn <[email protected]>
> > > Acked-by: Soheil Hassas Yeganeh <[email protected]>
> > > Signed-off-by: David S. Miller <[email protected]>
> > > Signed-off-by: Greg Kroah-Hartman <[email protected]>
> > > ---
> > > include/linux/skbuff.h | 24 ++--
> > > include/net/inet_frag.h | 3
> > > net/ipv4/inet_fragment.c | 14 +-
> > > net/ipv4/ip_fragment.c | 182 +++++++++++++++++---------------
> > > net/ipv6/netfilter/nf_conntrack_reasm.c | 1
> > > net/ipv6/reassembly.c | 1
> > > 6 files changed, 128 insertions(+), 97 deletions(-)
> > >
> > > --- a/include/linux/skbuff.h
> > > +++ b/include/linux/skbuff.h
> > > @@ -663,23 +663,27 @@ struct sk_buff {
> > > struct sk_buff *prev;
> > >
> > > union {
> > > - ktime_t tstamp;
> > > - u64 skb_mstamp;
> > > + struct net_device *dev;
> > > + /* Some protocols might use this space to store information,
> > > + * while device pointer would be NULL.
> > > + * UDP receive path is one user.
> > > + */
> > > + unsigned long dev_scratch;
> > > };
> > > };
> > > - struct rb_node rbnode; /* used in netem & tcp stack */
> > > + struct rb_node rbnode; /* used in netem, ip4 defrag, and tcp stack */
> > > + struct list_head list;
> > > };
> > > - struct sock *sk;
> > >
> > > union {
> > > - struct net_device *dev;
> > > - /* Some protocols might use this space to store information,
> > > - * while device pointer would be NULL.
> > > - * UDP receive path is one user.
> > > - */
> > > - unsigned long dev_scratch;
> > > + struct sock *sk;
> > > int ip_defrag_offset;
> > > };
> > > +
> > > + union {
> > > + ktime_t tstamp;
> > > + u64 skb_mstamp;
> > > + };
> > > /*
> > > * This is the control buffer. It is free to use for every
> > > * layer. Please put your private variables there. If you
> > > --- a/include/net/inet_frag.h
> > > +++ b/include/net/inet_frag.h
> > > @@ -75,7 +75,8 @@ struct inet_frag_queue {
> > > struct timer_list timer;
> > > spinlock_t lock;
> > > refcount_t refcnt;
> > > - struct sk_buff *fragments;
> > > + struct sk_buff *fragments; /* Used in IPv6. */
> > > + struct rb_root rb_fragments; /* Used in IPv4. */
> > > struct sk_buff *fragments_tail;
> > > ktime_t stamp;
> > > int len;
> > > --- a/net/ipv4/inet_fragment.c
> > > +++ b/net/ipv4/inet_fragment.c
> > > @@ -136,12 +136,16 @@ void inet_frag_destroy(struct inet_frag_
> > > fp = q->fragments;
> > > nf = q->net;
> > > f = nf->f;
> > > - while (fp) {
> > > - struct sk_buff *xp = fp->next;
> > > + if (fp) {
> > > + do {
> > > + struct sk_buff *xp = fp->next;
> > >
> > > - sum_truesize += fp->truesize;
> > > - kfree_skb(fp);
> > > - fp = xp;
> > > + sum_truesize += fp->truesize;
> > > + kfree_skb(fp);
> > > + fp = xp;
> > > + } while (fp);
> > > + } else {
> > > + sum_truesize = skb_rbtree_purge(&q->rb_fragments);
> > > }
> > > sum = sum_truesize + f->qsize;
> > >
> > > --- a/net/ipv4/ip_fragment.c
> > > +++ b/net/ipv4/ip_fragment.c
> > > @@ -136,7 +136,7 @@ static void ip_expire(struct timer_list
> > > {
> > > struct inet_frag_queue *frag = from_timer(frag, t, timer);
> > > const struct iphdr *iph;
> > > - struct sk_buff *head;
> > > + struct sk_buff *head = NULL;
> > > struct net *net;
> > > struct ipq *qp;
> > > int err;
> > > @@ -152,14 +152,31 @@ static void ip_expire(struct timer_list
> > >
> > > ipq_kill(qp);
> > > __IP_INC_STATS(net, IPSTATS_MIB_REASMFAILS);
> > > -
> > > - head = qp->q.fragments;
> > > -
> > > __IP_INC_STATS(net, IPSTATS_MIB_REASMTIMEOUT);
> > >
> > > - if (!(qp->q.flags & INET_FRAG_FIRST_IN) || !head)
> > > + if (!qp->q.flags & INET_FRAG_FIRST_IN)
> > > goto out;
> > >
> > > + /* sk_buff::dev and sk_buff::rbnode are unionized. So we
> > > + * pull the head out of the tree in order to be able to
> > > + * deal with head->dev.
> > > + */
> > > + if (qp->q.fragments) {
> > > + head = qp->q.fragments;
> > > + qp->q.fragments = head->next;
> > > + } else {
> > > + head = skb_rb_first(&qp->q.rb_fragments);
> > > + if (!head)
> > > + goto out;
> > > + rb_erase(&head->rbnode, &qp->q.rb_fragments);
> > > + memset(&head->rbnode, 0, sizeof(head->rbnode));
> > > + barrier();
> > > + }
> > > + if (head == qp->q.fragments_tail)
> > > + qp->q.fragments_tail = NULL;
> > > +
> > > + sub_frag_mem_limit(qp->q.net, head->truesize);
> > > +
> > > head->dev = dev_get_by_index_rcu(net, qp->iif);
> > > if (!head->dev)
> > > goto out;
> > > @@ -179,16 +196,16 @@ static void ip_expire(struct timer_list
> > > (skb_rtable(head)->rt_type != RTN_LOCAL))
> > > goto out;
> > >
> > > - skb_get(head);
> > > spin_unlock(&qp->q.lock);
> > > icmp_send(head, ICMP_TIME_EXCEEDED, ICMP_EXC_FRAGTIME, 0);
> > > - kfree_skb(head);
> > > goto out_rcu_unlock;
> > >
> > > out:
> > > spin_unlock(&qp->q.lock);
> > > out_rcu_unlock:
> > > rcu_read_unlock();
> > > + if (head)
> > > + kfree_skb(head);
> > > ipq_put(qp);
> > > }
> > >
> > > @@ -231,7 +248,7 @@ static int ip_frag_too_far(struct ipq *q
> > > end = atomic_inc_return(&peer->rid);
> > > qp->rid = end;
> > >
> > > - rc = qp->q.fragments && (end - start) > max;
> > > + rc = qp->q.fragments_tail && (end - start) > max;
> > >
> > > if (rc) {
> > > struct net *net;
> > > @@ -245,7 +262,6 @@ static int ip_frag_too_far(struct ipq *q
> > >
> > > static int ip_frag_reinit(struct ipq *qp)
> > > {
> > > - struct sk_buff *fp;
> > > unsigned int sum_truesize = 0;
> > >
> > > if (!mod_timer(&qp->q.timer, jiffies + qp->q.net->timeout)) {
> > > @@ -253,20 +269,14 @@ static int ip_frag_reinit(struct ipq *qp
> > > return -ETIMEDOUT;
> > > }
> > >
> > > - fp = qp->q.fragments;
> > > - do {
> > > - struct sk_buff *xp = fp->next;
> > > -
> > > - sum_truesize += fp->truesize;
> > > - kfree_skb(fp);
> > > - fp = xp;
> > > - } while (fp);
> > > + sum_truesize = skb_rbtree_purge(&qp->q.rb_fragments);
> > > sub_frag_mem_limit(qp->q.net, sum_truesize);
> > >
> > > qp->q.flags = 0;
> > > qp->q.len = 0;
> > > qp->q.meat = 0;
> > > qp->q.fragments = NULL;
> > > + qp->q.rb_fragments = RB_ROOT;
> > > qp->q.fragments_tail = NULL;
> > > qp->iif = 0;
> > > qp->ecn = 0;
> > > @@ -278,7 +288,8 @@ static int ip_frag_reinit(struct ipq *qp
> > > static int ip_frag_queue(struct ipq *qp, struct sk_buff *skb)
> > > {
> > > struct net *net = container_of(qp->q.net, struct net, ipv4.frags);
> > > - struct sk_buff *prev, *next;
> > > + struct rb_node **rbn, *parent;
> > > + struct sk_buff *skb1;
> > > struct net_device *dev;
> > > unsigned int fragsize;
> > > int flags, offset;
> > > @@ -341,58 +352,58 @@ static int ip_frag_queue(struct ipq *qp,
> > > if (err)
> > > goto err;
> > >
> > > - /* Find out which fragments are in front and at the back of us
> > > - * in the chain of fragments so far. We must know where to put
> > > - * this fragment, right?
> > > - */
> > > - prev = qp->q.fragments_tail;
> > > - if (!prev || prev->ip_defrag_offset < offset) {
> > > - next = NULL;
> > > - goto found;
> > > - }
> > > - prev = NULL;
> > > - for (next = qp->q.fragments; next != NULL; next = next->next) {
> > > - if (next->ip_defrag_offset >= offset)
> > > - break; /* bingo! */
> > > - prev = next;
> > > - }
> > > + /* Note : skb->rbnode and skb->dev share the same location. */
> > > + dev = skb->dev;
> > > + /* Makes sure compiler wont do silly aliasing games */
> > > + barrier();
> > >
> > > -found:
> > > /* RFC5722, Section 4, amended by Errata ID : 3089
> > > * When reassembling an IPv6 datagram, if
> > > * one or more its constituent fragments is determined to be an
> > > * overlapping fragment, the entire datagram (and any constituent
> > > * fragments) MUST be silently discarded.
> > > *
> > > - * We do the same here for IPv4.
> > > + * We do the same here for IPv4 (and increment an snmp counter).
> > > */
> > >
> > > - /* Is there an overlap with the previous fragment? */
> > > - if (prev &&
> > > - (prev->ip_defrag_offset + prev->len) > offset)
> > > - goto discard_qp;
> > > -
> > > - /* Is there an overlap with the next fragment? */
> > > - if (next && next->ip_defrag_offset < end)
> > > - goto discard_qp;
> > > + /* Find out where to put this fragment. */
> > > + skb1 = qp->q.fragments_tail;
> > > + if (!skb1) {
> > > + /* This is the first fragment we've received. */
> > > + rb_link_node(&skb->rbnode, NULL, &qp->q.rb_fragments.rb_node);
> > > + qp->q.fragments_tail = skb;
> > > + } else if ((skb1->ip_defrag_offset + skb1->len) < end) {
> > > + /* This is the common/special case: skb goes to the end. */
> > > + /* Detect and discard overlaps. */
> > > + if (offset < (skb1->ip_defrag_offset + skb1->len))
> > > + goto discard_qp;
> > > + /* Insert after skb1. */
> > > + rb_link_node(&skb->rbnode, &skb1->rbnode, &skb1->rbnode.rb_right);
> > > + qp->q.fragments_tail = skb;
> > > + } else {
> > > + /* Binary search. Note that skb can become the first fragment, but
> > > + * not the last (covered above). */
> > > + rbn = &qp->q.rb_fragments.rb_node;
> > > + do {
> > > + parent = *rbn;
> > > + skb1 = rb_to_skb(parent);
> > > + if (end <= skb1->ip_defrag_offset)
> > > + rbn = &parent->rb_left;
> > > + else if (offset >= skb1->ip_defrag_offset + skb1->len)
> > > + rbn = &parent->rb_right;
> > > + else /* Found an overlap with skb1. */
> > > + goto discard_qp;
> > > + } while (*rbn);
> > > + /* Here we have parent properly set, and rbn pointing to
> > > + * one of its NULL left/right children. Insert skb. */
> > > + rb_link_node(&skb->rbnode, parent, rbn);
> > > + }
> > > + rb_insert_color(&skb->rbnode, &qp->q.rb_fragments);
> > >
> > > - /* Note : skb->ip_defrag_offset and skb->dev share the same location */
> > > - dev = skb->dev;
> > > if (dev)
> > > qp->iif = dev->ifindex;
> > > - /* Makes sure compiler wont do silly aliasing games */
> > > - barrier();
> > > skb->ip_defrag_offset = offset;
> > >
> > > - /* Insert this fragment in the chain of fragments. */
> > > - skb->next = next;
> > > - if (!next)
> > > - qp->q.fragments_tail = skb;
> > > - if (prev)
> > > - prev->next = skb;
> > > - else
> > > - qp->q.fragments = skb;
> > > -
> > > qp->q.stamp = skb->tstamp;
> > > qp->q.meat += skb->len;
> > > qp->ecn |= ecn;
> > > @@ -414,7 +425,7 @@ found:
> > > unsigned long orefdst = skb->_skb_refdst;
> > >
> > > skb->_skb_refdst = 0UL;
> > > - err = ip_frag_reasm(qp, prev, dev);
> > > + err = ip_frag_reasm(qp, skb, dev);
> > > skb->_skb_refdst = orefdst;
> > > return err;
> > > }
> > > @@ -431,15 +442,15 @@ err:
> > > return err;
> > > }
> > >
> > > -
> > > /* Build a new IP datagram from all its fragments. */
> > > -
> > > -static int ip_frag_reasm(struct ipq *qp, struct sk_buff *prev,
> > > +static int ip_frag_reasm(struct ipq *qp, struct sk_buff *skb,
> > > struct net_device *dev)
> > > {
> > > struct net *net = container_of(qp->q.net, struct net, ipv4.frags);
> > > struct iphdr *iph;
> > > - struct sk_buff *fp, *head = qp->q.fragments;
> > > + struct sk_buff *fp, *head = skb_rb_first(&qp->q.rb_fragments);
> > > + struct sk_buff **nextp; /* To build frag_list. */
> > > + struct rb_node *rbn;
> > > int len;
> > > int ihlen;
> > > int err;
> > > @@ -453,25 +464,20 @@ static int ip_frag_reasm(struct ipq *qp,
> > > goto out_fail;
> > > }
> > > /* Make the one we just received the head. */
> > > - if (prev) {
> > > - head = prev->next;
> > > - fp = skb_clone(head, GFP_ATOMIC);
> > > + if (head != skb) {
> > > + fp = skb_clone(skb, GFP_ATOMIC);
> > > if (!fp)
> > > goto out_nomem;
> > > -
> > > - fp->next = head->next;
> > > - if (!fp->next)
> > > + rb_replace_node(&skb->rbnode, &fp->rbnode, &qp->q.rb_fragments);
> > > + if (qp->q.fragments_tail == skb)
> > > qp->q.fragments_tail = fp;
> > > - prev->next = fp;
> > > -
> > > - skb_morph(head, qp->q.fragments);
> > > - head->next = qp->q.fragments->next;
> > > -
> > > - consume_skb(qp->q.fragments);
> > > - qp->q.fragments = head;
> > > + skb_morph(skb, head);
> > > + rb_replace_node(&head->rbnode, &skb->rbnode,
> > > + &qp->q.rb_fragments);
> > > + consume_skb(head);
> > > + head = skb;
> > > }
> > >
> > > - WARN_ON(!head);
> > > WARN_ON(head->ip_defrag_offset != 0);
> > >
> > > /* Allocate a new buffer for the datagram. */
> > > @@ -496,24 +502,35 @@ static int ip_frag_reasm(struct ipq *qp,
> > > clone = alloc_skb(0, GFP_ATOMIC);
> > > if (!clone)
> > > goto out_nomem;
> > > - clone->next = head->next;
> > > - head->next = clone;
> > > skb_shinfo(clone)->frag_list = skb_shinfo(head)->frag_list;
> > > skb_frag_list_init(head);
> > > for (i = 0; i < skb_shinfo(head)->nr_frags; i++)
> > > plen += skb_frag_size(&skb_shinfo(head)->frags[i]);
> > > clone->len = clone->data_len = head->data_len - plen;
> > > - head->data_len -= clone->len;
> > > - head->len -= clone->len;
> > > + skb->truesize += clone->truesize;
> > > clone->csum = 0;bffa72cf7f9df
> > > clone->ip_summed = head->ip_summed;
> > > add_frag_mem_limit(qp->q.net, clone->truesize);
> > > + skb_shinfo(head)->frag_list = clone;
> > > + nextp = &clone->next;
> > > + } else {
> > > + nextp = &skb_shinfo(head)->frag_list;
> > > }
> > >
> > > - skb_shinfo(head)->frag_list = head->next;
> > > skb_push(head, head->data - skb_network_header(head));
> > >
> > > - for (fp=head->next; fp; fp = fp->next) {
> > > + /* Traverse the tree in order, to build frag_list. */
> > > + rbn = rb_next(&head->rbnode);
> > > + rb_erase(&head->rbnode, &qp->q.rb_fragments);
> > > + while (rbn) {
> > > + struct rb_node *rbnext = rb_next(rbn);
> > > + fp = rb_to_skb(rbn);
> > > + rb_erase(rbn, &qp->q.rb_fragments);
> > > + rbn = rbnext;
> > > + *nextp = fp;
> > > + nextp = &fp->next;
> > > + fp->prev = NULL;
> > > + memset(&fp->rbnode, 0, sizeof(fp->rbnode));
> > > head->data_len += fp->len;
> > > head->len += fp->len;
> > > if (head->ip_summed != fp->ip_summed)
> > > @@ -524,7 +541,9 @@ static int ip_frag_reasm(struct ipq *qp,
> > > }
> > > sub_frag_mem_limit(qp->q.net, head->truesize);
> > >
> > > + *nextp = NULL;
> > > head->next = NULL;
> > > + head->prev = NULL;
> > > head->dev = dev;
> > > head->tstamp = qp->q.stamp;
> > > IPCB(head)->frag_max_size = max(qp->max_df_size, qp->q.max_size);
> > > @@ -552,6 +571,7 @@ static int ip_frag_reasm(struct ipq *qp,
> > >
> > > __IP_INC_STATS(net, IPSTATS_MIB_REASMOKS);
> > > qp->q.fragments = NULL;
> > > + qp->q.rb_fragments = RB_ROOT;
> > > qp->q.fragments_tail = NULL;
> > > return 0;
> > >
> > > --- a/net/ipv6/netfilter/nf_conntrack_reasm.c
> > > +++ b/net/ipv6/netfilter/nf_conntrack_reasm.c
> > > @@ -471,6 +471,7 @@ nf_ct_frag6_reasm(struct frag_queue *fq,
> > > head->csum);
> > >
> > > fq->q.fragments = NULL;
> > > + fq->q.rb_fragments = RB_ROOT;
> > > fq->q.fragments_tail = NULL;
> > >
> > > return true;
> > > --- a/net/ipv6/reassembly.c
> > > +++ b/net/ipv6/reassembly.c
> > > @@ -472,6 +472,7 @@ static int ip6_frag_reasm(struct frag_qu
> > > __IP6_INC_STATS(net, __in6_dev_get(dev), IPSTATS_MIB_REASMOKS);
> > > rcu_read_unlock();
> > > fq->q.fragments = NULL;
> > > + fq->q.rb_fragments = RB_ROOT;
> > > fq->q.fragments_tail = NULL;
> > > return 1;
> > >
> > >
> > >
> >
> > I'm getting a kernel panic on the >=4.14.71 stable kernels, and I've
> > isolated the problem back to this patch.
> >
> > My 4.18.11 kernel seems to be OK.
> >
> > Whenever I inject a delay into the interface with iproute2 tools, I get a panic.
> >
> > Example command:
> > tc qdisc add dev eth0 root netem delay 35ms
> >
> > The RIP is pointing at netif_skb_features+0x31/0x230
> >
> > My efforts to get a transmittable copy of the panic have been thwarted.
> >
> > There's some confusion between this patch and the upstream patch
> > refered to in the commit message
> >
> > The upstream commit patches net/sched/sch_netem.c which isn't even
> > touched in this commit.
> >
> > Althought the commit messages are the same, the two patches seem to
> > have a different purpose.
> >
> > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/net/sched?id=bffa72cf7f9df842f0016ba03586039296b4caaf
> >
> > The commit message seems more relavant to this patch.
> >
> > The upstream commit bffa72cf7f9df842f0016ba03586039296b4caaf has not
> > yet been applied to the stable tree.
> >
> > I decided to roll the dice, and apply the upstream patch
> > bffa72cf7f9df842f0016ba03586039296b4caaf (it's been in the main kernel
> > tree just over a year).
> >
> > When I manually patch my 4.14.74 kernel with
> > bffa72cf7f9df842f0016ba03586039296b4caaf, my panic seems to be solved.
>
> That is odd, as this commit is in the 4.14.71 kernel release, so it
> should not be able to be applied to 4.14.74.
>
> If something still needs to be done here for the 4.14.y kernel tree,
> please let me know.
>
> thanks,
>
> greg k-h

Thanks for the response.

This issue was fixed in the 4.14.79 stable release with this patch:

sch_netem: restore skb->dev after dequeuing from the rbtree
https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/net/sched/sch_netem.c?h=linux-4.14.y&id=bd6df7a19559f9b52049f97c3188a7d1544e16df