2017-08-09 18:14:31

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.9 00/93] 4.9.42-stable review

This is the start of the stable review cycle for the 4.9.42 release.
There are 93 patches in this series, all will be posted as a response
to this one. If anyone has any issues with these being applied, please
let me know.

Responses should be made by Fri Aug 11 18:13:03 UTC 2017.
Anything received after that time might be too late.

The whole patch series can be found in one patch at:
kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.9.42-rc1.gz
or in the git tree and branch at:
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-4.9.y
and the diffstat can be found below.

thanks,

greg k-h

-------------
Pseudo-Shortlog of commits:

Greg Kroah-Hartman <[email protected]>
Linux 4.9.42-rc1

Michal Kubeček <[email protected]>
net: account for current skb length when deciding about UFO

zheng li <[email protected]>
ipv4: Should use consistent conditional judgement for ip fragment in __ip_append_data and ip_finish_output

Or Gerlitz <[email protected]>
net/mlx5: E-Switch, Re-enable RoCE on mode change only after FDB destroy

Ard Biesheuvel <[email protected]>
mm: don't dereference struct page fields of invalid pages

Jamie Iles <[email protected]>
signal: protect SIGNAL_UNKILLABLE from unintentional clearing.

Sudip Mukherjee <[email protected]>
lib/Kconfig.debug: fix frv build failure

Michal Hocko <[email protected]>
mm, slab: make sure that KMALLOC_MAX_SIZE will fit into MAX_ORDER

Rabin Vincent <[email protected]>
ARM: 8632/1: ftrace: fix syscall name matching

Omar Sandoval <[email protected]>
virtio_blk: fix panic in initialization error path

Jeff Moyer <[email protected]>
nbd: blk_mq_init_queue returns an error code on failure, not NULL

Steve Wise <[email protected]>
iw_cxgb4: do not send RX_DATA_ACK CPLs after close/abort

Emmanuel Vadot <[email protected]>
ARM: dts: sunxi: Change node name for pwrseq pin on Olinuxino-lime2-emmc

Milo Kim <[email protected]>
ARM: dts: sun8i: Support DTB build for NanoPi M1

Gerd Hoffmann <[email protected]>
drm/virtio: fix framebuffer sparse warning

Milan P. Gandhi <[email protected]>
scsi: qla2xxx: Get mutex lock before checking optrom_state

Marek Szyprowski <[email protected]>
clk/samsung: exynos542x: mark some clocks as critical

Pavel Tikhomirov <[email protected]>
ipv4: make tcp_notsent_lowat sysctl knob behave as true unsigned int

Zefir Kurtisi <[email protected]>
phy state machine: failsafe leave invalid RUNNING state

Pau Espin Pedrol <[email protected]>
netfilter: use fwmark_reflect in nf_send_reset

Bard Liao <[email protected]>
ASoC: rt5645: set sel_i2s_pre_div1 to 2

Christophe JAILLET <[email protected]>
spi: spi-axi: Free resources on error path

Nicholas Mc Guire <[email protected]>
x86/boot: Add missing declaration of string functions

Michael Chan <[email protected]>
tg3: Fix race condition in tg3_get_stats64().

Grygorii Strashko <[email protected]>
net: phy: dp83867: fix irq generation

Sergei Shtylyov <[email protected]>
sh_eth: R8A7740 supports packet shecksumming

Sergei Shtylyov <[email protected]>
sh_eth: fix EESIPR values for SH77{34|63}

Arnd Bergmann <[email protected]>
wext: handle NULL extra data in iwe_stream_add_point better

David S. Miller <[email protected]>
sparc64: Fix exception handling in UltraSPARC-III memcpy.

Rob Gardner <[email protected]>
sparc64: Prevent perf from running during super critical sections

Jane Chu <[email protected]>
sparc64: Measure receiver forward progress to avoid send mondo timeout

Wei Liu <[email protected]>
xen-netback: correctly schedule rate-limited queues

Florian Fainelli <[email protected]>
net: phy: Correctly process PHY_HALTED in phy_stop_machine()

Eugenia Emantayev <[email protected]>
net/mlx5e: Schedule overflow check work to mlx5e workqueue

Eugenia Emantayev <[email protected]>
net/mlx5e: Fix wrong delay calculation for overflow check scheduling

Ilan Tayari <[email protected]>
net/mlx5e: Fix outer_header_zero() check size

Moshe Shemesh <[email protected]>
net/mlx5: Fix command bad flow on command entry allocation failure

Aviv Heller <[email protected]>
net/mlx5: Consider tx_enabled in all modes on remap

Xin Long <[email protected]>
sctp: fix the check for _sctp_walk_params and _sctp_walk_errors

Alexander Potapenko <[email protected]>
sctp: don't dereference ptr before leaving _sctp_walk_{params, errors}()

Xin Long <[email protected]>
dccp: fix a memleak for dccp_feat_init err process

Xin Long <[email protected]>
dccp: fix a memleak that dccp_ipv4 doesn't put reqsk properly

Xin Long <[email protected]>
dccp: fix a memleak that dccp_ipv6 doesn't put reqsk properly

Marc Gonzalez <[email protected]>
net: ethernet: nb8800: Handle all 4 RGMII modes identically

Stefano Brivio <[email protected]>
ipv6: Don't increase IPSTATS_MIB_FRAGFAILS twice in ip6_fragment()

WANG Cong <[email protected]>
packet: fix use-after-free in prb_retire_rx_blk_timer_expired()

Liping Zhang <[email protected]>
openvswitch: fix potential out of bound access in parse_ct

Thomas Jarosch <[email protected]>
mcs7780: Fix initialization when CONFIG_VMAP_STACK is enabled

WANG Cong <[email protected]>
rtnetlink: allocate more memory for dev_set_mac_address()

Mahesh Bandewar <[email protected]>
ipv4: initialize fib_trie prior to register_netdev_notifier call.

Florian Fainelli <[email protected]>
net: dsa: b53: Add missing ARL entries for BCM53125

Sabrina Dubroca <[email protected]>
ipv6: avoid overflow of offset in ip6_find_1stfragopt

David S. Miller <[email protected]>
net: Zero terminate ifr_name in dev_ifname().

Alexander Potapenko <[email protected]>
ipv4: ipv6: initialize treq->txhash in cookie_v[46]_check()

Neal Cardwell <[email protected]>
tcp_bbr: init pacing rate on first RTT sample

Neal Cardwell <[email protected]>
tcp_bbr: remove sk_pacing_rate=0 transient during init

Neal Cardwell <[email protected]>
tcp_bbr: introduce bbr_init_pacing_rate_from_rtt() helper

Neal Cardwell <[email protected]>
tcp_bbr: introduce bbr_bw_to_pacing_rate() helper

Neal Cardwell <[email protected]>
tcp_bbr: cut pacing rate only if filled pipe

Steven Toth <[email protected]>
saa7164: fix double fetch PCIe access condition

Omar Sandoval <[email protected]>
Btrfs: fix early ENOSPC due to delalloc

Jin Qian <[email protected]>
f2fs: sanity check checkpoint segno and blkoff

Sean Young <[email protected]>
media: lirc: LIRC_GET_REC_RESOLUTION should return microseconds

David Woods <[email protected]>
mmc: core: Use device_property_read instead of of_property_read

David Woods <[email protected]>
mmc: dw_mmc: Use device_property_read instead of of_property_read

Nicholas Bellinger <[email protected]>
iscsi-target: Fix initial login PDU asynchronous socket close OOPs

Prabhakar Lad <[email protected]>
media: platform: davinci: return -EINVAL for VPFE_CMD_S_CCDC_RAW_PARAMS ioctl

Marc Gonzalez <[email protected]>
ARM: dts: tango4: Request RGMII RX and TX clock delays

Gregory CLEMENT <[email protected]>
ARM: dts: armada-38x: Fix irq type for pca955

Jerry Lee <[email protected]>
ext4: fix overflow caused by missing cast in ext4_resize_fs()

Jan Kara <[email protected]>
ext4: fix SEEK_HOLE/SEEK_DATA for blocksize < pagesize

Bartosz Golaszewski <[email protected]>
gpiolib: skip unwanted events, don't convert them to opposite edge

Suravee Suthikulpanit <[email protected]>
iommu/amd: Enable ga_log_intr when enabling guest_mode

Nicholas Piggin <[email protected]>
powerpc/64: Fix __check_irq_replay missing decrementer interrupt

Gustavo Romero <[email protected]>
powerpc/tm: Fix saving of TM SPRs in core dump

Matija Glavinic Pecotic <[email protected]>
timers: Fix overflow in get_next_timer_interrupt

Josh Poimboeuf <[email protected]>
mm/page_alloc: Remove kernel address exposure in free_reserved_area()

Wanpeng Li <[email protected]>
KVM: async_pf: make rcu irq exit if not triggered from idle task

Banajit Goswami <[email protected]>
ASoC: do not close shared backend dailink

Jean Delvare <[email protected]>
drm/amdgpu: Fix undue fallthroughs in golden registers initialization

Sergei A. Trusov <[email protected]>
ALSA: hda - Fix speaker output from VAIO VPCL14M1R

Dima Zavin <[email protected]>
cpuset: fix a deadlock due to incomplete patching of cpusets_enabled()

Mel Gorman <[email protected]>
mm, mprotect: flush TLB if potentially racing with a parallel reclaim leaving stale TLB entries

Guenter Roeck <[email protected]>
mmc: core: Fix access to HS400-ES devices

Sakari Ailus <[email protected]>
device property: Make dev_fwnode() public

Ludovic Desroches <[email protected]>
mmc: sdhci-of-at91: force card detect value for non removable devices

Trond Myklebust <[email protected]>
NFSv4: Fix EXCHANGE_ID corrupt verifier issue

Arend Van Spriel <[email protected]>
brcmfmac: fix memleak due to calling brcmf_sdiod_sgtable_alloc() twice

Emmanuel Grumbach <[email protected]>
iwlwifi: dvm: prevent an out of bounds access

Tejun Heo <[email protected]>
workqueue: restore WQ_UNBOUND/max_active==1 to be ordered

Dan Carpenter <[email protected]>
libata: array underflow in ata_find_dev()

Tejun Heo <[email protected]>
cgroup: fix error return value from cgroup_subtree_control()

Tejun Heo <[email protected]>
cgroup: create dfl_root files on subsys registration

John David Anglin <[email protected]>
parisc: Handle vma's whose context is not current in flush_cache_range


-------------

Diffstat:

Makefile | 4 +-
arch/arm/boot/dts/Makefile | 1 +
arch/arm/boot/dts/armada-388-gp.dts | 4 +-
.../boot/dts/sun7i-a20-olinuxino-lime2-emmc.dts | 2 +-
arch/arm/boot/dts/tango4-vantage-1172.dts | 2 +-
arch/arm/include/asm/ftrace.h | 18 ++
arch/parisc/kernel/cache.c | 5 +-
arch/powerpc/kernel/irq.c | 15 +-
arch/powerpc/kernel/ptrace.c | 13 +-
arch/sparc/include/asm/mmu_context_64.h | 12 +-
arch/sparc/include/asm/trap_block.h | 1 +
arch/sparc/kernel/smp_64.c | 185 ++++++++++++-------
arch/sparc/kernel/sun4v_ivec.S | 15 ++
arch/sparc/kernel/traps_64.c | 1 +
arch/sparc/kernel/tsb.S | 12 ++
arch/sparc/lib/U3memcpy.S | 4 +-
arch/sparc/power/hibernate.c | 3 +-
arch/x86/boot/string.c | 1 +
arch/x86/boot/string.h | 9 +
arch/x86/kernel/kvm.c | 6 +-
drivers/ata/libata-scsi.c | 6 +-
drivers/base/property.c | 3 +-
drivers/block/nbd.c | 6 +-
drivers/block/virtio_blk.c | 3 +-
drivers/clk/samsung/clk-exynos5420.c | 14 +-
drivers/gpio/gpiolib.c | 9 +-
drivers/gpu/drm/amd/amdgpu/si.c | 2 +
drivers/gpu/drm/virtio/virtgpu_fb.c | 2 +-
drivers/infiniband/hw/cxgb4/cm.c | 7 +-
drivers/iommu/amd_iommu.c | 1 +
drivers/media/pci/saa7164/saa7164-bus.c | 13 +-
drivers/media/platform/davinci/vpfe_capture.c | 22 +--
drivers/media/rc/ir-lirc-codec.c | 2 +-
drivers/mmc/core/host.c | 70 ++++---
drivers/mmc/core/mmc.c | 2 +-
drivers/mmc/host/dw_mmc.c | 20 +-
drivers/mmc/host/sdhci-of-at91.c | 35 +++-
drivers/net/dsa/b53/b53_common.c | 1 +
drivers/net/ethernet/aurora/nb8800.c | 9 +-
drivers/net/ethernet/broadcom/tg3.c | 3 +
drivers/net/ethernet/mellanox/mlx5/core/cmd.c | 19 +-
drivers/net/ethernet/mellanox/mlx5/core/en_clock.c | 6 +-
.../ethernet/mellanox/mlx5/core/en_fs_ethtool.c | 2 +-
.../ethernet/mellanox/mlx5/core/eswitch_offloads.c | 29 +--
drivers/net/ethernet/mellanox/mlx5/core/lag.c | 25 +--
drivers/net/ethernet/renesas/sh_eth.c | 5 +-
drivers/net/irda/mcs7780.c | 16 +-
drivers/net/phy/dp83867.c | 10 +
drivers/net/phy/phy.c | 12 ++
.../wireless/broadcom/brcm80211/brcmfmac/sdio.c | 5 -
drivers/net/wireless/intel/iwlwifi/dvm/tx.c | 2 +-
drivers/net/xen-netback/common.h | 1 +
drivers/net/xen-netback/interface.c | 6 +-
drivers/net/xen-netback/netback.c | 6 +-
drivers/scsi/qla2xxx/qla_attr.c | 18 +-
drivers/spi/spi-axi-spi-engine.c | 3 +-
drivers/target/iscsi/iscsi_target_nego.c | 204 ++++++++++++++-------
fs/btrfs/extent-tree.c | 4 -
fs/ext4/file.c | 3 +
fs/ext4/resize.c | 3 +-
fs/f2fs/super.c | 16 ++
fs/nfs/nfs4proc.c | 11 +-
fs/nfs/nfs4xdr.c | 2 +-
include/linux/cpuset.h | 19 +-
include/linux/mm_types.h | 4 +
include/linux/nfs_xdr.h | 2 +-
include/linux/property.h | 2 +
include/linux/sched.h | 10 +
include/linux/slab.h | 4 +-
include/net/iw_handler.h | 3 +-
include/net/sctp/sctp.h | 4 +
include/target/iscsi/iscsi_target_core.h | 1 +
kernel/cgroup.c | 8 +-
kernel/cpuset.c | 1 +
kernel/signal.c | 4 +-
kernel/time/timer.c | 2 +-
kernel/workqueue.c | 10 +
lib/Kconfig.debug | 2 +-
mm/internal.h | 5 +-
mm/madvise.c | 2 +
mm/memory.c | 1 +
mm/mprotect.c | 1 +
mm/mremap.c | 1 +
mm/page_alloc.c | 10 +-
mm/rmap.c | 36 ++++
net/core/dev_ioctl.c | 1 +
net/core/rtnetlink.c | 3 +-
net/dccp/feat.c | 7 +-
net/dccp/ipv4.c | 1 +
net/dccp/ipv6.c | 1 +
net/ipv4/fib_frontend.c | 9 +-
net/ipv4/ip_output.c | 3 +-
net/ipv4/netfilter/nf_reject_ipv4.c | 2 +
net/ipv4/syncookies.c | 1 +
net/ipv4/sysctl_net_ipv4.c | 2 +-
net/ipv4/tcp_bbr.c | 49 +++--
net/ipv6/ip6_output.c | 6 +-
net/ipv6/netfilter/nf_reject_ipv6.c | 3 +
net/ipv6/output_core.c | 8 +-
net/ipv6/syncookies.c | 1 +
net/openvswitch/conntrack.c | 7 +-
net/packet/af_packet.c | 2 +-
sound/pci/hda/patch_realtek.c | 1 +
sound/soc/codecs/rt5645.c | 3 +
sound/soc/soc-pcm.c | 4 +
105 files changed, 817 insertions(+), 380 deletions(-)



2017-08-09 18:14:34

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.9 03/93] cgroup: fix error return value from cgroup_subtree_control()

4.9-stable review patch. If anyone has any objections, please let me know.

------------------

From: Tejun Heo <[email protected]>

commit 3c74541777302eec43a0d1327c4d58b8659a776b upstream.

While refactoring, f7b2814bb9b6 ("cgroup: factor out
cgroup_{apply|finalize}_control() from
cgroup_subtree_control_write()") broke error return value from the
function. The return value from the last operation is always
overridden to zero. Fix it.

Signed-off-by: Tejun Heo <[email protected]>
Signed-off-by: Tejun Heo <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
kernel/cgroup.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)

--- a/kernel/cgroup.c
+++ b/kernel/cgroup.c
@@ -3487,11 +3487,11 @@ static ssize_t cgroup_subtree_control_wr
cgrp->subtree_control &= ~disable;

ret = cgroup_apply_control(cgrp);
-
cgroup_finalize_control(cgrp, ret);
+ if (ret)
+ goto out_unlock;

kernfs_activate(cgrp->kn);
- ret = 0;
out_unlock:
cgroup_kn_unlock(of->kn);
return ret ?: nbytes;


2017-08-09 18:14:37

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.9 05/93] workqueue: restore WQ_UNBOUND/max_active==1 to be ordered

4.9-stable review patch. If anyone has any objections, please let me know.

------------------

From: Tejun Heo <[email protected]>

commit 5c0338c68706be53b3dc472e4308961c36e4ece1 upstream.

The combination of WQ_UNBOUND and max_active == 1 used to imply
ordered execution. After NUMA affinity 4c16bd327c74 ("workqueue:
implement NUMA affinity for unbound workqueues"), this is no longer
true due to per-node worker pools.

While the right way to create an ordered workqueue is
alloc_ordered_workqueue(), the documentation has been misleading for a
long time and people do use WQ_UNBOUND and max_active == 1 for ordered
workqueues which can lead to subtle bugs which are very difficult to
trigger.

It's unlikely that we'd see noticeable performance impact by enforcing
ordering on WQ_UNBOUND / max_active == 1 workqueues. Let's
automatically set __WQ_ORDERED for those workqueues.

Signed-off-by: Tejun Heo <[email protected]>
Reported-by: Christoph Hellwig <[email protected]>
Reported-by: Alexei Potashnik <[email protected]>
Fixes: 4c16bd327c74 ("workqueue: implement NUMA affinity for unbound workqueues")
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
kernel/workqueue.c | 10 ++++++++++
1 file changed, 10 insertions(+)

--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -3915,6 +3915,16 @@ struct workqueue_struct *__alloc_workque
struct workqueue_struct *wq;
struct pool_workqueue *pwq;

+ /*
+ * Unbound && max_active == 1 used to imply ordered, which is no
+ * longer the case on NUMA machines due to per-node pools. While
+ * alloc_ordered_workqueue() is the right way to create an ordered
+ * workqueue, keep the previous behavior to avoid subtle breakages
+ * on NUMA.
+ */
+ if ((flags & WQ_UNBOUND) && max_active == 1)
+ flags |= __WQ_ORDERED;
+
/* see the comment above the definition of WQ_POWER_EFFICIENT */
if ((flags & WQ_POWER_EFFICIENT) && wq_power_efficient)
flags |= WQ_UNBOUND;


2017-08-09 18:14:42

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.9 16/93] ASoC: do not close shared backend dailink

4.9-stable review patch. If anyone has any objections, please let me know.

------------------

From: Banajit Goswami <[email protected]>

commit b1cd2e34c69a2f3988786af451b6e17967c293a0 upstream.

Multiple frontend dailinks may be connected to a backend
dailink at the same time. When one of frontend dailinks is
closed, the associated backend dailink should not be closed
if it is connected to other active frontend dailinks. Change
ensures that backend dailink is closed only after all
connected frontend dailinks are closed.

Signed-off-by: Gopikrishnaiah Anandan <[email protected]>
Signed-off-by: Banajit Goswami <[email protected]>
Signed-off-by: Patrick Lai <[email protected]>
Signed-off-by: Mark Brown <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
sound/soc/soc-pcm.c | 4 ++++
1 file changed, 4 insertions(+)

--- a/sound/soc/soc-pcm.c
+++ b/sound/soc/soc-pcm.c
@@ -181,6 +181,10 @@ int dpcm_dapm_stream_event(struct snd_so
dev_dbg(be->dev, "ASoC: BE %s event %d dir %d\n",
be->dai_link->name, event, dir);

+ if ((event == SND_SOC_DAPM_STREAM_STOP) &&
+ (be->dpcm[dir].users >= 1))
+ continue;
+
snd_soc_dapm_stream_event(be, dir, event);
}



2017-08-09 18:14:47

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.9 07/93] brcmfmac: fix memleak due to calling brcmf_sdiod_sgtable_alloc() twice

4.9-stable review patch. If anyone has any objections, please let me know.

------------------

From: Arend Van Spriel <[email protected]>

commit 5f5d03143de5e0c593da4ab18fc6393c2815e108 upstream.

Due to a bugfix in wireless tree and the commit mentioned below a merge
was needed which went haywire. So the submitted change resulted in the
function brcmf_sdiod_sgtable_alloc() being called twice during the probe
thus leaking the memory of the first call.

Fixes: 4d7928959832 ("brcmfmac: switch to new platform data")
Reported-by: Stefan Wahren <[email protected]>
Tested-by: Stefan Wahren <[email protected]>
Reviewed-by: Hante Meuleman <[email protected]>
Signed-off-by: Arend van Spriel <[email protected]>
Signed-off-by: Kalle Valo <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
drivers/net/wireless/broadcom/brcm80211/brcmfmac/sdio.c | 5 -----
1 file changed, 5 deletions(-)

--- a/drivers/net/wireless/broadcom/brcm80211/brcmfmac/sdio.c
+++ b/drivers/net/wireless/broadcom/brcm80211/brcmfmac/sdio.c
@@ -4161,11 +4161,6 @@ struct brcmf_sdio *brcmf_sdio_probe(stru
goto fail;
}

- /* allocate scatter-gather table. sg support
- * will be disabled upon allocation failure.
- */
- brcmf_sdiod_sgtable_alloc(bus->sdiodev);
-
/* Query the F2 block size, set roundup accordingly */
bus->blocksize = bus->sdiodev->func[2]->cur_blksize;
bus->roundup = min(max_roundup, bus->blocksize);


2017-08-09 18:14:54

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.9 25/93] ext4: fix overflow caused by missing cast in ext4_resize_fs()

4.9-stable review patch. If anyone has any objections, please let me know.

------------------

From: Jerry Lee <[email protected]>

commit aec51758ce10a9c847a62a48a168f8c804c6e053 upstream.

On a 32-bit platform, the value of n_blcoks_count may be wrong during
the file system is resized to size larger than 2^32 blocks. This may
caused the superblock being corrupted with zero blocks count.

Fixes: 1c6bd7173d66
Signed-off-by: Jerry Lee <[email protected]>
Signed-off-by: Theodore Ts'o <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
fs/ext4/resize.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)

--- a/fs/ext4/resize.c
+++ b/fs/ext4/resize.c
@@ -1926,7 +1926,8 @@ retry:
n_desc_blocks = o_desc_blocks +
le16_to_cpu(es->s_reserved_gdt_blocks);
n_group = n_desc_blocks * EXT4_DESC_PER_BLOCK(sb);
- n_blocks_count = n_group * EXT4_BLOCKS_PER_GROUP(sb);
+ n_blocks_count = (ext4_fsblk_t)n_group *
+ EXT4_BLOCKS_PER_GROUP(sb);
n_group--; /* set to last group number */
}



2017-08-09 18:15:04

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.9 30/93] mmc: dw_mmc: Use device_property_read instead of of_property_read

4.9-stable review patch. If anyone has any objections, please let me know.

------------------

From: David Woods <[email protected]>

commit 852ff5fea9eb6a9799f1881d6df2cd69a9e6eed5 upstream.

Using the device_property interfaces allows the dw_mmc driver to work
on platforms which run on either device tree or ACPI.

Signed-off-by: David Woods <[email protected]>
Reviewed-by: Chris Metcalf <[email protected]>
Acked-by: Jaehoon Chung <[email protected]>
Signed-off-by: Ulf Hansson <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
drivers/mmc/host/dw_mmc.c | 20 ++++++++++----------
1 file changed, 10 insertions(+), 10 deletions(-)

--- a/drivers/mmc/host/dw_mmc.c
+++ b/drivers/mmc/host/dw_mmc.c
@@ -2610,8 +2610,8 @@ static int dw_mci_init_slot(struct dw_mc
host->slot[id] = slot;

mmc->ops = &dw_mci_ops;
- if (of_property_read_u32_array(host->dev->of_node,
- "clock-freq-min-max", freq, 2)) {
+ if (device_property_read_u32_array(host->dev, "clock-freq-min-max",
+ freq, 2)) {
mmc->f_min = DW_MCI_FREQ_MIN;
mmc->f_max = DW_MCI_FREQ_MAX;
} else {
@@ -2709,7 +2709,6 @@ static void dw_mci_init_dma(struct dw_mc
{
int addr_config;
struct device *dev = host->dev;
- struct device_node *np = dev->of_node;

/*
* Check tansfer mode from HCON[17:16]
@@ -2770,8 +2769,9 @@ static void dw_mci_init_dma(struct dw_mc
dev_info(host->dev, "Using internal DMA controller.\n");
} else {
/* TRANS_MODE_EDMAC: check dma bindings again */
- if ((of_property_count_strings(np, "dma-names") < 0) ||
- (!of_find_property(np, "dmas", NULL))) {
+ if ((device_property_read_string_array(dev, "dma-names",
+ NULL, 0) < 0) ||
+ !device_property_present(dev, "dmas")) {
goto no_dma;
}
host->dma_ops = &dw_mci_edmac_ops;
@@ -2931,7 +2931,6 @@ static struct dw_mci_board *dw_mci_parse
{
struct dw_mci_board *pdata;
struct device *dev = host->dev;
- struct device_node *np = dev->of_node;
const struct dw_mci_drv_data *drv_data = host->drv_data;
int ret;
u32 clock_frequency;
@@ -2948,15 +2947,16 @@ static struct dw_mci_board *dw_mci_parse
}

/* find out number of slots supported */
- of_property_read_u32(np, "num-slots", &pdata->num_slots);
+ device_property_read_u32(dev, "num-slots", &pdata->num_slots);

- if (of_property_read_u32(np, "fifo-depth", &pdata->fifo_depth))
+ if (device_property_read_u32(dev, "fifo-depth", &pdata->fifo_depth))
dev_info(dev,
"fifo-depth property not found, using value of FIFOTH register as default\n");

- of_property_read_u32(np, "card-detect-delay", &pdata->detect_delay_ms);
+ device_property_read_u32(dev, "card-detect-delay",
+ &pdata->detect_delay_ms);

- if (!of_property_read_u32(np, "clock-frequency", &clock_frequency))
+ if (!device_property_read_u32(dev, "clock-frequency", &clock_frequency))
pdata->bus_hz = clock_frequency;

if (drv_data && drv_data->parse_dt) {


2017-08-09 18:15:12

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.9 14/93] ALSA: hda - Fix speaker output from VAIO VPCL14M1R

4.9-stable review patch. If anyone has any objections, please let me know.

------------------

From: Sergei A. Trusov <[email protected]>

commit 3f3c371421e601fa93b6cb7fb52da9ad59ec90b4 upstream.

Sony VAIO VPCL14M1R needs the quirk to make the speaker working properly.

Tested-by: Dmitriy <[email protected]>
Signed-off-by: Sergei A. Trusov <[email protected]>
Signed-off-by: Takashi Iwai <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
sound/pci/hda/patch_realtek.c | 1 +
1 file changed, 1 insertion(+)

--- a/sound/pci/hda/patch_realtek.c
+++ b/sound/pci/hda/patch_realtek.c
@@ -2233,6 +2233,7 @@ static const struct snd_pci_quirk alc882
SND_PCI_QUIRK(0x1043, 0x8691, "ASUS ROG Ranger VIII", ALC882_FIXUP_GPIO3),
SND_PCI_QUIRK(0x104d, 0x9047, "Sony Vaio TT", ALC889_FIXUP_VAIO_TT),
SND_PCI_QUIRK(0x104d, 0x905a, "Sony Vaio Z", ALC882_FIXUP_NO_PRIMARY_HP),
+ SND_PCI_QUIRK(0x104d, 0x9060, "Sony Vaio VPCL14M1R", ALC882_FIXUP_NO_PRIMARY_HP),
SND_PCI_QUIRK(0x104d, 0x9043, "Sony Vaio VGC-LN51JGB", ALC882_FIXUP_NO_PRIMARY_HP),
SND_PCI_QUIRK(0x104d, 0x9044, "Sony VAIO AiO", ALC882_FIXUP_NO_PRIMARY_HP),



2017-08-09 18:15:21

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.9 38/93] tcp_bbr: introduce bbr_init_pacing_rate_from_rtt() helper

4.9-stable review patch. If anyone has any objections, please let me know.

------------------

From: Neal Cardwell <[email protected]>


[ Upstream commit 79135b89b8af304456bd67916b80116ddf03d7b6 ]

Introduce a helper to initialize the BBR pacing rate unconditionally,
based on the current cwnd and RTT estimate. This is a pure refactor,
but is needed for two following fixes.

Fixes: 0f8782ea1497 ("tcp_bbr: add BBR congestion control")
Signed-off-by: Neal Cardwell <[email protected]>
Signed-off-by: Yuchung Cheng <[email protected]>
Signed-off-by: Soheil Hassas Yeganeh <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
net/ipv4/tcp_bbr.c | 23 ++++++++++++++++++-----
1 file changed, 18 insertions(+), 5 deletions(-)

--- a/net/ipv4/tcp_bbr.c
+++ b/net/ipv4/tcp_bbr.c
@@ -192,6 +192,23 @@ static u32 bbr_bw_to_pacing_rate(struct
return rate;
}

+/* Initialize pacing rate to: high_gain * init_cwnd / RTT. */
+static void bbr_init_pacing_rate_from_rtt(struct sock *sk)
+{
+ struct tcp_sock *tp = tcp_sk(sk);
+ u64 bw;
+ u32 rtt_us;
+
+ if (tp->srtt_us) { /* any RTT sample yet? */
+ rtt_us = max(tp->srtt_us >> 3, 1U);
+ } else { /* no RTT sample yet */
+ rtt_us = USEC_PER_MSEC; /* use nominal default RTT */
+ }
+ bw = (u64)tp->snd_cwnd * BW_UNIT;
+ do_div(bw, rtt_us);
+ sk->sk_pacing_rate = bbr_bw_to_pacing_rate(sk, bw, bbr_high_gain);
+}
+
/* Pace using current bw estimate and a gain factor. In order to help drive the
* network toward lower queues while maintaining high utilization and low
* latency, the average pacing rate aims to be slightly (~1%) lower than the
@@ -776,7 +793,6 @@ static void bbr_init(struct sock *sk)
{
struct tcp_sock *tp = tcp_sk(sk);
struct bbr *bbr = inet_csk_ca(sk);
- u64 bw;

bbr->prior_cwnd = 0;
bbr->tso_segs_goal = 0; /* default segs per skb until first ACK */
@@ -792,11 +808,8 @@ static void bbr_init(struct sock *sk)

minmax_reset(&bbr->bw, bbr->rtt_cnt, 0); /* init max bw to 0 */

- /* Initialize pacing rate to: high_gain * init_cwnd / RTT. */
- bw = (u64)tp->snd_cwnd * BW_UNIT;
- do_div(bw, (tp->srtt_us >> 3) ? : USEC_PER_MSEC);
sk->sk_pacing_rate = 0; /* force an update of sk_pacing_rate */
- bbr_set_pacing_rate(sk, bw, bbr_high_gain);
+ bbr_init_pacing_rate_from_rtt(sk);

bbr->restore_cwnd = 0;
bbr->round_start = 0;


2017-08-09 18:15:35

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.9 66/93] sparc64: Fix exception handling in UltraSPARC-III memcpy.

4.9-stable review patch. If anyone has any objections, please let me know.

------------------

From: "David S. Miller" <[email protected]>


[ Upstream commit 0ede1c401332173ab0693121dc6cde04a4dbf131 ]

Mikael Pettersson reported that some test programs in the strace-4.18
testsuite cause an OOPS.

After some debugging it turns out that garbage values are returned
when an exception occurs, causing the fixup memset() to be run with
bogus arguments.

The problem is that two of the exception handler stubs write the
successfully copied length into the wrong register.

Fixes: ee841d0aff64 ("sparc64: Convert U3copy_{from,to}_user to accurate exception reporting.")
Reported-by: Mikael Pettersson <[email protected]>
Tested-by: Mikael Pettersson <[email protected]>
Reviewed-by: Sam Ravnborg <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
arch/sparc/lib/U3memcpy.S | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)

--- a/arch/sparc/lib/U3memcpy.S
+++ b/arch/sparc/lib/U3memcpy.S
@@ -145,13 +145,13 @@ ENDPROC(U3_retl_o2_plus_GS_plus_0x08)
ENTRY(U3_retl_o2_and_7_plus_GS)
and %o2, 7, %o2
retl
- add %o2, GLOBAL_SPARE, %o2
+ add %o2, GLOBAL_SPARE, %o0
ENDPROC(U3_retl_o2_and_7_plus_GS)
ENTRY(U3_retl_o2_and_7_plus_GS_plus_8)
add GLOBAL_SPARE, 8, GLOBAL_SPARE
and %o2, 7, %o2
retl
- add %o2, GLOBAL_SPARE, %o2
+ add %o2, GLOBAL_SPARE, %o0
ENDPROC(U3_retl_o2_and_7_plus_GS_plus_8)
#endif



2017-08-09 18:15:43

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.9 78/93] clk/samsung: exynos542x: mark some clocks as critical

4.9-stable review patch. If anyone has any objections, please let me know.

------------------

From: Marek Szyprowski <[email protected]>


[ Upstream commit 318fa46cc60d37fec1e87dbf03a82aca0f5ce695 ]

Some parent clocks of the Exynos542x clock blocks, which have separate
power domains (like DISP, MFC, MSC, GSC, FSYS and G2D) must be always
enabled to access any register related to power management unit or devices
connected to it. For the time being, until a proper solution based on
runtime PM is applied, mark those clocks as critical (instead of ignore
unused or even no flags) to prevent disabling them.

Signed-off-by: Marek Szyprowski <[email protected]>
Acked-by: Sylwester Nawrocki <[email protected]>
Reviewed-by: Chanwoo Choi <[email protected]>
Reviewed-by: Javier Martinez Canillas <[email protected]>
Tested-by: Javier Martinez Canillas <[email protected]> [Exynos5800 Peach Pi Chromebook]
Signed-off-by: Stephen Boyd <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
drivers/clk/samsung/clk-exynos5420.c | 14 +++++++-------
1 file changed, 7 insertions(+), 7 deletions(-)

--- a/drivers/clk/samsung/clk-exynos5420.c
+++ b/drivers/clk/samsung/clk-exynos5420.c
@@ -586,7 +586,7 @@ static const struct samsung_gate_clock e
GATE(CLK_ACLK550_CAM, "aclk550_cam", "mout_user_aclk550_cam",
GATE_BUS_TOP, 24, 0, 0),
GATE(CLK_ACLK432_SCALER, "aclk432_scaler", "mout_user_aclk432_scaler",
- GATE_BUS_TOP, 27, 0, 0),
+ GATE_BUS_TOP, 27, CLK_IS_CRITICAL, 0),
};

static const struct samsung_mux_clock exynos5420_mux_clks[] __initconst = {
@@ -956,20 +956,20 @@ static const struct samsung_gate_clock e
GATE(CLK_SMMU_G2D, "smmu_g2d", "aclk333_g2d", GATE_IP_G2D, 7, 0, 0),

GATE(0, "aclk200_fsys", "mout_user_aclk200_fsys",
- GATE_BUS_FSYS0, 9, CLK_IGNORE_UNUSED, 0),
+ GATE_BUS_FSYS0, 9, CLK_IS_CRITICAL, 0),
GATE(0, "aclk200_fsys2", "mout_user_aclk200_fsys2",
GATE_BUS_FSYS0, 10, CLK_IGNORE_UNUSED, 0),

GATE(0, "aclk333_g2d", "mout_user_aclk333_g2d",
GATE_BUS_TOP, 0, CLK_IGNORE_UNUSED, 0),
GATE(0, "aclk266_g2d", "mout_user_aclk266_g2d",
- GATE_BUS_TOP, 1, CLK_IGNORE_UNUSED, 0),
+ GATE_BUS_TOP, 1, CLK_IS_CRITICAL, 0),
GATE(0, "aclk300_jpeg", "mout_user_aclk300_jpeg",
GATE_BUS_TOP, 4, CLK_IGNORE_UNUSED, 0),
GATE(0, "aclk333_432_isp0", "mout_user_aclk333_432_isp0",
GATE_BUS_TOP, 5, 0, 0),
GATE(0, "aclk300_gscl", "mout_user_aclk300_gscl",
- GATE_BUS_TOP, 6, CLK_IGNORE_UNUSED, 0),
+ GATE_BUS_TOP, 6, CLK_IS_CRITICAL, 0),
GATE(0, "aclk333_432_gscl", "mout_user_aclk333_432_gscl",
GATE_BUS_TOP, 7, CLK_IGNORE_UNUSED, 0),
GATE(0, "aclk333_432_isp", "mout_user_aclk333_432_isp",
@@ -983,20 +983,20 @@ static const struct samsung_gate_clock e
GATE(0, "aclk166", "mout_user_aclk166",
GATE_BUS_TOP, 14, CLK_IGNORE_UNUSED, 0),
GATE(CLK_ACLK333, "aclk333", "mout_user_aclk333",
- GATE_BUS_TOP, 15, CLK_IGNORE_UNUSED, 0),
+ GATE_BUS_TOP, 15, CLK_IS_CRITICAL, 0),
GATE(0, "aclk400_isp", "mout_user_aclk400_isp",
GATE_BUS_TOP, 16, 0, 0),
GATE(0, "aclk400_mscl", "mout_user_aclk400_mscl",
GATE_BUS_TOP, 17, 0, 0),
GATE(0, "aclk200_disp1", "mout_user_aclk200_disp1",
- GATE_BUS_TOP, 18, 0, 0),
+ GATE_BUS_TOP, 18, CLK_IS_CRITICAL, 0),
GATE(CLK_SCLK_MPHY_IXTAL24, "sclk_mphy_ixtal24", "mphy_refclk_ixtal24",
GATE_BUS_TOP, 28, 0, 0),
GATE(CLK_SCLK_HSIC_12M, "sclk_hsic_12m", "ff_hsic_12m",
GATE_BUS_TOP, 29, 0, 0),

GATE(0, "aclk300_disp1", "mout_user_aclk300_disp1",
- SRC_MASK_TOP2, 24, 0, 0),
+ SRC_MASK_TOP2, 24, CLK_IS_CRITICAL, 0),

GATE(CLK_MAU_EPLL, "mau_epll", "mout_mau_epll_clk",
SRC_MASK_TOP7, 20, 0, 0),


2017-08-09 18:15:45

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.9 49/93] packet: fix use-after-free in prb_retire_rx_blk_timer_expired()

4.9-stable review patch. If anyone has any objections, please let me know.

------------------

From: WANG Cong <[email protected]>


[ Upstream commit c800aaf8d869f2b9b47b10c5c312fe19f0a94042 ]

There are multiple reports showing we have a use-after-free in
the timer prb_retire_rx_blk_timer_expired(), where we use struct
tpacket_kbdq_core::pkbdq, a pg_vec, after it gets freed by
free_pg_vec().

The interesting part is it is not freed via packet_release() but
via packet_setsockopt(), which means we are not closing the socket.
Looking into the big and fat function packet_set_ring(), this could
happen if we satisfy the following conditions:

1. closing == 0, not on packet_release() path
2. req->tp_block_nr == 0, we don't allocate a new pg_vec
3. rx_ring->pg_vec is already set as V3, which means we already called
packet_set_ring() wtih req->tp_block_nr > 0 previously
4. req->tp_frame_nr == 0, pass sanity check
5. po->mapped == 0, never called mmap()

In this scenario we are clearing the old rx_ring->pg_vec, so we need
to free this pg_vec, but we don't stop the timer on this path because
of closing==0.

The timer has to be stopped as long as we need to free pg_vec, therefore
the check on closing!=0 is wrong, we should check pg_vec!=NULL instead.

Thanks to liujian for testing different fixes.

Reported-by: [email protected]
Reported-by: Dave Jones <[email protected]>
Reported-by: liujian (CE) <[email protected]>
Tested-by: liujian (CE) <[email protected]>
Cc: Ding Tianhong <[email protected]>
Cc: Willem de Bruijn <[email protected]>
Signed-off-by: Cong Wang <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
net/packet/af_packet.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

--- a/net/packet/af_packet.c
+++ b/net/packet/af_packet.c
@@ -4322,7 +4322,7 @@ static int packet_set_ring(struct sock *
register_prot_hook(sk);
}
spin_unlock(&po->bind_lock);
- if (closing && (po->tp_version > TPACKET_V2)) {
+ if (pg_vec && (po->tp_version > TPACKET_V2)) {
/* Because we don't support block-based V3 on tx-ring */
if (!tx_ring)
prb_shutdown_retire_blk_timer(po, rb_queue);


2017-08-09 18:15:57

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.9 51/93] net: ethernet: nb8800: Handle all 4 RGMII modes identically

4.9-stable review patch. If anyone has any objections, please let me know.

------------------

From: Marc Gonzalez <[email protected]>


[ Upstream commit 4813497b537c6208c90d6cbecac5072d347de900 ]

Before commit bf8f6952a233 ("Add blurb about RGMII") it was unclear
whose responsibility it was to insert the required clock skew, and
in hindsight, some PHY drivers got it wrong. The solution forward
is to introduce a new property, explicitly requiring skew from the
node to which it is attached. In the interim, this driver will handle
all 4 RGMII modes identically (no skew).

Fixes: 52dfc8301248 ("net: ethernet: add driver for Aurora VLSI NB8800 Ethernet controller")
Signed-off-by: Marc Gonzalez <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
drivers/net/ethernet/aurora/nb8800.c | 9 ++++-----
1 file changed, 4 insertions(+), 5 deletions(-)

--- a/drivers/net/ethernet/aurora/nb8800.c
+++ b/drivers/net/ethernet/aurora/nb8800.c
@@ -609,7 +609,7 @@ static void nb8800_mac_config(struct net
mac_mode |= HALF_DUPLEX;

if (gigabit) {
- if (priv->phy_mode == PHY_INTERFACE_MODE_RGMII)
+ if (phy_interface_is_rgmii(dev->phydev))
mac_mode |= RGMII_MODE;

mac_mode |= GMAC_MODE;
@@ -1277,11 +1277,10 @@ static int nb8800_tangox_init(struct net
break;

case PHY_INTERFACE_MODE_RGMII:
- pad_mode = PAD_MODE_RGMII;
- break;
-
+ case PHY_INTERFACE_MODE_RGMII_ID:
+ case PHY_INTERFACE_MODE_RGMII_RXID:
case PHY_INTERFACE_MODE_RGMII_TXID:
- pad_mode = PAD_MODE_RGMII | PAD_MODE_GTX_CLK_DELAY;
+ pad_mode = PAD_MODE_RGMII;
break;

default:


2017-08-09 18:16:14

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.9 50/93] ipv6: Dont increase IPSTATS_MIB_FRAGFAILS twice in ip6_fragment()

4.9-stable review patch. If anyone has any objections, please let me know.

------------------

From: Stefano Brivio <[email protected]>


[ Upstream commit afce615aaabfbaad02550e75c0bec106dafa1adf ]

RFC 2465 defines ipv6IfStatsOutFragFails as:

"The number of IPv6 datagrams that have been discarded
because they needed to be fragmented at this output
interface but could not be."

The existing implementation, instead, would increase the counter
twice in case we fail to allocate room for single fragments:
once for the fragment, once for the datagram.

This didn't look intentional though. In one of the two affected
affected failure paths, the double increase was simply a result
of a new 'goto fail' statement, introduced to avoid a skb leak.
The other path appears to be affected since at least 2.6.12-rc2.

Reported-by: Sabrina Dubroca <[email protected]>
Fixes: 1d325d217c7f ("ipv6: ip6_fragment: fix headroom tests and skb leak")
Signed-off-by: Stefano Brivio <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
net/ipv6/ip6_output.c | 4 ----
1 file changed, 4 deletions(-)

--- a/net/ipv6/ip6_output.c
+++ b/net/ipv6/ip6_output.c
@@ -662,8 +662,6 @@ int ip6_fragment(struct net *net, struct
*prevhdr = NEXTHDR_FRAGMENT;
tmp_hdr = kmemdup(skb_network_header(skb), hlen, GFP_ATOMIC);
if (!tmp_hdr) {
- IP6_INC_STATS(net, ip6_dst_idev(skb_dst(skb)),
- IPSTATS_MIB_FRAGFAILS);
err = -ENOMEM;
goto fail;
}
@@ -782,8 +780,6 @@ slow_path:
frag = alloc_skb(len + hlen + sizeof(struct frag_hdr) +
hroom + troom, GFP_ATOMIC);
if (!frag) {
- IP6_INC_STATS(net, ip6_dst_idev(skb_dst(skb)),
- IPSTATS_MIB_FRAGFAILS);
err = -ENOMEM;
goto fail;
}


2017-08-09 18:16:31

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.9 82/93] ARM: dts: sunxi: Change node name for pwrseq pin on Olinuxino-lime2-emmc

4.9-stable review patch. If anyone has any objections, please let me know.

------------------

From: Emmanuel Vadot <[email protected]>


[ Upstream commit 3116d37651d77125bf50f81f859b1278e02ccce6 ]

The node name for the power seq pin is mmc2@0 like the mmc2_pins_a one.
This makes the original node (mmc2_pins_a) scrapped out of the dtb and
result in a unusable eMMC if U-Boot didn't configured the pins to the
correct functions.

Signed-off-by: Emmanuel Vadot <[email protected]>
Signed-off-by: Maxime Ripard <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
arch/arm/boot/dts/sun7i-a20-olinuxino-lime2-emmc.dts | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

--- a/arch/arm/boot/dts/sun7i-a20-olinuxino-lime2-emmc.dts
+++ b/arch/arm/boot/dts/sun7i-a20-olinuxino-lime2-emmc.dts
@@ -56,7 +56,7 @@
};

&pio {
- mmc2_pins_nrst: mmc2@0 {
+ mmc2_pins_nrst: mmc2-rst-pin {
allwinner,pins = "PC16";
allwinner,function = "gpio_out";
allwinner,drive = <SUN4I_PINCTRL_10_MA>;


2017-08-09 18:16:51

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.9 80/93] drm/virtio: fix framebuffer sparse warning

4.9-stable review patch. If anyone has any objections, please let me know.

------------------

From: Gerd Hoffmann <[email protected]>


[ Upstream commit 71d3f6ef7f5af38dea2975ec5715c88bae92e92d ]

virtio uses normal ram as backing storage for the framebuffer, so we
should assign the address to new screen_buffer (added by commit
17a7b0b4d9749f80d365d7baff5dec2f54b0e992) instead of screen_base.

Reported-by: Michael S. Tsirkin <[email protected]>
Signed-off-by: Gerd Hoffmann <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
drivers/gpu/drm/virtio/virtgpu_fb.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

--- a/drivers/gpu/drm/virtio/virtgpu_fb.c
+++ b/drivers/gpu/drm/virtio/virtgpu_fb.c
@@ -337,7 +337,7 @@ static int virtio_gpufb_create(struct dr
info->fbops = &virtio_gpufb_ops;
info->pixmap.flags = FB_PIXMAP_SYSTEM;

- info->screen_base = obj->vmap;
+ info->screen_buffer = obj->vmap;
info->screen_size = obj->gem_base.size;
drm_fb_helper_fill_fix(info, fb->pitches[0], fb->depth);
drm_fb_helper_fill_var(info, &vfbdev->helper,


2017-08-09 18:15:41

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.9 72/93] x86/boot: Add missing declaration of string functions

4.9-stable review patch. If anyone has any objections, please let me know.

------------------

From: Nicholas Mc Guire <[email protected]>


[ Upstream commit fac69d0efad08fc15e4dbfc116830782acc0dc9a ]

Add the missing declarations of basic string functions to string.h to allow
a clean build.

Fixes: 5be865661516 ("String-handling functions for the new x86 setup code.")
Signed-off-by: Nicholas Mc Guire <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Thomas Gleixner <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
arch/x86/boot/string.c | 1 +
arch/x86/boot/string.h | 9 +++++++++
2 files changed, 10 insertions(+)

--- a/arch/x86/boot/string.c
+++ b/arch/x86/boot/string.c
@@ -14,6 +14,7 @@

#include <linux/types.h>
#include "ctype.h"
+#include "string.h"

int memcmp(const void *s1, const void *s2, size_t len)
{
--- a/arch/x86/boot/string.h
+++ b/arch/x86/boot/string.h
@@ -18,4 +18,13 @@ int memcmp(const void *s1, const void *s
#define memset(d,c,l) __builtin_memset(d,c,l)
#define memcmp __builtin_memcmp

+extern int strcmp(const char *str1, const char *str2);
+extern int strncmp(const char *cs, const char *ct, size_t count);
+extern size_t strlen(const char *s);
+extern char *strstr(const char *s1, const char *s2);
+extern size_t strnlen(const char *s, size_t maxlen);
+extern unsigned int atou(const char *s);
+extern unsigned long long simple_strtoull(const char *cp, char **endp,
+ unsigned int base);
+
#endif /* BOOT_STRING_H */


2017-08-09 18:16:50

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.9 81/93] ARM: dts: sun8i: Support DTB build for NanoPi M1

4.9-stable review patch. If anyone has any objections, please let me know.

------------------

From: Milo Kim <[email protected]>


[ Upstream commit 661ccdc1a95f18ab6c1373322fde09afd5b90a1f ]

The commit 10efbf5f1633 ("ARM: dts: sun8i: Add dts file for NanoPi M1 SBC")
introduced NanoPi M1 board but it's missing in Allwinner H3 DTB build.

Signed-off-by: Milo Kim <[email protected]>
Signed-off-by: Maxime Ripard <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
arch/arm/boot/dts/Makefile | 1 +
1 file changed, 1 insertion(+)

--- a/arch/arm/boot/dts/Makefile
+++ b/arch/arm/boot/dts/Makefile
@@ -820,6 +820,7 @@ dtb-$(CONFIG_MACH_SUN8I) += \
sun8i-a83t-allwinner-h8homlet-v2.dtb \
sun8i-a83t-cubietruck-plus.dtb \
sun8i-h3-bananapi-m2-plus.dtb \
+ sun8i-h3-nanopi-m1.dtb \
sun8i-h3-nanopi-neo.dtb \
sun8i-h3-orangepi-2.dtb \
sun8i-h3-orangepi-lite.dtb \


2017-08-09 18:17:24

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.9 79/93] scsi: qla2xxx: Get mutex lock before checking optrom_state

4.9-stable review patch. If anyone has any objections, please let me know.

------------------

From: "Milan P. Gandhi" <[email protected]>


[ Upstream commit c7702b8c22712a06080e10f1d2dee1a133ec8809 ]

There is a race condition with qla2xxx optrom functions where one thread
might modify optrom buffer, optrom_state while other thread is still
reading from it.

In couple of crashes, it was found that we had successfully passed the
following 'if' check where we confirm optrom_state to be
QLA_SREADING. But by the time we acquired mutex lock to proceed with
memory_read_from_buffer function, some other thread/process had already
modified that option rom buffer and optrom_state from QLA_SREADING to
QLA_SWAITING. Then we got ha->optrom_buffer 0x0 and crashed the system:

if (ha->optrom_state != QLA_SREADING)
return 0;

mutex_lock(&ha->optrom_mutex);
rval = memory_read_from_buffer(buf, count, &off, ha->optrom_buffer,
ha->optrom_region_size);
mutex_unlock(&ha->optrom_mutex);

With current optrom function we get following crash due to a race
condition:

[ 1479.466679] BUG: unable to handle kernel NULL pointer dereference at (null)
[ 1479.466707] IP: [<ffffffff81326756>] memcpy+0x6/0x110
[...]
[ 1479.473673] Call Trace:
[ 1479.474296] [<ffffffff81225cbc>] ? memory_read_from_buffer+0x3c/0x60
[ 1479.474941] [<ffffffffa01574dc>] qla2x00_sysfs_read_optrom+0x9c/0xc0 [qla2xxx]
[ 1479.475571] [<ffffffff8127e76b>] read+0xdb/0x1f0
[ 1479.476206] [<ffffffff811fdf9e>] vfs_read+0x9e/0x170
[ 1479.476839] [<ffffffff811feb6f>] SyS_read+0x7f/0xe0
[ 1479.477466] [<ffffffff816964c9>] system_call_fastpath+0x16/0x1b

Below patch modifies qla2x00_sysfs_read_optrom,
qla2x00_sysfs_write_optrom functions to get the mutex_lock before
checking ha->optrom_state to avoid similar crashes.

The patch was applied and tested and same crashes were no longer
observed again.

Tested-by: Milan P. Gandhi <[email protected]>
Signed-off-by: Milan P. Gandhi <[email protected]>
Reviewed-by: Laurence Oberman <[email protected]>
Acked-by: Himanshu Madhani <[email protected]>
Signed-off-by: Martin K. Petersen <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
drivers/scsi/qla2xxx/qla_attr.c | 18 +++++++++++++-----
1 file changed, 13 insertions(+), 5 deletions(-)

--- a/drivers/scsi/qla2xxx/qla_attr.c
+++ b/drivers/scsi/qla2xxx/qla_attr.c
@@ -243,12 +243,15 @@ qla2x00_sysfs_read_optrom(struct file *f
struct qla_hw_data *ha = vha->hw;
ssize_t rval = 0;

+ mutex_lock(&ha->optrom_mutex);
+
if (ha->optrom_state != QLA_SREADING)
- return 0;
+ goto out;

- mutex_lock(&ha->optrom_mutex);
rval = memory_read_from_buffer(buf, count, &off, ha->optrom_buffer,
ha->optrom_region_size);
+
+out:
mutex_unlock(&ha->optrom_mutex);

return rval;
@@ -263,14 +266,19 @@ qla2x00_sysfs_write_optrom(struct file *
struct device, kobj)));
struct qla_hw_data *ha = vha->hw;

- if (ha->optrom_state != QLA_SWRITING)
+ mutex_lock(&ha->optrom_mutex);
+
+ if (ha->optrom_state != QLA_SWRITING) {
+ mutex_unlock(&ha->optrom_mutex);
return -EINVAL;
- if (off > ha->optrom_region_size)
+ }
+ if (off > ha->optrom_region_size) {
+ mutex_unlock(&ha->optrom_mutex);
return -ERANGE;
+ }
if (off + count > ha->optrom_region_size)
count = ha->optrom_region_size - off;

- mutex_lock(&ha->optrom_mutex);
memcpy(&ha->optrom_buffer[off], buf, count);
mutex_unlock(&ha->optrom_mutex);



2017-08-09 18:15:40

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.9 74/93] ASoC: rt5645: set sel_i2s_pre_div1 to 2

4.9-stable review patch. If anyone has any objections, please let me know.

------------------

From: Bard Liao <[email protected]>


[ Upstream commit 02c5c03283c52157d336abf5e44ffcda10579fbf ]

The i2s clock pre-divider 1 is used for both i2s1 and sysclk.
The i2s1 is usually used for the main i2s and the pre-divider
will be set in hw_params function.

However, if i2s2 is used, the pre-divider is not set in the hw_params
function and the default value of i2s clock pre-divider 1 is too high
for sysclk and DMIC usage. Fix by overriding default divider value to 2.

Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=95681
Tested-by: Pierre-Louis Bossart <[email protected]>
Signed-off-by: Bard Liao <[email protected]>
Signed-off-by: Mark Brown <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
sound/soc/codecs/rt5645.c | 3 +++
1 file changed, 3 insertions(+)

--- a/sound/soc/codecs/rt5645.c
+++ b/sound/soc/codecs/rt5645.c
@@ -3833,6 +3833,9 @@ static int rt5645_i2c_probe(struct i2c_c
}
}

+ regmap_update_bits(rt5645->regmap, RT5645_ADDA_CLK1,
+ RT5645_I2S_PD1_MASK, RT5645_I2S_PD1_2);
+
if (rt5645->pdata.jd_invert) {
regmap_update_bits(rt5645->regmap, RT5645_IRQ_CTRL2,
RT5645_JD_1_1_MASK, RT5645_JD_1_1_INV);


2017-08-09 18:17:55

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.9 77/93] ipv4: make tcp_notsent_lowat sysctl knob behave as true unsigned int

4.9-stable review patch. If anyone has any objections, please let me know.

------------------

From: Pavel Tikhomirov <[email protected]>


[ Upstream commit b007f09072ca8afa118ade333e717ba443e8d807 ]

> cat /proc/sys/net/ipv4/tcp_notsent_lowat
-1
> echo 4294967295 > /proc/sys/net/ipv4/tcp_notsent_lowat
-bash: echo: write error: Invalid argument
> echo -2147483648 > /proc/sys/net/ipv4/tcp_notsent_lowat
> cat /proc/sys/net/ipv4/tcp_notsent_lowat
-2147483648

but in documentation we have "tcp_notsent_lowat - UNSIGNED INTEGER"

v2: simplify to just proc_douintvec
Signed-off-by: Pavel Tikhomirov <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
net/ipv4/sysctl_net_ipv4.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

--- a/net/ipv4/sysctl_net_ipv4.c
+++ b/net/ipv4/sysctl_net_ipv4.c
@@ -958,7 +958,7 @@ static struct ctl_table ipv4_net_table[]
.data = &init_net.ipv4.sysctl_tcp_notsent_lowat,
.maxlen = sizeof(unsigned int),
.mode = 0644,
- .proc_handler = proc_dointvec,
+ .proc_handler = proc_douintvec,
},
#ifdef CONFIG_IP_ROUTE_MULTIPATH
{


2017-08-09 18:15:39

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.9 73/93] spi: spi-axi: Free resources on error path

4.9-stable review patch. If anyone has any objections, please let me know.

------------------

From: Christophe JAILLET <[email protected]>


[ Upstream commit 9620ca90115d4bd700f05862d3b210a266a66efe ]

We should go to 'err_put_master' here instead of returning directly.
Otherwise a call to 'spi_master_put' is missing.

Signed-off-by: Christophe JAILLET <[email protected]>
Acked-by: Lars-Peter Clausen <[email protected]>
Signed-off-by: Mark Brown <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
drivers/spi/spi-axi-spi-engine.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)

--- a/drivers/spi/spi-axi-spi-engine.c
+++ b/drivers/spi/spi-axi-spi-engine.c
@@ -494,7 +494,8 @@ static int spi_engine_probe(struct platf
SPI_ENGINE_VERSION_MAJOR(version),
SPI_ENGINE_VERSION_MINOR(version),
SPI_ENGINE_VERSION_PATCH(version));
- return -ENODEV;
+ ret = -ENODEV;
+ goto err_put_master;
}

spi_engine->clk = devm_clk_get(&pdev->dev, "s_axi_aclk");


2017-08-09 18:18:19

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.9 76/93] phy state machine: failsafe leave invalid RUNNING state

4.9-stable review patch. If anyone has any objections, please let me know.

------------------

From: Zefir Kurtisi <[email protected]>


[ Upstream commit 811a919135b980bac8009d042acdccf10dc1ef5e ]

While in RUNNING state, phy_state_machine() checks for link changes by
comparing phydev->link before and after calling phy_read_status().
This works as long as it is guaranteed that phydev->link is never
changed outside the phy_state_machine().

If in some setups this happens, it causes the state machine to miss
a link loss and remain RUNNING despite phydev->link being 0.

This has been observed running a dsa setup with a process continuously
polling the link states over ethtool each second (SNMPD RFC-1213
agent). Disconnecting the link on a phy followed by a ETHTOOL_GSET
causes dsa_slave_get_settings() / dsa_slave_get_link_ksettings() to
call phy_read_status() and with that modify the link status - and
with that bricking the phy state machine.

This patch adds a fail-safe check while in RUNNING, which causes to
move to CHANGELINK when the link is gone and we are still RUNNING.

Signed-off-by: Zefir Kurtisi <[email protected]>
Reviewed-by: Florian Fainelli <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
drivers/net/phy/phy.c | 9 +++++++++
1 file changed, 9 insertions(+)

--- a/drivers/net/phy/phy.c
+++ b/drivers/net/phy/phy.c
@@ -1063,6 +1063,15 @@ void phy_state_machine(struct work_struc
if (old_link != phydev->link)
phydev->state = PHY_CHANGELINK;
}
+ /*
+ * Failsafe: check that nobody set phydev->link=0 between two
+ * poll cycles, otherwise we won't leave RUNNING state as long
+ * as link remains down.
+ */
+ if (!phydev->link && phydev->state == PHY_RUNNING) {
+ phydev->state = PHY_CHANGELINK;
+ phydev_err(phydev, "no link in PHY_RUNNING\n");
+ }
break;
case PHY_CHANGELINK:
err = phy_read_status(phydev);


2017-08-09 18:18:30

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.9 53/93] dccp: fix a memleak that dccp_ipv4 doesnt put reqsk properly

4.9-stable review patch. If anyone has any objections, please let me know.

------------------

From: Xin Long <[email protected]>


[ Upstream commit b7953d3c0e30a5fc944f6b7bd0bcceb0794bcd85 ]

The patch "dccp: fix a memleak that dccp_ipv6 doesn't put reqsk
properly" fixed reqsk refcnt leak for dccp_ipv6. The same issue
exists on dccp_ipv4.

This patch is to fix it for dccp_ipv4.

Signed-off-by: Xin Long <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
net/dccp/ipv4.c | 1 +
1 file changed, 1 insertion(+)

--- a/net/dccp/ipv4.c
+++ b/net/dccp/ipv4.c
@@ -637,6 +637,7 @@ int dccp_v4_conn_request(struct sock *sk
goto drop_and_free;

inet_csk_reqsk_queue_hash_add(sk, req, DCCP_TIMEOUT_INIT);
+ reqsk_put(req);
return 0;

drop_and_free:


2017-08-09 18:18:48

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.9 89/93] signal: protect SIGNAL_UNKILLABLE from unintentional clearing.

4.9-stable review patch. If anyone has any objections, please let me know.

------------------

From: Jamie Iles <[email protected]>


[ Upstream commit 2d39b3cd34e6d323720d4c61bd714f5ae202c022 ]

Since commit 00cd5c37afd5 ("ptrace: permit ptracing of /sbin/init") we
can now trace init processes. init is initially protected with
SIGNAL_UNKILLABLE which will prevent fatal signals such as SIGSTOP, but
there are a number of paths during tracing where SIGNAL_UNKILLABLE can
be implicitly cleared.

This can result in init becoming stoppable/killable after tracing. For
example, running:

while true; do kill -STOP 1; done &
strace -p 1

and then stopping strace and the kill loop will result in init being
left in state TASK_STOPPED. Sending SIGCONT to init will resume it, but
init will now respond to future SIGSTOP signals rather than ignoring
them.

Make sure that when setting SIGNAL_STOP_CONTINUED/SIGNAL_STOP_STOPPED
that we don't clear SIGNAL_UNKILLABLE.

Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Jamie Iles <[email protected]>
Acked-by: Oleg Nesterov <[email protected]>
Cc: Alexander Viro <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
include/linux/sched.h | 10 ++++++++++
kernel/signal.c | 4 ++--
2 files changed, 12 insertions(+), 2 deletions(-)

--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -830,6 +830,16 @@ struct signal_struct {

#define SIGNAL_UNKILLABLE 0x00000040 /* for init: ignore fatal signals */

+#define SIGNAL_STOP_MASK (SIGNAL_CLD_MASK | SIGNAL_STOP_STOPPED | \
+ SIGNAL_STOP_CONTINUED)
+
+static inline void signal_set_stop_flags(struct signal_struct *sig,
+ unsigned int flags)
+{
+ WARN_ON(sig->flags & (SIGNAL_GROUP_EXIT|SIGNAL_GROUP_COREDUMP));
+ sig->flags = (sig->flags & ~SIGNAL_STOP_MASK) | flags;
+}
+
/* If true, all threads except ->group_exit_task have pending SIGKILL */
static inline int signal_group_exit(const struct signal_struct *sig)
{
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -346,7 +346,7 @@ static bool task_participate_group_stop(
* fresh group stop. Read comment in do_signal_stop() for details.
*/
if (!sig->group_stop_count && !(sig->flags & SIGNAL_STOP_STOPPED)) {
- sig->flags = SIGNAL_STOP_STOPPED;
+ signal_set_stop_flags(sig, SIGNAL_STOP_STOPPED);
return true;
}
return false;
@@ -845,7 +845,7 @@ static bool prepare_signal(int sig, stru
* will take ->siglock, notice SIGNAL_CLD_MASK, and
* notify its parent. See get_signal_to_deliver().
*/
- signal->flags = why | SIGNAL_STOP_CONTINUED;
+ signal_set_stop_flags(signal, why | SIGNAL_STOP_CONTINUED);
signal->group_stop_count = 0;
signal->group_exit_code = 0;
}


2017-08-09 18:18:54

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.9 92/93] ipv4: Should use consistent conditional judgement for ip fragment in __ip_append_data and ip_finish_output

4.9-stable review patch. If anyone has any objections, please let me know.

------------------

From: zheng li <[email protected]>


[ Upstream commit 0a28cfd51e17f4f0a056bcf66bfbe492c3b99f38 ]

There is an inconsistent conditional judgement in __ip_append_data and
ip_finish_output functions, the variable length in __ip_append_data just
include the length of application's payload and udp header, don't include
the length of ip header, but in ip_finish_output use
(skb->len > ip_skb_dst_mtu(skb)) as judgement, and skb->len include the
length of ip header.

That causes some particular application's udp payload whose length is
between (MTU - IP Header) and MTU were fragmented by ip_fragment even
though the rst->dev support UFO feature.

Add the length of ip header to length in __ip_append_data to keep
consistent conditional judgement as ip_finish_output for ip fragment.

Signed-off-by: Zheng Li <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
net/ipv4/ip_output.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

--- a/net/ipv4/ip_output.c
+++ b/net/ipv4/ip_output.c
@@ -936,7 +936,7 @@ static int __ip_append_data(struct sock
csummode = CHECKSUM_PARTIAL;

cork->length += length;
- if (((length > mtu) || (skb && skb_is_gso(skb))) &&
+ if ((((length + fragheaderlen) > mtu) || (skb && skb_is_gso(skb))) &&
(sk->sk_protocol == IPPROTO_UDP) &&
(rt->dst.dev->features & NETIF_F_UFO) && !rt->dst.header_len &&
(sk->sk_type == SOCK_DGRAM) && !sk->sk_no_check_tx) {


2017-08-09 18:18:59

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.9 52/93] dccp: fix a memleak that dccp_ipv6 doesnt put reqsk properly

4.9-stable review patch. If anyone has any objections, please let me know.

------------------

From: Xin Long <[email protected]>


[ Upstream commit 0c2232b0a71db0ac1d22f751aa1ac0cadb950fd2 ]

In dccp_v6_conn_request, after reqsk gets alloced and hashed into
ehash table, reqsk's refcnt is set 3. one is for req->rsk_timer,
one is for hlist, and the other one is for current using.

The problem is when dccp_v6_conn_request returns and finishes using
reqsk, it doesn't put reqsk. This will cause reqsk refcnt leaks and
reqsk obj never gets freed.

Jianlin found this issue when running dccp_memleak.c in a loop, the
system memory would run out.

dccp_memleak.c:
int s1 = socket(PF_INET6, 6, IPPROTO_IP);
bind(s1, &sa1, 0x20);
listen(s1, 0x9);
int s2 = socket(PF_INET6, 6, IPPROTO_IP);
connect(s2, &sa1, 0x20);
close(s1);
close(s2);

This patch is to put the reqsk before dccp_v6_conn_request returns,
just as what tcp_conn_request does.

Reported-by: Jianlin Shi <[email protected]>
Signed-off-by: Xin Long <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
net/dccp/ipv6.c | 1 +
1 file changed, 1 insertion(+)

--- a/net/dccp/ipv6.c
+++ b/net/dccp/ipv6.c
@@ -380,6 +380,7 @@ static int dccp_v6_conn_request(struct s
goto drop_and_free;

inet_csk_reqsk_queue_hash_add(sk, req, DCCP_TIMEOUT_INIT);
+ reqsk_put(req);
return 0;

drop_and_free:


2017-08-09 18:19:29

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.9 83/93] iw_cxgb4: do not send RX_DATA_ACK CPLs after close/abort

4.9-stable review patch. If anyone has any objections, please let me know.

------------------

From: Steve Wise <[email protected]>


[ Upstream commit 3bcf96e0183f5c863657cb6ae9adad307a0f6071 ]

Function rx_data(), which handles ingress CPL_RX_DATA messages, was
always sending an RX_DATA_ACK with the goal of updating the credits.
However, if the RDMA connection is moved out of FPDU mode abruptly,
then it is possible for iw_cxgb4 to process queued RX_DATA CPLs after HW
has aborted the connection. These CPLs should not trigger RX_DATA_ACKS.
If they do, HW can see a READ after DELETE of the DB_LE hash entry for
the tid and post a LE_DB HashTblMemCrcError.

Signed-off-by: Steve Wise <[email protected]>
Signed-off-by: Doug Ledford <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
drivers/infiniband/hw/cxgb4/cm.c | 7 ++++---
1 file changed, 4 insertions(+), 3 deletions(-)

--- a/drivers/infiniband/hw/cxgb4/cm.c
+++ b/drivers/infiniband/hw/cxgb4/cm.c
@@ -1804,20 +1804,21 @@ static int rx_data(struct c4iw_dev *dev,
skb_trim(skb, dlen);
mutex_lock(&ep->com.mutex);

- /* update RX credits */
- update_rx_credits(ep, dlen);
-
switch (ep->com.state) {
case MPA_REQ_SENT:
+ update_rx_credits(ep, dlen);
ep->rcv_seq += dlen;
disconnect = process_mpa_reply(ep, skb);
break;
case MPA_REQ_WAIT:
+ update_rx_credits(ep, dlen);
ep->rcv_seq += dlen;
disconnect = process_mpa_request(ep, skb);
break;
case FPDU_MODE: {
struct c4iw_qp_attributes attrs;
+
+ update_rx_credits(ep, dlen);
BUG_ON(!ep->com.qp);
if (status)
pr_err("%s Unexpected streaming data." \


2017-08-09 18:19:49

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.9 91/93] net/mlx5: E-Switch, Re-enable RoCE on mode change only after FDB destroy

4.9-stable review patch. If anyone has any objections, please let me know.

------------------

From: Or Gerlitz <[email protected]>


[ Upstream commit 5bae8c031053c69b4aa74b7f1ba15d4ec8426208 ]

We must re-enable RoCE on the e-switch management port (PF) only after destroying
the FDB in its switchdev/offloaded mode. Otherwise, when encapsulation is supported,
this re-enablement will fail.

Also, it's more natural and symmetric to disable RoCE on the PF before we create
the FDB under switchdev mode, so do that as well and revert if getting into error
during the mode change later.

Fixes: 9da34cd34e85 ('net/mlx5: Disable RoCE on the e-switch management [..]')
Signed-off-by: Or Gerlitz <[email protected]>
Reviewed-by: Hadar Hen Zion <[email protected]>
Signed-off-by: Saeed Mahameed <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c | 29 ++++++++-----
1 file changed, 18 insertions(+), 11 deletions(-)

--- a/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c
@@ -651,9 +651,14 @@ int esw_offloads_init(struct mlx5_eswitc
int vport;
int err;

+ /* disable PF RoCE so missed packets don't go through RoCE steering */
+ mlx5_dev_list_lock();
+ mlx5_remove_dev_by_protocol(esw->dev, MLX5_INTERFACE_PROTOCOL_IB);
+ mlx5_dev_list_unlock();
+
err = esw_create_offloads_fdb_table(esw, nvports);
if (err)
- return err;
+ goto create_fdb_err;

err = esw_create_offloads_table(esw);
if (err)
@@ -673,11 +678,6 @@ int esw_offloads_init(struct mlx5_eswitc
goto err_reps;
}

- /* disable PF RoCE so missed packets don't go through RoCE steering */
- mlx5_dev_list_lock();
- mlx5_remove_dev_by_protocol(esw->dev, MLX5_INTERFACE_PROTOCOL_IB);
- mlx5_dev_list_unlock();
-
return 0;

err_reps:
@@ -694,6 +694,13 @@ create_fg_err:

create_ft_err:
esw_destroy_offloads_fdb_table(esw);
+
+create_fdb_err:
+ /* enable back PF RoCE */
+ mlx5_dev_list_lock();
+ mlx5_add_dev_by_protocol(esw->dev, MLX5_INTERFACE_PROTOCOL_IB);
+ mlx5_dev_list_unlock();
+
return err;
}

@@ -701,11 +708,6 @@ static int esw_offloads_stop(struct mlx5
{
int err, err1, num_vfs = esw->dev->priv.sriov.num_vfs;

- /* enable back PF RoCE */
- mlx5_dev_list_lock();
- mlx5_add_dev_by_protocol(esw->dev, MLX5_INTERFACE_PROTOCOL_IB);
- mlx5_dev_list_unlock();
-
mlx5_eswitch_disable_sriov(esw);
err = mlx5_eswitch_enable_sriov(esw, num_vfs, SRIOV_LEGACY);
if (err) {
@@ -715,6 +717,11 @@ static int esw_offloads_stop(struct mlx5
esw_warn(esw->dev, "Failed setting eswitch back to offloads, err %d\n", err);
}

+ /* enable back PF RoCE */
+ mlx5_dev_list_lock();
+ mlx5_add_dev_by_protocol(esw->dev, MLX5_INTERFACE_PROTOCOL_IB);
+ mlx5_dev_list_unlock();
+
return err;
}



2017-08-09 18:18:45

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.9 87/93] mm, slab: make sure that KMALLOC_MAX_SIZE will fit into MAX_ORDER

4.9-stable review patch. If anyone has any objections, please let me know.

------------------

From: Michal Hocko <[email protected]>


[ Upstream commit bb1107f7c6052c863692a41f78c000db792334bf ]

Andrey Konovalov has reported the following warning triggered by the
syzkaller fuzzer.

WARNING: CPU: 1 PID: 9935 at mm/page_alloc.c:3511 __alloc_pages_nodemask+0x159c/0x1e20
Kernel panic - not syncing: panic_on_warn set ...
CPU: 1 PID: 9935 Comm: syz-executor0 Not tainted 4.9.0-rc7+ #34
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
Call Trace:
__alloc_pages_slowpath mm/page_alloc.c:3511
__alloc_pages_nodemask+0x159c/0x1e20 mm/page_alloc.c:3781
alloc_pages_current+0x1c7/0x6b0 mm/mempolicy.c:2072
alloc_pages include/linux/gfp.h:469
kmalloc_order+0x1f/0x70 mm/slab_common.c:1015
kmalloc_order_trace+0x1f/0x160 mm/slab_common.c:1026
kmalloc_large include/linux/slab.h:422
__kmalloc+0x210/0x2d0 mm/slub.c:3723
kmalloc include/linux/slab.h:495
ep_write_iter+0x167/0xb50 drivers/usb/gadget/legacy/inode.c:664
new_sync_write fs/read_write.c:499
__vfs_write+0x483/0x760 fs/read_write.c:512
vfs_write+0x170/0x4e0 fs/read_write.c:560
SYSC_write fs/read_write.c:607
SyS_write+0xfb/0x230 fs/read_write.c:599
entry_SYSCALL_64_fastpath+0x1f/0xc2

The issue is caused by a lack of size check for the request size in
ep_write_iter which should be fixed. It, however, points to another
problem, that SLUB defines KMALLOC_MAX_SIZE too large because the its
KMALLOC_SHIFT_MAX is (MAX_ORDER + PAGE_SHIFT) which means that the
resulting page allocator request might be MAX_ORDER which is too large
(see __alloc_pages_slowpath).

The same applies to the SLOB allocator which allows even larger sizes.
Make sure that they are capped properly and never request more than
MAX_ORDER order.

Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Michal Hocko <[email protected]>
Reported-by: Andrey Konovalov <[email protected]>
Acked-by: Christoph Lameter <[email protected]>
Cc: Alexei Starovoitov <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
include/linux/slab.h | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)

--- a/include/linux/slab.h
+++ b/include/linux/slab.h
@@ -226,7 +226,7 @@ static inline const char *__check_heap_o
* (PAGE_SIZE*2). Larger requests are passed to the page allocator.
*/
#define KMALLOC_SHIFT_HIGH (PAGE_SHIFT + 1)
-#define KMALLOC_SHIFT_MAX (MAX_ORDER + PAGE_SHIFT)
+#define KMALLOC_SHIFT_MAX (MAX_ORDER + PAGE_SHIFT - 1)
#ifndef KMALLOC_SHIFT_LOW
#define KMALLOC_SHIFT_LOW 3
#endif
@@ -239,7 +239,7 @@ static inline const char *__check_heap_o
* be allocated from the same page.
*/
#define KMALLOC_SHIFT_HIGH PAGE_SHIFT
-#define KMALLOC_SHIFT_MAX 30
+#define KMALLOC_SHIFT_MAX (MAX_ORDER + PAGE_SHIFT - 1)
#ifndef KMALLOC_SHIFT_LOW
#define KMALLOC_SHIFT_LOW 3
#endif


2017-08-09 18:20:05

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.9 90/93] mm: dont dereference struct page fields of invalid pages

4.9-stable review patch. If anyone has any objections, please let me know.

------------------

From: Ard Biesheuvel <[email protected]>


[ Upstream commit f073bdc51771f5a5c7a8d1191bfc3ae371d44de7 ]

The VM_BUG_ON() check in move_freepages() checks whether the node id of
a page matches the node id of its zone. However, it does this before
having checked whether the struct page pointer refers to a valid struct
page to begin with. This is guaranteed in most cases, but may not be
the case if CONFIG_HOLES_IN_ZONE=y.

So reorder the VM_BUG_ON() with the pfn_valid_within() check.

Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Ard Biesheuvel <[email protected]>
Acked-by: Will Deacon <[email protected]>
Cc: Catalin Marinas <[email protected]>
Cc: Hanjun Guo <[email protected]>
Cc: Yisheng Xie <[email protected]>
Cc: Robert Richter <[email protected]>
Cc: James Morse <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
mm/page_alloc.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)

--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -1875,14 +1875,14 @@ int move_freepages(struct zone *zone,
#endif

for (page = start_page; page <= end_page;) {
- /* Make sure we are not inadvertently changing nodes */
- VM_BUG_ON_PAGE(page_to_nid(page) != zone_to_nid(zone), page);
-
if (!pfn_valid_within(page_to_pfn(page))) {
page++;
continue;
}

+ /* Make sure we are not inadvertently changing nodes */
+ VM_BUG_ON_PAGE(page_to_nid(page) != zone_to_nid(zone), page);
+
if (!PageBuddy(page)) {
page++;
continue;


2017-08-09 18:20:34

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.9 88/93] lib/Kconfig.debug: fix frv build failure

4.9-stable review patch. If anyone has any objections, please let me know.

------------------

From: Sudip Mukherjee <[email protected]>


[ Upstream commit da0510c47519fe0999cffe316e1d370e29f952be ]

The build of frv allmodconfig was failing with the errors like:

/tmp/cc0JSPc3.s: Assembler messages:
/tmp/cc0JSPc3.s:1839: Error: symbol `.LSLT0' is already defined
/tmp/cc0JSPc3.s:1842: Error: symbol `.LASLTP0' is already defined
/tmp/cc0JSPc3.s:1969: Error: symbol `.LELTP0' is already defined
/tmp/cc0JSPc3.s:1970: Error: symbol `.LELT0' is already defined

Commit 866ced950bcd ("kbuild: Support split debug info v4") introduced
splitting the debug info and keeping that in a separate file. Somehow,
the frv-linux gcc did not like that and I am guessing that instead of
splitting it started copying. The first report about this is at:

https://lists.01.org/pipermail/kbuild-all/2015-July/010527.html.

I will try and see if this can work with frv and if still fails I will
open a bug report with gcc. But meanwhile this is the easiest option to
solve build failure of frv.

Fixes: 866ced950bcd ("kbuild: Support split debug info v4")
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Sudip Mukherjee <[email protected]>
Reported-by: Fengguang Wu <[email protected]>
Cc: Andi Kleen <[email protected]>
Cc: David Howells <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
lib/Kconfig.debug | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

--- a/lib/Kconfig.debug
+++ b/lib/Kconfig.debug
@@ -145,7 +145,7 @@ config DEBUG_INFO_REDUCED

config DEBUG_INFO_SPLIT
bool "Produce split debuginfo in .dwo files"
- depends on DEBUG_INFO
+ depends on DEBUG_INFO && !FRV
help
Generate debug info into separate .dwo files. This significantly
reduces the build directory size for builds with DEBUG_INFO,


2017-08-09 18:18:43

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.9 84/93] nbd: blk_mq_init_queue returns an error code on failure, not NULL

4.9-stable review patch. If anyone has any objections, please let me know.

------------------

From: Jeff Moyer <[email protected]>


[ Upstream commit 25b4acfc7de0fc4da3bfea3a316f7282c6fbde81 ]

Additionally, don't assign directly to disk->queue, otherwise
blk_put_queue (called via put_disk) will choke (panic) on the errno
stored there.

Bug found by code inspection after Omar found a similar issue in
virtio_blk. Compile-tested only.

Signed-off-by: Jeff Moyer <[email protected]>
Reviewed-by: Omar Sandoval <[email protected]>
Reviewed-by: Josef Bacik <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
drivers/block/nbd.c | 6 ++++--
1 file changed, 4 insertions(+), 2 deletions(-)

--- a/drivers/block/nbd.c
+++ b/drivers/block/nbd.c
@@ -929,6 +929,7 @@ static int __init nbd_init(void)
return -ENOMEM;

for (i = 0; i < nbds_max; i++) {
+ struct request_queue *q;
struct gendisk *disk = alloc_disk(1 << part_shift);
if (!disk)
goto out;
@@ -954,12 +955,13 @@ static int __init nbd_init(void)
* every gendisk to have its very own request_queue struct.
* These structs are big so we dynamically allocate them.
*/
- disk->queue = blk_mq_init_queue(&nbd_dev[i].tag_set);
- if (!disk->queue) {
+ q = blk_mq_init_queue(&nbd_dev[i].tag_set);
+ if (IS_ERR(q)) {
blk_mq_free_tag_set(&nbd_dev[i].tag_set);
put_disk(disk);
goto out;
}
+ disk->queue = q;

/*
* Tell the block layer that we are not a rotational device


2017-08-09 18:20:50

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.9 86/93] ARM: 8632/1: ftrace: fix syscall name matching

4.9-stable review patch. If anyone has any objections, please let me know.

------------------

From: Rabin Vincent <[email protected]>


[ Upstream commit 270c8cf1cacc69cb8d99dea812f06067a45e4609 ]

ARM has a few system calls (most notably mmap) for which the names of
the functions which are referenced in the syscall table do not match the
names of the syscall tracepoints. As a consequence of this, these
tracepoints are not made available. Implement
arch_syscall_match_sym_name to fix this and allow tracing even these
system calls.

Signed-off-by: Rabin Vincent <[email protected]>
Signed-off-by: Russell King <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
arch/arm/include/asm/ftrace.h | 18 ++++++++++++++++++
1 file changed, 18 insertions(+)

--- a/arch/arm/include/asm/ftrace.h
+++ b/arch/arm/include/asm/ftrace.h
@@ -54,6 +54,24 @@ static inline void *return_address(unsig

#define ftrace_return_address(n) return_address(n)

+#define ARCH_HAS_SYSCALL_MATCH_SYM_NAME
+
+static inline bool arch_syscall_match_sym_name(const char *sym,
+ const char *name)
+{
+ if (!strcmp(sym, "sys_mmap2"))
+ sym = "sys_mmap_pgoff";
+ else if (!strcmp(sym, "sys_statfs64_wrapper"))
+ sym = "sys_statfs64";
+ else if (!strcmp(sym, "sys_fstatfs64_wrapper"))
+ sym = "sys_fstatfs64";
+ else if (!strcmp(sym, "sys_arm_fadvise64_64"))
+ sym = "sys_fadvise64_64";
+
+ /* Ignore case since sym may start with "SyS" instead of "sys" */
+ return !strcasecmp(sym, name);
+}
+
#endif /* ifndef __ASSEMBLY__ */

#endif /* _ASM_ARM_FTRACE */


2017-08-09 18:21:13

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.9 85/93] virtio_blk: fix panic in initialization error path

4.9-stable review patch. If anyone has any objections, please let me know.

------------------

From: Omar Sandoval <[email protected]>


[ Upstream commit 6bf6b0aa3da84a3d9126919a94c49c0fb7ee2fb3 ]

If blk_mq_init_queue() returns an error, it gets assigned to
vblk->disk->queue. Then, when we call put_disk(), we end up calling
blk_put_queue() with the ERR_PTR, causing a bad dereference. Fix it by
only assigning to vblk->disk->queue on success.

Signed-off-by: Omar Sandoval <[email protected]>
Reviewed-by: Jeff Moyer <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
drivers/block/virtio_blk.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)

--- a/drivers/block/virtio_blk.c
+++ b/drivers/block/virtio_blk.c
@@ -630,11 +630,12 @@ static int virtblk_probe(struct virtio_d
if (err)
goto out_put_disk;

- q = vblk->disk->queue = blk_mq_init_queue(&vblk->tag_set);
+ q = blk_mq_init_queue(&vblk->tag_set);
if (IS_ERR(q)) {
err = -ENOMEM;
goto out_free_tags;
}
+ vblk->disk->queue = q;

q->queuedata = vblk;



2017-08-09 18:21:34

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.9 45/93] ipv4: initialize fib_trie prior to register_netdev_notifier call.

4.9-stable review patch. If anyone has any objections, please let me know.

------------------

From: Mahesh Bandewar <[email protected]>


[ Upstream commit 8799a221f5944a7d74516ecf46d58c28ec1d1f75 ]

Net stack initialization currently initializes fib-trie after the
first call to netdevice_notifier() call. In fact fib_trie initialization
needs to happen before first rtnl_register(). It does not cause any problem
since there are no devices UP at this moment, but trying to bring 'lo'
UP at initialization would make this assumption wrong and exposes the issue.

Fixes following crash

Call Trace:
? alternate_node_alloc+0x76/0xa0
fib_table_insert+0x1b7/0x4b0
fib_magic.isra.17+0xea/0x120
fib_add_ifaddr+0x7b/0x190
fib_netdev_event+0xc0/0x130
register_netdevice_notifier+0x1c1/0x1d0
ip_fib_init+0x72/0x85
ip_rt_init+0x187/0x1e9
ip_init+0xe/0x1a
inet_init+0x171/0x26c
? ipv4_offload_init+0x66/0x66
do_one_initcall+0x43/0x160
kernel_init_freeable+0x191/0x219
? rest_init+0x80/0x80
kernel_init+0xe/0x150
ret_from_fork+0x22/0x30
Code: f6 46 23 04 74 86 4c 89 f7 e8 ae 45 01 00 49 89 c7 4d 85 ff 0f 85 7b ff ff ff 31 db eb 08 4c 89 ff e8 16 47 01 00 48 8b 44 24 38 <45> 8b 6e 14 4d 63 76 74 48 89 04 24 0f 1f 44 00 00 48 83 c4 08
RIP: kmem_cache_alloc+0xcf/0x1c0 RSP: ffff9b1500017c28
CR2: 0000000000000014

Fixes: 7b1a74fdbb9e ("[NETNS]: Refactor fib initialization so it can handle multiple namespaces.")
Fixes: 7f9b80529b8a ("[IPV4]: fib hash|trie initialization")

Signed-off-by: Mahesh Bandewar <[email protected]>
Acked-by: "Eric W. Biederman" <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
net/ipv4/fib_frontend.c | 9 +++++----
1 file changed, 5 insertions(+), 4 deletions(-)

--- a/net/ipv4/fib_frontend.c
+++ b/net/ipv4/fib_frontend.c
@@ -1319,13 +1319,14 @@ static struct pernet_operations fib_net_

void __init ip_fib_init(void)
{
- rtnl_register(PF_INET, RTM_NEWROUTE, inet_rtm_newroute, NULL, NULL);
- rtnl_register(PF_INET, RTM_DELROUTE, inet_rtm_delroute, NULL, NULL);
- rtnl_register(PF_INET, RTM_GETROUTE, NULL, inet_dump_fib, NULL);
+ fib_trie_init();

register_pernet_subsys(&fib_net_ops);
+
register_netdevice_notifier(&fib_netdev_notifier);
register_inetaddr_notifier(&fib_inetaddr_notifier);

- fib_trie_init();
+ rtnl_register(PF_INET, RTM_NEWROUTE, inet_rtm_newroute, NULL, NULL);
+ rtnl_register(PF_INET, RTM_DELROUTE, inet_rtm_delroute, NULL, NULL);
+ rtnl_register(PF_INET, RTM_GETROUTE, NULL, inet_dump_fib, NULL);
}


2017-08-09 18:22:41

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.9 54/93] dccp: fix a memleak for dccp_feat_init err process

4.9-stable review patch. If anyone has any objections, please let me know.

------------------

From: Xin Long <[email protected]>


[ Upstream commit e90ce2fc27cad7e7b1e72b9e66201a7a4c124c2b ]

In dccp_feat_init, when ccid_get_builtin_ccids failsto alloc
memory for rx.val, it should free tx.val before returning an
error.

Signed-off-by: Xin Long <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
net/dccp/feat.c | 7 +++++--
1 file changed, 5 insertions(+), 2 deletions(-)

--- a/net/dccp/feat.c
+++ b/net/dccp/feat.c
@@ -1471,9 +1471,12 @@ int dccp_feat_init(struct sock *sk)
* singleton values (which always leads to failure).
* These settings can still (later) be overridden via sockopts.
*/
- if (ccid_get_builtin_ccids(&tx.val, &tx.len) ||
- ccid_get_builtin_ccids(&rx.val, &rx.len))
+ if (ccid_get_builtin_ccids(&tx.val, &tx.len))
return -ENOBUFS;
+ if (ccid_get_builtin_ccids(&rx.val, &rx.len)) {
+ kfree(tx.val);
+ return -ENOBUFS;
+ }

if (!dccp_feat_prefer(sysctl_dccp_tx_ccid, tx.val, tx.len) ||
!dccp_feat_prefer(sysctl_dccp_rx_ccid, rx.val, rx.len))


2017-08-09 18:23:24

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.9 75/93] netfilter: use fwmark_reflect in nf_send_reset

4.9-stable review patch. If anyone has any objections, please let me know.

------------------

From: Pau Espin Pedrol <[email protected]>


[ Upstream commit cc31d43b4154ad5a7d8aa5543255a93b7e89edc2 ]

Otherwise, RST packets generated by ipt_REJECT always have mark 0 when
the routing is checked later in the same code path.

Fixes: e110861f8609 ("net: add a sysctl to reflect the fwmark on replies")
Cc: Lorenzo Colitti <[email protected]>
Signed-off-by: Pau Espin Pedrol <[email protected]>
Signed-off-by: Pablo Neira Ayuso <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
net/ipv4/netfilter/nf_reject_ipv4.c | 2 ++
net/ipv6/netfilter/nf_reject_ipv6.c | 3 +++
2 files changed, 5 insertions(+)

--- a/net/ipv4/netfilter/nf_reject_ipv4.c
+++ b/net/ipv4/netfilter/nf_reject_ipv4.c
@@ -126,6 +126,8 @@ void nf_send_reset(struct net *net, stru
/* ip_route_me_harder expects skb->dst to be set */
skb_dst_set_noref(nskb, skb_dst(oldskb));

+ nskb->mark = IP4_REPLY_MARK(net, oldskb->mark);
+
skb_reserve(nskb, LL_MAX_HEADER);
niph = nf_reject_iphdr_put(nskb, oldskb, IPPROTO_TCP,
ip4_dst_hoplimit(skb_dst(nskb)));
--- a/net/ipv6/netfilter/nf_reject_ipv6.c
+++ b/net/ipv6/netfilter/nf_reject_ipv6.c
@@ -157,6 +157,7 @@ void nf_send_reset6(struct net *net, str
fl6.fl6_sport = otcph->dest;
fl6.fl6_dport = otcph->source;
fl6.flowi6_oif = l3mdev_master_ifindex(skb_dst(oldskb)->dev);
+ fl6.flowi6_mark = IP6_REPLY_MARK(net, oldskb->mark);
security_skb_classify_flow(oldskb, flowi6_to_flowi(&fl6));
dst = ip6_route_output(net, NULL, &fl6);
if (dst->error) {
@@ -180,6 +181,8 @@ void nf_send_reset6(struct net *net, str

skb_dst_set(nskb, dst);

+ nskb->mark = fl6.flowi6_mark;
+
skb_reserve(nskb, hh_len + dst->header_len);
ip6h = nf_reject_ip6hdr_put(nskb, oldskb, IPPROTO_TCP,
ip6_dst_hoplimit(dst));


2017-08-09 18:23:56

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.9 48/93] openvswitch: fix potential out of bound access in parse_ct

4.9-stable review patch. If anyone has any objections, please let me know.

------------------

From: Liping Zhang <[email protected]>


[ Upstream commit 69ec932e364b1ba9c3a2085fe96b76c8a3f71e7c ]

Before the 'type' is validated, we shouldn't use it to fetch the
ovs_ct_attr_lens's minlen and maxlen, else, out of bound access
may happen.

Fixes: 7f8a436eaa2c ("openvswitch: Add conntrack action")
Signed-off-by: Liping Zhang <[email protected]>
Acked-by: Pravin B Shelar <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
net/openvswitch/conntrack.c | 7 +++++--
1 file changed, 5 insertions(+), 2 deletions(-)

--- a/net/openvswitch/conntrack.c
+++ b/net/openvswitch/conntrack.c
@@ -1088,8 +1088,8 @@ static int parse_ct(const struct nlattr

nla_for_each_nested(a, attr, rem) {
int type = nla_type(a);
- int maxlen = ovs_ct_attr_lens[type].maxlen;
- int minlen = ovs_ct_attr_lens[type].minlen;
+ int maxlen;
+ int minlen;

if (type > OVS_CT_ATTR_MAX) {
OVS_NLERR(log,
@@ -1097,6 +1097,9 @@ static int parse_ct(const struct nlattr
type, OVS_CT_ATTR_MAX);
return -EINVAL;
}
+
+ maxlen = ovs_ct_attr_lens[type].maxlen;
+ minlen = ovs_ct_attr_lens[type].minlen;
if (nla_len(a) < minlen || nla_len(a) > maxlen) {
OVS_NLERR(log,
"Conntrack attr type has unexpected length (type=%d, length=%d, expected=%d)",


2017-08-09 18:24:13

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.9 71/93] tg3: Fix race condition in tg3_get_stats64().

4.9-stable review patch. If anyone has any objections, please let me know.

------------------

From: Michael Chan <[email protected]>


[ Upstream commit f5992b72ebe0dde488fa8f706b887194020c66fc ]

The driver's ndo_get_stats64() method is not always called under RTNL.
So it can race with driver close or ethtool reconfigurations. Fix the
race condition by taking tp->lock spinlock in tg3_free_consistent()
when freeing the tp->hw_stats memory block. tg3_get_stats64() is
already taking tp->lock.

Reported-by: Wang Yufen <[email protected]>
Signed-off-by: Michael Chan <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
drivers/net/ethernet/broadcom/tg3.c | 3 +++
1 file changed, 3 insertions(+)

--- a/drivers/net/ethernet/broadcom/tg3.c
+++ b/drivers/net/ethernet/broadcom/tg3.c
@@ -8720,11 +8720,14 @@ static void tg3_free_consistent(struct t
tg3_mem_rx_release(tp);
tg3_mem_tx_release(tp);

+ /* Protect tg3_get_stats64() from reading freed tp->hw_stats. */
+ tg3_full_lock(tp, 0);
if (tp->hw_stats) {
dma_free_coherent(&tp->pdev->dev, sizeof(struct tg3_hw_stats),
tp->hw_stats, tp->stats_mapping);
tp->hw_stats = NULL;
}
+ tg3_full_unlock(tp);
}

/*


2017-08-09 18:24:34

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.9 70/93] net: phy: dp83867: fix irq generation

4.9-stable review patch. If anyone has any objections, please let me know.

------------------

From: Grygorii Strashko <[email protected]>


[ Upstream commit 5ca7d1ca77dc23934504b95a96d2660d345f83c2 ]

For proper IRQ generation by DP83867 phy the INT/PWDN pin has to be
programmed as an interrupt output instead of a Powerdown input in
Configuration Register 3 (CFG3), Address 0x001E, bit 7 INT_OE = 1. The
current driver doesn't do this and as result IRQs will not be generated by
DP83867 phy even if they are properly configured in DT.

Hence, fix IRQ generation by properly configuring CFG3.INT_OE bit and
ensure that Link Status Change (LINK_STATUS_CHNG_INT) and Auto-Negotiation
Complete (AUTONEG_COMP_INT) interrupt are enabled. After this the DP83867
driver will work properly in interrupt enabled mode.

Signed-off-by: Grygorii Strashko <[email protected]>
Reviewed-by: Florian Fainelli <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
drivers/net/phy/dp83867.c | 10 ++++++++++
1 file changed, 10 insertions(+)

--- a/drivers/net/phy/dp83867.c
+++ b/drivers/net/phy/dp83867.c
@@ -29,6 +29,7 @@
#define MII_DP83867_MICR 0x12
#define MII_DP83867_ISR 0x13
#define DP83867_CTRL 0x1f
+#define DP83867_CFG3 0x1e

/* Extended Registers */
#define DP83867_RGMIICTL 0x0032
@@ -90,6 +91,8 @@ static int dp83867_config_intr(struct ph
micr_status |=
(MII_DP83867_MICR_AN_ERR_INT_EN |
MII_DP83867_MICR_SPEED_CHNG_INT_EN |
+ MII_DP83867_MICR_AUTONEG_COMP_INT_EN |
+ MII_DP83867_MICR_LINK_STS_CHNG_INT_EN |
MII_DP83867_MICR_DUP_MODE_CHNG_INT_EN |
MII_DP83867_MICR_SLEEP_MODE_CHNG_INT_EN);

@@ -190,6 +193,13 @@ static int dp83867_config_init(struct ph
DP83867_DEVADDR, delay);
}

+ /* Enable Interrupt output INT_OE in CFG3 register */
+ if (phy_interrupt_is_valid(phydev)) {
+ val = phy_read(phydev, DP83867_CFG3);
+ val |= BIT(7);
+ phy_write(phydev, DP83867_CFG3, val);
+ }
+
return 0;
}



2017-08-09 18:25:05

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.9 69/93] sh_eth: R8A7740 supports packet shecksumming

4.9-stable review patch. If anyone has any objections, please let me know.

------------------

From: Sergei Shtylyov <[email protected]>


[ Upstream commit 0f1f9cbc04dbb3cc310f70a11cba0cf1f2109d9c ]

The R8A7740 GEther controller supports the packet checksum offloading
but the 'hw_crc' (bad name, I'll fix it) flag isn't set in the R8A7740
data, thus CSMR isn't cleared...

Fixes: 73a0d907301e ("net: sh_eth: add support R8A7740")
Signed-off-by: Sergei Shtylyov <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
drivers/net/ethernet/renesas/sh_eth.c | 1 +
1 file changed, 1 insertion(+)

--- a/drivers/net/ethernet/renesas/sh_eth.c
+++ b/drivers/net/ethernet/renesas/sh_eth.c
@@ -574,6 +574,7 @@ static struct sh_eth_cpu_data r8a7740_da
.rpadir_value = 2 << 16,
.no_trimd = 1,
.no_ade = 1,
+ .hw_crc = 1,
.tsu = 1,
.select_mii = 1,
.shift_rd0 = 1,


2017-08-09 18:15:33

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.9 62/93] net: phy: Correctly process PHY_HALTED in phy_stop_machine()

4.9-stable review patch. If anyone has any objections, please let me know.

------------------

From: Florian Fainelli <[email protected]>


[ Upstream commit 7ad813f208533cebfcc32d3d7474dc1677d1b09a ]

Marc reported that he was not getting the PHY library adjust_link()
callback function to run when calling phy_stop() + phy_disconnect()
which does not indeed happen because we set the state machine to
PHY_HALTED but we don't get to run it to process this state past that
point.

Fix this with a synchronous call to phy_state_machine() in order to have
the state machine actually act on PHY_HALTED, set the PHY device's link
down, turn the network device's carrier off and finally call the
adjust_link() function.

Reported-by: Marc Gonzalez <[email protected]>
Fixes: a390d1f379cf ("phylib: convert state_queue work to delayed_work")
Signed-off-by: Florian Fainelli <[email protected]>
Signed-off-by: Marc Gonzalez <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
drivers/net/phy/phy.c | 3 +++
1 file changed, 3 insertions(+)

--- a/drivers/net/phy/phy.c
+++ b/drivers/net/phy/phy.c
@@ -674,6 +674,9 @@ void phy_stop_machine(struct phy_device
if (phydev->state > PHY_UP && phydev->state != PHY_HALTED)
phydev->state = PHY_UP;
mutex_unlock(&phydev->lock);
+
+ /* Now we can run the state machine synchronously */
+ phy_state_machine(&phydev->state_queue.work);
}

/**


2017-08-09 18:25:21

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.9 68/93] sh_eth: fix EESIPR values for SH77{34|63}

4.9-stable review patch. If anyone has any objections, please let me know.

------------------

From: Sergei Shtylyov <[email protected]>


[ Upstream commit 978d3639fd13d987950e4ce85c8737ae92154b2c ]

As the SH77{34|63} manuals are freely available, I've checked the EESIPR
values written against the manuals, and they appeared to set the reserved
bits 11-15 (which should be 0 on write). Fix those EESIPR values.

Fixes: 380af9e390ec ("net: sh_eth: CPU dependency code collect to "struct sh_eth_cpu_data"")
Fixes: f5d12767c8fd ("sh_eth: get SH77{34|63} support out of #ifdef")
Signed-off-by: Sergei Shtylyov <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
drivers/net/ethernet/renesas/sh_eth.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)

--- a/drivers/net/ethernet/renesas/sh_eth.c
+++ b/drivers/net/ethernet/renesas/sh_eth.c
@@ -802,7 +802,7 @@ static struct sh_eth_cpu_data sh7734_dat

.ecsr_value = ECSR_ICD | ECSR_MPD,
.ecsipr_value = ECSIPR_LCHNGIP | ECSIPR_ICDIP | ECSIPR_MPDIP,
- .eesipr_value = DMAC_M_RFRMER | DMAC_M_ECI | 0x003fffff,
+ .eesipr_value = DMAC_M_RFRMER | DMAC_M_ECI | 0x003f07ff,

.tx_check = EESR_TC1 | EESR_FTC,
.eesr_err_check = EESR_TWB1 | EESR_TWB | EESR_TABT | EESR_RABT |
@@ -832,7 +832,7 @@ static struct sh_eth_cpu_data sh7763_dat

.ecsr_value = ECSR_ICD | ECSR_MPD,
.ecsipr_value = ECSIPR_LCHNGIP | ECSIPR_ICDIP | ECSIPR_MPDIP,
- .eesipr_value = DMAC_M_RFRMER | DMAC_M_ECI | 0x003fffff,
+ .eesipr_value = DMAC_M_RFRMER | DMAC_M_ECI | 0x003f07ff,

.tx_check = EESR_TC1 | EESR_FTC,
.eesr_err_check = EESR_TWB1 | EESR_TWB | EESR_TABT | EESR_RABT |


2017-08-09 18:25:37

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.9 67/93] wext: handle NULL extra data in iwe_stream_add_point better

4.9-stable review patch. If anyone has any objections, please let me know.

------------------

From: Arnd Bergmann <[email protected]>

commit 93be2b74279c15c2844684b1a027fdc71dd5d9bf upstream.

gcc-7 complains that wl3501_cs passes NULL into a function that
then uses the argument as the input for memcpy:

drivers/net/wireless/wl3501_cs.c: In function 'wl3501_get_scan':
include/net/iw_handler.h:559:3: error: argument 2 null where non-null expected [-Werror=nonnull]
memcpy(stream + point_len, extra, iwe->u.data.length);

This works fine here because iwe->u.data.length is guaranteed to be 0
and the memcpy doesn't actually have an effect.

Making the length check explicit avoids the warning and should have
no other effect here.

Also check the pointer itself, since otherwise we get warnings
elsewhere in the code.

Signed-off-by: Arnd Bergmann <[email protected]>
Signed-off-by: Johannes Berg <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
include/net/iw_handler.h | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)

--- a/include/net/iw_handler.h
+++ b/include/net/iw_handler.h
@@ -556,7 +556,8 @@ iwe_stream_add_point(struct iw_request_i
memcpy(stream + lcp_len,
((char *) &iwe->u) + IW_EV_POINT_OFF,
IW_EV_POINT_PK_LEN - IW_EV_LCP_PK_LEN);
- memcpy(stream + point_len, extra, iwe->u.data.length);
+ if (iwe->u.data.length && extra)
+ memcpy(stream + point_len, extra, iwe->u.data.length);
stream += event_len;
}
return stream;


2017-08-09 18:25:59

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.9 47/93] mcs7780: Fix initialization when CONFIG_VMAP_STACK is enabled

4.9-stable review patch. If anyone has any objections, please let me know.

------------------

From: Thomas Jarosch <[email protected]>


[ Upstream commit 9476d393667968b4a02afbe9d35a3558482b943e ]

DMA transfers are not allowed to buffers that are on the stack.
Therefore allocate a buffer to store the result of usb_control_message().

Fixes these bugreports:
https://bugzilla.kernel.org/show_bug.cgi?id=195217

https://bugzilla.redhat.com/show_bug.cgi?id=1421387
https://bugzilla.redhat.com/show_bug.cgi?id=1427398

Shortened kernel backtrace from 4.11.9-200.fc25.x86_64:
kernel: ------------[ cut here ]------------
kernel: WARNING: CPU: 3 PID: 2957 at drivers/usb/core/hcd.c:1587
kernel: transfer buffer not dma capable
kernel: Call Trace:
kernel: dump_stack+0x63/0x86
kernel: __warn+0xcb/0xf0
kernel: warn_slowpath_fmt+0x5a/0x80
kernel: usb_hcd_map_urb_for_dma+0x37f/0x570
kernel: ? try_to_del_timer_sync+0x53/0x80
kernel: usb_hcd_submit_urb+0x34e/0xb90
kernel: ? schedule_timeout+0x17e/0x300
kernel: ? del_timer_sync+0x50/0x50
kernel: ? __slab_free+0xa9/0x300
kernel: usb_submit_urb+0x2f4/0x560
kernel: ? urb_destroy+0x24/0x30
kernel: usb_start_wait_urb+0x6e/0x170
kernel: usb_control_msg+0xdc/0x120
kernel: mcs_get_reg+0x36/0x40 [mcs7780]
kernel: mcs_net_open+0xb5/0x5c0 [mcs7780]
...

Regression goes back to 4.9, so it's a good candidate for -stable.
Though it's the decision of the maintainer.

Thanks to Dan Williams for adding the "transfer buffer not dma capable"
warning in the first place. It instantly pointed me in the right direction.

Patch has been tested with transferring data from a Polar watch.

Signed-off-by: Thomas Jarosch <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
drivers/net/irda/mcs7780.c | 16 +++++++++++++---
1 file changed, 13 insertions(+), 3 deletions(-)

--- a/drivers/net/irda/mcs7780.c
+++ b/drivers/net/irda/mcs7780.c
@@ -141,9 +141,19 @@ static int mcs_set_reg(struct mcs_cb *mc
static int mcs_get_reg(struct mcs_cb *mcs, __u16 reg, __u16 * val)
{
struct usb_device *dev = mcs->usbdev;
- int ret = usb_control_msg(dev, usb_rcvctrlpipe(dev, 0), MCS_RDREQ,
- MCS_RD_RTYPE, 0, reg, val, 2,
- msecs_to_jiffies(MCS_CTRL_TIMEOUT));
+ void *dmabuf;
+ int ret;
+
+ dmabuf = kmalloc(sizeof(__u16), GFP_KERNEL);
+ if (!dmabuf)
+ return -ENOMEM;
+
+ ret = usb_control_msg(dev, usb_rcvctrlpipe(dev, 0), MCS_RDREQ,
+ MCS_RD_RTYPE, 0, reg, dmabuf, 2,
+ msecs_to_jiffies(MCS_CTRL_TIMEOUT));
+
+ memcpy(val, dmabuf, sizeof(__u16));
+ kfree(dmabuf);

return ret;
}


2017-08-09 18:25:58

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.9 65/93] sparc64: Prevent perf from running during super critical sections

4.9-stable review patch. If anyone has any objections, please let me know.

------------------

From: Rob Gardner <[email protected]>


[ Upstream commit fc290a114fc6034b0f6a5a46e2fb7d54976cf87a ]

This fixes another cause of random segfaults and bus errors that may
occur while running perf with the callgraph option.

Critical sections beginning with spin_lock_irqsave() raise the interrupt
level to PIL_NORMAL_MAX (14) and intentionally do not block performance
counter interrupts, which arrive at PIL_NMI (15).

But some sections of code are "super critical" with respect to perf
because the perf_callchain_user() path accesses user space and may cause
TLB activity as well as faults as it unwinds the user stack.

One particular critical section occurs in switch_mm:

spin_lock_irqsave(&mm->context.lock, flags);
...
load_secondary_context(mm);
tsb_context_switch(mm);
...
spin_unlock_irqrestore(&mm->context.lock, flags);

If a perf interrupt arrives in between load_secondary_context() and
tsb_context_switch(), then perf_callchain_user() could execute with
the context ID of one process, but with an active TSB for a different
process. When the user stack is accessed, it is very likely to
incur a TLB miss, since the h/w context ID has been changed. The TLB
will then be reloaded with a translation from the TSB for one process,
but using a context ID for another process. This exposes memory from
one process to another, and since it is a mapping for stack memory,
this usually causes the new process to crash quickly.

This super critical section needs more protection than is provided
by spin_lock_irqsave() since perf interrupts must not be allowed in.

Since __tsb_context_switch already goes through the trouble of
disabling interrupts completely, we fix this by moving the secondary
context load down into this better protected region.

Orabug: 25577560

Signed-off-by: Dave Aldridge <[email protected]>
Signed-off-by: Rob Gardner <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
arch/sparc/include/asm/mmu_context_64.h | 12 +++++++-----
arch/sparc/kernel/tsb.S | 12 ++++++++++++
arch/sparc/power/hibernate.c | 3 +--
3 files changed, 20 insertions(+), 7 deletions(-)

--- a/arch/sparc/include/asm/mmu_context_64.h
+++ b/arch/sparc/include/asm/mmu_context_64.h
@@ -25,9 +25,11 @@ void destroy_context(struct mm_struct *m
void __tsb_context_switch(unsigned long pgd_pa,
struct tsb_config *tsb_base,
struct tsb_config *tsb_huge,
- unsigned long tsb_descr_pa);
+ unsigned long tsb_descr_pa,
+ unsigned long secondary_ctx);

-static inline void tsb_context_switch(struct mm_struct *mm)
+static inline void tsb_context_switch_ctx(struct mm_struct *mm,
+ unsigned long ctx)
{
__tsb_context_switch(__pa(mm->pgd),
&mm->context.tsb_block[0],
@@ -38,7 +40,8 @@ static inline void tsb_context_switch(st
#else
NULL
#endif
- , __pa(&mm->context.tsb_descr[0]));
+ , __pa(&mm->context.tsb_descr[0]),
+ ctx);
}

void tsb_grow(struct mm_struct *mm,
@@ -110,8 +113,7 @@ static inline void switch_mm(struct mm_s
* cpu0 to update it's TSB because at that point the cpu_vm_mask
* only had cpu1 set in it.
*/
- load_secondary_context(mm);
- tsb_context_switch(mm);
+ tsb_context_switch_ctx(mm, CTX_HWBITS(mm->context));

/* Any time a processor runs a context on an address space
* for the first time, we must flush that context out of the
--- a/arch/sparc/kernel/tsb.S
+++ b/arch/sparc/kernel/tsb.S
@@ -375,6 +375,7 @@ tsb_flush:
* %o1: TSB base config pointer
* %o2: TSB huge config pointer, or NULL if none
* %o3: Hypervisor TSB descriptor physical address
+ * %o4: Secondary context to load, if non-zero
*
* We have to run this whole thing with interrupts
* disabled so that the current cpu doesn't change
@@ -387,6 +388,17 @@ __tsb_context_switch:
rdpr %pstate, %g1
wrpr %g1, PSTATE_IE, %pstate

+ brz,pn %o4, 1f
+ mov SECONDARY_CONTEXT, %o5
+
+661: stxa %o4, [%o5] ASI_DMMU
+ .section .sun4v_1insn_patch, "ax"
+ .word 661b
+ stxa %o4, [%o5] ASI_MMU
+ .previous
+ flush %g6
+
+1:
TRAP_LOAD_TRAP_BLOCK(%g2, %g3)

stx %o0, [%g2 + TRAP_PER_CPU_PGD_PADDR]
--- a/arch/sparc/power/hibernate.c
+++ b/arch/sparc/power/hibernate.c
@@ -35,6 +35,5 @@ void restore_processor_state(void)
{
struct mm_struct *mm = current->active_mm;

- load_secondary_context(mm);
- tsb_context_switch(mm);
+ tsb_context_switch_ctx(mm, CTX_HWBITS(mm->context));
}


2017-08-09 18:26:42

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.9 64/93] sparc64: Measure receiver forward progress to avoid send mondo timeout

4.9-stable review patch. If anyone has any objections, please let me know.

------------------

From: Jane Chu <[email protected]>


[ Upstream commit 9d53caec84c7c5700e7c1ed744ea584fff55f9ac ]

A large sun4v SPARC system may have moments of intensive xcall activities,
usually caused by unmapping many pages on many CPUs concurrently. This can
flood receivers with CPU mondo interrupts for an extended period, causing
some unlucky senders to hit send-mondo timeout. This problem gets worse
as cpu count increases because sometimes mappings must be invalidated on
all CPUs, and sometimes all CPUs may gang up on a single CPU.

But a busy system is not a broken system. In the above scenario, as long
as the receiver is making forward progress processing mondo interrupts,
the sender should continue to retry.

This patch implements the receiver's forward progress meter by introducing
a per cpu counter 'cpu_mondo_counter[cpu]' where 'cpu' is in the range
of 0..NR_CPUS. The receiver increments its counter as soon as it receives
a mondo and the sender tracks the receiver's counter. If the receiver has
stopped making forward progress when the retry limit is reached, the sender
declares send-mondo-timeout and panic; otherwise, the receiver is allowed
to keep making forward progress.

In addition, it's been observed that PCIe hotplug events generate Correctable
Errors that are handled by hypervisor and then OS. Hypervisor 'borrows'
a guest cpu strand briefly to provide the service. If the cpu strand is
simultaneously the only cpu targeted by a mondo, it may not be available
for the mondo in 20msec, causing SUN4V mondo timeout. It appears that 1 second
is the agreed wait time between hypervisor and guest OS, this patch makes
the adjustment.

Orabug: 25476541
Orabug: 26417466

Signed-off-by: Jane Chu <[email protected]>
Reviewed-by: Steve Sistare <[email protected]>
Reviewed-by: Anthony Yznaga <[email protected]>
Reviewed-by: Rob Gardner <[email protected]>
Reviewed-by: Thomas Tai <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
arch/sparc/include/asm/trap_block.h | 1
arch/sparc/kernel/smp_64.c | 189 ++++++++++++++++++++++--------------
arch/sparc/kernel/sun4v_ivec.S | 15 ++
arch/sparc/kernel/traps_64.c | 1
4 files changed, 134 insertions(+), 72 deletions(-)

--- a/arch/sparc/include/asm/trap_block.h
+++ b/arch/sparc/include/asm/trap_block.h
@@ -54,6 +54,7 @@ extern struct trap_per_cpu trap_block[NR
void init_cur_cpu_trap(struct thread_info *);
void setup_tba(void);
extern int ncpus_probed;
+extern u64 cpu_mondo_counter[NR_CPUS];

unsigned long real_hard_smp_processor_id(void);

--- a/arch/sparc/kernel/smp_64.c
+++ b/arch/sparc/kernel/smp_64.c
@@ -621,22 +621,48 @@ retry:
}
}

-/* Multi-cpu list version. */
+#define CPU_MONDO_COUNTER(cpuid) (cpu_mondo_counter[cpuid])
+#define MONDO_USEC_WAIT_MIN 2
+#define MONDO_USEC_WAIT_MAX 100
+#define MONDO_RETRY_LIMIT 500000
+
+/* Multi-cpu list version.
+ *
+ * Deliver xcalls to 'cnt' number of cpus in 'cpu_list'.
+ * Sometimes not all cpus receive the mondo, requiring us to re-send
+ * the mondo until all cpus have received, or cpus are truly stuck
+ * unable to receive mondo, and we timeout.
+ * Occasionally a target cpu strand is borrowed briefly by hypervisor to
+ * perform guest service, such as PCIe error handling. Consider the
+ * service time, 1 second overall wait is reasonable for 1 cpu.
+ * Here two in-between mondo check wait time are defined: 2 usec for
+ * single cpu quick turn around and up to 100usec for large cpu count.
+ * Deliver mondo to large number of cpus could take longer, we adjusts
+ * the retry count as long as target cpus are making forward progress.
+ */
static void hypervisor_xcall_deliver(struct trap_per_cpu *tb, int cnt)
{
- int retries, this_cpu, prev_sent, i, saw_cpu_error;
+ int this_cpu, tot_cpus, prev_sent, i, rem;
+ int usec_wait, retries, tot_retries;
+ u16 first_cpu = 0xffff;
+ unsigned long xc_rcvd = 0;
unsigned long status;
+ int ecpuerror_id = 0;
+ int enocpu_id = 0;
u16 *cpu_list;
+ u16 cpu;

this_cpu = smp_processor_id();
-
cpu_list = __va(tb->cpu_list_pa);
-
- saw_cpu_error = 0;
- retries = 0;
+ usec_wait = cnt * MONDO_USEC_WAIT_MIN;
+ if (usec_wait > MONDO_USEC_WAIT_MAX)
+ usec_wait = MONDO_USEC_WAIT_MAX;
+ retries = tot_retries = 0;
+ tot_cpus = cnt;
prev_sent = 0;
+
do {
- int forward_progress, n_sent;
+ int n_sent, mondo_delivered, target_cpu_busy;

status = sun4v_cpu_mondo_send(cnt,
tb->cpu_list_pa,
@@ -644,94 +670,113 @@ static void hypervisor_xcall_deliver(str

/* HV_EOK means all cpus received the xcall, we're done. */
if (likely(status == HV_EOK))
- break;
+ goto xcall_done;
+
+ /* If not these non-fatal errors, panic */
+ if (unlikely((status != HV_EWOULDBLOCK) &&
+ (status != HV_ECPUERROR) &&
+ (status != HV_ENOCPU)))
+ goto fatal_errors;

/* First, see if we made any forward progress.
*
+ * Go through the cpu_list, count the target cpus that have
+ * received our mondo (n_sent), and those that did not (rem).
+ * Re-pack cpu_list with the cpus remain to be retried in the
+ * front - this simplifies tracking the truly stalled cpus.
+ *
* The hypervisor indicates successful sends by setting
* cpu list entries to the value 0xffff.
+ *
+ * EWOULDBLOCK means some target cpus did not receive the
+ * mondo and retry usually helps.
+ *
+ * ECPUERROR means at least one target cpu is in error state,
+ * it's usually safe to skip the faulty cpu and retry.
+ *
+ * ENOCPU means one of the target cpu doesn't belong to the
+ * domain, perhaps offlined which is unexpected, but not
+ * fatal and it's okay to skip the offlined cpu.
*/
+ rem = 0;
n_sent = 0;
for (i = 0; i < cnt; i++) {
- if (likely(cpu_list[i] == 0xffff))
+ cpu = cpu_list[i];
+ if (likely(cpu == 0xffff)) {
n_sent++;
+ } else if ((status == HV_ECPUERROR) &&
+ (sun4v_cpu_state(cpu) == HV_CPU_STATE_ERROR)) {
+ ecpuerror_id = cpu + 1;
+ } else if (status == HV_ENOCPU && !cpu_online(cpu)) {
+ enocpu_id = cpu + 1;
+ } else {
+ cpu_list[rem++] = cpu;
+ }
}

- forward_progress = 0;
- if (n_sent > prev_sent)
- forward_progress = 1;
+ /* No cpu remained, we're done. */
+ if (rem == 0)
+ break;

- prev_sent = n_sent;
+ /* Otherwise, update the cpu count for retry. */
+ cnt = rem;

- /* If we get a HV_ECPUERROR, then one or more of the cpus
- * in the list are in error state. Use the cpu_state()
- * hypervisor call to find out which cpus are in error state.
+ /* Record the overall number of mondos received by the
+ * first of the remaining cpus.
*/
- if (unlikely(status == HV_ECPUERROR)) {
- for (i = 0; i < cnt; i++) {
- long err;
- u16 cpu;
-
- cpu = cpu_list[i];
- if (cpu == 0xffff)
- continue;
-
- err = sun4v_cpu_state(cpu);
- if (err == HV_CPU_STATE_ERROR) {
- saw_cpu_error = (cpu + 1);
- cpu_list[i] = 0xffff;
- }
- }
- } else if (unlikely(status != HV_EWOULDBLOCK))
- goto fatal_mondo_error;
+ if (first_cpu != cpu_list[0]) {
+ first_cpu = cpu_list[0];
+ xc_rcvd = CPU_MONDO_COUNTER(first_cpu);
+ }

- /* Don't bother rewriting the CPU list, just leave the
- * 0xffff and non-0xffff entries in there and the
- * hypervisor will do the right thing.
- *
- * Only advance timeout state if we didn't make any
- * forward progress.
+ /* Was any mondo delivered successfully? */
+ mondo_delivered = (n_sent > prev_sent);
+ prev_sent = n_sent;
+
+ /* or, was any target cpu busy processing other mondos? */
+ target_cpu_busy = (xc_rcvd < CPU_MONDO_COUNTER(first_cpu));
+ xc_rcvd = CPU_MONDO_COUNTER(first_cpu);
+
+ /* Retry count is for no progress. If we're making progress,
+ * reset the retry count.
*/
- if (unlikely(!forward_progress)) {
- if (unlikely(++retries > 10000))
- goto fatal_mondo_timeout;
-
- /* Delay a little bit to let other cpus catch up
- * on their cpu mondo queue work.
- */
- udelay(2 * cnt);
+ if (likely(mondo_delivered || target_cpu_busy)) {
+ tot_retries += retries;
+ retries = 0;
+ } else if (unlikely(retries > MONDO_RETRY_LIMIT)) {
+ goto fatal_mondo_timeout;
}
- } while (1);

- if (unlikely(saw_cpu_error))
- goto fatal_mondo_cpu_error;
+ /* Delay a little bit to let other cpus catch up on
+ * their cpu mondo queue work.
+ */
+ if (!mondo_delivered)
+ udelay(usec_wait);

- return;
+ retries++;
+ } while (1);

-fatal_mondo_cpu_error:
- printk(KERN_CRIT "CPU[%d]: SUN4V mondo cpu error, some target cpus "
- "(including %d) were in error state\n",
- this_cpu, saw_cpu_error - 1);
+xcall_done:
+ if (unlikely(ecpuerror_id > 0)) {
+ pr_crit("CPU[%d]: SUN4V mondo cpu error, target cpu(%d) was in error state\n",
+ this_cpu, ecpuerror_id - 1);
+ } else if (unlikely(enocpu_id > 0)) {
+ pr_crit("CPU[%d]: SUN4V mondo cpu error, target cpu(%d) does not belong to the domain\n",
+ this_cpu, enocpu_id - 1);
+ }
return;

+fatal_errors:
+ /* fatal errors include bad alignment, etc */
+ pr_crit("CPU[%d]: Args were cnt(%d) cpulist_pa(%lx) mondo_block_pa(%lx)\n",
+ this_cpu, tot_cpus, tb->cpu_list_pa, tb->cpu_mondo_block_pa);
+ panic("Unexpected SUN4V mondo error %lu\n", status);
+
fatal_mondo_timeout:
- printk(KERN_CRIT "CPU[%d]: SUN4V mondo timeout, no forward "
- " progress after %d retries.\n",
- this_cpu, retries);
- goto dump_cpu_list_and_out;
-
-fatal_mondo_error:
- printk(KERN_CRIT "CPU[%d]: Unexpected SUN4V mondo error %lu\n",
- this_cpu, status);
- printk(KERN_CRIT "CPU[%d]: Args were cnt(%d) cpulist_pa(%lx) "
- "mondo_block_pa(%lx)\n",
- this_cpu, cnt, tb->cpu_list_pa, tb->cpu_mondo_block_pa);
-
-dump_cpu_list_and_out:
- printk(KERN_CRIT "CPU[%d]: CPU list [ ", this_cpu);
- for (i = 0; i < cnt; i++)
- printk("%u ", cpu_list[i]);
- printk("]\n");
+ /* some cpus being non-responsive to the cpu mondo */
+ pr_crit("CPU[%d]: SUN4V mondo timeout, cpu(%d) made no forward progress after %d retries. Total target cpus(%d).\n",
+ this_cpu, first_cpu, (tot_retries + retries), tot_cpus);
+ panic("SUN4V mondo timeout panic\n");
}

static void (*xcall_deliver_impl)(struct trap_per_cpu *, int);
--- a/arch/sparc/kernel/sun4v_ivec.S
+++ b/arch/sparc/kernel/sun4v_ivec.S
@@ -26,6 +26,21 @@ sun4v_cpu_mondo:
ldxa [%g0] ASI_SCRATCHPAD, %g4
sub %g4, TRAP_PER_CPU_FAULT_INFO, %g4

+ /* Get smp_processor_id() into %g3 */
+ sethi %hi(trap_block), %g5
+ or %g5, %lo(trap_block), %g5
+ sub %g4, %g5, %g3
+ srlx %g3, TRAP_BLOCK_SZ_SHIFT, %g3
+
+ /* Increment cpu_mondo_counter[smp_processor_id()] */
+ sethi %hi(cpu_mondo_counter), %g5
+ or %g5, %lo(cpu_mondo_counter), %g5
+ sllx %g3, 3, %g3
+ add %g5, %g3, %g5
+ ldx [%g5], %g3
+ add %g3, 1, %g3
+ stx %g3, [%g5]
+
/* Get CPU mondo queue base phys address into %g7. */
ldx [%g4 + TRAP_PER_CPU_CPU_MONDO_PA], %g7

--- a/arch/sparc/kernel/traps_64.c
+++ b/arch/sparc/kernel/traps_64.c
@@ -2732,6 +2732,7 @@ void do_getpsr(struct pt_regs *regs)
}
}

+u64 cpu_mondo_counter[NR_CPUS] = {0};
struct trap_per_cpu trap_block[NR_CPUS];
EXPORT_SYMBOL(trap_block);



2017-08-09 18:26:47

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.9 28/93] media: platform: davinci: return -EINVAL for VPFE_CMD_S_CCDC_RAW_PARAMS ioctl

4.9-stable review patch. If anyone has any objections, please let me know.

------------------

From: Prabhakar Lad <[email protected]>

commit da05d52d2f0f6bd61094a0cd045fed94bf7d673a upstream.

this patch makes sure VPFE_CMD_S_CCDC_RAW_PARAMS ioctl no longer works
for vpfe_capture driver with a minimal patch suitable for backporting.

- This ioctl was never in public api and was only defined in kernel header.
- The function set_params constantly mixes up pointers and phys_addr_t
numbers.
- This is part of a 'VPFE_CMD_S_CCDC_RAW_PARAMS' ioctl command that is
described as an 'experimental ioctl that will change in future kernels'.
- The code to allocate the table never gets called after we copy_from_user
the user input over the kernel settings, and then compare them
for inequality.
- We then go on to use an address provided by user space as both the
__user pointer for input and pass it through phys_to_virt to come up
with a kernel pointer to copy the data to. This looks like a trivially
exploitable root hole.

Due to these reasons we make sure this ioctl now returns -EINVAL and backport
this patch as far as possible.

Fixes: 5f15fbb68fd7 ("V4L/DVB (12251): v4l: dm644x ccdc module for vpfe capture driver")

Signed-off-by: Lad, Prabhakar <[email protected]>
Signed-off-by: Hans Verkuil <[email protected]>
Signed-off-by: Mauro Carvalho Chehab <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
drivers/media/platform/davinci/vpfe_capture.c | 22 ++--------------------
1 file changed, 2 insertions(+), 20 deletions(-)

--- a/drivers/media/platform/davinci/vpfe_capture.c
+++ b/drivers/media/platform/davinci/vpfe_capture.c
@@ -1725,27 +1725,9 @@ static long vpfe_param_handler(struct fi

switch (cmd) {
case VPFE_CMD_S_CCDC_RAW_PARAMS:
+ ret = -EINVAL;
v4l2_warn(&vpfe_dev->v4l2_dev,
- "VPFE_CMD_S_CCDC_RAW_PARAMS: experimental ioctl\n");
- if (ccdc_dev->hw_ops.set_params) {
- ret = ccdc_dev->hw_ops.set_params(param);
- if (ret) {
- v4l2_dbg(1, debug, &vpfe_dev->v4l2_dev,
- "Error setting parameters in CCDC\n");
- goto unlock_out;
- }
- ret = vpfe_get_ccdc_image_format(vpfe_dev,
- &vpfe_dev->fmt);
- if (ret < 0) {
- v4l2_dbg(1, debug, &vpfe_dev->v4l2_dev,
- "Invalid image format at CCDC\n");
- goto unlock_out;
- }
- } else {
- ret = -EINVAL;
- v4l2_dbg(1, debug, &vpfe_dev->v4l2_dev,
- "VPFE_CMD_S_CCDC_RAW_PARAMS not supported\n");
- }
+ "VPFE_CMD_S_CCDC_RAW_PARAMS not supported\n");
break;
default:
ret = -ENOTTY;


2017-08-09 18:27:19

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.9 63/93] xen-netback: correctly schedule rate-limited queues

4.9-stable review patch. If anyone has any objections, please let me know.

------------------

From: Wei Liu <[email protected]>


[ Upstream commit dfa523ae9f2542bee4cddaea37b3be3e157f6e6b ]

Add a flag to indicate if a queue is rate-limited. Test the flag in
NAPI poll handler and avoid rescheduling the queue if true, otherwise
we risk locking up the host. The rescheduling will be done in the
timer callback function.

Reported-by: Jean-Louis Dupond <[email protected]>
Signed-off-by: Wei Liu <[email protected]>
Tested-by: Jean-Louis Dupond <[email protected]>
Reviewed-by: Paul Durrant <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
drivers/net/xen-netback/common.h | 1 +
drivers/net/xen-netback/interface.c | 6 +++++-
drivers/net/xen-netback/netback.c | 6 +++++-
3 files changed, 11 insertions(+), 2 deletions(-)

--- a/drivers/net/xen-netback/common.h
+++ b/drivers/net/xen-netback/common.h
@@ -199,6 +199,7 @@ struct xenvif_queue { /* Per-queue data
unsigned long remaining_credit;
struct timer_list credit_timeout;
u64 credit_window_start;
+ bool rate_limited;

/* Statistics */
struct xenvif_stats stats;
--- a/drivers/net/xen-netback/interface.c
+++ b/drivers/net/xen-netback/interface.c
@@ -105,7 +105,11 @@ static int xenvif_poll(struct napi_struc

if (work_done < budget) {
napi_complete(napi);
- xenvif_napi_schedule_or_enable_events(queue);
+ /* If the queue is rate-limited, it shall be
+ * rescheduled in the timer callback.
+ */
+ if (likely(!queue->rate_limited))
+ xenvif_napi_schedule_or_enable_events(queue);
}

return work_done;
--- a/drivers/net/xen-netback/netback.c
+++ b/drivers/net/xen-netback/netback.c
@@ -179,6 +179,7 @@ static void tx_add_credit(struct xenvif_
max_credit = ULONG_MAX; /* wrapped: clamp to ULONG_MAX */

queue->remaining_credit = min(max_credit, max_burst);
+ queue->rate_limited = false;
}

void xenvif_tx_credit_callback(unsigned long data)
@@ -685,8 +686,10 @@ static bool tx_credit_exceeded(struct xe
msecs_to_jiffies(queue->credit_usec / 1000);

/* Timer could already be pending in rare cases. */
- if (timer_pending(&queue->credit_timeout))
+ if (timer_pending(&queue->credit_timeout)) {
+ queue->rate_limited = true;
return true;
+ }

/* Passed the point where we can replenish credit? */
if (time_after_eq64(now, next_credit)) {
@@ -701,6 +704,7 @@ static bool tx_credit_exceeded(struct xe
mod_timer(&queue->credit_timeout,
next_credit);
queue->credit_window_start = next_credit;
+ queue->rate_limited = true;

return true;
}


2017-08-09 18:15:29

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.9 55/93] sctp: dont dereference ptr before leaving _sctp_walk_{params, errors}()

4.9-stable review patch. If anyone has any objections, please let me know.

------------------

From: Alexander Potapenko <[email protected]>


[ Upstream commit b1f5bfc27a19f214006b9b4db7b9126df2dfdf5a ]

If the length field of the iterator (|pos.p| or |err|) is past the end
of the chunk, we shouldn't access it.

This bug has been detected by KMSAN. For the following pair of system
calls:

socket(PF_INET6, SOCK_STREAM, 0x84 /* IPPROTO_??? */) = 3
sendto(3, "A", 1, MSG_OOB, {sa_family=AF_INET6, sin6_port=htons(0),
inet_pton(AF_INET6, "::1", &sin6_addr), sin6_flowinfo=0,
sin6_scope_id=0}, 28) = 1

the tool has reported a use of uninitialized memory:

==================================================================
BUG: KMSAN: use of uninitialized memory in sctp_rcv+0x17b8/0x43b0
CPU: 1 PID: 2940 Comm: probe Not tainted 4.11.0-rc5+ #2926
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs
01/01/2011
Call Trace:
<IRQ>
__dump_stack lib/dump_stack.c:16
dump_stack+0x172/0x1c0 lib/dump_stack.c:52
kmsan_report+0x12a/0x180 mm/kmsan/kmsan.c:927
__msan_warning_32+0x61/0xb0 mm/kmsan/kmsan_instr.c:469
__sctp_rcv_init_lookup net/sctp/input.c:1074
__sctp_rcv_lookup_harder net/sctp/input.c:1233
__sctp_rcv_lookup net/sctp/input.c:1255
sctp_rcv+0x17b8/0x43b0 net/sctp/input.c:170
sctp6_rcv+0x32/0x70 net/sctp/ipv6.c:984
ip6_input_finish+0x82f/0x1ee0 net/ipv6/ip6_input.c:279
NF_HOOK ./include/linux/netfilter.h:257
ip6_input+0x239/0x290 net/ipv6/ip6_input.c:322
dst_input ./include/net/dst.h:492
ip6_rcv_finish net/ipv6/ip6_input.c:69
NF_HOOK ./include/linux/netfilter.h:257
ipv6_rcv+0x1dbd/0x22e0 net/ipv6/ip6_input.c:203
__netif_receive_skb_core+0x2f6f/0x3a20 net/core/dev.c:4208
__netif_receive_skb net/core/dev.c:4246
process_backlog+0x667/0xba0 net/core/dev.c:4866
napi_poll net/core/dev.c:5268
net_rx_action+0xc95/0x1590 net/core/dev.c:5333
__do_softirq+0x485/0x942 kernel/softirq.c:284
do_softirq_own_stack+0x1c/0x30 arch/x86/entry/entry_64.S:902
</IRQ>
do_softirq kernel/softirq.c:328
__local_bh_enable_ip+0x25b/0x290 kernel/softirq.c:181
local_bh_enable+0x37/0x40 ./include/linux/bottom_half.h:31
rcu_read_unlock_bh ./include/linux/rcupdate.h:931
ip6_finish_output2+0x19b2/0x1cf0 net/ipv6/ip6_output.c:124
ip6_finish_output+0x764/0x970 net/ipv6/ip6_output.c:149
NF_HOOK_COND ./include/linux/netfilter.h:246
ip6_output+0x456/0x520 net/ipv6/ip6_output.c:163
dst_output ./include/net/dst.h:486
NF_HOOK ./include/linux/netfilter.h:257
ip6_xmit+0x1841/0x1c00 net/ipv6/ip6_output.c:261
sctp_v6_xmit+0x3b7/0x470 net/sctp/ipv6.c:225
sctp_packet_transmit+0x38cb/0x3a20 net/sctp/output.c:632
sctp_outq_flush+0xeb3/0x46e0 net/sctp/outqueue.c:885
sctp_outq_uncork+0xb2/0xd0 net/sctp/outqueue.c:750
sctp_side_effects net/sctp/sm_sideeffect.c:1773
sctp_do_sm+0x6962/0x6ec0 net/sctp/sm_sideeffect.c:1147
sctp_primitive_ASSOCIATE+0x12c/0x160 net/sctp/primitive.c:88
sctp_sendmsg+0x43e5/0x4f90 net/sctp/socket.c:1954
inet_sendmsg+0x498/0x670 net/ipv4/af_inet.c:762
sock_sendmsg_nosec net/socket.c:633
sock_sendmsg net/socket.c:643
SYSC_sendto+0x608/0x710 net/socket.c:1696
SyS_sendto+0x8a/0xb0 net/socket.c:1664
do_syscall_64+0xe6/0x130 arch/x86/entry/common.c:285
entry_SYSCALL64_slow_path+0x25/0x25 arch/x86/entry/entry_64.S:246
RIP: 0033:0x401133
RSP: 002b:00007fff6d99cd38 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
RAX: ffffffffffffffda RBX: 00000000004002b0 RCX: 0000000000401133
RDX: 0000000000000001 RSI: 0000000000494088 RDI: 0000000000000003
RBP: 00007fff6d99cd90 R08: 00007fff6d99cd50 R09: 000000000000001c
R10: 0000000000000001 R11: 0000000000000246 R12: 0000000000000000
R13: 00000000004063d0 R14: 0000000000406460 R15: 0000000000000000
origin:
save_stack_trace+0x37/0x40 arch/x86/kernel/stacktrace.c:59
kmsan_save_stack_with_flags mm/kmsan/kmsan.c:302
kmsan_internal_poison_shadow+0xb1/0x1a0 mm/kmsan/kmsan.c:198
kmsan_poison_shadow+0x6d/0xc0 mm/kmsan/kmsan.c:211
slab_alloc_node mm/slub.c:2743
__kmalloc_node_track_caller+0x200/0x360 mm/slub.c:4351
__kmalloc_reserve net/core/skbuff.c:138
__alloc_skb+0x26b/0x840 net/core/skbuff.c:231
alloc_skb ./include/linux/skbuff.h:933
sctp_packet_transmit+0x31e/0x3a20 net/sctp/output.c:570
sctp_outq_flush+0xeb3/0x46e0 net/sctp/outqueue.c:885
sctp_outq_uncork+0xb2/0xd0 net/sctp/outqueue.c:750
sctp_side_effects net/sctp/sm_sideeffect.c:1773
sctp_do_sm+0x6962/0x6ec0 net/sctp/sm_sideeffect.c:1147
sctp_primitive_ASSOCIATE+0x12c/0x160 net/sctp/primitive.c:88
sctp_sendmsg+0x43e5/0x4f90 net/sctp/socket.c:1954
inet_sendmsg+0x498/0x670 net/ipv4/af_inet.c:762
sock_sendmsg_nosec net/socket.c:633
sock_sendmsg net/socket.c:643
SYSC_sendto+0x608/0x710 net/socket.c:1696
SyS_sendto+0x8a/0xb0 net/socket.c:1664
do_syscall_64+0xe6/0x130 arch/x86/entry/common.c:285
return_from_SYSCALL_64+0x0/0x6a arch/x86/entry/entry_64.S:246
==================================================================

Signed-off-by: Alexander Potapenko <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
include/net/sctp/sctp.h | 4 ++++
1 file changed, 4 insertions(+)

--- a/include/net/sctp/sctp.h
+++ b/include/net/sctp/sctp.h
@@ -460,6 +460,8 @@ _sctp_walk_params((pos), (chunk), ntohs(

#define _sctp_walk_params(pos, chunk, end, member)\
for (pos.v = chunk->member;\
+ (pos.v + offsetof(struct sctp_paramhdr, length) + sizeof(pos.p->length) <\
+ (void *)chunk + end) &&\
pos.v <= (void *)chunk + end - ntohs(pos.p->length) &&\
ntohs(pos.p->length) >= sizeof(sctp_paramhdr_t);\
pos.v += SCTP_PAD4(ntohs(pos.p->length)))
@@ -470,6 +472,8 @@ _sctp_walk_errors((err), (chunk_hdr), nt
#define _sctp_walk_errors(err, chunk_hdr, end)\
for (err = (sctp_errhdr_t *)((void *)chunk_hdr + \
sizeof(sctp_chunkhdr_t));\
+ ((void *)err + offsetof(sctp_errhdr_t, length) + sizeof(err->length) <\
+ (void *)chunk_hdr + end) &&\
(void *)err <= (void *)chunk_hdr + end - ntohs(err->length) &&\
ntohs(err->length) >= sizeof(sctp_errhdr_t); \
err = (sctp_errhdr_t *)((void *)err + SCTP_PAD4(ntohs(err->length))))


2017-08-09 18:27:54

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.9 61/93] net/mlx5e: Schedule overflow check work to mlx5e workqueue

4.9-stable review patch. If anyone has any objections, please let me know.

------------------

From: Eugenia Emantayev <[email protected]>


[ Upstream commit f08c39ed0bfb503c7b3e013cd40d036ce6a0941a ]

This is done in order to ensure that work will not run after the cleanup.

Fixes: ef9814deafd0 ('net/mlx5e: Add HW timestamping (TS) support')
Signed-off-by: Eugenia Emantayev <[email protected]>
Signed-off-by: Saeed Mahameed <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
drivers/net/ethernet/mellanox/mlx5/core/en_clock.c | 7 ++++---
1 file changed, 4 insertions(+), 3 deletions(-)

--- a/drivers/net/ethernet/mellanox/mlx5/core/en_clock.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_clock.c
@@ -62,13 +62,14 @@ static void mlx5e_timestamp_overflow(str
struct delayed_work *dwork = to_delayed_work(work);
struct mlx5e_tstamp *tstamp = container_of(dwork, struct mlx5e_tstamp,
overflow_work);
+ struct mlx5e_priv *priv = container_of(tstamp, struct mlx5e_priv, tstamp);
unsigned long flags;

write_lock_irqsave(&tstamp->lock, flags);
timecounter_read(&tstamp->clock);
write_unlock_irqrestore(&tstamp->lock, flags);
- schedule_delayed_work(&tstamp->overflow_work,
- msecs_to_jiffies(tstamp->overflow_period * 1000));
+ queue_delayed_work(priv->wq, &tstamp->overflow_work,
+ msecs_to_jiffies(tstamp->overflow_period * 1000));
}

int mlx5e_hwstamp_set(struct net_device *dev, struct ifreq *ifr)
@@ -264,7 +265,7 @@ void mlx5e_timestamp_init(struct mlx5e_p

INIT_DELAYED_WORK(&tstamp->overflow_work, mlx5e_timestamp_overflow);
if (tstamp->overflow_period)
- schedule_delayed_work(&tstamp->overflow_work, 0);
+ queue_delayed_work(priv->wq, &tstamp->overflow_work, 0);
else
mlx5_core_warn(priv->mdev, "invalid overflow period, overflow_work is not scheduled\n");



2017-08-09 18:28:15

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.9 60/93] net/mlx5e: Fix wrong delay calculation for overflow check scheduling

4.9-stable review patch. If anyone has any objections, please let me know.

------------------

From: Eugenia Emantayev <[email protected]>


[ Upstream commit d439c84509a510e864fdc6166c760482cd03fc57 ]

The overflow_period is calculated in seconds. In order to use it
for delayed work scheduling translation to jiffies is needed.

Fixes: ef9814deafd0 ('net/mlx5e: Add HW timestamping (TS) support')
Signed-off-by: Eugenia Emantayev <[email protected]>
Signed-off-by: Saeed Mahameed <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
drivers/net/ethernet/mellanox/mlx5/core/en_clock.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)

--- a/drivers/net/ethernet/mellanox/mlx5/core/en_clock.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_clock.c
@@ -67,7 +67,8 @@ static void mlx5e_timestamp_overflow(str
write_lock_irqsave(&tstamp->lock, flags);
timecounter_read(&tstamp->clock);
write_unlock_irqrestore(&tstamp->lock, flags);
- schedule_delayed_work(&tstamp->overflow_work, tstamp->overflow_period);
+ schedule_delayed_work(&tstamp->overflow_work,
+ msecs_to_jiffies(tstamp->overflow_period * 1000));
}

int mlx5e_hwstamp_set(struct net_device *dev, struct ifreq *ifr)


2017-08-09 18:28:31

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.9 59/93] net/mlx5e: Fix outer_header_zero() check size

4.9-stable review patch. If anyone has any objections, please let me know.

------------------

From: Ilan Tayari <[email protected]>


[ Upstream commit 0242f4a0bb03906010bbf80495512be00494a0ef ]

outer_header_zero() routine checks if the outer_headers match of a
flow-table entry are all zero.

This function uses the size of whole fte_match_param, instead of just
the outer_headers member, causing failure to detect all-zeros if
any other members of the fte_match_param are non-zero.

Use the correct size for zero check.

Fixes: 6dc6071cfcde ("net/mlx5e: Add ethtool flow steering support")
Signed-off-by: Ilan Tayari <[email protected]>
Signed-off-by: Saeed Mahameed <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
drivers/net/ethernet/mellanox/mlx5/core/en_fs_ethtool.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

--- a/drivers/net/ethernet/mellanox/mlx5/core/en_fs_ethtool.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_fs_ethtool.c
@@ -276,7 +276,7 @@ static void add_rule_to_list(struct mlx5

static bool outer_header_zero(u32 *match_criteria)
{
- int size = MLX5_ST_SZ_BYTES(fte_match_param);
+ int size = MLX5_FLD_SZ_BYTES(fte_match_param, outer_headers);
char *outer_headers_c = MLX5_ADDR_OF(fte_match_param, match_criteria,
outer_headers);



2017-08-09 18:15:27

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.9 56/93] sctp: fix the check for _sctp_walk_params and _sctp_walk_errors

4.9-stable review patch. If anyone has any objections, please let me know.

------------------

From: Xin Long <[email protected]>


[ Upstream commit 6b84202c946cd3da3a8daa92c682510e9ed80321 ]

Commit b1f5bfc27a19 ("sctp: don't dereference ptr before leaving
_sctp_walk_{params, errors}()") tried to fix the issue that it
may overstep the chunk end for _sctp_walk_{params, errors} with
'chunk_end > offset(length) + sizeof(length)'.

But it introduced a side effect: When processing INIT, it verifies
the chunks with 'param.v == chunk_end' after iterating all params
by sctp_walk_params(). With the check 'chunk_end > offset(length)
+ sizeof(length)', it would return when the last param is not yet
accessed. Because the last param usually is fwdtsn supported param
whose size is 4 and 'chunk_end == offset(length) + sizeof(length)'

This is a badly issue even causing sctp couldn't process 4-shakes.
Client would always get abort when connecting to server, due to
the failure of INIT chunk verification on server.

The patch is to use 'chunk_end <= offset(length) + sizeof(length)'
instead of 'chunk_end < offset(length) + sizeof(length)' for both
_sctp_walk_params and _sctp_walk_errors.

Fixes: b1f5bfc27a19 ("sctp: don't dereference ptr before leaving _sctp_walk_{params, errors}()")
Signed-off-by: Xin Long <[email protected]>
Acked-by: Neil Horman <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
include/net/sctp/sctp.h | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)

--- a/include/net/sctp/sctp.h
+++ b/include/net/sctp/sctp.h
@@ -460,7 +460,7 @@ _sctp_walk_params((pos), (chunk), ntohs(

#define _sctp_walk_params(pos, chunk, end, member)\
for (pos.v = chunk->member;\
- (pos.v + offsetof(struct sctp_paramhdr, length) + sizeof(pos.p->length) <\
+ (pos.v + offsetof(struct sctp_paramhdr, length) + sizeof(pos.p->length) <=\
(void *)chunk + end) &&\
pos.v <= (void *)chunk + end - ntohs(pos.p->length) &&\
ntohs(pos.p->length) >= sizeof(sctp_paramhdr_t);\
@@ -472,7 +472,7 @@ _sctp_walk_errors((err), (chunk_hdr), nt
#define _sctp_walk_errors(err, chunk_hdr, end)\
for (err = (sctp_errhdr_t *)((void *)chunk_hdr + \
sizeof(sctp_chunkhdr_t));\
- ((void *)err + offsetof(sctp_errhdr_t, length) + sizeof(err->length) <\
+ ((void *)err + offsetof(sctp_errhdr_t, length) + sizeof(err->length) <=\
(void *)chunk_hdr + end) &&\
(void *)err <= (void *)chunk_hdr + end - ntohs(err->length) &&\
ntohs(err->length) >= sizeof(sctp_errhdr_t); \


2017-08-09 18:15:24

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.9 42/93] net: Zero terminate ifr_name in dev_ifname().

4.9-stable review patch. If anyone has any objections, please let me know.

------------------

From: "David S. Miller" <[email protected]>


[ Upstream commit 63679112c536289826fec61c917621de95ba2ade ]

The ifr.ifr_name is passed around and assumed to be NULL terminated.

Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
net/core/dev_ioctl.c | 1 +
1 file changed, 1 insertion(+)

--- a/net/core/dev_ioctl.c
+++ b/net/core/dev_ioctl.c
@@ -28,6 +28,7 @@ static int dev_ifname(struct net *net, s

if (copy_from_user(&ifr, arg, sizeof(struct ifreq)))
return -EFAULT;
+ ifr.ifr_name[IFNAMSIZ-1] = 0;

error = netdev_get_name(net, ifr.ifr_name, ifr.ifr_ifindex);
if (error)


2017-08-09 18:29:01

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.9 58/93] net/mlx5: Fix command bad flow on command entry allocation failure

4.9-stable review patch. If anyone has any objections, please let me know.

------------------

From: Moshe Shemesh <[email protected]>


[ Upstream commit 219c81f7d1d5a89656cb3b53d3b4e11e93608d80 ]

When driver fail to allocate an entry to send command to FW, it must
notify the calling function and release the memory allocated for
this command.

Fixes: e126ba97dba9e ('mlx5: Add driver for Mellanox Connect-IB adapters')
Signed-off-by: Moshe Shemesh <[email protected]>
Cc: [email protected]
Signed-off-by: Saeed Mahameed <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
drivers/net/ethernet/mellanox/mlx5/core/cmd.c | 19 +++++++++++++++++--
1 file changed, 17 insertions(+), 2 deletions(-)

--- a/drivers/net/ethernet/mellanox/mlx5/core/cmd.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/cmd.c
@@ -770,6 +770,10 @@ static void cb_timeout_handler(struct wo
mlx5_cmd_comp_handler(dev, 1UL << ent->idx, true);
}

+static void free_msg(struct mlx5_core_dev *dev, struct mlx5_cmd_msg *msg);
+static void mlx5_free_cmd_msg(struct mlx5_core_dev *dev,
+ struct mlx5_cmd_msg *msg);
+
static void cmd_work_handler(struct work_struct *work)
{
struct mlx5_cmd_work_ent *ent = container_of(work, struct mlx5_cmd_work_ent, work);
@@ -779,16 +783,27 @@ static void cmd_work_handler(struct work
struct mlx5_cmd_layout *lay;
struct semaphore *sem;
unsigned long flags;
+ int alloc_ret;

sem = ent->page_queue ? &cmd->pages_sem : &cmd->sem;
down(sem);
if (!ent->page_queue) {
- ent->idx = alloc_ent(cmd);
- if (ent->idx < 0) {
+ alloc_ret = alloc_ent(cmd);
+ if (alloc_ret < 0) {
+ if (ent->callback) {
+ ent->callback(-EAGAIN, ent->context);
+ mlx5_free_cmd_msg(dev, ent->out);
+ free_msg(dev, ent->in);
+ free_cmd(ent);
+ } else {
+ ent->ret = -EAGAIN;
+ complete(&ent->done);
+ }
mlx5_core_err(dev, "failed to allocate command entry\n");
up(sem);
return;
}
+ ent->idx = alloc_ret;
} else {
ent->idx = cmd->max_reg_cmds;
spin_lock_irqsave(&cmd->alloc_lock, flags);


2017-08-09 18:29:03

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.9 57/93] net/mlx5: Consider tx_enabled in all modes on remap

4.9-stable review patch. If anyone has any objections, please let me know.

------------------

From: Aviv Heller <[email protected]>


[ Upstream commit dc798b4cc0f2a06e7ad7d522403de274b86a0a6f ]

The tx_enabled lag event field is used to determine whether a slave is
active.
Current logic uses this value only if the mode is active-backup.

However, LACP mode, although considered a load balancing mode, can mark
a slave as inactive in certain situations (e.g., LACP timeout).

This fix takes the tx_enabled value into account when remapping, with
no respect to the LAG mode (this should not affect the behavior in XOR
mode, since in this mode both slaves are marked as active).

Fixes: 7907f23adc18 (net/mlx5: Implement RoCE LAG feature)
Signed-off-by: Aviv Heller <[email protected]>
Signed-off-by: Saeed Mahameed <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
drivers/net/ethernet/mellanox/mlx5/core/lag.c | 25 ++++++++++---------------
1 file changed, 10 insertions(+), 15 deletions(-)

--- a/drivers/net/ethernet/mellanox/mlx5/core/lag.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/lag.c
@@ -157,22 +157,17 @@ static bool mlx5_lag_is_bonded(struct ml
static void mlx5_infer_tx_affinity_mapping(struct lag_tracker *tracker,
u8 *port1, u8 *port2)
{
- if (tracker->tx_type == NETDEV_LAG_TX_TYPE_ACTIVEBACKUP) {
- if (tracker->netdev_state[0].tx_enabled) {
- *port1 = 1;
- *port2 = 1;
- } else {
- *port1 = 2;
- *port2 = 2;
- }
- } else {
- *port1 = 1;
- *port2 = 2;
- if (!tracker->netdev_state[0].link_up)
- *port1 = 2;
- else if (!tracker->netdev_state[1].link_up)
- *port2 = 1;
+ *port1 = 1;
+ *port2 = 2;
+ if (!tracker->netdev_state[0].tx_enabled ||
+ !tracker->netdev_state[0].link_up) {
+ *port1 = 2;
+ return;
}
+
+ if (!tracker->netdev_state[1].tx_enabled ||
+ !tracker->netdev_state[1].link_up)
+ *port2 = 1;
}

static void mlx5_activate_lag(struct mlx5_lag *ldev,


2017-08-09 18:29:43

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.9 46/93] rtnetlink: allocate more memory for dev_set_mac_address()

4.9-stable review patch. If anyone has any objections, please let me know.

------------------

From: WANG Cong <[email protected]>


[ Upstream commit 153711f9421be5dbc973dc57a4109dc9d54c89b1 ]

virtnet_set_mac_address() interprets mac address as struct
sockaddr, but upper layer only allocates dev->addr_len
which is ETH_ALEN + sizeof(sa_family_t) in this case.

We lack a unified definition for mac address, so just fix
the upper layer, this also allows drivers to interpret it
to struct sockaddr freely.

Reported-by: David Ahern <[email protected]>
Signed-off-by: Cong Wang <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
net/core/rtnetlink.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)

--- a/net/core/rtnetlink.c
+++ b/net/core/rtnetlink.c
@@ -1965,7 +1965,8 @@ static int do_setlink(const struct sk_bu
struct sockaddr *sa;
int len;

- len = sizeof(sa_family_t) + dev->addr_len;
+ len = sizeof(sa_family_t) + max_t(size_t, dev->addr_len,
+ sizeof(*sa));
sa = kmalloc(len, GFP_KERNEL);
if (!sa) {
err = -ENOMEM;


2017-08-09 18:30:00

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.9 43/93] ipv6: avoid overflow of offset in ip6_find_1stfragopt

4.9-stable review patch. If anyone has any objections, please let me know.

------------------

From: Sabrina Dubroca <[email protected]>


[ Upstream commit 6399f1fae4ec29fab5ec76070435555e256ca3a6 ]

In some cases, offset can overflow and can cause an infinite loop in
ip6_find_1stfragopt(). Make it unsigned int to prevent the overflow, and
cap it at IPV6_MAXPLEN, since packets larger than that should be invalid.

This problem has been here since before the beginning of git history.

Signed-off-by: Sabrina Dubroca <[email protected]>
Acked-by: Hannes Frederic Sowa <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
net/ipv6/output_core.c | 8 ++++++--
1 file changed, 6 insertions(+), 2 deletions(-)

--- a/net/ipv6/output_core.c
+++ b/net/ipv6/output_core.c
@@ -78,7 +78,7 @@ EXPORT_SYMBOL(ipv6_select_ident);

int ip6_find_1stfragopt(struct sk_buff *skb, u8 **nexthdr)
{
- u16 offset = sizeof(struct ipv6hdr);
+ unsigned int offset = sizeof(struct ipv6hdr);
unsigned int packet_len = skb_tail_pointer(skb) -
skb_network_header(skb);
int found_rhdr = 0;
@@ -86,6 +86,7 @@ int ip6_find_1stfragopt(struct sk_buff *

while (offset <= packet_len) {
struct ipv6_opt_hdr *exthdr;
+ unsigned int len;

switch (**nexthdr) {

@@ -111,7 +112,10 @@ int ip6_find_1stfragopt(struct sk_buff *

exthdr = (struct ipv6_opt_hdr *)(skb_network_header(skb) +
offset);
- offset += ipv6_optlen(exthdr);
+ len = ipv6_optlen(exthdr);
+ if (len + offset >= IPV6_MAXPLEN)
+ return -EINVAL;
+ offset += len;
*nexthdr = &exthdr->nexthdr;
}



2017-08-09 18:29:59

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.9 44/93] net: dsa: b53: Add missing ARL entries for BCM53125

4.9-stable review patch. If anyone has any objections, please let me know.

------------------

From: Florian Fainelli <[email protected]>


[ Upstream commit be35e8c516c1915a3035d266a2015b41f73ba3f9 ]

The BCM53125 entry was missing an arl_entries member which would
basically prevent the ARL search from terminating properly. This switch
has 4 ARL entries, so add that.

Fixes: 1da6df85c6fb ("net: dsa: b53: Implement ARL add/del/dump operations")
Signed-off-by: Florian Fainelli <[email protected]>
Reviewed-by: Vivien Didelot <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
drivers/net/dsa/b53/b53_common.c | 1 +
1 file changed, 1 insertion(+)

--- a/drivers/net/dsa/b53/b53_common.c
+++ b/drivers/net/dsa/b53/b53_common.c
@@ -1558,6 +1558,7 @@ static const struct b53_chip_data b53_sw
.dev_name = "BCM53125",
.vlans = 4096,
.enabled_ports = 0xff,
+ .arl_entries = 4,
.cpu_port = B53_CPU_PORT,
.vta_regs = B53_VTA_REGS,
.duplex_reg = B53_DUPLEX_STAT_GE,


2017-08-09 18:30:36

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.9 41/93] ipv4: ipv6: initialize treq->txhash in cookie_v[46]_check()

4.9-stable review patch. If anyone has any objections, please let me know.

------------------

From: Alexander Potapenko <[email protected]>


[ Upstream commit 18bcf2907df935981266532e1e0d052aff2e6fae ]

KMSAN reported use of uninitialized memory in skb_set_hash_from_sk(),
which originated from the TCP request socket created in
cookie_v6_check():

==================================================================
BUG: KMSAN: use of uninitialized memory in tcp_transmit_skb+0xf77/0x3ec0
CPU: 1 PID: 2949 Comm: syz-execprog Not tainted 4.11.0-rc5+ #2931
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
TCP: request_sock_TCPv6: Possible SYN flooding on port 20028. Sending cookies. Check SNMP counters.
Call Trace:
<IRQ>
__dump_stack lib/dump_stack.c:16
dump_stack+0x172/0x1c0 lib/dump_stack.c:52
kmsan_report+0x12a/0x180 mm/kmsan/kmsan.c:927
__msan_warning_32+0x61/0xb0 mm/kmsan/kmsan_instr.c:469
skb_set_hash_from_sk ./include/net/sock.h:2011
tcp_transmit_skb+0xf77/0x3ec0 net/ipv4/tcp_output.c:983
tcp_send_ack+0x75b/0x830 net/ipv4/tcp_output.c:3493
tcp_delack_timer_handler+0x9a6/0xb90 net/ipv4/tcp_timer.c:284
tcp_delack_timer+0x1b0/0x310 net/ipv4/tcp_timer.c:309
call_timer_fn+0x240/0x520 kernel/time/timer.c:1268
expire_timers kernel/time/timer.c:1307
__run_timers+0xc13/0xf10 kernel/time/timer.c:1601
run_timer_softirq+0x36/0xa0 kernel/time/timer.c:1614
__do_softirq+0x485/0x942 kernel/softirq.c:284
invoke_softirq kernel/softirq.c:364
irq_exit+0x1fa/0x230 kernel/softirq.c:405
exiting_irq+0xe/0x10 ./arch/x86/include/asm/apic.h:657
smp_apic_timer_interrupt+0x5a/0x80 arch/x86/kernel/apic/apic.c:966
apic_timer_interrupt+0x86/0x90 arch/x86/entry/entry_64.S:489
RIP: 0010:native_restore_fl ./arch/x86/include/asm/irqflags.h:36
RIP: 0010:arch_local_irq_restore ./arch/x86/include/asm/irqflags.h:77
RIP: 0010:__msan_poison_alloca+0xed/0x120 mm/kmsan/kmsan_instr.c:440
RSP: 0018:ffff880024917cd8 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff10
RAX: 0000000000000246 RBX: ffff8800224c0000 RCX: 0000000000000005
RDX: 0000000000000004 RSI: ffff880000000000 RDI: ffffea0000b6d770
RBP: ffff880024917d58 R08: 0000000000000dd8 R09: 0000000000000004
R10: 0000160000000000 R11: 0000000000000000 R12: ffffffff85abf810
R13: ffff880024917dd8 R14: 0000000000000010 R15: ffffffff81cabde4
</IRQ>
poll_select_copy_remaining+0xac/0x6b0 fs/select.c:293
SYSC_select+0x4b4/0x4e0 fs/select.c:653
SyS_select+0x76/0xa0 fs/select.c:634
entry_SYSCALL_64_fastpath+0x13/0x94 arch/x86/entry/entry_64.S:204
RIP: 0033:0x4597e7
RSP: 002b:000000c420037ee0 EFLAGS: 00000246 ORIG_RAX: 0000000000000017
RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00000000004597e7
RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
RBP: 000000c420037ef0 R08: 000000c420037ee0 R09: 0000000000000059
R10: 0000000000000000 R11: 0000000000000246 R12: 000000000042dc20
R13: 00000000000000f3 R14: 0000000000000030 R15: 0000000000000003
chained origin:
save_stack_trace+0x37/0x40 arch/x86/kernel/stacktrace.c:59
kmsan_save_stack_with_flags mm/kmsan/kmsan.c:302
kmsan_save_stack mm/kmsan/kmsan.c:317
kmsan_internal_chain_origin+0x12a/0x1f0 mm/kmsan/kmsan.c:547
__msan_store_shadow_origin_4+0xac/0x110 mm/kmsan/kmsan_instr.c:259
tcp_create_openreq_child+0x709/0x1ae0 net/ipv4/tcp_minisocks.c:472
tcp_v6_syn_recv_sock+0x7eb/0x2a30 net/ipv6/tcp_ipv6.c:1103
tcp_get_cookie_sock+0x136/0x5f0 net/ipv4/syncookies.c:212
cookie_v6_check+0x17a9/0x1b50 net/ipv6/syncookies.c:245
tcp_v6_cookie_check net/ipv6/tcp_ipv6.c:989
tcp_v6_do_rcv+0xdd8/0x1c60 net/ipv6/tcp_ipv6.c:1298
tcp_v6_rcv+0x41a3/0x4f00 net/ipv6/tcp_ipv6.c:1487
ip6_input_finish+0x82f/0x1ee0 net/ipv6/ip6_input.c:279
NF_HOOK ./include/linux/netfilter.h:257
ip6_input+0x239/0x290 net/ipv6/ip6_input.c:322
dst_input ./include/net/dst.h:492
ip6_rcv_finish net/ipv6/ip6_input.c:69
NF_HOOK ./include/linux/netfilter.h:257
ipv6_rcv+0x1dbd/0x22e0 net/ipv6/ip6_input.c:203
__netif_receive_skb_core+0x2f6f/0x3a20 net/core/dev.c:4208
__netif_receive_skb net/core/dev.c:4246
process_backlog+0x667/0xba0 net/core/dev.c:4866
napi_poll net/core/dev.c:5268
net_rx_action+0xc95/0x1590 net/core/dev.c:5333
__do_softirq+0x485/0x942 kernel/softirq.c:284
origin:
save_stack_trace+0x37/0x40 arch/x86/kernel/stacktrace.c:59
kmsan_save_stack_with_flags mm/kmsan/kmsan.c:302
kmsan_internal_poison_shadow+0xb1/0x1a0 mm/kmsan/kmsan.c:198
kmsan_kmalloc+0x7f/0xe0 mm/kmsan/kmsan.c:337
kmem_cache_alloc+0x1c2/0x1e0 mm/slub.c:2766
reqsk_alloc ./include/net/request_sock.h:87
inet_reqsk_alloc+0xa4/0x5b0 net/ipv4/tcp_input.c:6200
cookie_v6_check+0x4f4/0x1b50 net/ipv6/syncookies.c:169
tcp_v6_cookie_check net/ipv6/tcp_ipv6.c:989
tcp_v6_do_rcv+0xdd8/0x1c60 net/ipv6/tcp_ipv6.c:1298
tcp_v6_rcv+0x41a3/0x4f00 net/ipv6/tcp_ipv6.c:1487
ip6_input_finish+0x82f/0x1ee0 net/ipv6/ip6_input.c:279
NF_HOOK ./include/linux/netfilter.h:257
ip6_input+0x239/0x290 net/ipv6/ip6_input.c:322
dst_input ./include/net/dst.h:492
ip6_rcv_finish net/ipv6/ip6_input.c:69
NF_HOOK ./include/linux/netfilter.h:257
ipv6_rcv+0x1dbd/0x22e0 net/ipv6/ip6_input.c:203
__netif_receive_skb_core+0x2f6f/0x3a20 net/core/dev.c:4208
__netif_receive_skb net/core/dev.c:4246
process_backlog+0x667/0xba0 net/core/dev.c:4866
napi_poll net/core/dev.c:5268
net_rx_action+0xc95/0x1590 net/core/dev.c:5333
__do_softirq+0x485/0x942 kernel/softirq.c:284
==================================================================

Similar error is reported for cookie_v4_check().

Fixes: 58d607d3e52f ("tcp: provide skb->hash to synack packets")
Signed-off-by: Alexander Potapenko <[email protected]>
Acked-by: Eric Dumazet <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
net/ipv4/syncookies.c | 1 +
net/ipv6/syncookies.c | 1 +
2 files changed, 2 insertions(+)

--- a/net/ipv4/syncookies.c
+++ b/net/ipv4/syncookies.c
@@ -334,6 +334,7 @@ struct sock *cookie_v4_check(struct sock
treq = tcp_rsk(req);
treq->rcv_isn = ntohl(th->seq) - 1;
treq->snt_isn = cookie;
+ treq->txhash = net_tx_rndhash();
req->mss = mss;
ireq->ir_num = ntohs(th->dest);
ireq->ir_rmt_port = th->source;
--- a/net/ipv6/syncookies.c
+++ b/net/ipv6/syncookies.c
@@ -209,6 +209,7 @@ struct sock *cookie_v6_check(struct sock
treq->snt_synack.v64 = 0;
treq->rcv_isn = ntohl(th->seq) - 1;
treq->snt_isn = cookie;
+ treq->txhash = net_tx_rndhash();

/*
* We need to lookup the dst_entry to get the correct window size.


2017-08-09 18:15:20

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.9 37/93] tcp_bbr: introduce bbr_bw_to_pacing_rate() helper

4.9-stable review patch. If anyone has any objections, please let me know.

------------------

From: Neal Cardwell <[email protected]>


[ Upstream commit f19fd62dafaf1ed6cf615dba655b82fa9df59074 ]

Introduce a helper to convert a BBR bandwidth and gain factor to a
pacing rate in bytes per second. This is a pure refactor, but is
needed for two following fixes.

Fixes: 0f8782ea1497 ("tcp_bbr: add BBR congestion control")
Signed-off-by: Neal Cardwell <[email protected]>
Signed-off-by: Yuchung Cheng <[email protected]>
Signed-off-by: Soheil Hassas Yeganeh <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
net/ipv4/tcp_bbr.c | 14 +++++++++++---
1 file changed, 11 insertions(+), 3 deletions(-)

--- a/net/ipv4/tcp_bbr.c
+++ b/net/ipv4/tcp_bbr.c
@@ -182,6 +182,16 @@ static u64 bbr_rate_bytes_per_sec(struct
return rate >> BW_SCALE;
}

+/* Convert a BBR bw and gain factor to a pacing rate in bytes per second. */
+static u32 bbr_bw_to_pacing_rate(struct sock *sk, u32 bw, int gain)
+{
+ u64 rate = bw;
+
+ rate = bbr_rate_bytes_per_sec(sk, rate, gain);
+ rate = min_t(u64, rate, sk->sk_max_pacing_rate);
+ return rate;
+}
+
/* Pace using current bw estimate and a gain factor. In order to help drive the
* network toward lower queues while maintaining high utilization and low
* latency, the average pacing rate aims to be slightly (~1%) lower than the
@@ -191,10 +201,8 @@ static u64 bbr_rate_bytes_per_sec(struct
*/
static void bbr_set_pacing_rate(struct sock *sk, u32 bw, int gain)
{
- u64 rate = bw;
+ u32 rate = bbr_bw_to_pacing_rate(sk, bw, gain);

- rate = bbr_rate_bytes_per_sec(sk, rate, gain);
- rate = min_t(u64, rate, sk->sk_max_pacing_rate);
if (bbr_full_bw_reached(sk) || rate > sk->sk_pacing_rate)
sk->sk_pacing_rate = rate;
}


2017-08-09 18:30:54

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.9 40/93] tcp_bbr: init pacing rate on first RTT sample

4.9-stable review patch. If anyone has any objections, please let me know.

------------------

From: Neal Cardwell <[email protected]>


[ Upstream commit 32984565574da7ed3afa10647bb4020d7a9e6c93 ]

Fixes the following behavior: for connections that had no RTT sample
at the time of initializing congestion control, BBR was initializing
the pacing rate to a high nominal rate (based an a guess of RTT=1ms,
in case this is LAN traffic). Then BBR never adjusted the pacing rate
downward upon obtaining an actual RTT sample, if the connection never
filled the pipe (e.g. all sends were small app-limited writes()).

This fix adjusts the pacing rate upon obtaining the first RTT sample.

Fixes: 0f8782ea1497 ("tcp_bbr: add BBR congestion control")
Signed-off-by: Neal Cardwell <[email protected]>
Signed-off-by: Yuchung Cheng <[email protected]>
Signed-off-by: Soheil Hassas Yeganeh <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
net/ipv4/tcp_bbr.c | 10 +++++++++-
1 file changed, 9 insertions(+), 1 deletion(-)

--- a/net/ipv4/tcp_bbr.c
+++ b/net/ipv4/tcp_bbr.c
@@ -83,7 +83,8 @@ struct bbr {
cwnd_gain:10, /* current gain for setting cwnd */
full_bw_cnt:3, /* number of rounds without large bw gains */
cycle_idx:3, /* current index in pacing_gain cycle array */
- unused_b:6;
+ has_seen_rtt:1, /* have we seen an RTT sample yet? */
+ unused_b:5;
u32 prior_cwnd; /* prior cwnd upon entering loss recovery */
u32 full_bw; /* recent bw, to estimate if pipe is full */
};
@@ -196,11 +197,13 @@ static u32 bbr_bw_to_pacing_rate(struct
static void bbr_init_pacing_rate_from_rtt(struct sock *sk)
{
struct tcp_sock *tp = tcp_sk(sk);
+ struct bbr *bbr = inet_csk_ca(sk);
u64 bw;
u32 rtt_us;

if (tp->srtt_us) { /* any RTT sample yet? */
rtt_us = max(tp->srtt_us >> 3, 1U);
+ bbr->has_seen_rtt = 1;
} else { /* no RTT sample yet */
rtt_us = USEC_PER_MSEC; /* use nominal default RTT */
}
@@ -218,8 +221,12 @@ static void bbr_init_pacing_rate_from_rt
*/
static void bbr_set_pacing_rate(struct sock *sk, u32 bw, int gain)
{
+ struct tcp_sock *tp = tcp_sk(sk);
+ struct bbr *bbr = inet_csk_ca(sk);
u32 rate = bbr_bw_to_pacing_rate(sk, bw, gain);

+ if (unlikely(!bbr->has_seen_rtt && tp->srtt_us))
+ bbr_init_pacing_rate_from_rtt(sk);
if (bbr_full_bw_reached(sk) || rate > sk->sk_pacing_rate)
sk->sk_pacing_rate = rate;
}
@@ -808,6 +815,7 @@ static void bbr_init(struct sock *sk)

minmax_reset(&bbr->bw, bbr->rtt_cnt, 0); /* init max bw to 0 */

+ bbr->has_seen_rtt = 0;
bbr_init_pacing_rate_from_rtt(sk);

bbr->restore_cwnd = 0;


2017-08-09 18:31:19

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.9 35/93] [media] saa7164: fix double fetch PCIe access condition

4.9-stable review patch. If anyone has any objections, please let me know.

------------------

From: Steven Toth <[email protected]>

commit 6fb05e0dd32e566facb96ea61a48c7488daa5ac3 upstream.

Avoid a double fetch by reusing the values from the prior transfer.

Originally reported via https://bugzilla.kernel.org/show_bug.cgi?id=195559

Thanks to Pengfei Wang <[email protected]> for reporting.

Signed-off-by: Steven Toth <[email protected]>
Reported-by: Pengfei Wang <[email protected]>
Signed-off-by: Mauro Carvalho Chehab <[email protected]>
Cc: Eduardo Valentin <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
drivers/media/pci/saa7164/saa7164-bus.c | 13 +------------
1 file changed, 1 insertion(+), 12 deletions(-)

--- a/drivers/media/pci/saa7164/saa7164-bus.c
+++ b/drivers/media/pci/saa7164/saa7164-bus.c
@@ -393,11 +393,11 @@ int saa7164_bus_get(struct saa7164_dev *
msg_tmp.size = le16_to_cpu((__force __le16)msg_tmp.size);
msg_tmp.command = le32_to_cpu((__force __le32)msg_tmp.command);
msg_tmp.controlselector = le16_to_cpu((__force __le16)msg_tmp.controlselector);
+ memcpy(msg, &msg_tmp, sizeof(*msg));

/* No need to update the read positions, because this was a peek */
/* If the caller specifically want to peek, return */
if (peekonly) {
- memcpy(msg, &msg_tmp, sizeof(*msg));
goto peekout;
}

@@ -442,21 +442,15 @@ int saa7164_bus_get(struct saa7164_dev *
space_rem = bus->m_dwSizeGetRing - curr_grp;

if (space_rem < sizeof(*msg)) {
- /* msg wraps around the ring */
- memcpy_fromio(msg, bus->m_pdwGetRing + curr_grp, space_rem);
- memcpy_fromio((u8 *)msg + space_rem, bus->m_pdwGetRing,
- sizeof(*msg) - space_rem);
if (buf)
memcpy_fromio(buf, bus->m_pdwGetRing + sizeof(*msg) -
space_rem, buf_size);

} else if (space_rem == sizeof(*msg)) {
- memcpy_fromio(msg, bus->m_pdwGetRing + curr_grp, sizeof(*msg));
if (buf)
memcpy_fromio(buf, bus->m_pdwGetRing, buf_size);
} else {
/* Additional data wraps around the ring */
- memcpy_fromio(msg, bus->m_pdwGetRing + curr_grp, sizeof(*msg));
if (buf) {
memcpy_fromio(buf, bus->m_pdwGetRing + curr_grp +
sizeof(*msg), space_rem - sizeof(*msg));
@@ -469,15 +463,10 @@ int saa7164_bus_get(struct saa7164_dev *

} else {
/* No wrapping */
- memcpy_fromio(msg, bus->m_pdwGetRing + curr_grp, sizeof(*msg));
if (buf)
memcpy_fromio(buf, bus->m_pdwGetRing + curr_grp + sizeof(*msg),
buf_size);
}
- /* Convert from little endian to CPU */
- msg->size = le16_to_cpu((__force __le16)msg->size);
- msg->command = le32_to_cpu((__force __le32)msg->command);
- msg->controlselector = le16_to_cpu((__force __le16)msg->controlselector);

/* Update the read positions, adjusting the ring */
saa7164_writel(bus->m_dwGetReadPos, new_grp);


2017-08-09 18:31:19

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.9 36/93] tcp_bbr: cut pacing rate only if filled pipe

4.9-stable review patch. If anyone has any objections, please let me know.

------------------

From: Neal Cardwell <[email protected]>


[ Upstream commit 4aea287e90dd61a48268ff2994b56f9799441b62 ]

In bbr_set_pacing_rate(), which decides whether to cut the pacing
rate, there was some code that considered exiting STARTUP to be
equivalent to the notion of filling the pipe (i.e.,
bbr_full_bw_reached()). Specifically, as the code was structured,
exiting STARTUP and going into PROBE_RTT could cause us to cut the
pacing rate down to something silly and low, based on whatever
bandwidth samples we've had so far, when it's possible that all of
them have been small app-limited bandwidth samples that are not
representative of the bandwidth available in the path. (The code was
correct at the time it was written, but the state machine changed
without this spot being adjusted correspondingly.)

Fixes: 0f8782ea1497 ("tcp_bbr: add BBR congestion control")
Signed-off-by: Neal Cardwell <[email protected]>
Signed-off-by: Yuchung Cheng <[email protected]>
Signed-off-by: Soheil Hassas Yeganeh <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
net/ipv4/tcp_bbr.c | 3 +--
1 file changed, 1 insertion(+), 2 deletions(-)

--- a/net/ipv4/tcp_bbr.c
+++ b/net/ipv4/tcp_bbr.c
@@ -191,12 +191,11 @@ static u64 bbr_rate_bytes_per_sec(struct
*/
static void bbr_set_pacing_rate(struct sock *sk, u32 bw, int gain)
{
- struct bbr *bbr = inet_csk_ca(sk);
u64 rate = bw;

rate = bbr_rate_bytes_per_sec(sk, rate, gain);
rate = min_t(u64, rate, sk->sk_max_pacing_rate);
- if (bbr->mode != BBR_STARTUP || rate > sk->sk_pacing_rate)
+ if (bbr_full_bw_reached(sk) || rate > sk->sk_pacing_rate)
sk->sk_pacing_rate = rate;
}



2017-08-09 18:31:17

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.9 39/93] tcp_bbr: remove sk_pacing_rate=0 transient during init

4.9-stable review patch. If anyone has any objections, please let me know.

------------------

From: Neal Cardwell <[email protected]>


[ Upstream commit 1d3648eb5d1fe9ed3d095ed8fa19ad11ca4c8bc0 ]

Fix a corner case noticed by Eric Dumazet, where BBR's setting
sk->sk_pacing_rate to 0 during initialization could theoretically
cause packets in the sending host to hang if there were packets "in
flight" in the pacing infrastructure at the time the BBR congestion
control state is initialized. This could occur if the pacing
infrastructure happened to race with bbr_init() in a way such that the
pacer read the 0 rather than the immediately following non-zero pacing
rate.

Fixes: 0f8782ea1497 ("tcp_bbr: add BBR congestion control")
Reported-by: Eric Dumazet <[email protected]>
Signed-off-by: Neal Cardwell <[email protected]>
Signed-off-by: Yuchung Cheng <[email protected]>
Signed-off-by: Soheil Hassas Yeganeh <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
net/ipv4/tcp_bbr.c | 1 -
1 file changed, 1 deletion(-)

--- a/net/ipv4/tcp_bbr.c
+++ b/net/ipv4/tcp_bbr.c
@@ -808,7 +808,6 @@ static void bbr_init(struct sock *sk)

minmax_reset(&bbr->bw, bbr->rtt_cnt, 0); /* init max bw to 0 */

- sk->sk_pacing_rate = 0; /* force an update of sk_pacing_rate */
bbr_init_pacing_rate_from_rtt(sk);

bbr->restore_cwnd = 0;


2017-08-09 18:32:12

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.9 06/93] iwlwifi: dvm: prevent an out of bounds access

4.9-stable review patch. If anyone has any objections, please let me know.

------------------

From: Emmanuel Grumbach <[email protected]>

commit 0b0f934e92a8eaed2e6c48a50eae6f84661f74f3 upstream.

iwlagn_check_ratid_empty takes the tid as a parameter, but
it doesn't check that it is not IWL_TID_NON_QOS.
Since IWL_TID_NON_QOS = 8 and iwl_priv::tid_data is an array
with 8 entries, accessing iwl_priv::tid_data[IWL_TID_NON_QOS]
is a bad idea.
This happened in iwlagn_rx_reply_tx. Since
iwlagn_check_ratid_empty is relevant only to check whether
we can open A-MPDU, this flow is irrelevant if tid is
IWL_TID_NON_QOS. Call iwlagn_check_ratid_empty only inside
the
if (tid != IWL_TID_NON_QOS)

a few lines earlier in the function.

Reported-by: Seraphime Kirkovski <[email protected]>
Tested-by: Seraphime Kirkovski <[email protected]>
Signed-off-by: Emmanuel Grumbach <[email protected]>
Signed-off-by: Luca Coelho <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
drivers/net/wireless/intel/iwlwifi/dvm/tx.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

--- a/drivers/net/wireless/intel/iwlwifi/dvm/tx.c
+++ b/drivers/net/wireless/intel/iwlwifi/dvm/tx.c
@@ -1190,11 +1190,11 @@ void iwlagn_rx_reply_tx(struct iwl_priv
next_reclaimed;
IWL_DEBUG_TX_REPLY(priv, "Next reclaimed packet:%d\n",
next_reclaimed);
+ iwlagn_check_ratid_empty(priv, sta_id, tid);
}

iwl_trans_reclaim(priv->trans, txq_id, ssn, &skbs);

- iwlagn_check_ratid_empty(priv, sta_id, tid);
freed = 0;

/* process frames */


2017-08-09 18:15:10

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.9 12/93] mm, mprotect: flush TLB if potentially racing with a parallel reclaim leaving stale TLB entries

4.9-stable review patch. If anyone has any objections, please let me know.

------------------

From: Mel Gorman <[email protected]>

commit 3ea277194daaeaa84ce75180ec7c7a2075027a68 upstream.

Nadav Amit identified a theoritical race between page reclaim and
mprotect due to TLB flushes being batched outside of the PTL being held.

He described the race as follows:

CPU0 CPU1
---- ----
user accesses memory using RW PTE
[PTE now cached in TLB]
try_to_unmap_one()
==> ptep_get_and_clear()
==> set_tlb_ubc_flush_pending()
mprotect(addr, PROT_READ)
==> change_pte_range()
==> [ PTE non-present - no flush ]

user writes using cached RW PTE
...

try_to_unmap_flush()

The same type of race exists for reads when protecting for PROT_NONE and
also exists for operations that can leave an old TLB entry behind such
as munmap, mremap and madvise.

For some operations like mprotect, it's not necessarily a data integrity
issue but it is a correctness issue as there is a window where an
mprotect that limits access still allows access. For munmap, it's
potentially a data integrity issue although the race is massive as an
munmap, mmap and return to userspace must all complete between the
window when reclaim drops the PTL and flushes the TLB. However, it's
theoritically possible so handle this issue by flushing the mm if
reclaim is potentially currently batching TLB flushes.

Other instances where a flush is required for a present pte should be ok
as either the page lock is held preventing parallel reclaim or a page
reference count is elevated preventing a parallel free leading to
corruption. In the case of page_mkclean there isn't an obvious path
that userspace could take advantage of without using the operations that
are guarded by this patch. Other users such as gup as a race with
reclaim looks just at PTEs. huge page variants should be ok as they
don't race with reclaim. mincore only looks at PTEs. userfault also
should be ok as if a parallel reclaim takes place, it will either fault
the page back in or read some of the data before the flush occurs
triggering a fault.

Note that a variant of this patch was acked by Andy Lutomirski but this
was for the x86 parts on top of his PCID work which didn't make the 4.13
merge window as expected. His ack is dropped from this version and
there will be a follow-on patch on top of PCID that will include his
ack.

[[email protected]: tweak comments]
[[email protected]: fix spello]
Link: http://lkml.kernel.org/r/[email protected]
Reported-by: Nadav Amit <[email protected]>
Signed-off-by: Mel Gorman <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
include/linux/mm_types.h | 4 ++++
mm/internal.h | 5 ++++-
mm/madvise.c | 2 ++
mm/memory.c | 1 +
mm/mprotect.c | 1 +
mm/mremap.c | 1 +
mm/rmap.c | 36 ++++++++++++++++++++++++++++++++++++
7 files changed, 49 insertions(+), 1 deletion(-)

--- a/include/linux/mm_types.h
+++ b/include/linux/mm_types.h
@@ -508,6 +508,10 @@ struct mm_struct {
*/
bool tlb_flush_pending;
#endif
+#ifdef CONFIG_ARCH_WANT_BATCHED_UNMAP_TLB_FLUSH
+ /* See flush_tlb_batched_pending() */
+ bool tlb_flush_batched;
+#endif
struct uprobes_state uprobes_state;
#ifdef CONFIG_X86_INTEL_MPX
/* address of the bounds directory */
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -472,6 +472,7 @@ struct tlbflush_unmap_batch;
#ifdef CONFIG_ARCH_WANT_BATCHED_UNMAP_TLB_FLUSH
void try_to_unmap_flush(void);
void try_to_unmap_flush_dirty(void);
+void flush_tlb_batched_pending(struct mm_struct *mm);
#else
static inline void try_to_unmap_flush(void)
{
@@ -479,7 +480,9 @@ static inline void try_to_unmap_flush(vo
static inline void try_to_unmap_flush_dirty(void)
{
}
-
+static inline void flush_tlb_batched_pending(struct mm_struct *mm)
+{
+}
#endif /* CONFIG_ARCH_WANT_BATCHED_UNMAP_TLB_FLUSH */

extern const struct trace_print_flags pageflag_names[];
--- a/mm/madvise.c
+++ b/mm/madvise.c
@@ -21,6 +21,7 @@
#include <linux/swap.h>
#include <linux/swapops.h>
#include <linux/mmu_notifier.h>
+#include "internal.h"

#include <asm/tlb.h>

@@ -282,6 +283,7 @@ static int madvise_free_pte_range(pmd_t
return 0;

orig_pte = pte = pte_offset_map_lock(mm, pmd, addr, &ptl);
+ flush_tlb_batched_pending(mm);
arch_enter_lazy_mmu_mode();
for (; addr != end; pte++, addr += PAGE_SIZE) {
ptent = *pte;
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -1124,6 +1124,7 @@ again:
init_rss_vec(rss);
start_pte = pte_offset_map_lock(mm, pmd, addr, &ptl);
pte = start_pte;
+ flush_tlb_batched_pending(mm);
arch_enter_lazy_mmu_mode();
do {
pte_t ptent = *pte;
--- a/mm/mprotect.c
+++ b/mm/mprotect.c
@@ -74,6 +74,7 @@ static unsigned long change_pte_range(st
if (!pte)
return 0;

+ flush_tlb_batched_pending(vma->vm_mm);
arch_enter_lazy_mmu_mode();
do {
oldpte = *pte;
--- a/mm/mremap.c
+++ b/mm/mremap.c
@@ -142,6 +142,7 @@ static void move_ptes(struct vm_area_str
new_ptl = pte_lockptr(mm, new_pmd);
if (new_ptl != old_ptl)
spin_lock_nested(new_ptl, SINGLE_DEPTH_NESTING);
+ flush_tlb_batched_pending(vma->vm_mm);
arch_enter_lazy_mmu_mode();

for (; old_addr < old_end; old_pte++, old_addr += PAGE_SIZE,
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -617,6 +617,13 @@ static void set_tlb_ubc_flush_pending(st
tlb_ubc->flush_required = true;

/*
+ * Ensure compiler does not re-order the setting of tlb_flush_batched
+ * before the PTE is cleared.
+ */
+ barrier();
+ mm->tlb_flush_batched = true;
+
+ /*
* If the PTE was dirty then it's best to assume it's writable. The
* caller must use try_to_unmap_flush_dirty() or try_to_unmap_flush()
* before the page is queued for IO.
@@ -643,6 +650,35 @@ static bool should_defer_flush(struct mm

return should_defer;
}
+
+/*
+ * Reclaim unmaps pages under the PTL but do not flush the TLB prior to
+ * releasing the PTL if TLB flushes are batched. It's possible for a parallel
+ * operation such as mprotect or munmap to race between reclaim unmapping
+ * the page and flushing the page. If this race occurs, it potentially allows
+ * access to data via a stale TLB entry. Tracking all mm's that have TLB
+ * batching in flight would be expensive during reclaim so instead track
+ * whether TLB batching occurred in the past and if so then do a flush here
+ * if required. This will cost one additional flush per reclaim cycle paid
+ * by the first operation at risk such as mprotect and mumap.
+ *
+ * This must be called under the PTL so that an access to tlb_flush_batched
+ * that is potentially a "reclaim vs mprotect/munmap/etc" race will synchronise
+ * via the PTL.
+ */
+void flush_tlb_batched_pending(struct mm_struct *mm)
+{
+ if (mm->tlb_flush_batched) {
+ flush_tlb_mm(mm);
+
+ /*
+ * Do not allow the compiler to re-order the clearing of
+ * tlb_flush_batched before the tlb is flushed.
+ */
+ barrier();
+ mm->tlb_flush_batched = false;
+ }
+}
#else
static void set_tlb_ubc_flush_pending(struct mm_struct *mm,
struct page *page, bool writable)


2017-08-09 18:32:41

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.9 11/93] mmc: core: Fix access to HS400-ES devices

4.9-stable review patch. If anyone has any objections, please let me know.

------------------

From: Guenter Roeck <[email protected]>

commit 773dc118756b1f38766063e90e582016be868f09 upstream.

HS400-ES devices fail to initialize with the following error messages.

mmc1: power class selection to bus width 8 ddr 0 failed
mmc1: error -110 whilst initialising MMC card

This was seen on Samsung Chromebook Plus. Code analysis points to
commit 3d4ef329757c ("mmc: core: fix multi-bit bus width without
high-speed mode"), which attempts to set the bus width for all but
HS200 devices unconditionally. However, for HS400-ES, the bus width
is already selected.

Cc: Anssi Hannula <[email protected]>
Cc: Douglas Anderson <[email protected]>
Cc: Brian Norris <[email protected]>
Fixes: 3d4ef329757c ("mmc: core: fix multi-bit bus width ...")
Signed-off-by: Guenter Roeck <[email protected]>
Reviewed-by: Douglas Anderson <[email protected]>
Reviewed-by: Shawn Lin <[email protected]>
Tested-by: Heiko Stuebner <[email protected]>
Signed-off-by: Ulf Hansson <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
drivers/mmc/core/mmc.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

--- a/drivers/mmc/core/mmc.c
+++ b/drivers/mmc/core/mmc.c
@@ -1690,7 +1690,7 @@ static int mmc_init_card(struct mmc_host
err = mmc_select_hs400(card);
if (err)
goto free_card;
- } else {
+ } else if (!mmc_card_hs400es(card)) {
/* Select the desired bus width optionally */
err = mmc_select_bus_width(card);
if (err > 0 && mmc_card_hs(card)) {


2017-08-09 18:32:39

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.9 13/93] cpuset: fix a deadlock due to incomplete patching of cpusets_enabled()

4.9-stable review patch. If anyone has any objections, please let me know.

------------------

From: Dima Zavin <[email protected]>

commit 89affbf5d9ebb15c6460596822e8857ea2f9e735 upstream.

In codepaths that use the begin/retry interface for reading
mems_allowed_seq with irqs disabled, there exists a race condition that
stalls the patch process after only modifying a subset of the
static_branch call sites.

This problem manifested itself as a deadlock in the slub allocator,
inside get_any_partial. The loop reads mems_allowed_seq value (via
read_mems_allowed_begin), performs the defrag operation, and then
verifies the consistency of mem_allowed via the read_mems_allowed_retry
and the cookie returned by xxx_begin.

The issue here is that both begin and retry first check if cpusets are
enabled via cpusets_enabled() static branch. This branch can be
rewritted dynamically (via cpuset_inc) if a new cpuset is created. The
x86 jump label code fully synchronizes across all CPUs for every entry
it rewrites. If it rewrites only one of the callsites (specifically the
one in read_mems_allowed_retry) and then waits for the
smp_call_function(do_sync_core) to complete while a CPU is inside the
begin/retry section with IRQs off and the mems_allowed value is changed,
we can hang.

This is because begin() will always return 0 (since it wasn't patched
yet) while retry() will test the 0 against the actual value of the seq
counter.

The fix is to use two different static keys: one for begin
(pre_enable_key) and one for retry (enable_key). In cpuset_inc(), we
first bump the pre_enable key to ensure that cpuset_mems_allowed_begin()
always return a valid seqcount if are enabling cpusets. Similarly, when
disabling cpusets via cpuset_dec(), we first ensure that callers of
cpuset_mems_allowed_retry() will start ignoring the seqcount value
before we let cpuset_mems_allowed_begin() return 0.

The relevant stack traces of the two stuck threads:

CPU: 1 PID: 1415 Comm: mkdir Tainted: G L 4.9.36-00104-g540c51286237 #4
Hardware name: Default string Default string/Hardware, BIOS 4.29.1-20170526215256 05/26/2017
task: ffff8817f9c28000 task.stack: ffffc9000ffa4000
RIP: smp_call_function_many+0x1f9/0x260
Call Trace:
smp_call_function+0x3b/0x70
on_each_cpu+0x2f/0x90
text_poke_bp+0x87/0xd0
arch_jump_label_transform+0x93/0x100
__jump_label_update+0x77/0x90
jump_label_update+0xaa/0xc0
static_key_slow_inc+0x9e/0xb0
cpuset_css_online+0x70/0x2e0
online_css+0x2c/0xa0
cgroup_apply_control_enable+0x27f/0x3d0
cgroup_mkdir+0x2b7/0x420
kernfs_iop_mkdir+0x5a/0x80
vfs_mkdir+0xf6/0x1a0
SyS_mkdir+0xb7/0xe0
entry_SYSCALL_64_fastpath+0x18/0xad

...

CPU: 2 PID: 1 Comm: init Tainted: G L 4.9.36-00104-g540c51286237 #4
Hardware name: Default string Default string/Hardware, BIOS 4.29.1-20170526215256 05/26/2017
task: ffff8818087c0000 task.stack: ffffc90000030000
RIP: int3+0x39/0x70
Call Trace:
<#DB> ? ___slab_alloc+0x28b/0x5a0
<EOE> ? copy_process.part.40+0xf7/0x1de0
__slab_alloc.isra.80+0x54/0x90
copy_process.part.40+0xf7/0x1de0
copy_process.part.40+0xf7/0x1de0
kmem_cache_alloc_node+0x8a/0x280
copy_process.part.40+0xf7/0x1de0
_do_fork+0xe7/0x6c0
_raw_spin_unlock_irq+0x2d/0x60
trace_hardirqs_on_caller+0x136/0x1d0
entry_SYSCALL_64_fastpath+0x5/0xad
do_syscall_64+0x27/0x350
SyS_clone+0x19/0x20
do_syscall_64+0x60/0x350
entry_SYSCALL64_slow_path+0x25/0x25

Link: http://lkml.kernel.org/r/[email protected]
Fixes: 46e700abc44c ("mm, page_alloc: remove unnecessary taking of a seqlock when cpusets are disabled")
Signed-off-by: Dima Zavin <[email protected]>
Reported-by: Cliff Spradlin <[email protected]>
Acked-by: Vlastimil Babka <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Christopher Lameter <[email protected]>
Cc: Li Zefan <[email protected]>
Cc: Pekka Enberg <[email protected]>
Cc: David Rientjes <[email protected]>
Cc: Joonsoo Kim <[email protected]>
Cc: Mel Gorman <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
include/linux/cpuset.h | 19 +++++++++++++++++--
kernel/cpuset.c | 1 +
2 files changed, 18 insertions(+), 2 deletions(-)

--- a/include/linux/cpuset.h
+++ b/include/linux/cpuset.h
@@ -16,6 +16,19 @@

#ifdef CONFIG_CPUSETS

+/*
+ * Static branch rewrites can happen in an arbitrary order for a given
+ * key. In code paths where we need to loop with read_mems_allowed_begin() and
+ * read_mems_allowed_retry() to get a consistent view of mems_allowed, we need
+ * to ensure that begin() always gets rewritten before retry() in the
+ * disabled -> enabled transition. If not, then if local irqs are disabled
+ * around the loop, we can deadlock since retry() would always be
+ * comparing the latest value of the mems_allowed seqcount against 0 as
+ * begin() still would see cpusets_enabled() as false. The enabled -> disabled
+ * transition should happen in reverse order for the same reasons (want to stop
+ * looking at real value of mems_allowed.sequence in retry() first).
+ */
+extern struct static_key_false cpusets_pre_enable_key;
extern struct static_key_false cpusets_enabled_key;
static inline bool cpusets_enabled(void)
{
@@ -30,12 +43,14 @@ static inline int nr_cpusets(void)

static inline void cpuset_inc(void)
{
+ static_branch_inc(&cpusets_pre_enable_key);
static_branch_inc(&cpusets_enabled_key);
}

static inline void cpuset_dec(void)
{
static_branch_dec(&cpusets_enabled_key);
+ static_branch_dec(&cpusets_pre_enable_key);
}

extern int cpuset_init(void);
@@ -113,7 +128,7 @@ extern void cpuset_print_current_mems_al
*/
static inline unsigned int read_mems_allowed_begin(void)
{
- if (!cpusets_enabled())
+ if (!static_branch_unlikely(&cpusets_pre_enable_key))
return 0;

return read_seqcount_begin(&current->mems_allowed_seq);
@@ -127,7 +142,7 @@ static inline unsigned int read_mems_all
*/
static inline bool read_mems_allowed_retry(unsigned int seq)
{
- if (!cpusets_enabled())
+ if (!static_branch_unlikely(&cpusets_enabled_key))
return false;

return read_seqcount_retry(&current->mems_allowed_seq, seq);
--- a/kernel/cpuset.c
+++ b/kernel/cpuset.c
@@ -61,6 +61,7 @@
#include <linux/cgroup.h>
#include <linux/wait.h>

+DEFINE_STATIC_KEY_FALSE(cpusets_pre_enable_key);
DEFINE_STATIC_KEY_FALSE(cpusets_enabled_key);

/* See "Frequency meter" comments, below. */


2017-08-09 18:15:02

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.9 32/93] media: lirc: LIRC_GET_REC_RESOLUTION should return microseconds

4.9-stable review patch. If anyone has any objections, please let me know.

------------------

From: Sean Young <[email protected]>

commit 9f5039ba440e499d85c29b1ddbc3cbc9dc90e44b upstream.

Since commit e8f4818895b3 ("[media] lirc: advertise
LIRC_CAN_GET_REC_RESOLUTION and improve") lircd uses the ioctl
LIRC_GET_REC_RESOLUTION to determine the shortest pulse or space that
the hardware can detect. This breaks decoding in lirc because lircd
expects the answer in microseconds, but nanoseconds is returned.

Reported-by: Derek <[email protected]>
Tested-by: Derek <[email protected]>
Signed-off-by: Sean Young <[email protected]>
Signed-off-by: Mauro Carvalho Chehab <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
drivers/media/rc/ir-lirc-codec.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

--- a/drivers/media/rc/ir-lirc-codec.c
+++ b/drivers/media/rc/ir-lirc-codec.c
@@ -254,7 +254,7 @@ static long ir_lirc_ioctl(struct file *f
return 0;

case LIRC_GET_REC_RESOLUTION:
- val = dev->rx_resolution;
+ val = dev->rx_resolution / 1000;
break;

case LIRC_SET_WIDEBAND_RECEIVER:


2017-08-09 18:33:21

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.9 10/93] device property: Make dev_fwnode() public

4.9-stable review patch. If anyone has any objections, please let me know.

------------------

From: Sakari Ailus <[email protected]>

commit e44bb0cbdc88686c21e2175a990b40bf6db5d005 upstream.

The function to obtain a fwnode related to a struct device is useful for
drivers that use the fwnode property API: it allows not being aware of the
underlying firmware implementation.

Signed-off-by: Sakari Ailus <[email protected]>
Reviewed-by: Mika Westerberg <[email protected]>
Signed-off-by: Rafael J. Wysocki <[email protected]>
Cc: Chris Metcalf <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
drivers/base/property.c | 3 ++-
include/linux/property.h | 2 ++
2 files changed, 4 insertions(+), 1 deletion(-)

--- a/drivers/base/property.c
+++ b/drivers/base/property.c
@@ -182,11 +182,12 @@ static int pset_prop_read_string(struct
return 0;
}

-static inline struct fwnode_handle *dev_fwnode(struct device *dev)
+struct fwnode_handle *dev_fwnode(struct device *dev)
{
return IS_ENABLED(CONFIG_OF) && dev->of_node ?
&dev->of_node->fwnode : dev->fwnode;
}
+EXPORT_SYMBOL_GPL(dev_fwnode);

/**
* device_property_present - check if a property of a device is present
--- a/include/linux/property.h
+++ b/include/linux/property.h
@@ -33,6 +33,8 @@ enum dev_dma_attr {
DEV_DMA_COHERENT,
};

+struct fwnode_handle *dev_fwnode(struct device *dev);
+
bool device_property_present(struct device *dev, const char *propname);
int device_property_read_u8_array(struct device *dev, const char *propname,
u8 *val, size_t nval);


2017-08-09 18:33:44

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.9 09/93] mmc: sdhci-of-at91: force card detect value for non removable devices

4.9-stable review patch. If anyone has any objections, please let me know.

------------------

From: Ludovic Desroches <[email protected]>

commit 7a1e3f143176e8ebdb2f5a9b3b47abc18b879d90 upstream.

When the device is non removable, the card detect signal is often used
for another purpose i.e. muxed to another SoC peripheral or used as a
GPIO. It could lead to wrong behaviors depending the default value of
this signal if not muxed to the SDHCI controller.

Fixes: bb5f8ea4d514 ("mmc: sdhci-of-at91: introduce driver for the Atmel SDMMC")
Signed-off-by: Ludovic Desroches <[email protected]>
Acked-by: Adrian Hunter <[email protected]>
Signed-off-by: Ulf Hansson <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
drivers/mmc/host/sdhci-of-at91.c | 35 ++++++++++++++++++++++++++++++++++-
1 file changed, 34 insertions(+), 1 deletion(-)

--- a/drivers/mmc/host/sdhci-of-at91.c
+++ b/drivers/mmc/host/sdhci-of-at91.c
@@ -31,6 +31,7 @@

#define SDMMC_MC1R 0x204
#define SDMMC_MC1R_DDR BIT(3)
+#define SDMMC_MC1R_FCD BIT(7)
#define SDMMC_CACR 0x230
#define SDMMC_CACR_CAPWREN BIT(0)
#define SDMMC_CACR_KEY (0x46 << 8)
@@ -43,6 +44,15 @@ struct sdhci_at91_priv {
struct clk *mainck;
};

+static void sdhci_at91_set_force_card_detect(struct sdhci_host *host)
+{
+ u8 mc1r;
+
+ mc1r = readb(host->ioaddr + SDMMC_MC1R);
+ mc1r |= SDMMC_MC1R_FCD;
+ writeb(mc1r, host->ioaddr + SDMMC_MC1R);
+}
+
static void sdhci_at91_set_clock(struct sdhci_host *host, unsigned int clock)
{
u16 clk;
@@ -112,10 +122,18 @@ void sdhci_at91_set_uhs_signaling(struct
sdhci_set_uhs_signaling(host, timing);
}

+static void sdhci_at91_reset(struct sdhci_host *host, u8 mask)
+{
+ sdhci_reset(host, mask);
+
+ if (host->mmc->caps & MMC_CAP_NONREMOVABLE)
+ sdhci_at91_set_force_card_detect(host);
+}
+
static const struct sdhci_ops sdhci_at91_sama5d2_ops = {
.set_clock = sdhci_at91_set_clock,
.set_bus_width = sdhci_set_bus_width,
- .reset = sdhci_reset,
+ .reset = sdhci_at91_reset,
.set_uhs_signaling = sdhci_at91_set_uhs_signaling,
.set_power = sdhci_at91_set_power,
};
@@ -322,6 +340,21 @@ static int sdhci_at91_probe(struct platf
host->quirks &= ~SDHCI_QUIRK_BROKEN_CARD_DETECTION;
}

+ /*
+ * If the device attached to the MMC bus is not removable, it is safer
+ * to set the Force Card Detect bit. People often don't connect the
+ * card detect signal and use this pin for another purpose. If the card
+ * detect pin is not muxed to SDHCI controller, a default value is
+ * used. This value can be different from a SoC revision to another
+ * one. Problems come when this default value is not card present. To
+ * avoid this case, if the device is non removable then the card
+ * detection procedure using the SDMCC_CD signal is bypassed.
+ * This bit is reset when a software reset for all command is performed
+ * so we need to implement our own reset function to set back this bit.
+ */
+ if (host->mmc->caps & MMC_CAP_NONREMOVABLE)
+ sdhci_at91_set_force_card_detect(host);
+
pm_runtime_put_autosuspend(&pdev->dev);

return 0;


2017-08-09 18:34:02

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.9 34/93] Btrfs: fix early ENOSPC due to delalloc

4.9-stable review patch. If anyone has any objections, please let me know.

------------------

From: Omar Sandoval <[email protected]>

commit 17024ad0a0fdfcfe53043afb969b813d3e020c21 upstream.

If a lot of metadata is reserved for outstanding delayed allocations, we
rely on shrink_delalloc() to reclaim metadata space in order to fulfill
reservation tickets. However, shrink_delalloc() has a shortcut where if
it determines that space can be overcommitted, it will stop early. This
made sense before the ticketed enospc system, but now it means that
shrink_delalloc() will often not reclaim enough space to fulfill any
tickets, leading to an early ENOSPC. (Reservation tickets don't care
about being able to overcommit, they need every byte accounted for.)

Fix it by getting rid of the shortcut so that shrink_delalloc() reclaims
all of the metadata it is supposed to. This fixes early ENOSPCs we were
seeing when doing a btrfs receive to populate a new filesystem, as well
as early ENOSPCs Christoph saw when doing a big cp -r onto Btrfs.

Fixes: 957780eb2788 ("Btrfs: introduce ticketed enospc infrastructure")
Tested-by: Christoph Anton Mitterer <[email protected]>
Reviewed-by: Josef Bacik <[email protected]>
Signed-off-by: Omar Sandoval <[email protected]>
Signed-off-by: David Sterba <[email protected]>
Signed-off-by: Nikolay Borisov <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
fs/btrfs/extent-tree.c | 4 ----
1 file changed, 4 deletions(-)

--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -4759,10 +4759,6 @@ skip_async:
else
flush = BTRFS_RESERVE_NO_FLUSH;
spin_lock(&space_info->lock);
- if (can_overcommit(root, space_info, orig, flush)) {
- spin_unlock(&space_info->lock);
- break;
- }
if (list_empty(&space_info->tickets) &&
list_empty(&space_info->priority_tickets)) {
spin_unlock(&space_info->lock);


2017-08-09 18:15:01

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.9 31/93] mmc: core: Use device_property_read instead of of_property_read

4.9-stable review patch. If anyone has any objections, please let me know.

------------------

From: David Woods <[email protected]>

commit 73a47a9bb3e2c4a9c553c72456e63ab991b1a4d9 upstream.

Using the device_property interfaces allows mmc drivers to work
on platforms which run on either device tree or ACPI.

Signed-off-by: David Woods <[email protected]>
Reviewed-by: Chris Metcalf <[email protected]>
Signed-off-by: Ulf Hansson <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
drivers/mmc/core/host.c | 70 +++++++++++++++++++++++-------------------------
1 file changed, 34 insertions(+), 36 deletions(-)

--- a/drivers/mmc/core/host.c
+++ b/drivers/mmc/core/host.c
@@ -179,19 +179,17 @@ static void mmc_retune_timer(unsigned lo
*/
int mmc_of_parse(struct mmc_host *host)
{
- struct device_node *np;
+ struct device *dev = host->parent;
u32 bus_width;
int ret;
bool cd_cap_invert, cd_gpio_invert = false;
bool ro_cap_invert, ro_gpio_invert = false;

- if (!host->parent || !host->parent->of_node)
+ if (!dev || !dev_fwnode(dev))
return 0;

- np = host->parent->of_node;
-
/* "bus-width" is translated to MMC_CAP_*_BIT_DATA flags */
- if (of_property_read_u32(np, "bus-width", &bus_width) < 0) {
+ if (device_property_read_u32(dev, "bus-width", &bus_width) < 0) {
dev_dbg(host->parent,
"\"bus-width\" property is missing, assuming 1 bit.\n");
bus_width = 1;
@@ -213,7 +211,7 @@ int mmc_of_parse(struct mmc_host *host)
}

/* f_max is obtained from the optional "max-frequency" property */
- of_property_read_u32(np, "max-frequency", &host->f_max);
+ device_property_read_u32(dev, "max-frequency", &host->f_max);

/*
* Configure CD and WP pins. They are both by default active low to
@@ -228,12 +226,12 @@ int mmc_of_parse(struct mmc_host *host)
*/

/* Parse Card Detection */
- if (of_property_read_bool(np, "non-removable")) {
+ if (device_property_read_bool(dev, "non-removable")) {
host->caps |= MMC_CAP_NONREMOVABLE;
} else {
- cd_cap_invert = of_property_read_bool(np, "cd-inverted");
+ cd_cap_invert = device_property_read_bool(dev, "cd-inverted");

- if (of_property_read_bool(np, "broken-cd"))
+ if (device_property_read_bool(dev, "broken-cd"))
host->caps |= MMC_CAP_NEEDS_POLL;

ret = mmc_gpiod_request_cd(host, "cd", 0, true,
@@ -259,7 +257,7 @@ int mmc_of_parse(struct mmc_host *host)
}

/* Parse Write Protection */
- ro_cap_invert = of_property_read_bool(np, "wp-inverted");
+ ro_cap_invert = device_property_read_bool(dev, "wp-inverted");

ret = mmc_gpiod_request_ro(host, "wp", 0, false, 0, &ro_gpio_invert);
if (!ret)
@@ -267,62 +265,62 @@ int mmc_of_parse(struct mmc_host *host)
else if (ret != -ENOENT && ret != -ENOSYS)
return ret;

- if (of_property_read_bool(np, "disable-wp"))
+ if (device_property_read_bool(dev, "disable-wp"))
host->caps2 |= MMC_CAP2_NO_WRITE_PROTECT;

/* See the comment on CD inversion above */
if (ro_cap_invert ^ ro_gpio_invert)
host->caps2 |= MMC_CAP2_RO_ACTIVE_HIGH;

- if (of_property_read_bool(np, "cap-sd-highspeed"))
+ if (device_property_read_bool(dev, "cap-sd-highspeed"))
host->caps |= MMC_CAP_SD_HIGHSPEED;
- if (of_property_read_bool(np, "cap-mmc-highspeed"))
+ if (device_property_read_bool(dev, "cap-mmc-highspeed"))
host->caps |= MMC_CAP_MMC_HIGHSPEED;
- if (of_property_read_bool(np, "sd-uhs-sdr12"))
+ if (device_property_read_bool(dev, "sd-uhs-sdr12"))
host->caps |= MMC_CAP_UHS_SDR12;
- if (of_property_read_bool(np, "sd-uhs-sdr25"))
+ if (device_property_read_bool(dev, "sd-uhs-sdr25"))
host->caps |= MMC_CAP_UHS_SDR25;
- if (of_property_read_bool(np, "sd-uhs-sdr50"))
+ if (device_property_read_bool(dev, "sd-uhs-sdr50"))
host->caps |= MMC_CAP_UHS_SDR50;
- if (of_property_read_bool(np, "sd-uhs-sdr104"))
+ if (device_property_read_bool(dev, "sd-uhs-sdr104"))
host->caps |= MMC_CAP_UHS_SDR104;
- if (of_property_read_bool(np, "sd-uhs-ddr50"))
+ if (device_property_read_bool(dev, "sd-uhs-ddr50"))
host->caps |= MMC_CAP_UHS_DDR50;
- if (of_property_read_bool(np, "cap-power-off-card"))
+ if (device_property_read_bool(dev, "cap-power-off-card"))
host->caps |= MMC_CAP_POWER_OFF_CARD;
- if (of_property_read_bool(np, "cap-mmc-hw-reset"))
+ if (device_property_read_bool(dev, "cap-mmc-hw-reset"))
host->caps |= MMC_CAP_HW_RESET;
- if (of_property_read_bool(np, "cap-sdio-irq"))
+ if (device_property_read_bool(dev, "cap-sdio-irq"))
host->caps |= MMC_CAP_SDIO_IRQ;
- if (of_property_read_bool(np, "full-pwr-cycle"))
+ if (device_property_read_bool(dev, "full-pwr-cycle"))
host->caps2 |= MMC_CAP2_FULL_PWR_CYCLE;
- if (of_property_read_bool(np, "keep-power-in-suspend"))
+ if (device_property_read_bool(dev, "keep-power-in-suspend"))
host->pm_caps |= MMC_PM_KEEP_POWER;
- if (of_property_read_bool(np, "wakeup-source") ||
- of_property_read_bool(np, "enable-sdio-wakeup")) /* legacy */
+ if (device_property_read_bool(dev, "wakeup-source") ||
+ device_property_read_bool(dev, "enable-sdio-wakeup")) /* legacy */
host->pm_caps |= MMC_PM_WAKE_SDIO_IRQ;
- if (of_property_read_bool(np, "mmc-ddr-1_8v"))
+ if (device_property_read_bool(dev, "mmc-ddr-1_8v"))
host->caps |= MMC_CAP_1_8V_DDR;
- if (of_property_read_bool(np, "mmc-ddr-1_2v"))
+ if (device_property_read_bool(dev, "mmc-ddr-1_2v"))
host->caps |= MMC_CAP_1_2V_DDR;
- if (of_property_read_bool(np, "mmc-hs200-1_8v"))
+ if (device_property_read_bool(dev, "mmc-hs200-1_8v"))
host->caps2 |= MMC_CAP2_HS200_1_8V_SDR;
- if (of_property_read_bool(np, "mmc-hs200-1_2v"))
+ if (device_property_read_bool(dev, "mmc-hs200-1_2v"))
host->caps2 |= MMC_CAP2_HS200_1_2V_SDR;
- if (of_property_read_bool(np, "mmc-hs400-1_8v"))
+ if (device_property_read_bool(dev, "mmc-hs400-1_8v"))
host->caps2 |= MMC_CAP2_HS400_1_8V | MMC_CAP2_HS200_1_8V_SDR;
- if (of_property_read_bool(np, "mmc-hs400-1_2v"))
+ if (device_property_read_bool(dev, "mmc-hs400-1_2v"))
host->caps2 |= MMC_CAP2_HS400_1_2V | MMC_CAP2_HS200_1_2V_SDR;
- if (of_property_read_bool(np, "mmc-hs400-enhanced-strobe"))
+ if (device_property_read_bool(dev, "mmc-hs400-enhanced-strobe"))
host->caps2 |= MMC_CAP2_HS400_ES;
- if (of_property_read_bool(np, "no-sdio"))
+ if (device_property_read_bool(dev, "no-sdio"))
host->caps2 |= MMC_CAP2_NO_SDIO;
- if (of_property_read_bool(np, "no-sd"))
+ if (device_property_read_bool(dev, "no-sd"))
host->caps2 |= MMC_CAP2_NO_SD;
- if (of_property_read_bool(np, "no-mmc"))
+ if (device_property_read_bool(dev, "no-mmc"))
host->caps2 |= MMC_CAP2_NO_MMC;

- host->dsr_req = !of_property_read_u32(np, "dsr", &host->dsr);
+ host->dsr_req = !device_property_read_u32(dev, "dsr", &host->dsr);
if (host->dsr_req && (host->dsr & ~0xffff)) {
dev_err(host->parent,
"device tree specified broken value for DSR: 0x%x, ignoring\n",


2017-08-09 18:34:25

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.9 33/93] f2fs: sanity check checkpoint segno and blkoff

4.9-stable review patch. If anyone has any objections, please let me know.

------------------

From: Jin Qian <[email protected]>

commit 15d3042a937c13f5d9244241c7a9c8416ff6e82a upstream.

Make sure segno and blkoff read from raw image are valid.

Cc: [email protected]
Signed-off-by: Jin Qian <[email protected]>
[Jaegeuk Kim: adjust minor coding style]
Signed-off-by: Jaegeuk Kim <[email protected]>
[AmitP: Found in Android Security bulletin for Aug'17, fixes CVE-2017-10663]
Signed-off-by: Amit Pundir <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
fs/f2fs/super.c | 16 ++++++++++++++++
1 file changed, 16 insertions(+)

--- a/fs/f2fs/super.c
+++ b/fs/f2fs/super.c
@@ -1424,6 +1424,8 @@ int sanity_check_ckpt(struct f2fs_sb_inf
unsigned int total, fsmeta;
struct f2fs_super_block *raw_super = F2FS_RAW_SUPER(sbi);
struct f2fs_checkpoint *ckpt = F2FS_CKPT(sbi);
+ unsigned int main_segs, blocks_per_seg;
+ int i;

total = le32_to_cpu(raw_super->segment_count);
fsmeta = le32_to_cpu(raw_super->segment_count_ckpt);
@@ -1435,6 +1437,20 @@ int sanity_check_ckpt(struct f2fs_sb_inf
if (unlikely(fsmeta >= total))
return 1;

+ main_segs = le32_to_cpu(raw_super->segment_count_main);
+ blocks_per_seg = sbi->blocks_per_seg;
+
+ for (i = 0; i < NR_CURSEG_NODE_TYPE; i++) {
+ if (le32_to_cpu(ckpt->cur_node_segno[i]) >= main_segs ||
+ le16_to_cpu(ckpt->cur_node_blkoff[i]) >= blocks_per_seg)
+ return 1;
+ }
+ for (i = 0; i < NR_CURSEG_DATA_TYPE; i++) {
+ if (le32_to_cpu(ckpt->cur_data_segno[i]) >= main_segs ||
+ le16_to_cpu(ckpt->cur_data_blkoff[i]) >= blocks_per_seg)
+ return 1;
+ }
+
if (unlikely(f2fs_cp_error(sbi))) {
f2fs_msg(sbi->sb, KERN_ERR, "A bug case: need to run fsck");
return 1;


2017-08-09 18:34:41

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.9 29/93] iscsi-target: Fix initial login PDU asynchronous socket close OOPs

4.9-stable review patch. If anyone has any objections, please let me know.

------------------

From: Nicholas Bellinger <[email protected]>

commit 25cdda95fda78d22d44157da15aa7ea34be3c804 upstream.

This patch fixes a OOPs originally introduced by:

commit bb048357dad6d604520c91586334c9c230366a14
Author: Nicholas Bellinger <[email protected]>
Date: Thu Sep 5 14:54:04 2013 -0700

iscsi-target: Add sk->sk_state_change to cleanup after TCP failure

which would trigger a NULL pointer dereference when a TCP connection
was closed asynchronously via iscsi_target_sk_state_change(), but only
when the initial PDU processing in iscsi_target_do_login() from iscsi_np
process context was blocked waiting for backend I/O to complete.

To address this issue, this patch makes the following changes.

First, it introduces some common helper functions used for checking
socket closing state, checking login_flags, and atomically checking
socket closing state + setting login_flags.

Second, it introduces a LOGIN_FLAGS_INITIAL_PDU bit to know when a TCP
connection has dropped via iscsi_target_sk_state_change(), but the
initial PDU processing within iscsi_target_do_login() in iscsi_np
context is still running. For this case, it sets LOGIN_FLAGS_CLOSED,
but doesn't invoke schedule_delayed_work().

The original NULL pointer dereference case reported by MNC is now handled
by iscsi_target_do_login() doing a iscsi_target_sk_check_close() before
transitioning to FFP to determine when the socket has already closed,
or iscsi_target_start_negotiation() if the login needs to exchange
more PDUs (eg: iscsi_target_do_login returned 0) but the socket has
closed. For both of these cases, the cleanup up of remaining connection
resources will occur in iscsi_target_start_negotiation() from iscsi_np
process context once the failure is detected.

Finally, to handle to case where iscsi_target_sk_state_change() is
called after the initial PDU procesing is complete, it now invokes
conn->login_work -> iscsi_target_do_login_rx() to perform cleanup once
existing iscsi_target_sk_check_close() checks detect connection failure.
For this case, the cleanup of remaining connection resources will occur
in iscsi_target_do_login_rx() from delayed workqueue process context
once the failure is detected.

Reported-by: Mike Christie <[email protected]>
Reviewed-by: Mike Christie <[email protected]>
Tested-by: Mike Christie <[email protected]>
Cc: Mike Christie <[email protected]>
Reported-by: Hannes Reinecke <[email protected]>
Cc: Hannes Reinecke <[email protected]>
Cc: Sagi Grimberg <[email protected]>
Cc: Varun Prakash <[email protected]>
Signed-off-by: Nicholas Bellinger <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
drivers/target/iscsi/iscsi_target_nego.c | 204 ++++++++++++++++++++-----------
include/target/iscsi/iscsi_target_core.h | 1
2 files changed, 138 insertions(+), 67 deletions(-)

--- a/drivers/target/iscsi/iscsi_target_nego.c
+++ b/drivers/target/iscsi/iscsi_target_nego.c
@@ -490,14 +490,60 @@ static void iscsi_target_restore_sock_ca

static int iscsi_target_do_login(struct iscsi_conn *, struct iscsi_login *);

-static bool iscsi_target_sk_state_check(struct sock *sk)
+static bool __iscsi_target_sk_check_close(struct sock *sk)
{
if (sk->sk_state == TCP_CLOSE_WAIT || sk->sk_state == TCP_CLOSE) {
- pr_debug("iscsi_target_sk_state_check: TCP_CLOSE_WAIT|TCP_CLOSE,"
+ pr_debug("__iscsi_target_sk_check_close: TCP_CLOSE_WAIT|TCP_CLOSE,"
"returning FALSE\n");
- return false;
+ return true;
}
- return true;
+ return false;
+}
+
+static bool iscsi_target_sk_check_close(struct iscsi_conn *conn)
+{
+ bool state = false;
+
+ if (conn->sock) {
+ struct sock *sk = conn->sock->sk;
+
+ read_lock_bh(&sk->sk_callback_lock);
+ state = (__iscsi_target_sk_check_close(sk) ||
+ test_bit(LOGIN_FLAGS_CLOSED, &conn->login_flags));
+ read_unlock_bh(&sk->sk_callback_lock);
+ }
+ return state;
+}
+
+static bool iscsi_target_sk_check_flag(struct iscsi_conn *conn, unsigned int flag)
+{
+ bool state = false;
+
+ if (conn->sock) {
+ struct sock *sk = conn->sock->sk;
+
+ read_lock_bh(&sk->sk_callback_lock);
+ state = test_bit(flag, &conn->login_flags);
+ read_unlock_bh(&sk->sk_callback_lock);
+ }
+ return state;
+}
+
+static bool iscsi_target_sk_check_and_clear(struct iscsi_conn *conn, unsigned int flag)
+{
+ bool state = false;
+
+ if (conn->sock) {
+ struct sock *sk = conn->sock->sk;
+
+ write_lock_bh(&sk->sk_callback_lock);
+ state = (__iscsi_target_sk_check_close(sk) ||
+ test_bit(LOGIN_FLAGS_CLOSED, &conn->login_flags));
+ if (!state)
+ clear_bit(flag, &conn->login_flags);
+ write_unlock_bh(&sk->sk_callback_lock);
+ }
+ return state;
}

static void iscsi_target_login_drop(struct iscsi_conn *conn, struct iscsi_login *login)
@@ -537,6 +583,20 @@ static void iscsi_target_do_login_rx(str

pr_debug("entering iscsi_target_do_login_rx, conn: %p, %s:%d\n",
conn, current->comm, current->pid);
+ /*
+ * If iscsi_target_do_login_rx() has been invoked by ->sk_data_ready()
+ * before initial PDU processing in iscsi_target_start_negotiation()
+ * has completed, go ahead and retry until it's cleared.
+ *
+ * Otherwise if the TCP connection drops while this is occuring,
+ * iscsi_target_start_negotiation() will detect the failure, call
+ * cancel_delayed_work_sync(&conn->login_work), and cleanup the
+ * remaining iscsi connection resources from iscsi_np process context.
+ */
+ if (iscsi_target_sk_check_flag(conn, LOGIN_FLAGS_INITIAL_PDU)) {
+ schedule_delayed_work(&conn->login_work, msecs_to_jiffies(10));
+ return;
+ }

spin_lock(&tpg->tpg_state_lock);
state = (tpg->tpg_state == TPG_STATE_ACTIVE);
@@ -544,26 +604,12 @@ static void iscsi_target_do_login_rx(str

if (!state) {
pr_debug("iscsi_target_do_login_rx: tpg_state != TPG_STATE_ACTIVE\n");
- iscsi_target_restore_sock_callbacks(conn);
- iscsi_target_login_drop(conn, login);
- iscsit_deaccess_np(np, tpg, tpg_np);
- return;
+ goto err;
}

- if (conn->sock) {
- struct sock *sk = conn->sock->sk;
-
- read_lock_bh(&sk->sk_callback_lock);
- state = iscsi_target_sk_state_check(sk);
- read_unlock_bh(&sk->sk_callback_lock);
-
- if (!state) {
- pr_debug("iscsi_target_do_login_rx, TCP state CLOSE\n");
- iscsi_target_restore_sock_callbacks(conn);
- iscsi_target_login_drop(conn, login);
- iscsit_deaccess_np(np, tpg, tpg_np);
- return;
- }
+ if (iscsi_target_sk_check_close(conn)) {
+ pr_debug("iscsi_target_do_login_rx, TCP state CLOSE\n");
+ goto err;
}

conn->login_kworker = current;
@@ -581,34 +627,29 @@ static void iscsi_target_do_login_rx(str
flush_signals(current);
conn->login_kworker = NULL;

- if (rc < 0) {
- iscsi_target_restore_sock_callbacks(conn);
- iscsi_target_login_drop(conn, login);
- iscsit_deaccess_np(np, tpg, tpg_np);
- return;
- }
+ if (rc < 0)
+ goto err;

pr_debug("iscsi_target_do_login_rx after rx_login_io, %p, %s:%d\n",
conn, current->comm, current->pid);

rc = iscsi_target_do_login(conn, login);
if (rc < 0) {
- iscsi_target_restore_sock_callbacks(conn);
- iscsi_target_login_drop(conn, login);
- iscsit_deaccess_np(np, tpg, tpg_np);
+ goto err;
} else if (!rc) {
- if (conn->sock) {
- struct sock *sk = conn->sock->sk;
-
- write_lock_bh(&sk->sk_callback_lock);
- clear_bit(LOGIN_FLAGS_READ_ACTIVE, &conn->login_flags);
- write_unlock_bh(&sk->sk_callback_lock);
- }
+ if (iscsi_target_sk_check_and_clear(conn, LOGIN_FLAGS_READ_ACTIVE))
+ goto err;
} else if (rc == 1) {
iscsi_target_nego_release(conn);
iscsi_post_login_handler(np, conn, zero_tsih);
iscsit_deaccess_np(np, tpg, tpg_np);
}
+ return;
+
+err:
+ iscsi_target_restore_sock_callbacks(conn);
+ iscsi_target_login_drop(conn, login);
+ iscsit_deaccess_np(np, tpg, tpg_np);
}

static void iscsi_target_do_cleanup(struct work_struct *work)
@@ -656,31 +697,54 @@ static void iscsi_target_sk_state_change
orig_state_change(sk);
return;
}
+ state = __iscsi_target_sk_check_close(sk);
+ pr_debug("__iscsi_target_sk_close_change: state: %d\n", state);
+
if (test_bit(LOGIN_FLAGS_READ_ACTIVE, &conn->login_flags)) {
pr_debug("Got LOGIN_FLAGS_READ_ACTIVE=1 sk_state_change"
" conn: %p\n", conn);
+ if (state)
+ set_bit(LOGIN_FLAGS_CLOSED, &conn->login_flags);
write_unlock_bh(&sk->sk_callback_lock);
orig_state_change(sk);
return;
}
- if (test_and_set_bit(LOGIN_FLAGS_CLOSED, &conn->login_flags)) {
+ if (test_bit(LOGIN_FLAGS_CLOSED, &conn->login_flags)) {
pr_debug("Got LOGIN_FLAGS_CLOSED=1 sk_state_change conn: %p\n",
conn);
write_unlock_bh(&sk->sk_callback_lock);
orig_state_change(sk);
return;
}
+ /*
+ * If the TCP connection has dropped, go ahead and set LOGIN_FLAGS_CLOSED,
+ * but only queue conn->login_work -> iscsi_target_do_login_rx()
+ * processing if LOGIN_FLAGS_INITIAL_PDU has already been cleared.
+ *
+ * When iscsi_target_do_login_rx() runs, iscsi_target_sk_check_close()
+ * will detect the dropped TCP connection from delayed workqueue context.
+ *
+ * If LOGIN_FLAGS_INITIAL_PDU is still set, which means the initial
+ * iscsi_target_start_negotiation() is running, iscsi_target_do_login()
+ * via iscsi_target_sk_check_close() or iscsi_target_start_negotiation()
+ * via iscsi_target_sk_check_and_clear() is responsible for detecting the
+ * dropped TCP connection in iscsi_np process context, and cleaning up
+ * the remaining iscsi connection resources.
+ */
+ if (state) {
+ pr_debug("iscsi_target_sk_state_change got failed state\n");
+ set_bit(LOGIN_FLAGS_CLOSED, &conn->login_flags);
+ state = test_bit(LOGIN_FLAGS_INITIAL_PDU, &conn->login_flags);
+ write_unlock_bh(&sk->sk_callback_lock);

- state = iscsi_target_sk_state_check(sk);
- write_unlock_bh(&sk->sk_callback_lock);
-
- pr_debug("iscsi_target_sk_state_change: state: %d\n", state);
+ orig_state_change(sk);

- if (!state) {
- pr_debug("iscsi_target_sk_state_change got failed state\n");
- schedule_delayed_work(&conn->login_cleanup_work, 0);
+ if (!state)
+ schedule_delayed_work(&conn->login_work, 0);
return;
}
+ write_unlock_bh(&sk->sk_callback_lock);
+
orig_state_change(sk);
}

@@ -945,6 +1009,15 @@ static int iscsi_target_do_login(struct
if (iscsi_target_handle_csg_one(conn, login) < 0)
return -1;
if (login_rsp->flags & ISCSI_FLAG_LOGIN_TRANSIT) {
+ /*
+ * Check to make sure the TCP connection has not
+ * dropped asynchronously while session reinstatement
+ * was occuring in this kthread context, before
+ * transitioning to full feature phase operation.
+ */
+ if (iscsi_target_sk_check_close(conn))
+ return -1;
+
login->tsih = conn->sess->tsih;
login->login_complete = 1;
iscsi_target_restore_sock_callbacks(conn);
@@ -971,21 +1044,6 @@ static int iscsi_target_do_login(struct
break;
}

- if (conn->sock) {
- struct sock *sk = conn->sock->sk;
- bool state;
-
- read_lock_bh(&sk->sk_callback_lock);
- state = iscsi_target_sk_state_check(sk);
- read_unlock_bh(&sk->sk_callback_lock);
-
- if (!state) {
- pr_debug("iscsi_target_do_login() failed state for"
- " conn: %p\n", conn);
- return -1;
- }
- }
-
return 0;
}

@@ -1252,13 +1310,25 @@ int iscsi_target_start_negotiation(
if (conn->sock) {
struct sock *sk = conn->sock->sk;

- write_lock_bh(&sk->sk_callback_lock);
- set_bit(LOGIN_FLAGS_READY, &conn->login_flags);
- write_unlock_bh(&sk->sk_callback_lock);
- }
+ write_lock_bh(&sk->sk_callback_lock);
+ set_bit(LOGIN_FLAGS_READY, &conn->login_flags);
+ set_bit(LOGIN_FLAGS_INITIAL_PDU, &conn->login_flags);
+ write_unlock_bh(&sk->sk_callback_lock);
+ }
+ /*
+ * If iscsi_target_do_login returns zero to signal more PDU
+ * exchanges are required to complete the login, go ahead and
+ * clear LOGIN_FLAGS_INITIAL_PDU but only if the TCP connection
+ * is still active.
+ *
+ * Otherwise if TCP connection dropped asynchronously, go ahead
+ * and perform connection cleanup now.
+ */
+ ret = iscsi_target_do_login(conn, login);
+ if (!ret && iscsi_target_sk_check_and_clear(conn, LOGIN_FLAGS_INITIAL_PDU))
+ ret = -1;

- ret = iscsi_target_do_login(conn, login);
- if (ret < 0) {
+ if (ret < 0) {
cancel_delayed_work_sync(&conn->login_work);
cancel_delayed_work_sync(&conn->login_cleanup_work);
iscsi_target_restore_sock_callbacks(conn);
--- a/include/target/iscsi/iscsi_target_core.h
+++ b/include/target/iscsi/iscsi_target_core.h
@@ -563,6 +563,7 @@ struct iscsi_conn {
#define LOGIN_FLAGS_READ_ACTIVE 1
#define LOGIN_FLAGS_CLOSED 2
#define LOGIN_FLAGS_READY 4
+#define LOGIN_FLAGS_INITIAL_PDU 8
unsigned long login_flags;
struct delayed_work login_work;
struct delayed_work login_cleanup_work;


2017-08-09 18:14:53

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.9 23/93] gpiolib: skip unwanted events, dont convert them to opposite edge

4.9-stable review patch. If anyone has any objections, please let me know.

------------------

From: Bartosz Golaszewski <[email protected]>

commit df1e76f28ffe87d1b065eecab2d0fbb89e6bdee5 upstream.

The previous fix for filtering out of unwatched events was not entirely
correct. Instead of skipping the events we don't want, they are now
interpreted as events with opposing edge.

In order to fix it: always read the GPIO line value on interrupt and
only emit the event if it corresponds with the event type we requested.

Fixes: ad537b822577 ("gpiolib: fix filtering out unwanted events")
Signed-off-by: Bartosz Golaszewski <[email protected]>
Signed-off-by: Linus Walleij <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
drivers/gpio/gpiolib.c | 9 ++++-----
1 file changed, 4 insertions(+), 5 deletions(-)

--- a/drivers/gpio/gpiolib.c
+++ b/drivers/gpio/gpiolib.c
@@ -703,24 +703,23 @@ static irqreturn_t lineevent_irq_thread(
{
struct lineevent_state *le = p;
struct gpioevent_data ge;
- int ret;
+ int ret, level;

ge.timestamp = ktime_get_real_ns();
+ level = gpiod_get_value_cansleep(le->desc);

if (le->eflags & GPIOEVENT_REQUEST_RISING_EDGE
&& le->eflags & GPIOEVENT_REQUEST_FALLING_EDGE) {
- int level = gpiod_get_value_cansleep(le->desc);
-
if (level)
/* Emit low-to-high event */
ge.id = GPIOEVENT_EVENT_RISING_EDGE;
else
/* Emit high-to-low event */
ge.id = GPIOEVENT_EVENT_FALLING_EDGE;
- } else if (le->eflags & GPIOEVENT_REQUEST_RISING_EDGE) {
+ } else if (le->eflags & GPIOEVENT_REQUEST_RISING_EDGE && level) {
/* Emit low-to-high event */
ge.id = GPIOEVENT_EVENT_RISING_EDGE;
- } else if (le->eflags & GPIOEVENT_REQUEST_FALLING_EDGE) {
+ } else if (le->eflags & GPIOEVENT_REQUEST_FALLING_EDGE && !level) {
/* Emit high-to-low event */
ge.id = GPIOEVENT_EVENT_FALLING_EDGE;
} else {


2017-08-09 18:35:15

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.9 27/93] ARM: dts: tango4: Request RGMII RX and TX clock delays

4.9-stable review patch. If anyone has any objections, please let me know.

------------------

From: Marc Gonzalez <[email protected]>

commit 985333b0eef8603b02181c4ec0a722b82be9642d upstream.

RX and TX clock delays are required. Request them explicitly.

Fixes: cad008b8a77e6 ("ARM: dts: tango4: Initial device trees")
Signed-off-by: Marc Gonzalez <[email protected]>
Signed-off-by: Arnd Bergmann <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
arch/arm/boot/dts/tango4-vantage-1172.dts | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

--- a/arch/arm/boot/dts/tango4-vantage-1172.dts
+++ b/arch/arm/boot/dts/tango4-vantage-1172.dts
@@ -21,7 +21,7 @@
};

&eth0 {
- phy-connection-type = "rgmii";
+ phy-connection-type = "rgmii-id";
phy-handle = <&eth0_phy>;
#address-cells = <1>;
#size-cells = <0>;


2017-08-09 18:14:51

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.9 24/93] ext4: fix SEEK_HOLE/SEEK_DATA for blocksize < pagesize

4.9-stable review patch. If anyone has any objections, please let me know.

------------------

From: Jan Kara <[email protected]>

commit fcf5ea10992fbac3c7473a1db33d56a139333cd1 upstream.

ext4_find_unwritten_pgoff() does not properly handle a situation when
starting index is in the middle of a page and blocksize < pagesize. The
following command shows the bug on filesystem with 1k blocksize:

xfs_io -f -c "falloc 0 4k" \
-c "pwrite 1k 1k" \
-c "pwrite 3k 1k" \
-c "seek -a -r 0" foo

In this example, neither lseek(fd, 1024, SEEK_HOLE) nor lseek(fd, 2048,
SEEK_DATA) will return the correct result.

Fix the problem by neglecting buffers in a page before starting offset.

Reported-by: Andreas Gruenbacher <[email protected]>
Signed-off-by: Theodore Ts'o <[email protected]>
Signed-off-by: Jan Kara <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
fs/ext4/file.c | 3 +++
1 file changed, 3 insertions(+)

--- a/fs/ext4/file.c
+++ b/fs/ext4/file.c
@@ -469,6 +469,8 @@ static int ext4_find_unwritten_pgoff(str
lastoff = page_offset(page);
bh = head = page_buffers(page);
do {
+ if (lastoff + bh->b_size <= startoff)
+ goto next;
if (buffer_uptodate(bh) ||
buffer_unwritten(bh)) {
if (whence == SEEK_DATA)
@@ -483,6 +485,7 @@ static int ext4_find_unwritten_pgoff(str
unlock_page(page);
goto out;
}
+next:
lastoff += bh->b_size;
bh = bh->b_this_page;
} while (bh != head);


2017-08-09 18:35:36

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.9 26/93] ARM: dts: armada-38x: Fix irq type for pca955

4.9-stable review patch. If anyone has any objections, please let me know.

------------------

From: Gregory CLEMENT <[email protected]>

commit 8d4514173211586c6238629b1ef1e071927735f5 upstream.

As written in the datasheet the PCA955 can only handle low level irq and
not edge irq.

Without this fix the interrupt is not usable for pca955: the gpio-pca953x
driver already set the irq type as low level which is incompatible with
edge type, then the kernel prevents using the interrupt:

"irq: type mismatch, failed to map hwirq-18 for
/soc/internal-regs/gpio@18100!"

Fixes: 928413bd859c ("ARM: mvebu: Add Armada 388 General Purpose
Development Board support")
Signed-off-by: Gregory CLEMENT <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
arch/arm/boot/dts/armada-388-gp.dts | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)

--- a/arch/arm/boot/dts/armada-388-gp.dts
+++ b/arch/arm/boot/dts/armada-388-gp.dts
@@ -75,7 +75,7 @@
pinctrl-names = "default";
pinctrl-0 = <&pca0_pins>;
interrupt-parent = <&gpio0>;
- interrupts = <18 IRQ_TYPE_EDGE_FALLING>;
+ interrupts = <18 IRQ_TYPE_LEVEL_LOW>;
gpio-controller;
#gpio-cells = <2>;
interrupt-controller;
@@ -87,7 +87,7 @@
compatible = "nxp,pca9555";
pinctrl-names = "default";
interrupt-parent = <&gpio0>;
- interrupts = <18 IRQ_TYPE_EDGE_FALLING>;
+ interrupts = <18 IRQ_TYPE_LEVEL_LOW>;
gpio-controller;
#gpio-cells = <2>;
interrupt-controller;


2017-08-09 18:36:02

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.9 08/93] NFSv4: Fix EXCHANGE_ID corrupt verifier issue

4.9-stable review patch. If anyone has any objections, please let me know.

------------------

From: Trond Myklebust <[email protected]>

commit fd40559c8657418385e42f797e0b04bfc0add748 upstream.

The verifier is allocated on the stack, but the EXCHANGE_ID RPC call was
changed to be asynchronous by commit 8d89bd70bc939. If we interrrupt
the call to rpc_wait_for_completion_task(), we can therefore end up
transmitting random stack contents in lieu of the verifier.

Fixes: 8d89bd70bc939 ("NFS setup async exchange_id")
Signed-off-by: Trond Myklebust <[email protected]>
Signed-off-by: Anna Schumaker <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
fs/nfs/nfs4proc.c | 11 ++++-------
fs/nfs/nfs4xdr.c | 2 +-
include/linux/nfs_xdr.h | 2 +-
3 files changed, 6 insertions(+), 9 deletions(-)

--- a/fs/nfs/nfs4proc.c
+++ b/fs/nfs/nfs4proc.c
@@ -7410,7 +7410,7 @@ static void nfs4_exchange_id_done(struct
cdata->res.server_scope = NULL;
}
/* Save the EXCHANGE_ID verifier session trunk tests */
- memcpy(clp->cl_confirm.data, cdata->args.verifier->data,
+ memcpy(clp->cl_confirm.data, cdata->args.verifier.data,
sizeof(clp->cl_confirm.data));
}
out:
@@ -7447,7 +7447,6 @@ static const struct rpc_call_ops nfs4_ex
static int _nfs4_proc_exchange_id(struct nfs_client *clp, struct rpc_cred *cred,
u32 sp4_how, struct rpc_xprt *xprt)
{
- nfs4_verifier verifier;
struct rpc_message msg = {
.rpc_proc = &nfs4_procedures[NFSPROC4_CLNT_EXCHANGE_ID],
.rpc_cred = cred,
@@ -7470,8 +7469,7 @@ static int _nfs4_proc_exchange_id(struct
if (!calldata)
goto out;

- if (!xprt)
- nfs4_init_boot_verifier(clp, &verifier);
+ nfs4_init_boot_verifier(clp, &calldata->args.verifier);

status = nfs4_init_uniform_client_string(clp);
if (status)
@@ -7516,9 +7514,8 @@ static int _nfs4_proc_exchange_id(struct
task_setup_data.rpc_xprt = xprt;
task_setup_data.flags =
RPC_TASK_SOFT|RPC_TASK_SOFTCONN|RPC_TASK_ASYNC;
- calldata->args.verifier = &clp->cl_confirm;
- } else {
- calldata->args.verifier = &verifier;
+ memcpy(calldata->args.verifier.data, clp->cl_confirm.data,
+ sizeof(calldata->args.verifier.data));
}
calldata->args.client = clp;
#ifdef CONFIG_NFS_V4_1_MIGRATION
--- a/fs/nfs/nfs4xdr.c
+++ b/fs/nfs/nfs4xdr.c
@@ -1761,7 +1761,7 @@ static void encode_exchange_id(struct xd
int len = 0;

encode_op_hdr(xdr, OP_EXCHANGE_ID, decode_exchange_id_maxsz, hdr);
- encode_nfs4_verifier(xdr, args->verifier);
+ encode_nfs4_verifier(xdr, &args->verifier);

encode_string(xdr, strlen(args->client->cl_owner_id),
args->client->cl_owner_id);
--- a/include/linux/nfs_xdr.h
+++ b/include/linux/nfs_xdr.h
@@ -1199,7 +1199,7 @@ struct nfs41_state_protection {

struct nfs41_exchange_id_args {
struct nfs_client *client;
- nfs4_verifier *verifier;
+ nfs4_verifier verifier;
u32 flags;
struct nfs41_state_protection state_protect;
};


2017-08-09 18:36:03

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.9 22/93] iommu/amd: Enable ga_log_intr when enabling guest_mode

4.9-stable review patch. If anyone has any objections, please let me know.

------------------

From: Suravee Suthikulpanit <[email protected]>

commit efe6f241602cb61466895f6816b8ea6b90f04d4e upstream.

IRTE[GALogIntr] bit should set when enabling guest_mode, which enables
IOMMU to generate entry in GALog when IRTE[IsRun] is not set, and send
an interrupt to notify IOMMU driver.

Signed-off-by: Suravee Suthikulpanit <[email protected]>
Cc: Joerg Roedel <[email protected]>
Fixes: d98de49a53e48 ('iommu/amd: Enable vAPIC interrupt remapping mode by default')
Signed-off-by: Joerg Roedel <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
drivers/iommu/amd_iommu.c | 1 +
1 file changed, 1 insertion(+)

--- a/drivers/iommu/amd_iommu.c
+++ b/drivers/iommu/amd_iommu.c
@@ -4294,6 +4294,7 @@ static int amd_ir_set_vcpu_affinity(stru
/* Setting */
irte->hi.fields.ga_root_ptr = (pi_data->base >> 12);
irte->hi.fields.vector = vcpu_pi_info->vector;
+ irte->lo.fields_vapic.ga_log_intr = 1;
irte->lo.fields_vapic.guest_mode = 1;
irte->lo.fields_vapic.ga_tag = pi_data->ga_tag;



2017-08-09 18:36:05

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.9 21/93] powerpc/64: Fix __check_irq_replay missing decrementer interrupt

4.9-stable review patch. If anyone has any objections, please let me know.

------------------

From: Nicholas Piggin <[email protected]>

commit 3db40c312c2c1eb2187c5731102fa8ff380e6e40 upstream.

If the decrementer wraps again and de-asserts the decrementer
exception while hard-disabled, __check_irq_replay() has a test to
notice the wrap when interrupts are re-enabled.

The decrementer check must be done when clearing the PACA_IRQ_HARD_DIS
flag, not when the PACA_IRQ_DEC flag is tested. Previously this worked
because the decrementer interrupt was always the first one checked
after clearing the hard disable flag, but HMI check was moved ahead of
that, which introduced this bug.

This can cause a missed decrementer interrupt if we soft-disable
interrupts then take an HMI which is recorded in irq_happened, then
hard-disable interrupts for > 4s to wrap the decrementer.

Fixes: e0e0d6b7390b ("powerpc/64: Replay hypervisor maintenance interrupt first")
Signed-off-by: Nicholas Piggin <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
arch/powerpc/kernel/irq.c | 15 ++++++++++++++-
1 file changed, 14 insertions(+), 1 deletion(-)

--- a/arch/powerpc/kernel/irq.c
+++ b/arch/powerpc/kernel/irq.c
@@ -146,6 +146,19 @@ notrace unsigned int __check_irq_replay(

/* Clear bit 0 which we wouldn't clear otherwise */
local_paca->irq_happened &= ~PACA_IRQ_HARD_DIS;
+ if (happened & PACA_IRQ_HARD_DIS) {
+ /*
+ * We may have missed a decrementer interrupt if hard disabled.
+ * Check the decrementer register in case we had a rollover
+ * while hard disabled.
+ */
+ if (!(happened & PACA_IRQ_DEC)) {
+ if (decrementer_check_overflow()) {
+ local_paca->irq_happened |= PACA_IRQ_DEC;
+ happened |= PACA_IRQ_DEC;
+ }
+ }
+ }

/*
* Force the delivery of pending soft-disabled interrupts on PS3.
@@ -171,7 +184,7 @@ notrace unsigned int __check_irq_replay(
* in case we also had a rollover while hard disabled
*/
local_paca->irq_happened &= ~PACA_IRQ_DEC;
- if ((happened & PACA_IRQ_DEC) || decrementer_check_overflow())
+ if (happened & PACA_IRQ_DEC)
return 0x900;

/* Finally check if an external interrupt happened */


2017-08-09 18:37:01

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.9 20/93] powerpc/tm: Fix saving of TM SPRs in core dump

4.9-stable review patch. If anyone has any objections, please let me know.

------------------

From: Gustavo Romero <[email protected]>

commit cd63f3cf1d59b7ad8419eba1cac8f9126e79cc43 upstream.

Currently flush_tmregs_to_thread() does not save the TM SPRs (TFHAR,
TFIAR, TEXASR) to the thread struct, unless the process is currently
inside a suspended transaction.

If the process is core dumping, and the TM SPRs have changed since the
last time the process was context switched, then we will save stale
values of the TM SPRs to the core dump.

Fix it by saving the live register state to the thread struct in that
case.

Fixes: 08e1c01d6aed ("powerpc/ptrace: Enable support for TM SPR state")
Signed-off-by: Gustavo Romero <[email protected]>
Reviewed-by: Cyril Bur <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
arch/powerpc/kernel/ptrace.c | 13 ++++++++++---
1 file changed, 10 insertions(+), 3 deletions(-)

--- a/arch/powerpc/kernel/ptrace.c
+++ b/arch/powerpc/kernel/ptrace.c
@@ -127,12 +127,19 @@ static void flush_tmregs_to_thread(struc
* If task is not current, it will have been flushed already to
* it's thread_struct during __switch_to().
*
- * A reclaim flushes ALL the state.
+ * A reclaim flushes ALL the state or if not in TM save TM SPRs
+ * in the appropriate thread structures from live.
*/

- if (tsk == current && MSR_TM_SUSPENDED(mfmsr()))
- tm_reclaim_current(TM_CAUSE_SIGNAL);
+ if (tsk != current)
+ return;

+ if (MSR_TM_SUSPENDED(mfmsr())) {
+ tm_reclaim_current(TM_CAUSE_SIGNAL);
+ } else {
+ tm_enable();
+ tm_save_sprs(&(tsk->thread));
+ }
}
#else
static inline void flush_tmregs_to_thread(struct task_struct *tsk) { }


2017-08-09 18:37:18

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.9 19/93] timers: Fix overflow in get_next_timer_interrupt

4.9-stable review patch. If anyone has any objections, please let me know.

------------------

From: Matija Glavinic Pecotic <[email protected]>

commit 34f41c0316ed52b0b44542491d89278efdaa70e4 upstream.

For e.g. HZ=100, timer being 430 jiffies in the future, and 32 bit
unsigned int, there is an overflow on unsigned int right-hand side
of the expression which results with wrong values being returned.

Type cast the multiplier to 64bit to avoid that issue.

Fixes: 46c8f0b077a8 ("timers: Fix get_next_timer_interrupt() computation")
Signed-off-by: Matija Glavinic Pecotic <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Reviewed-by: Alexander Sverdlin <[email protected]>
Cc: [email protected]
Cc: [email protected]
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
kernel/time/timer.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

--- a/kernel/time/timer.c
+++ b/kernel/time/timer.c
@@ -1536,7 +1536,7 @@ u64 get_next_timer_interrupt(unsigned lo
base->is_idle = false;
} else {
if (!is_max_delta)
- expires = basem + (nextevt - basej) * TICK_NSEC;
+ expires = basem + (u64)(nextevt - basej) * TICK_NSEC;
/*
* If we expect to sleep more than a tick, mark the base idle:
*/


2017-08-09 18:37:38

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.9 18/93] mm/page_alloc: Remove kernel address exposure in free_reserved_area()

4.9-stable review patch. If anyone has any objections, please let me know.

------------------

From: Josh Poimboeuf <[email protected]>

commit adb1fe9ae2ee6ef6bc10f3d5a588020e7664dfa7 upstream.

Linus suggested we try to remove some of the low-hanging fruit related
to kernel address exposure in dmesg. The only leaks I see on my local
system are:

Freeing SMP alternatives memory: 32K (ffffffff9e309000 - ffffffff9e311000)
Freeing initrd memory: 10588K (ffffa0b736b42000 - ffffa0b737599000)
Freeing unused kernel memory: 3592K (ffffffff9df87000 - ffffffff9e309000)
Freeing unused kernel memory: 1352K (ffffa0b7288ae000 - ffffa0b728a00000)
Freeing unused kernel memory: 632K (ffffa0b728d62000 - ffffa0b728e00000)

Linus says:

"I suspect we should just remove [the addresses in the 'Freeing'
messages]. I'm sure they are useful in theory, but I suspect they
were more useful back when the whole "free init memory" was
originally done.

These days, if we have a use-after-free, I suspect the init-mem
situation is the easiest situation by far. Compared to all the dynamic
allocations which are much more likely to show it anyway. So having
debug output for that case is likely not all that productive."

With this patch the freeing messages now look like this:

Freeing SMP alternatives memory: 32K
Freeing initrd memory: 10588K
Freeing unused kernel memory: 3592K
Freeing unused kernel memory: 1352K
Freeing unused kernel memory: 632K

Suggested-by: Linus Torvalds <[email protected]>
Signed-off-by: Josh Poimboeuf <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Brian Gerst <[email protected]>
Cc: Denys Vlasenko <[email protected]>
Cc: H. Peter Anvin <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: [email protected]
Link: http://lkml.kernel.org/r/6836ff90c45b71d38e5d4405aec56fa9e5d1d4b2.1477405374.git.jpoimboe@redhat.com
Signed-off-by: Ingo Molnar <[email protected]>
Cc: Kees Cook <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
mm/page_alloc.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)

--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -6445,8 +6445,8 @@ unsigned long free_reserved_area(void *s
}

if (pages && s)
- pr_info("Freeing %s memory: %ldK (%p - %p)\n",
- s, pages << (PAGE_SHIFT - 10), start, end);
+ pr_info("Freeing %s memory: %ldK\n",
+ s, pages << (PAGE_SHIFT - 10));

return pages;
}


2017-08-09 18:14:32

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.9 01/93] parisc: Handle vmas whose context is not current in flush_cache_range

4.9-stable review patch. If anyone has any objections, please let me know.

------------------

From: John David Anglin <[email protected]>

commit 13d57093c141db2036364d6be35e394fc5b64728 upstream.

In testing James' patch to drivers/parisc/pdc_stable.c, I hit the BUG
statement in flush_cache_range() during a system shutdown:

kernel BUG at arch/parisc/kernel/cache.c:595!
CPU: 2 PID: 6532 Comm: kworker/2:0 Not tainted 4.13.0-rc2+ #1
Workqueue: events free_ioctx

IAOQ[0]: flush_cache_range+0x144/0x148
IAOQ[1]: flush_cache_page+0x0/0x1a8
RP(r2): flush_cache_range+0xec/0x148
Backtrace:
[<00000000402910ac>] unmap_page_range+0x84/0x880
[<00000000402918f4>] unmap_single_vma+0x4c/0x60
[<0000000040291a18>] zap_page_range_single+0x110/0x160
[<0000000040291c34>] unmap_mapping_range+0x174/0x1a8
[<000000004026ccd8>] truncate_pagecache+0x50/0xa8
[<000000004026cd84>] truncate_setsize+0x54/0x70
[<000000004033d534>] put_aio_ring_file+0x44/0xb0
[<000000004033d5d8>] aio_free_ring+0x38/0x140
[<000000004033d714>] free_ioctx+0x34/0xa8
[<00000000401b0028>] process_one_work+0x1b8/0x4d0
[<00000000401b04f4>] worker_thread+0x1b4/0x648
[<00000000401b9128>] kthread+0x1b0/0x208
[<0000000040150020>] end_fault_vector+0x20/0x28
[<0000000040639518>] nf_ip_reroute+0x50/0xa8
[<0000000040638ed0>] nf_ip_route+0x10/0x78
[<0000000040638c90>] xfrm4_mode_tunnel_input+0x180/0x1f8

CPU: 2 PID: 6532 Comm: kworker/2:0 Not tainted 4.13.0-rc2+ #1
Workqueue: events free_ioctx
Backtrace:
[<0000000040163bf0>] show_stack+0x20/0x38
[<0000000040688480>] dump_stack+0xa8/0x120
[<0000000040163dc4>] die_if_kernel+0x19c/0x2b0
[<0000000040164d0c>] handle_interruption+0xa24/0xa48

This patch modifies flush_cache_range() to handle non current contexts.
In as much as this occurs infrequently, the simplest approach is to
flush the entire cache when this happens.

Signed-off-by: John David Anglin <[email protected]>
Signed-off-by: Helge Deller <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
arch/parisc/kernel/cache.c | 5 ++---
1 file changed, 2 insertions(+), 3 deletions(-)

--- a/arch/parisc/kernel/cache.c
+++ b/arch/parisc/kernel/cache.c
@@ -604,13 +604,12 @@ void flush_cache_range(struct vm_area_st
if (parisc_requires_coherency())
flush_tlb_range(vma, start, end);

- if ((end - start) >= parisc_cache_flush_threshold) {
+ if ((end - start) >= parisc_cache_flush_threshold
+ || vma->vm_mm->context != mfsp(3)) {
flush_cache_all();
return;
}

- BUG_ON(vma->vm_mm->context != mfsp(3));
-
flush_user_dcache_range_asm(start, end);
if (vma->vm_flags & VM_EXEC)
flush_user_icache_range_asm(start, end);


2017-08-09 18:38:28

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.9 04/93] libata: array underflow in ata_find_dev()

4.9-stable review patch. If anyone has any objections, please let me know.

------------------

From: Dan Carpenter <[email protected]>

commit 59a5e266c3f5c1567508888dd61a45b86daed0fa upstream.

My static checker complains that "devno" can be negative, meaning that
we read before the start of the loop. I've looked at the code, and I
think the warning is right. This come from /proc so it's root only or
it would be quite a quite a serious bug. The call tree looks like this:

proc_scsi_write() <- gets id and channel from simple_strtoul()
-> scsi_add_single_device() <- calls shost->transportt->user_scan()
-> ata_scsi_user_scan()
-> ata_find_dev()

Signed-off-by: Dan Carpenter <[email protected]>
Signed-off-by: Tejun Heo <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
drivers/ata/libata-scsi.c | 6 ++++--
1 file changed, 4 insertions(+), 2 deletions(-)

--- a/drivers/ata/libata-scsi.c
+++ b/drivers/ata/libata-scsi.c
@@ -2971,10 +2971,12 @@ static unsigned int atapi_xlat(struct at
static struct ata_device *ata_find_dev(struct ata_port *ap, int devno)
{
if (!sata_pmp_attached(ap)) {
- if (likely(devno < ata_link_max_devices(&ap->link)))
+ if (likely(devno >= 0 &&
+ devno < ata_link_max_devices(&ap->link)))
return &ap->link.device[devno];
} else {
- if (likely(devno < ap->nr_pmp_links))
+ if (likely(devno >= 0 &&
+ devno < ap->nr_pmp_links))
return &ap->pmp_link[devno].device[0];
}



2017-08-09 18:39:18

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.9 02/93] cgroup: create dfl_root files on subsys registration

4.9-stable review patch. If anyone has any objections, please let me know.

------------------

From: Tejun Heo <[email protected]>

commit 7af608e4f9530372aec6e940552bf76595f2e265 upstream.

On subsystem registration, css_populate_dir() is not called on the new
root css, so the interface files for the subsystem on cgrp_dfl_root
aren't created on registration. This is a residue from the days when
cgrp_dfl_root was used only as the parking spot for unused subsystems,
which no longer is true as it's used as the root for cgroup2.

This is often fine as later operations tend to create them as a part
of mount (cgroup1) or subtree_control operations (cgroup2); however,
it's not difficult to mount cgroup2 with the controller interface
files missing as Waiman found out.

Fix it by invoking css_populate_dir() on the root css on subsys
registration.

Signed-off-by: Tejun Heo <[email protected]>
Reported-and-tested-by: Waiman Long <[email protected]>
Signed-off-by: Tejun Heo <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
kernel/cgroup.c | 4 ++++
1 file changed, 4 insertions(+)

--- a/kernel/cgroup.c
+++ b/kernel/cgroup.c
@@ -5718,6 +5718,10 @@ int __init cgroup_init(void)

if (ss->bind)
ss->bind(init_css_set.subsys[ssid]);
+
+ mutex_lock(&cgroup_mutex);
+ css_populate_dir(init_css_set.subsys[ssid]);
+ mutex_unlock(&cgroup_mutex);
}

/* init_css_set.subsys[] has been updated, re-hash */


2017-08-10 00:15:58

by Shuah Khan

[permalink] [raw]
Subject: Re: [PATCH 4.9 00/93] 4.9.42-stable review

On 08/09/2017 12:12 PM, Greg Kroah-Hartman wrote:
> This is the start of the stable review cycle for the 4.9.42 release.
> There are 93 patches in this series, all will be posted as a response
> to this one. If anyone has any issues with these being applied, please
> let me know.
>
> Responses should be made by Fri Aug 11 18:13:03 UTC 2017.
> Anything received after that time might be too late.
>
> The whole patch series can be found in one patch at:
> kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.9.42-rc1.gz
> or in the git tree and branch at:
> git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-4.9.y
> and the diffstat can be found below.
>
> thanks,
>
> greg k-h
>

Compiled and booted on my test system. No dmesg regressions.

thanks,
-- Shuah

2017-08-10 00:41:08

by Guenter Roeck

[permalink] [raw]
Subject: Re: [PATCH 4.9 00/93] 4.9.42-stable review

On 08/09/2017 11:12 AM, Greg Kroah-Hartman wrote:
> This is the start of the stable review cycle for the 4.9.42 release.
> There are 93 patches in this series, all will be posted as a response
> to this one. If anyone has any issues with these being applied, please
> let me know.
>
> Responses should be made by Fri Aug 11 18:13:03 UTC 2017.
> Anything received after that time might be too late.
>

Build results:
total: 145 pass: 143 fail: 2
Failed builds:
sparc64:allmodconfig
sparc64:allnoconfig

Qemu test results:
total: 122 pass: 105 fail: 17
Failed tests:
arm:beagle:multi_v7_defconfig:omap3-beagle
arm:beaglexm:multi_v7_defconfig:omap3-beagle-xm
arm:overo:multi_v7_defconfig:omap3-overo-tobi
arm:sabrelite:multi_v7_defconfig:imx6dl-sabrelite
arm:vexpress-a9:multi_v7_defconfig:vexpress-v2p-ca9
arm:vexpress-a15:multi_v7_defconfig:vexpress-v2p-ca15-tc1
arm:vexpress-a15-a7:multi_v7_defconfig:vexpress-v2p-ca15_a7
arm:xilinx-zynq-a9:multi_v7_defconfig:zynq-zc702
arm:xilinx-zynq-a9:multi_v7_defconfig:zynq-zc706
arm:xilinx-zynq-a9:multi_v7_defconfig:zynq-zed
arm:midway:multi_v7_defconfig:ecx-2000
arm:smdkc210:multi_v7_defconfig:exynos4210-smdkv310
nios2:10m50-ghrd:10m50_defconfig:10m50_devboard.dts
sparc64:sun4u:smp:sparc64_defconfig
sparc64:sun4v:smp:sparc64_defconfig
sparc64:sun4u:nosmp:sparc64_defconfig
sparc64:sun4v:nosmp:sparc64_defconfig

Build errors:

All failing arm tests:

make[1]: *** No rule to make target 'arch/arm/boot/dts/sun8i-h3-nanopi-m1.dtb', needed by '__build'. Stop.

sparc64:
arch/sparc/mm/tsb.c:478:3: error: implicit declaration of function 'tsb_context_switch'
arch/sparc/kernel/process_64.c:439:3: error: implicit declaration of function 'tsb_context_switch'
arch/sparc/mm/init_64.c:2892:2: error: implicit declaration of function 'tsb_context_switch'

The nios2 test crashes; same as with v4.4.

Details are available at http://kerneltests.org/builders.

Guenter

2017-08-10 03:04:46

by Sumit Semwal

[permalink] [raw]
Subject: Re: [PATCH 4.9 00/93] 4.9.42-stable review

Hi Greg,

On 9 August 2017 at 23:42, Greg Kroah-Hartman
<[email protected]> wrote:
> This is the start of the stable review cycle for the 4.9.42 release.
> There are 93 patches in this series, all will be posted as a response
> to this one. If anyone has any issues with these being applied, please
> let me know.
>
> Responses should be made by Fri Aug 11 18:13:03 UTC 2017.
> Anything received after that time might be too late.
For ARM64, built and boot tested with defconfig on Hikey - no regressions noted.
>
> The whole patch series can be found in one patch at:
> kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.9.42-rc1.gz
> or in the git tree and branch at:
> git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-4.9.y
> and the diffstat can be found below.
>
> thanks,
>
> greg k-h

Best,
Sumit.

2017-08-10 16:20:06

by Greg Kroah-Hartman

[permalink] [raw]
Subject: Re: [PATCH 4.9 65/93] sparc64: Prevent perf from running during super critical sections

On Wed, Aug 09, 2017 at 11:13:58AM -0700, Greg Kroah-Hartman wrote:
> 4.9-stable review patch. If anyone has any objections, please let me know.
>
> ------------------
>
> From: Rob Gardner <[email protected]>
>
>
> [ Upstream commit fc290a114fc6034b0f6a5a46e2fb7d54976cf87a ]
>
> This fixes another cause of random segfaults and bus errors that may
> occur while running perf with the callgraph option.
>
> Critical sections beginning with spin_lock_irqsave() raise the interrupt
> level to PIL_NORMAL_MAX (14) and intentionally do not block performance
> counter interrupts, which arrive at PIL_NMI (15).
>
> But some sections of code are "super critical" with respect to perf
> because the perf_callchain_user() path accesses user space and may cause
> TLB activity as well as faults as it unwinds the user stack.
>
> One particular critical section occurs in switch_mm:
>
> spin_lock_irqsave(&mm->context.lock, flags);
> ...
> load_secondary_context(mm);
> tsb_context_switch(mm);
> ...
> spin_unlock_irqrestore(&mm->context.lock, flags);
>
> If a perf interrupt arrives in between load_secondary_context() and
> tsb_context_switch(), then perf_callchain_user() could execute with
> the context ID of one process, but with an active TSB for a different
> process. When the user stack is accessed, it is very likely to
> incur a TLB miss, since the h/w context ID has been changed. The TLB
> will then be reloaded with a translation from the TSB for one process,
> but using a context ID for another process. This exposes memory from
> one process to another, and since it is a mapping for stack memory,
> this usually causes the new process to crash quickly.
>
> This super critical section needs more protection than is provided
> by spin_lock_irqsave() since perf interrupts must not be allowed in.
>
> Since __tsb_context_switch already goes through the trouble of
> disabling interrupts completely, we fix this by moving the secondary
> context load down into this better protected region.
>
> Orabug: 25577560
>
> Signed-off-by: Dave Aldridge <[email protected]>
> Signed-off-by: Rob Gardner <[email protected]>
> Signed-off-by: David S. Miller <[email protected]>
> Signed-off-by: Greg Kroah-Hartman <[email protected]>
> ---
> arch/sparc/include/asm/mmu_context_64.h | 12 +++++++-----
> arch/sparc/kernel/tsb.S | 12 ++++++++++++
> arch/sparc/power/hibernate.c | 3 +--
> 3 files changed, 20 insertions(+), 7 deletions(-)
>
> --- a/arch/sparc/include/asm/mmu_context_64.h
> +++ b/arch/sparc/include/asm/mmu_context_64.h
> @@ -25,9 +25,11 @@ void destroy_context(struct mm_struct *m
> void __tsb_context_switch(unsigned long pgd_pa,
> struct tsb_config *tsb_base,
> struct tsb_config *tsb_huge,
> - unsigned long tsb_descr_pa);
> + unsigned long tsb_descr_pa,
> + unsigned long secondary_ctx);
>
> -static inline void tsb_context_switch(struct mm_struct *mm)
> +static inline void tsb_context_switch_ctx(struct mm_struct *mm,
> + unsigned long ctx)
> {
> __tsb_context_switch(__pa(mm->pgd),
> &mm->context.tsb_block[0],
> @@ -38,7 +40,8 @@ static inline void tsb_context_switch(st
> #else
> NULL
> #endif
> - , __pa(&mm->context.tsb_descr[0]));
> + , __pa(&mm->context.tsb_descr[0]),
> + ctx);
> }
>
> void tsb_grow(struct mm_struct *mm,
> @@ -110,8 +113,7 @@ static inline void switch_mm(struct mm_s
> * cpu0 to update it's TSB because at that point the cpu_vm_mask
> * only had cpu1 set in it.
> */
> - load_secondary_context(mm);
> - tsb_context_switch(mm);
> + tsb_context_switch_ctx(mm, CTX_HWBITS(mm->context));
>
> /* Any time a processor runs a context on an address space
> * for the first time, we must flush that context out of the
> --- a/arch/sparc/kernel/tsb.S
> +++ b/arch/sparc/kernel/tsb.S
> @@ -375,6 +375,7 @@ tsb_flush:
> * %o1: TSB base config pointer
> * %o2: TSB huge config pointer, or NULL if none
> * %o3: Hypervisor TSB descriptor physical address
> + * %o4: Secondary context to load, if non-zero
> *
> * We have to run this whole thing with interrupts
> * disabled so that the current cpu doesn't change
> @@ -387,6 +388,17 @@ __tsb_context_switch:
> rdpr %pstate, %g1
> wrpr %g1, PSTATE_IE, %pstate
>
> + brz,pn %o4, 1f
> + mov SECONDARY_CONTEXT, %o5
> +
> +661: stxa %o4, [%o5] ASI_DMMU
> + .section .sun4v_1insn_patch, "ax"
> + .word 661b
> + stxa %o4, [%o5] ASI_MMU
> + .previous
> + flush %g6
> +
> +1:
> TRAP_LOAD_TRAP_BLOCK(%g2, %g3)
>
> stx %o0, [%g2 + TRAP_PER_CPU_PGD_PADDR]
> --- a/arch/sparc/power/hibernate.c
> +++ b/arch/sparc/power/hibernate.c
> @@ -35,6 +35,5 @@ void restore_processor_state(void)
> {
> struct mm_struct *mm = current->active_mm;
>
> - load_secondary_context(mm);
> - tsb_context_switch(mm);
> + tsb_context_switch_ctx(mm, CTX_HWBITS(mm->context));
> }
>

This seems to break the build, so I'm dropping it from the patch set.

thanks,

greg k-h

2017-08-10 16:46:03

by David Miller

[permalink] [raw]
Subject: Re: [PATCH 4.9 65/93] sparc64: Prevent perf from running during super critical sections

From: Greg Kroah-Hartman <[email protected]>
Date: Thu, 10 Aug 2017 09:20:03 -0700

> This seems to break the build, so I'm dropping it from the patch set.

Sorry, I'll fix this up and resubmit.

Thanks.

2017-08-11 05:00:40

by David Miller

[permalink] [raw]
Subject: Re: [PATCH 4.9 65/93] sparc64: Prevent perf from running during super critical sections

From: Greg Kroah-Hartman <[email protected]>
Date: Thu, 10 Aug 2017 09:20:03 -0700

> This seems to break the build, so I'm dropping it from the patch set.

Greg, this one should be better, please queue it up for v4.9

Thanks!

====================
>From 0f33eec8a49645ac99796afbdf3b8986983b3783 Mon Sep 17 00:00:00 2001
From: Rob Gardner <[email protected]>
Date: Mon, 17 Jul 2017 09:22:27 -0600
Subject: [PATCH] sparc64: Prevent perf from running during super critical
sections

[ Upstream commit fc290a114fc6034b0f6a5a46e2fb7d54976cf87a ]

This fixes another cause of random segfaults and bus errors that may
occur while running perf with the callgraph option.

Critical sections beginning with spin_lock_irqsave() raise the interrupt
level to PIL_NORMAL_MAX (14) and intentionally do not block performance
counter interrupts, which arrive at PIL_NMI (15).

But some sections of code are "super critical" with respect to perf
because the perf_callchain_user() path accesses user space and may cause
TLB activity as well as faults as it unwinds the user stack.

One particular critical section occurs in switch_mm:

spin_lock_irqsave(&mm->context.lock, flags);
...
load_secondary_context(mm);
tsb_context_switch(mm);
...
spin_unlock_irqrestore(&mm->context.lock, flags);

If a perf interrupt arrives in between load_secondary_context() and
tsb_context_switch(), then perf_callchain_user() could execute with
the context ID of one process, but with an active TSB for a different
process. When the user stack is accessed, it is very likely to
incur a TLB miss, since the h/w context ID has been changed. The TLB
will then be reloaded with a translation from the TSB for one process,
but using a context ID for another process. This exposes memory from
one process to another, and since it is a mapping for stack memory,
this usually causes the new process to crash quickly.

This super critical section needs more protection than is provided
by spin_lock_irqsave() since perf interrupts must not be allowed in.

Since __tsb_context_switch already goes through the trouble of
disabling interrupts completely, we fix this by moving the secondary
context load down into this better protected region.

Orabug: 25577560

Signed-off-by: Dave Aldridge <[email protected]>
Signed-off-by: Rob Gardner <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
---
arch/sparc/include/asm/mmu_context_64.h | 14 +++++++++-----
arch/sparc/kernel/tsb.S | 12 ++++++++++++
arch/sparc/power/hibernate.c | 3 +--
3 files changed, 22 insertions(+), 7 deletions(-)

diff --git a/arch/sparc/include/asm/mmu_context_64.h b/arch/sparc/include/asm/mmu_context_64.h
index 349dd23..0cdeb2b 100644
--- a/arch/sparc/include/asm/mmu_context_64.h
+++ b/arch/sparc/include/asm/mmu_context_64.h
@@ -25,9 +25,11 @@ void destroy_context(struct mm_struct *mm);
void __tsb_context_switch(unsigned long pgd_pa,
struct tsb_config *tsb_base,
struct tsb_config *tsb_huge,
- unsigned long tsb_descr_pa);
+ unsigned long tsb_descr_pa,
+ unsigned long secondary_ctx);

-static inline void tsb_context_switch(struct mm_struct *mm)
+static inline void tsb_context_switch_ctx(struct mm_struct *mm,
+ unsigned long ctx)
{
__tsb_context_switch(__pa(mm->pgd),
&mm->context.tsb_block[0],
@@ -38,9 +40,12 @@ static inline void tsb_context_switch(struct mm_struct *mm)
#else
NULL
#endif
- , __pa(&mm->context.tsb_descr[0]));
+ , __pa(&mm->context.tsb_descr[0]),
+ ctx);
}

+#define tsb_context_switch(X) tsb_context_switch_ctx(X, 0)
+
void tsb_grow(struct mm_struct *mm,
unsigned long tsb_index,
unsigned long mm_rss);
@@ -110,8 +115,7 @@ static inline void switch_mm(struct mm_struct *old_mm, struct mm_struct *mm, str
* cpu0 to update it's TSB because at that point the cpu_vm_mask
* only had cpu1 set in it.
*/
- load_secondary_context(mm);
- tsb_context_switch(mm);
+ tsb_context_switch_ctx(mm, CTX_HWBITS(mm->context));

/* Any time a processor runs a context on an address space
* for the first time, we must flush that context out of the
diff --git a/arch/sparc/kernel/tsb.S b/arch/sparc/kernel/tsb.S
index 395ec18..7d961f6 100644
--- a/arch/sparc/kernel/tsb.S
+++ b/arch/sparc/kernel/tsb.S
@@ -375,6 +375,7 @@ tsb_flush:
* %o1: TSB base config pointer
* %o2: TSB huge config pointer, or NULL if none
* %o3: Hypervisor TSB descriptor physical address
+ * %o4: Secondary context to load, if non-zero
*
* We have to run this whole thing with interrupts
* disabled so that the current cpu doesn't change
@@ -387,6 +388,17 @@ __tsb_context_switch:
rdpr %pstate, %g1
wrpr %g1, PSTATE_IE, %pstate

+ brz,pn %o4, 1f
+ mov SECONDARY_CONTEXT, %o5
+
+661: stxa %o4, [%o5] ASI_DMMU
+ .section .sun4v_1insn_patch, "ax"
+ .word 661b
+ stxa %o4, [%o5] ASI_MMU
+ .previous
+ flush %g6
+
+1:
TRAP_LOAD_TRAP_BLOCK(%g2, %g3)

stx %o0, [%g2 + TRAP_PER_CPU_PGD_PADDR]
diff --git a/arch/sparc/power/hibernate.c b/arch/sparc/power/hibernate.c
index 17bd2e1..df707a8 100644
--- a/arch/sparc/power/hibernate.c
+++ b/arch/sparc/power/hibernate.c
@@ -35,6 +35,5 @@ void restore_processor_state(void)
{
struct mm_struct *mm = current->active_mm;

- load_secondary_context(mm);
- tsb_context_switch(mm);
+ tsb_context_switch_ctx(mm, CTX_HWBITS(mm->context));
}
--
2.1.2.532.g19b5d50

2017-08-11 19:34:02

by Greg Kroah-Hartman

[permalink] [raw]
Subject: Re: [PATCH 4.9 65/93] sparc64: Prevent perf from running during super critical sections

On Thu, Aug 10, 2017 at 10:00:35PM -0700, David Miller wrote:
> From: Greg Kroah-Hartman <[email protected]>
> Date: Thu, 10 Aug 2017 09:20:03 -0700
>
> > This seems to break the build, so I'm dropping it from the patch set.
>
> Greg, this one should be better, please queue it up for v4.9

Thanks, now applied.

greg k-h