2018-07-27 09:49:25

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.17 00/66] 4.17.11-stable review

This is the start of the stable review cycle for the 4.17.11 release.
There are 66 patches in this series, all will be posted as a response
to this one. If anyone has any issues with these being applied, please
let me know.

Responses should be made by Sun Jul 29 09:37:38 UTC 2018.
Anything received after that time might be too late.

The whole patch series can be found in one patch at:
https://www.kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.17.11-rc1.gz
or in the git tree and branch at:
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-4.17.y
and the diffstat can be found below.

thanks,

greg k-h

-------------
Pseudo-Shortlog of commits:

Greg Kroah-Hartman <[email protected]>
Linux 4.17.11-rc1

Roman Fietze <[email protected]>
can: m_can.c: fix setup of CCCR register: clear CCCR NISO bit before checking can.ctrlmode

Faiz Abbas <[email protected]>
can: m_can: Fix runtime resume call

Stephane Grosjean <[email protected]>
can: peak_canfd: fix firmware < v3.3.0: limit allocation to 32-bit DMA addr only

Anssi Hannula <[email protected]>
can: xilinx_can: fix RX overflow interrupt not being enabled

Anssi Hannula <[email protected]>
can: xilinx_can: fix incorrect clear of non-processed interrupts

Anssi Hannula <[email protected]>
can: xilinx_can: keep only 1-2 frames in TX FIFO to fix TX accounting

Anssi Hannula <[email protected]>
can: xilinx_can: fix device dropping off bus on RX overrun

Anssi Hannula <[email protected]>
can: xilinx_can: fix recovery from error states not being propagated

Anssi Hannula <[email protected]>
can: xilinx_can: fix power management handling

Anssi Hannula <[email protected]>
can: xilinx_can: fix RX loop if RXNEMP is asserted without RXOK

Rafael J. Wysocki <[email protected]>
driver core: Partially revert "driver core: correct device's shutdown order"

Schmauss, Erik <[email protected]>
ACPICA: AML Parser: ignore dispatcher error status during table load

Jerry Zhang <[email protected]>
usb: gadget: f_fs: Only return delayed status when len is 0

Benjamin Herrenschmidt <[email protected]>
usb: gadget: Fix OS descriptors support

Zheng Xiaowei <[email protected]>
usb: xhci: Fix memory leak in xhci_endpoint_reset()

Antti Seppälä <[email protected]>
usb: dwc2: Fix DMA alignment to start at allocated boundary

Bin Liu <[email protected]>
usb: core: handle hub C_PORT_OVER_CURRENT condition

Lubomir Rintel <[email protected]>
usb: cdc_acm: Add quirk for Castles VEGA3000

Samuel Thibault <[email protected]>
staging: speakup: fix wraparound in uaccess length check

Hans de Goede <[email protected]>
Revert "staging:r8188eu: Use lib80211 to support TKIP"

Eric Dumazet <[email protected]>
tcp: add tcp_ooo_try_coalesce() helper

Eric Dumazet <[email protected]>
tcp: call tcp_drop() from tcp_data_queue_ofo()

Eric Dumazet <[email protected]>
tcp: detect malicious patterns in tcp_collapse_ofo_queue()

Eric Dumazet <[email protected]>
tcp: avoid collapses in tcp_prune_queue() if possible

Eric Dumazet <[email protected]>
tcp: free batches of packets in tcp_prune_ofo_queue()

Roopa Prabhu <[email protected]>
vxlan: fix default fdb entry netlink notify ordering during netdev create

Roopa Prabhu <[email protected]>
vxlan: make netlink notify in vxlan_fdb_destroy optional

Roopa Prabhu <[email protected]>
vxlan: add new fdb alloc and create helpers

Roopa Prabhu <[email protected]>
rtnetlink: add rtnl_link_state check in rtnl_configure_link

Ariel Levkovich <[email protected]>
net/mlx5: Adjust clock overflow work period

Eran Ben Elisha <[email protected]>
net/mlx5e: Fix quota counting in aRFS expire flow

Eran Ben Elisha <[email protected]>
net/mlx5e: Don't allow aRFS for encapsulated packets

David Ahern <[email protected]>
net/ipv6: Fix linklocal to global address with VRF

Hangbin Liu <[email protected]>
multicast: do not restore deleted record source filter mode to new one

Heiner Kallweit <[email protected]>
net: phy: consider PHY_IGNORE_INTERRUPT in phy_start_aneg_priv

Daniel Borkmann <[email protected]>
sock: fix sg page frag coalescing in sk_alloc_sg

John Hurley <[email protected]>
nfp: flower: ensure dead neighbour entries are not offloaded

Shay Agroskin <[email protected]>
net/mlx5e: Refine ets validation function

Roi Dayan <[email protected]>
net/mlx5e: Only allow offloading decap egress (egdev) flows

Or Gerlitz <[email protected]>
net/mlx5e: Add ingress/egress indication for offloaded TC flows

Doron Roberts-Kedes <[email protected]>
tls: check RCV_SHUTDOWN in tls_wait_data

Heiner Kallweit <[email protected]>
r8169: restore previous behavior to accept BIOS WoL settings

Saeed Mahameed <[email protected]>
net/mlx5: E-Switch, UBSAN fix undefined behavior in mlx5_eswitch_mode

Yuchung Cheng <[email protected]>
tcp: do not delay ACK in DCTCP upon CE status change

Yuchung Cheng <[email protected]>
tcp: do not cancel delay-AcK on DCTCP special ACK

Yuchung Cheng <[email protected]>
tcp: helpers to send special DCTCP ack

Yuchung Cheng <[email protected]>
tcp: fix dctcp delayed ACK schedule

Eric Dumazet <[email protected]>
net: skb_segment() should not return NULL

Zhao Chen <[email protected]>
net-next/hinic: fix a problem in hinic_xmit_frame()

Jack Morgenstein <[email protected]>
net/mlx4_core: Save the qpn from the input modifier in RST2INIT wrapper

Uwe Kleine-König <[email protected]>
net: dsa: mv88e6xxx: fix races between lock and irq freeing

Willem de Bruijn <[email protected]>
ip: in cmsg IP(V6)_ORIGDSTADDR call pskb_may_pull

Paolo Abeni <[email protected]>
ip: hash fragments consistently

Jarod Wilson <[email protected]>
bonding: set default miimon value for non-arp modes if not set

Neil Armstrong <[email protected]>
clk: meson-gxbb: set fclk_div2 as CLK_IS_CRITICAL

Lyude Paul <[email protected]>
drm/nouveau: Set DRIVER_ATOMIC cap earlier to fix debugfs

Lyude Paul <[email protected]>
drm/nouveau/drm/nouveau: Fix runtime PM leak in nv50_disp_atomic_commit()

Alexey Kardashevskiy <[email protected]>
KVM: PPC: Check if IOMMU page is contained in the pinned physical page

Boris Ostrovsky <[email protected]>
xen/PVH: Set up GS segment for stack canary

Joel Stanley <[email protected]>
clk: aspeed: Support HPLL strapping on ast2400

Joel Stanley <[email protected]>
clk: aspeed: Mark bclk (PCIe) and dclk (VGA) as critical

Gregory CLEMENT <[email protected]>
clk: mvebu: armada-37xx-periph: Fix switching CPU rate from 300Mhz to 1.2GHz

Paul Burton <[email protected]>
MIPS: Fix off-by-one in pci_resource_to_user()

Felix Fietkau <[email protected]>
MIPS: ath79: fix register address in ath79_ddr_wb_flush()

Christoph Hellwig <[email protected]>
Revert "iommu/intel-iommu: Enable CONFIG_DMA_DIRECT_OPS=y and clean up intel_{alloc,free}_coherent()"

Paolo Bonzini <[email protected]>
KVM: VMX: support MSR_IA32_ARCH_CAPABILITIES as a feature MSR


-------------

Diffstat:

Makefile | 4 +-
arch/mips/ath79/common.c | 2 +-
arch/mips/pci/pci.c | 2 +-
arch/powerpc/include/asm/mmu_context.h | 4 +-
arch/powerpc/kvm/book3s_64_vio.c | 2 +-
arch/powerpc/kvm/book3s_64_vio_hv.c | 6 +-
arch/powerpc/mm/mmu_context_iommu.c | 37 +-
arch/x86/kvm/x86.c | 4 +-
arch/x86/xen/xen-pvh.S | 26 +-
drivers/acpi/acpica/psloop.c | 26 ++
drivers/base/dd.c | 8 -
drivers/clk/clk-aspeed.c | 46 ++-
drivers/clk/meson/gxbb.c | 1 +
drivers/clk/mvebu/armada-37xx-periph.c | 38 ++
drivers/gpu/drm/nouveau/dispnv04/disp.c | 3 +
drivers/gpu/drm/nouveau/nouveau_drm.c | 7 +
drivers/gpu/drm/nouveau/nv50_display.c | 8 +-
drivers/iommu/Kconfig | 1 -
drivers/iommu/intel-iommu.c | 62 +++-
drivers/net/bonding/bond_options.c | 23 +-
drivers/net/can/m_can/m_can.c | 11 +-
drivers/net/can/peak_canfd/peak_pciefd_main.c | 19 +
drivers/net/can/xilinx_can.c | 392 +++++++++++++++------
drivers/net/dsa/mv88e6xxx/chip.c | 21 +-
drivers/net/ethernet/huawei/hinic/hinic_tx.c | 1 +
.../net/ethernet/mellanox/mlx4/resource_tracker.c | 2 +-
drivers/net/ethernet/mellanox/mlx5/core/en.h | 3 -
drivers/net/ethernet/mellanox/mlx5/core/en_arfs.c | 7 +-
drivers/net/ethernet/mellanox/mlx5/core/en_dcbnl.c | 17 +-
drivers/net/ethernet/mellanox/mlx5/core/en_main.c | 15 +-
drivers/net/ethernet/mellanox/mlx5/core/en_rep.c | 32 +-
drivers/net/ethernet/mellanox/mlx5/core/en_tc.c | 42 ++-
drivers/net/ethernet/mellanox/mlx5/core/en_tc.h | 13 +-
drivers/net/ethernet/mellanox/mlx5/core/eswitch.c | 2 +-
.../net/ethernet/mellanox/mlx5/core/lib/clock.c | 12 +-
.../ethernet/netronome/nfp/flower/tunnel_conf.c | 2 +-
drivers/net/ethernet/realtek/r8169.c | 3 +-
drivers/net/phy/phy.c | 2 +-
drivers/net/vxlan.c | 126 +++++--
drivers/staging/rtl8188eu/Kconfig | 1 -
drivers/staging/rtl8188eu/core/rtw_recv.c | 161 ++++++---
drivers/staging/rtl8188eu/core/rtw_security.c | 92 ++---
drivers/staging/speakup/speakup_soft.c | 6 +-
drivers/usb/class/cdc-acm.c | 3 +
drivers/usb/core/hub.c | 8 +-
drivers/usb/dwc2/hcd.c | 44 +--
drivers/usb/gadget/composite.c | 1 -
drivers/usb/gadget/function/f_fs.c | 2 +-
drivers/usb/host/xhci.c | 1 +
drivers/vfio/vfio_iommu_spapr_tce.c | 2 +-
include/net/tcp.h | 7 +
net/core/rtnetlink.c | 9 +-
net/core/skbuff.c | 10 +-
net/core/sock.c | 6 +-
net/ipv4/igmp.c | 3 +-
net/ipv4/ip_output.c | 2 +
net/ipv4/ip_sockglue.c | 7 +-
net/ipv4/tcp_dctcp.c | 50 +--
net/ipv4/tcp_input.c | 65 +++-
net/ipv4/tcp_output.c | 33 +-
net/ipv6/datagram.c | 7 +-
net/ipv6/icmp.c | 5 +-
net/ipv6/ip6_output.c | 2 +
net/ipv6/mcast.c | 3 +-
net/ipv6/tcp_ipv6.c | 6 +-
net/tls/tls_sw.c | 3 +
66 files changed, 1097 insertions(+), 474 deletions(-)




2018-07-27 09:48:15

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.17 10/66] drm/nouveau/drm/nouveau: Fix runtime PM leak in nv50_disp_atomic_commit()

4.17-stable review patch. If anyone has any objections, please let me know.

------------------

From: Lyude Paul <[email protected]>

commit e5d54f1935722f83df7619f3978f774c2b802cd8 upstream.

A CRTC being enabled doesn't mean it's on! It doesn't even necessarily
mean it's being used. This fixes runtime PM leaks on the P50 I've got
next to me.

Signed-off-by: Lyude Paul <[email protected]>
Cc: [email protected]
Signed-off-by: Ben Skeggs <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>


---
drivers/gpu/drm/nouveau/nv50_display.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

--- a/drivers/gpu/drm/nouveau/nv50_display.c
+++ b/drivers/gpu/drm/nouveau/nv50_display.c
@@ -4198,7 +4198,7 @@ nv50_disp_atomic_commit(struct drm_devic
nv50_disp_atomic_commit_tail(state);

drm_for_each_crtc(crtc, dev) {
- if (crtc->state->enable) {
+ if (crtc->state->active) {
if (!drm->have_disp_power_ref) {
drm->have_disp_power_ref = true;
return 0;



2018-07-27 09:48:15

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.17 11/66] drm/nouveau: Set DRIVER_ATOMIC cap earlier to fix debugfs

4.17-stable review patch. If anyone has any objections, please let me know.

------------------

From: Lyude Paul <[email protected]>

commit eb493fbc150f4a28151ae1ee84f24395989f3600 upstream.

Currently nouveau doesn't actually expose the state debugfs file that's
usually provided for any modesetting driver that supports atomic, even
if nouveau is loaded with atomic=1. This is due to the fact that the
standard debugfs files that DRM creates for atomic drivers is called
when drm_get_pci_dev() is called from nouveau_drm.c. This happens well
before we've initialized the display core, which is currently
responsible for setting the DRIVER_ATOMIC cap.

So, move the atomic option into nouveau_drm.c and just add the
DRIVER_ATOMIC cap whenever it's enabled on the kernel commandline. This
shouldn't cause any actual issues, as the atomic ioctl will still fail
as expected even if the display core doesn't disable it until later in
the init sequence. This also provides the added benefit of being able to
use the state debugfs file to check the current display state even if
clients aren't allowed to modify it through anything other than the
legacy ioctls.

Additionally, disable the DRIVER_ATOMIC cap in nv04's display core, as
this was already disabled there previously.

Signed-off-by: Lyude Paul <[email protected]>
Cc: [email protected]
Signed-off-by: Ben Skeggs <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
drivers/gpu/drm/nouveau/dispnv04/disp.c | 3 +++
drivers/gpu/drm/nouveau/nouveau_drm.c | 7 +++++++
drivers/gpu/drm/nouveau/nv50_display.c | 6 ------
3 files changed, 10 insertions(+), 6 deletions(-)

--- a/drivers/gpu/drm/nouveau/dispnv04/disp.c
+++ b/drivers/gpu/drm/nouveau/dispnv04/disp.c
@@ -55,6 +55,9 @@ nv04_display_create(struct drm_device *d
nouveau_display(dev)->init = nv04_display_init;
nouveau_display(dev)->fini = nv04_display_fini;

+ /* Pre-nv50 doesn't support atomic, so don't expose the ioctls */
+ dev->driver->driver_features &= ~DRIVER_ATOMIC;
+
nouveau_hw_save_vga_fonts(dev, 1);

nv04_crtc_create(dev, 0);
--- a/drivers/gpu/drm/nouveau/nouveau_drm.c
+++ b/drivers/gpu/drm/nouveau/nouveau_drm.c
@@ -79,6 +79,10 @@ MODULE_PARM_DESC(modeset, "enable driver
int nouveau_modeset = -1;
module_param_named(modeset, nouveau_modeset, int, 0400);

+MODULE_PARM_DESC(atomic, "Expose atomic ioctl (default: disabled)");
+static int nouveau_atomic = 0;
+module_param_named(atomic, nouveau_atomic, int, 0400);
+
MODULE_PARM_DESC(runpm, "disable (0), force enable (1), optimus only default (-1)");
static int nouveau_runtime_pm = -1;
module_param_named(runpm, nouveau_runtime_pm, int, 0400);
@@ -501,6 +505,9 @@ static int nouveau_drm_probe(struct pci_

pci_set_master(pdev);

+ if (nouveau_atomic)
+ driver_pci.driver_features |= DRIVER_ATOMIC;
+
ret = drm_get_pci_dev(pdev, pent, &driver_pci);
if (ret) {
nvkm_device_del(&device);
--- a/drivers/gpu/drm/nouveau/nv50_display.c
+++ b/drivers/gpu/drm/nouveau/nv50_display.c
@@ -4441,10 +4441,6 @@ nv50_display_destroy(struct drm_device *
kfree(disp);
}

-MODULE_PARM_DESC(atomic, "Expose atomic ioctl (default: disabled)");
-static int nouveau_atomic = 0;
-module_param_named(atomic, nouveau_atomic, int, 0400);
-
int
nv50_display_create(struct drm_device *dev)
{
@@ -4469,8 +4465,6 @@ nv50_display_create(struct drm_device *d
disp->disp = &nouveau_display(dev)->disp;
dev->mode_config.funcs = &nv50_disp_func;
dev->driver->driver_features |= DRIVER_PREFER_XBGR_30BPP;
- if (nouveau_atomic)
- dev->driver->driver_features |= DRIVER_ATOMIC;

/* small shared memory area we use for notifiers and semaphores */
ret = nouveau_bo_new(&drm->client, 4096, 0x1000, TTM_PL_FLAG_VRAM,



2018-07-27 09:48:18

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.17 12/66] clk: meson-gxbb: set fclk_div2 as CLK_IS_CRITICAL

4.17-stable review patch. If anyone has any objections, please let me know.

------------------

From: Neil Armstrong <[email protected]>

commit c987ac6f1f088663b6dad39281071aeb31d450a8 upstream.

On Amlogic Meson GXBB & GXL platforms, the SCPI Cortex-M4 Co-Processor
seems to be dependent on the FCLK_DIV2 to be operationnal.

The issue occurred since v4.17-rc1 by freezing the kernel boot when
the 'schedutil' cpufreq governor was selected as default :

[ 12.071837] scpi_protocol scpi: SCP Protocol 0.0 Firmware 0.0.0 version
domain-0 init dvfs: 4
[ 12.087757] hctosys: unable to open rtc device (rtc0)
[ 12.087907] cfg80211: Loading compiled-in X.509 certificates for regulatory database
[ 12.102241] cfg80211: Loaded X.509 cert 'sforshee: 00b28ddf47aef9cea7'

But when disabling the MMC driver, the boot finished but cpufreq failed to
change the CPU frequency :

[ 12.153045] cpufreq: __target_index: Failed to change cpu frequency: -5

A bisect between v4.16 and v4.16-rc1 gave
05f814402d61 ("clk: meson: add fdiv clock gates") to be the first bad commit.
This commit added support for the missing clock gates before the fixed PLL
fixed dividers (FCLK_DIVx) and the clock framework basically disabled
all the unused fixed dividers, thus disabled a critical clock path for
the SCPI Co-Processor.

This patch simply sets the FCLK_DIV2 gate as critical to ensure
nobody can disable it.

Fixes: 05f814402d61 ("clk: meson: add fdiv clock gates")
Signed-off-by: Neil Armstrong <[email protected]>
Tested-by: Kevin Hilman <[email protected]>
[few corrections in the commit description]
Signed-off-by: Jerome Brunet <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
drivers/clk/meson/gxbb.c | 1 +
1 file changed, 1 insertion(+)

--- a/drivers/clk/meson/gxbb.c
+++ b/drivers/clk/meson/gxbb.c
@@ -511,6 +511,7 @@ static struct clk_regmap gxbb_fclk_div2
.ops = &clk_regmap_gate_ops,
.parent_names = (const char *[]){ "fclk_div2_div" },
.num_parents = 1,
+ .flags = CLK_IS_CRITICAL,
},
};




2018-07-27 09:48:29

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.17 13/66] bonding: set default miimon value for non-arp modes if not set

4.17-stable review patch. If anyone has any objections, please let me know.

------------------

From: Jarod Wilson <[email protected]>

[ Upstream commit c1f897ce186a529a494441642125479d38727a3d ]

For some time now, if you load the bonding driver and configure bond
parameters via sysfs using minimal config options, such as specifying
nothing but the mode, relying on defaults for everything else, modes
that cannot use arp monitoring (802.3ad, balance-tlb, balance-alb) all
wind up with both arp_interval=0 (as it should be) and miimon=0, which
means the miimon monitor thread never actually runs. This is particularly
problematic for 802.3ad.

For example, from an LNST recipe I've set up:

$ modprobe bonding max_bonds=0"
$ echo "+t_bond0" > /sys/class/net/bonding_masters"
$ ip link set t_bond0 down"
$ echo "802.3ad" > /sys/class/net/t_bond0/bonding/mode"
$ ip link set ens1f1 down"
$ echo "+ens1f1" > /sys/class/net/t_bond0/bonding/slaves"
$ ip link set ens1f0 down"
$ echo "+ens1f0" > /sys/class/net/t_bond0/bonding/slaves"
$ ethtool -i t_bond0"
$ ip link set ens1f1 up"
$ ip link set ens1f0 up"
$ ip link set t_bond0 up"
$ ip addr add 192.168.9.1/24 dev t_bond0"
$ ip addr add 2002::1/64 dev t_bond0"

This bond comes up okay, but things look slightly suspect in
/proc/net/bonding/t_bond0 output:

$ grep -i mii /proc/net/bonding/t_bond0
MII Status: up
MII Polling Interval (ms): 0
MII Status: up
MII Status: up

Now, pull a cable on one of the ports in the bond, then reconnect it, and
you'll see:

Slave Interface: ens1f0
MII Status: down
Speed: 1000 Mbps
Duplex: full

I believe this became a major issue as of commit 4d2c0cda0744, which for
802.3ad bonds, sets slave->link = BOND_LINK_DOWN, with a comment about
relying on link monitoring via miimon to set it correctly, but since the
miimon work queue never runs, the link just stays marked down.

If we simply tweak bond_option_mode_set() slightly, we can check for the
non-arp modes having no miimon value set, and insert BOND_DEFAULT_MIIMON,
which gets things back in full working order. This problem exists as far
back as 4.14, and might be worth fixing in all stable trees since, though
the work-around is to simply specify an miimon value yourself.

Reported-by: Bob Ball <[email protected]>
Signed-off-by: Jarod Wilson <[email protected]>
Acked-by: Mahesh Bandewar <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
drivers/net/bonding/bond_options.c | 23 ++++++++++++++---------
1 file changed, 14 insertions(+), 9 deletions(-)

--- a/drivers/net/bonding/bond_options.c
+++ b/drivers/net/bonding/bond_options.c
@@ -743,15 +743,20 @@ const struct bond_option *bond_opt_get(u
static int bond_option_mode_set(struct bonding *bond,
const struct bond_opt_value *newval)
{
- if (!bond_mode_uses_arp(newval->value) && bond->params.arp_interval) {
- netdev_dbg(bond->dev, "%s mode is incompatible with arp monitoring, start mii monitoring\n",
- newval->string);
- /* disable arp monitoring */
- bond->params.arp_interval = 0;
- /* set miimon to default value */
- bond->params.miimon = BOND_DEFAULT_MIIMON;
- netdev_dbg(bond->dev, "Setting MII monitoring interval to %d\n",
- bond->params.miimon);
+ if (!bond_mode_uses_arp(newval->value)) {
+ if (bond->params.arp_interval) {
+ netdev_dbg(bond->dev, "%s mode is incompatible with arp monitoring, start mii monitoring\n",
+ newval->string);
+ /* disable arp monitoring */
+ bond->params.arp_interval = 0;
+ }
+
+ if (!bond->params.miimon) {
+ /* set miimon to default value */
+ bond->params.miimon = BOND_DEFAULT_MIIMON;
+ netdev_dbg(bond->dev, "Setting MII monitoring interval to %d\n",
+ bond->params.miimon);
+ }
}

if (newval->value == BOND_MODE_ALB)



2018-07-27 09:48:29

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.17 02/66] Revert "iommu/intel-iommu: Enable CONFIG_DMA_DIRECT_OPS=y and clean up intel_{alloc,free}_coherent()"

4.17-stable review patch. If anyone has any objections, please let me know.

------------------

From: Christoph Hellwig <[email protected]>

commit 7ec916f82c48dcfc115eee2e3e0e6d400e310fc5 upstream.

This commit may cause a less than required dma mask to be used for
some allocations, which apparently leads to module load failures for
iwlwifi sometimes.

This reverts commit d657c5c73ca987214a6f9436e435b34fc60f332a.

Signed-off-by: Christoph Hellwig <[email protected]>
Reported-by: Fabio Coatti <[email protected]>
Tested-by: Fabio Coatti <[email protected]>
Cc: "Jason A. Donenfeld" <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
drivers/iommu/Kconfig | 1
drivers/iommu/intel-iommu.c | 64 ++++++++++++++++++++++++++++++++------------
2 files changed, 47 insertions(+), 18 deletions(-)

--- a/drivers/iommu/Kconfig
+++ b/drivers/iommu/Kconfig
@@ -142,7 +142,6 @@ config DMAR_TABLE
config INTEL_IOMMU
bool "Support for Intel IOMMU using DMA Remapping Devices"
depends on PCI_MSI && ACPI && (X86 || IA64_GENERIC)
- select DMA_DIRECT_OPS
select IOMMU_API
select IOMMU_IOVA
select DMAR_TABLE
--- a/drivers/iommu/intel-iommu.c
+++ b/drivers/iommu/intel-iommu.c
@@ -31,7 +31,6 @@
#include <linux/pci.h>
#include <linux/dmar.h>
#include <linux/dma-mapping.h>
-#include <linux/dma-direct.h>
#include <linux/mempool.h>
#include <linux/memory.h>
#include <linux/cpu.h>
@@ -3709,30 +3708,61 @@ static void *intel_alloc_coherent(struct
dma_addr_t *dma_handle, gfp_t flags,
unsigned long attrs)
{
- void *vaddr;
+ struct page *page = NULL;
+ int order;

- vaddr = dma_direct_alloc(dev, size, dma_handle, flags, attrs);
- if (iommu_no_mapping(dev) || !vaddr)
- return vaddr;
-
- *dma_handle = __intel_map_single(dev, virt_to_phys(vaddr),
- PAGE_ALIGN(size), DMA_BIDIRECTIONAL,
- dev->coherent_dma_mask);
- if (!*dma_handle)
- goto out_free_pages;
- return vaddr;
+ size = PAGE_ALIGN(size);
+ order = get_order(size);
+
+ if (!iommu_no_mapping(dev))
+ flags &= ~(GFP_DMA | GFP_DMA32);
+ else if (dev->coherent_dma_mask < dma_get_required_mask(dev)) {
+ if (dev->coherent_dma_mask < DMA_BIT_MASK(32))
+ flags |= GFP_DMA;
+ else
+ flags |= GFP_DMA32;
+ }
+
+ if (gfpflags_allow_blocking(flags)) {
+ unsigned int count = size >> PAGE_SHIFT;
+
+ page = dma_alloc_from_contiguous(dev, count, order, flags);
+ if (page && iommu_no_mapping(dev) &&
+ page_to_phys(page) + size > dev->coherent_dma_mask) {
+ dma_release_from_contiguous(dev, page, count);
+ page = NULL;
+ }
+ }
+
+ if (!page)
+ page = alloc_pages(flags, order);
+ if (!page)
+ return NULL;
+ memset(page_address(page), 0, size);
+
+ *dma_handle = __intel_map_single(dev, page_to_phys(page), size,
+ DMA_BIDIRECTIONAL,
+ dev->coherent_dma_mask);
+ if (*dma_handle)
+ return page_address(page);
+ if (!dma_release_from_contiguous(dev, page, size >> PAGE_SHIFT))
+ __free_pages(page, order);

-out_free_pages:
- dma_direct_free(dev, size, vaddr, *dma_handle, attrs);
return NULL;
}

static void intel_free_coherent(struct device *dev, size_t size, void *vaddr,
dma_addr_t dma_handle, unsigned long attrs)
{
- if (!iommu_no_mapping(dev))
- intel_unmap(dev, dma_handle, PAGE_ALIGN(size));
- dma_direct_free(dev, size, vaddr, dma_handle, attrs);
+ int order;
+ struct page *page = virt_to_page(vaddr);
+
+ size = PAGE_ALIGN(size);
+ order = get_order(size);
+
+ intel_unmap(dev, dma_handle, size);
+ if (!dma_release_from_contiguous(dev, page, size >> PAGE_SHIFT))
+ __free_pages(page, order);
}

static void intel_unmap_sg(struct device *dev, struct scatterlist *sglist,



2018-07-27 09:48:31

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.17 03/66] MIPS: ath79: fix register address in ath79_ddr_wb_flush()

4.17-stable review patch. If anyone has any objections, please let me know.

------------------

From: Felix Fietkau <[email protected]>

commit bc88ad2efd11f29e00a4fd60fcd1887abfe76833 upstream.

ath79_ddr_wb_flush_base has the type void __iomem *, so register offsets
need to be a multiple of 4 in order to access the intended register.

Signed-off-by: Felix Fietkau <[email protected]>
Signed-off-by: John Crispin <[email protected]>
Signed-off-by: Paul Burton <[email protected]>
Fixes: 24b0e3e84fbf ("MIPS: ath79: Improve the DDR controller interface")
Patchwork: https://patchwork.linux-mips.org/patch/19912/
Cc: Alban Bedel <[email protected]>
Cc: James Hogan <[email protected]>
Cc: Ralf Baechle <[email protected]>
Cc: [email protected]
Cc: [email protected] # 4.2+
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
arch/mips/ath79/common.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

--- a/arch/mips/ath79/common.c
+++ b/arch/mips/ath79/common.c
@@ -58,7 +58,7 @@ EXPORT_SYMBOL_GPL(ath79_ddr_ctrl_init);

void ath79_ddr_wb_flush(u32 reg)
{
- void __iomem *flush_reg = ath79_ddr_wb_flush_base + reg;
+ void __iomem *flush_reg = ath79_ddr_wb_flush_base + (reg * 4);

/* Flush the DDR write buffer. */
__raw_writel(0x1, flush_reg);



2018-07-27 09:48:33

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.17 04/66] MIPS: Fix off-by-one in pci_resource_to_user()

4.17-stable review patch. If anyone has any objections, please let me know.

------------------

From: Paul Burton <[email protected]>

commit 38c0a74fe06da3be133cae3fb7bde6a9438e698b upstream.

The MIPS implementation of pci_resource_to_user() introduced in v3.12 by
commit 4c2924b725fb ("MIPS: PCI: Use pci_resource_to_user to map pci
memory space properly") incorrectly sets *end to the address of the
byte after the resource, rather than the last byte of the resource.

This results in userland seeing resources as a byte larger than they
actually are, for example a 32 byte BAR will be reported by a tool such
as lspci as being 33 bytes in size:

Region 2: I/O ports at 1000 [disabled] [size=33]

Correct this by subtracting one from the calculated end address,
reporting the correct address to userland.

Signed-off-by: Paul Burton <[email protected]>
Reported-by: Rui Wang <[email protected]>
Fixes: 4c2924b725fb ("MIPS: PCI: Use pci_resource_to_user to map pci memory space properly")
Cc: James Hogan <[email protected]>
Cc: Ralf Baechle <[email protected]>
Cc: Wolfgang Grandegger <[email protected]>
Cc: [email protected]
Cc: [email protected] # v3.12+
Patchwork: https://patchwork.linux-mips.org/patch/19829/
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
arch/mips/pci/pci.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

--- a/arch/mips/pci/pci.c
+++ b/arch/mips/pci/pci.c
@@ -54,5 +54,5 @@ void pci_resource_to_user(const struct p
phys_addr_t size = resource_size(rsrc);

*start = fixup_bigphys_addr(rsrc->start, size);
- *end = rsrc->start + size;
+ *end = rsrc->start + size - 1;
}



2018-07-27 09:48:37

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.17 05/66] clk: mvebu: armada-37xx-periph: Fix switching CPU rate from 300Mhz to 1.2GHz

4.17-stable review patch. If anyone has any objections, please let me know.

------------------

From: Gregory CLEMENT <[email protected]>

commit 61c40f35f5cd6f67ccbd7319a1722eb78c815989 upstream.

Switching the CPU from the L2 or L3 frequencies (300 and 200 Mhz
respectively) to L0 frequency (1.2 Ghz) requires a significant amount
of time to let VDD stabilize to the appropriate voltage. This amount of
time is large enough that it cannot be covered by the hardware
countdown register. Due to this, the CPU might start operating at L0
before the voltage is stabilized, leading to CPU stalls.

To work around this problem, we prevent switching directly from the
L2/L3 frequencies to the L0 frequency, and instead switch to the L1
frequency in-between. The sequence therefore becomes:

1. First switch from L2/L3(200/300MHz) to L1(600MHZ)
2. Sleep 20ms for stabling VDD voltage
3. Then switch from L1(600MHZ) to L0(1200Mhz).

It is based on the work done by Ken Ma <[email protected]>

Cc: [email protected]
Fixes: 2089dc33ea0e ("clk: mvebu: armada-37xx-periph: add DVFS support for cpu clocks")
Signed-off-by: Gregory CLEMENT <[email protected]>
Signed-off-by: Stephen Boyd <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
drivers/clk/mvebu/armada-37xx-periph.c | 38 +++++++++++++++++++++++++++++++++
1 file changed, 38 insertions(+)

--- a/drivers/clk/mvebu/armada-37xx-periph.c
+++ b/drivers/clk/mvebu/armada-37xx-periph.c
@@ -35,6 +35,7 @@
#define CLK_SEL 0x10
#define CLK_DIS 0x14

+#define ARMADA_37XX_DVFS_LOAD_1 1
#define LOAD_LEVEL_NR 4

#define ARMADA_37XX_NB_L0L1 0x18
@@ -507,6 +508,40 @@ static long clk_pm_cpu_round_rate(struct
return -EINVAL;
}

+/*
+ * Switching the CPU from the L2 or L3 frequencies (300 and 200 Mhz
+ * respectively) to L0 frequency (1.2 Ghz) requires a significant
+ * amount of time to let VDD stabilize to the appropriate
+ * voltage. This amount of time is large enough that it cannot be
+ * covered by the hardware countdown register. Due to this, the CPU
+ * might start operating at L0 before the voltage is stabilized,
+ * leading to CPU stalls.
+ *
+ * To work around this problem, we prevent switching directly from the
+ * L2/L3 frequencies to the L0 frequency, and instead switch to the L1
+ * frequency in-between. The sequence therefore becomes:
+ * 1. First switch from L2/L3(200/300MHz) to L1(600MHZ)
+ * 2. Sleep 20ms for stabling VDD voltage
+ * 3. Then switch from L1(600MHZ) to L0(1200Mhz).
+ */
+static void clk_pm_cpu_set_rate_wa(unsigned long rate, struct regmap *base)
+{
+ unsigned int cur_level;
+
+ if (rate != 1200 * 1000 * 1000)
+ return;
+
+ regmap_read(base, ARMADA_37XX_NB_CPU_LOAD, &cur_level);
+ cur_level &= ARMADA_37XX_NB_CPU_LOAD_MASK;
+ if (cur_level <= ARMADA_37XX_DVFS_LOAD_1)
+ return;
+
+ regmap_update_bits(base, ARMADA_37XX_NB_CPU_LOAD,
+ ARMADA_37XX_NB_CPU_LOAD_MASK,
+ ARMADA_37XX_DVFS_LOAD_1);
+ msleep(20);
+}
+
static int clk_pm_cpu_set_rate(struct clk_hw *hw, unsigned long rate,
unsigned long parent_rate)
{
@@ -537,6 +572,9 @@ static int clk_pm_cpu_set_rate(struct cl
*/
reg = ARMADA_37XX_NB_CPU_LOAD;
mask = ARMADA_37XX_NB_CPU_LOAD_MASK;
+
+ clk_pm_cpu_set_rate_wa(rate, base);
+
regmap_update_bits(base, reg, mask, load_level);

return rate;



2018-07-27 09:48:39

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.17 01/66] KVM: VMX: support MSR_IA32_ARCH_CAPABILITIES as a feature MSR

4.17-stable review patch. If anyone has any objections, please let me know.

------------------

From: Paolo Bonzini <[email protected]>

commit cd28325249a1ca0d771557ce823e0308ad629f98 upstream.

This lets userspace read the MSR_IA32_ARCH_CAPABILITIES and check that all
requested features are available on the host.

Signed-off-by: Paolo Bonzini <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
arch/x86/kvm/x86.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)

--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -1092,6 +1092,7 @@ static u32 msr_based_features[] = {

MSR_F10H_DECFG,
MSR_IA32_UCODE_REV,
+ MSR_IA32_ARCH_CAPABILITIES,
};

static unsigned int num_msr_based_features;
@@ -1100,7 +1101,8 @@ static int kvm_get_msr_feature(struct kv
{
switch (msr->index) {
case MSR_IA32_UCODE_REV:
- rdmsrl(msr->index, msr->data);
+ case MSR_IA32_ARCH_CAPABILITIES:
+ rdmsrl_safe(msr->index, &msr->data);
break;
default:
if (kvm_x86_ops->get_msr_feature(msr))



2018-07-27 09:48:40

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.17 06/66] clk: aspeed: Mark bclk (PCIe) and dclk (VGA) as critical

4.17-stable review patch. If anyone has any objections, please let me know.

------------------

From: Joel Stanley <[email protected]>

commit 974c7c6d7ba5a4b12d99456b0599aa6326dc2b69 upstream.

This is used by the host to talk to the BMC's PCIe slave device. The BMC
is not involved, but the clock needs to be enabled so the host can use
the device.

Fixes: 15ed8ce5f84e ("clk: aspeed: Register gated clocks")
Cc: [email protected] # 4.15
Acked-by: Andrew Jeffery <[email protected]>
Tested-by: Lei YU <[email protected]>
Signed-off-by: Joel Stanley <[email protected]>
Signed-off-by: Stephen Boyd <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
drivers/clk/clk-aspeed.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)

--- a/drivers/clk/clk-aspeed.c
+++ b/drivers/clk/clk-aspeed.c
@@ -88,8 +88,8 @@ static const struct aspeed_gate_data asp
[ASPEED_CLK_GATE_GCLK] = { 1, 7, "gclk-gate", NULL, 0 }, /* 2D engine */
[ASPEED_CLK_GATE_MCLK] = { 2, -1, "mclk-gate", "mpll", CLK_IS_CRITICAL }, /* SDRAM */
[ASPEED_CLK_GATE_VCLK] = { 3, 6, "vclk-gate", NULL, 0 }, /* Video Capture */
- [ASPEED_CLK_GATE_BCLK] = { 4, 8, "bclk-gate", "bclk", 0 }, /* PCIe/PCI */
- [ASPEED_CLK_GATE_DCLK] = { 5, -1, "dclk-gate", NULL, 0 }, /* DAC */
+ [ASPEED_CLK_GATE_BCLK] = { 4, 8, "bclk-gate", "bclk", CLK_IS_CRITICAL }, /* PCIe/PCI */
+ [ASPEED_CLK_GATE_DCLK] = { 5, -1, "dclk-gate", NULL, CLK_IS_CRITICAL }, /* DAC */
[ASPEED_CLK_GATE_REFCLK] = { 6, -1, "refclk-gate", "clkin", CLK_IS_CRITICAL },
[ASPEED_CLK_GATE_USBPORT2CLK] = { 7, 3, "usb-port2-gate", NULL, 0 }, /* USB2.0 Host port 2 */
[ASPEED_CLK_GATE_LCLK] = { 8, 5, "lclk-gate", NULL, 0 }, /* LPC */



2018-07-27 09:49:16

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.17 08/66] xen/PVH: Set up GS segment for stack canary

4.17-stable review patch. If anyone has any objections, please let me know.

------------------

From: Boris Ostrovsky <[email protected]>

commit 98014068328c5574de9a4a30b604111fd9d8f901 upstream.

We are making calls to C code (e.g. xen_prepare_pvh()) which may use
stack canary (stored in GS segment).

Signed-off-by: Boris Ostrovsky <[email protected]>
Reviewed-by: Juergen Gross <[email protected]>
Signed-off-by: Juergen Gross <[email protected]>
Cc: Jason Andryuk <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
arch/x86/xen/xen-pvh.S | 26 +++++++++++++++++++++++++-
1 file changed, 25 insertions(+), 1 deletion(-)

--- a/arch/x86/xen/xen-pvh.S
+++ b/arch/x86/xen/xen-pvh.S
@@ -54,6 +54,9 @@
* charge of setting up it's own stack, GDT and IDT.
*/

+#define PVH_GDT_ENTRY_CANARY 4
+#define PVH_CANARY_SEL (PVH_GDT_ENTRY_CANARY * 8)
+
ENTRY(pvh_start_xen)
cld

@@ -98,6 +101,12 @@ ENTRY(pvh_start_xen)
/* 64-bit entry point. */
.code64
1:
+ /* Set base address in stack canary descriptor. */
+ mov $MSR_GS_BASE,%ecx
+ mov $_pa(canary), %eax
+ xor %edx, %edx
+ wrmsr
+
call xen_prepare_pvh

/* startup_64 expects boot_params in %rsi. */
@@ -107,6 +116,17 @@ ENTRY(pvh_start_xen)

#else /* CONFIG_X86_64 */

+ /* Set base address in stack canary descriptor. */
+ movl $_pa(gdt_start),%eax
+ movl $_pa(canary),%ecx
+ movw %cx, (PVH_GDT_ENTRY_CANARY * 8) + 2(%eax)
+ shrl $16, %ecx
+ movb %cl, (PVH_GDT_ENTRY_CANARY * 8) + 4(%eax)
+ movb %ch, (PVH_GDT_ENTRY_CANARY * 8) + 7(%eax)
+
+ mov $PVH_CANARY_SEL,%eax
+ mov %eax,%gs
+
call mk_early_pgtbl_32

mov $_pa(initial_page_table), %eax
@@ -150,9 +170,13 @@ gdt_start:
.quad GDT_ENTRY(0xc09a, 0, 0xfffff) /* __KERNEL_CS */
#endif
.quad GDT_ENTRY(0xc092, 0, 0xfffff) /* __KERNEL_DS */
+ .quad GDT_ENTRY(0x4090, 0, 0x18) /* PVH_CANARY_SEL */
gdt_end:

- .balign 4
+ .balign 16
+canary:
+ .fill 48, 1, 0
+
early_stack:
.fill 256, 1, 0
early_stack_end:



2018-07-27 09:49:23

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.17 09/66] KVM: PPC: Check if IOMMU page is contained in the pinned physical page

4.17-stable review patch. If anyone has any objections, please let me know.

------------------

From: Alexey Kardashevskiy <[email protected]>

commit 76fa4975f3ed12d15762bc979ca44078598ed8ee upstream.

A VM which has:
- a DMA capable device passed through to it (eg. network card);
- running a malicious kernel that ignores H_PUT_TCE failure;
- capability of using IOMMU pages bigger that physical pages
can create an IOMMU mapping that exposes (for example) 16MB of
the host physical memory to the device when only 64K was allocated to the VM.

The remaining 16MB - 64K will be some other content of host memory, possibly
including pages of the VM, but also pages of host kernel memory, host
programs or other VMs.

The attacking VM does not control the location of the page it can map,
and is only allowed to map as many pages as it has pages of RAM.

We already have a check in drivers/vfio/vfio_iommu_spapr_tce.c that
an IOMMU page is contained in the physical page so the PCI hardware won't
get access to unassigned host memory; however this check is missing in
the KVM fastpath (H_PUT_TCE accelerated code). We were lucky so far and
did not hit this yet as the very first time when the mapping happens
we do not have tbl::it_userspace allocated yet and fall back to
the userspace which in turn calls VFIO IOMMU driver, this fails and
the guest does not retry,

This stores the smallest preregistered page size in the preregistered
region descriptor and changes the mm_iommu_xxx API to check this against
the IOMMU page size.

This calculates maximum page size as a minimum of the natural region
alignment and compound page size. For the page shift this uses the shift
returned by find_linux_pte() which indicates how the page is mapped to
the current userspace - if the page is huge and this is not a zero, then
it is a leaf pte and the page is mapped within the range.

Fixes: 121f80ba68f1 ("KVM: PPC: VFIO: Add in-kernel acceleration for VFIO")
Cc: [email protected] # v4.12+
Signed-off-by: Alexey Kardashevskiy <[email protected]>
Reviewed-by: David Gibson <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
Signed-off-by: Alexey Kardashevskiy <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>


---
arch/powerpc/include/asm/mmu_context.h | 4 +--
arch/powerpc/kvm/book3s_64_vio.c | 2 -
arch/powerpc/kvm/book3s_64_vio_hv.c | 6 +++--
arch/powerpc/mm/mmu_context_iommu.c | 37 +++++++++++++++++++++++++++++++--
drivers/vfio/vfio_iommu_spapr_tce.c | 2 -
5 files changed, 43 insertions(+), 8 deletions(-)

--- a/arch/powerpc/include/asm/mmu_context.h
+++ b/arch/powerpc/include/asm/mmu_context.h
@@ -35,9 +35,9 @@ extern struct mm_iommu_table_group_mem_t
extern struct mm_iommu_table_group_mem_t *mm_iommu_find(struct mm_struct *mm,
unsigned long ua, unsigned long entries);
extern long mm_iommu_ua_to_hpa(struct mm_iommu_table_group_mem_t *mem,
- unsigned long ua, unsigned long *hpa);
+ unsigned long ua, unsigned int pageshift, unsigned long *hpa);
extern long mm_iommu_ua_to_hpa_rm(struct mm_iommu_table_group_mem_t *mem,
- unsigned long ua, unsigned long *hpa);
+ unsigned long ua, unsigned int pageshift, unsigned long *hpa);
extern long mm_iommu_mapped_inc(struct mm_iommu_table_group_mem_t *mem);
extern void mm_iommu_mapped_dec(struct mm_iommu_table_group_mem_t *mem);
#endif
--- a/arch/powerpc/kvm/book3s_64_vio.c
+++ b/arch/powerpc/kvm/book3s_64_vio.c
@@ -433,7 +433,7 @@ long kvmppc_tce_iommu_map(struct kvm *kv
/* This only handles v2 IOMMU type, v1 is handled via ioctl() */
return H_TOO_HARD;

- if (WARN_ON_ONCE(mm_iommu_ua_to_hpa(mem, ua, &hpa)))
+ if (WARN_ON_ONCE(mm_iommu_ua_to_hpa(mem, ua, tbl->it_page_shift, &hpa)))
return H_HARDWARE;

if (mm_iommu_mapped_inc(mem))
--- a/arch/powerpc/kvm/book3s_64_vio_hv.c
+++ b/arch/powerpc/kvm/book3s_64_vio_hv.c
@@ -262,7 +262,8 @@ static long kvmppc_rm_tce_iommu_map(stru
if (!mem)
return H_TOO_HARD;

- if (WARN_ON_ONCE_RM(mm_iommu_ua_to_hpa_rm(mem, ua, &hpa)))
+ if (WARN_ON_ONCE_RM(mm_iommu_ua_to_hpa_rm(mem, ua, tbl->it_page_shift,
+ &hpa)))
return H_HARDWARE;

pua = (void *) vmalloc_to_phys(pua);
@@ -431,7 +432,8 @@ long kvmppc_rm_h_put_tce_indirect(struct

mem = mm_iommu_lookup_rm(vcpu->kvm->mm, ua, IOMMU_PAGE_SIZE_4K);
if (mem)
- prereg = mm_iommu_ua_to_hpa_rm(mem, ua, &tces) == 0;
+ prereg = mm_iommu_ua_to_hpa_rm(mem, ua,
+ IOMMU_PAGE_SHIFT_4K, &tces) == 0;
}

if (!prereg) {
--- a/arch/powerpc/mm/mmu_context_iommu.c
+++ b/arch/powerpc/mm/mmu_context_iommu.c
@@ -19,6 +19,7 @@
#include <linux/hugetlb.h>
#include <linux/swap.h>
#include <asm/mmu_context.h>
+#include <asm/pte-walk.h>

static DEFINE_MUTEX(mem_list_mutex);

@@ -27,6 +28,7 @@ struct mm_iommu_table_group_mem_t {
struct rcu_head rcu;
unsigned long used;
atomic64_t mapped;
+ unsigned int pageshift;
u64 ua; /* userspace address */
u64 entries; /* number of entries in hpas[] */
u64 *hpas; /* vmalloc'ed */
@@ -125,6 +127,8 @@ long mm_iommu_get(struct mm_struct *mm,
{
struct mm_iommu_table_group_mem_t *mem;
long i, j, ret = 0, locked_entries = 0;
+ unsigned int pageshift;
+ unsigned long flags;
struct page *page = NULL;

mutex_lock(&mem_list_mutex);
@@ -159,6 +163,12 @@ long mm_iommu_get(struct mm_struct *mm,
goto unlock_exit;
}

+ /*
+ * For a starting point for a maximum page size calculation
+ * we use @ua and @entries natural alignment to allow IOMMU pages
+ * smaller than huge pages but still bigger than PAGE_SIZE.
+ */
+ mem->pageshift = __ffs(ua | (entries << PAGE_SHIFT));
mem->hpas = vzalloc(entries * sizeof(mem->hpas[0]));
if (!mem->hpas) {
kfree(mem);
@@ -199,6 +209,23 @@ long mm_iommu_get(struct mm_struct *mm,
}
}
populate:
+ pageshift = PAGE_SHIFT;
+ if (PageCompound(page)) {
+ pte_t *pte;
+ struct page *head = compound_head(page);
+ unsigned int compshift = compound_order(head);
+
+ local_irq_save(flags); /* disables as well */
+ pte = find_linux_pte(mm->pgd, ua, NULL, &pageshift);
+ local_irq_restore(flags);
+
+ /* Double check it is still the same pinned page */
+ if (pte && pte_page(*pte) == head &&
+ pageshift == compshift)
+ pageshift = max_t(unsigned int, pageshift,
+ PAGE_SHIFT);
+ }
+ mem->pageshift = min(mem->pageshift, pageshift);
mem->hpas[i] = page_to_pfn(page) << PAGE_SHIFT;
}

@@ -349,7 +376,7 @@ struct mm_iommu_table_group_mem_t *mm_io
EXPORT_SYMBOL_GPL(mm_iommu_find);

long mm_iommu_ua_to_hpa(struct mm_iommu_table_group_mem_t *mem,
- unsigned long ua, unsigned long *hpa)
+ unsigned long ua, unsigned int pageshift, unsigned long *hpa)
{
const long entry = (ua - mem->ua) >> PAGE_SHIFT;
u64 *va = &mem->hpas[entry];
@@ -357,6 +384,9 @@ long mm_iommu_ua_to_hpa(struct mm_iommu_
if (entry >= mem->entries)
return -EFAULT;

+ if (pageshift > mem->pageshift)
+ return -EFAULT;
+
*hpa = *va | (ua & ~PAGE_MASK);

return 0;
@@ -364,7 +394,7 @@ long mm_iommu_ua_to_hpa(struct mm_iommu_
EXPORT_SYMBOL_GPL(mm_iommu_ua_to_hpa);

long mm_iommu_ua_to_hpa_rm(struct mm_iommu_table_group_mem_t *mem,
- unsigned long ua, unsigned long *hpa)
+ unsigned long ua, unsigned int pageshift, unsigned long *hpa)
{
const long entry = (ua - mem->ua) >> PAGE_SHIFT;
void *va = &mem->hpas[entry];
@@ -373,6 +403,9 @@ long mm_iommu_ua_to_hpa_rm(struct mm_iom
if (entry >= mem->entries)
return -EFAULT;

+ if (pageshift > mem->pageshift)
+ return -EFAULT;
+
pa = (void *) vmalloc_to_phys(va);
if (!pa)
return -EFAULT;
--- a/drivers/vfio/vfio_iommu_spapr_tce.c
+++ b/drivers/vfio/vfio_iommu_spapr_tce.c
@@ -467,7 +467,7 @@ static int tce_iommu_prereg_ua_to_hpa(st
if (!mem)
return -EINVAL;

- ret = mm_iommu_ua_to_hpa(mem, tce, phpa);
+ ret = mm_iommu_ua_to_hpa(mem, tce, shift, phpa);
if (ret)
return -EINVAL;




2018-07-27 09:49:29

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.17 25/66] r8169: restore previous behavior to accept BIOS WoL settings

4.17-stable review patch. If anyone has any objections, please let me know.

------------------

From: Heiner Kallweit <[email protected]>

[ Upstream commit 18041b523692038d41751fd8046638c356d77a36 ]

Commit 7edf6d314cd0 tried to resolve an inconsistency (BIOS WoL
settings are accepted, but device isn't wakeup-enabled) resulting
from a previous broken-BIOS workaround by making disabled WoL the
default.
This however had some side effects, most likely due to a broken BIOS
some systems don't properly resume from suspend when the MagicPacket
WoL bit isn't set in the chip, see
https://bugzilla.kernel.org/show_bug.cgi?id=200195
Therefore restore the WoL behavior from 4.16.

Reported-by: Albert Astals Cid <[email protected]>
Fixes: 7edf6d314cd0 ("r8169: disable WOL per default")
Signed-off-by: Heiner Kallweit <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
drivers/net/ethernet/realtek/r8169.c | 3 +--
1 file changed, 1 insertion(+), 2 deletions(-)

--- a/drivers/net/ethernet/realtek/r8169.c
+++ b/drivers/net/ethernet/realtek/r8169.c
@@ -8272,8 +8272,7 @@ static int rtl_init_one(struct pci_dev *
return rc;
}

- /* override BIOS settings, use userspace tools to enable WOL */
- __rtl8169_set_wol(tp, 0);
+ tp->saved_wolopts = __rtl8169_get_wol(tp);

if (rtl_tbi_enabled(tp)) {
tp->set_speed = rtl8169_set_speed_tbi;



2018-07-27 09:49:32

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.17 26/66] tls: check RCV_SHUTDOWN in tls_wait_data

4.17-stable review patch. If anyone has any objections, please let me know.

------------------

From: Doron Roberts-Kedes <[email protected]>

[ Upstream commit fcf4793e278edede8fcd748198d12128037e526c ]

The current code does not check sk->sk_shutdown & RCV_SHUTDOWN.
tls_sw_recvmsg may return a positive value in the case where bytes have
already been copied when the socket is shutdown. sk->sk_err has been
cleared, causing the tls_wait_data to hang forever on a subsequent
invocation. Checking sk->sk_shutdown & RCV_SHUTDOWN, as in tcp_recvmsg,
fixes this problem.

Fixes: c46234ebb4d1 ("tls: RX path for ktls")
Acked-by: Dave Watson <[email protected]>
Signed-off-by: Doron Roberts-Kedes <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
net/tls/tls_sw.c | 3 +++
1 file changed, 3 insertions(+)

--- a/net/tls/tls_sw.c
+++ b/net/tls/tls_sw.c
@@ -646,6 +646,9 @@ static struct sk_buff *tls_wait_data(str
return NULL;
}

+ if (sk->sk_shutdown & RCV_SHUTDOWN)
+ return NULL;
+
if (sock_flag(sk, SOCK_DONE))
return NULL;




2018-07-27 09:49:41

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.17 28/66] net/mlx5e: Only allow offloading decap egress (egdev) flows

4.17-stable review patch. If anyone has any objections, please let me know.

------------------

From: Roi Dayan <[email protected]>

[ Upstream commit 7e29392eee7a1e3318eeb1099807264a49f60e33 ]

We get egress rules through the egdev mechanism when the ingress device
is not supporting offload, with the expected use-case of tunnel decap
ingress rule set on shared tunnel device.

Make sure to offload egress/egdev rules only if decap action (tunnel key
unset) exists there and err otherwise.

Fixes: 717503b9cf57 ("net: sched: convert cls_flower->egress_dev users to tc_setup_cb_egdev infra")
Signed-off-by: Roi Dayan <[email protected]>
Signed-off-by: Paul Blakey <[email protected]>
Reviewed-by: Or Gerlitz <[email protected]>
Signed-off-by: Saeed Mahameed <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
drivers/net/ethernet/mellanox/mlx5/core/en_tc.c | 4 ++++
1 file changed, 4 insertions(+)

--- a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
@@ -1894,6 +1894,10 @@ static bool actions_match_supported(stru
else
actions = flow->nic_attr->action;

+ if (flow->flags & MLX5E_TC_FLOW_EGRESS &&
+ !(actions & MLX5_FLOW_CONTEXT_ACTION_DECAP))
+ return false;
+
if (actions & MLX5_FLOW_CONTEXT_ACTION_MOD_HDR)
return modify_header_match_supported(&parse_attr->spec, exts);




2018-07-27 09:49:42

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.17 30/66] nfp: flower: ensure dead neighbour entries are not offloaded

4.17-stable review patch. If anyone has any objections, please let me know.

------------------

From: John Hurley <[email protected]>

[ Upstream commit b809ec869b2cf2af053ffd99e5a46ab600e94aa2 ]

Previously only the neighbour state was checked to decide if an offloaded
entry should be removed. However, there can be situations when the entry
is dead but still marked as valid. This can lead to dead entries not
being removed from fw tables or even incorrect data being added.

Check the entry dead bit before deciding if it should be added to or
removed from fw neighbour tables.

Fixes: 8e6a9046b66a ("nfp: flower vxlan neighbour offload")
Signed-off-by: John Hurley <[email protected]>
Reviewed-by: Jakub Kicinski <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
drivers/net/ethernet/netronome/nfp/flower/tunnel_conf.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

--- a/drivers/net/ethernet/netronome/nfp/flower/tunnel_conf.c
+++ b/drivers/net/ethernet/netronome/nfp/flower/tunnel_conf.c
@@ -317,7 +317,7 @@ nfp_tun_write_neigh(struct net_device *n
payload.dst_ipv4 = flow->daddr;

/* If entry has expired send dst IP with all other fields 0. */
- if (!(neigh->nud_state & NUD_VALID)) {
+ if (!(neigh->nud_state & NUD_VALID) || neigh->dead) {
nfp_tun_del_route_from_cache(app, payload.dst_ipv4);
/* Trigger ARP to verify invalid neighbour state. */
neigh_event_send(neigh, NULL);



2018-07-27 09:49:42

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.17 29/66] net/mlx5e: Refine ets validation function

4.17-stable review patch. If anyone has any objections, please let me know.

------------------

From: Shay Agroskin <[email protected]>

[ Upstream commit e279d634f3d57452eb106a0c0e99a6add3fba1a6 ]

Removed an error message received when configuring ETS total
bandwidth to be zero.
Our hardware doesn't support such configuration, so we shall
reject it in the driver. Nevertheless, we removed the error message
in order to eliminate error messages caused by old userspace tools
who try to pass such configuration.

Fixes: ff0891915cd7 ("net/mlx5e: Fix ETS BW check")
Signed-off-by: Shay Agroskin <[email protected]>
Reviewed-by: Huy Nguyen <[email protected]>
Reviewed-by: Eran Ben Elisha <[email protected]>
Signed-off-by: Saeed Mahameed <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
drivers/net/ethernet/mellanox/mlx5/core/en_dcbnl.c | 17 ++++++++---------
1 file changed, 8 insertions(+), 9 deletions(-)

--- a/drivers/net/ethernet/mellanox/mlx5/core/en_dcbnl.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_dcbnl.c
@@ -272,7 +272,8 @@ int mlx5e_dcbnl_ieee_setets_core(struct
}

static int mlx5e_dbcnl_validate_ets(struct net_device *netdev,
- struct ieee_ets *ets)
+ struct ieee_ets *ets,
+ bool zero_sum_allowed)
{
bool have_ets_tc = false;
int bw_sum = 0;
@@ -297,8 +298,9 @@ static int mlx5e_dbcnl_validate_ets(stru
}

if (have_ets_tc && bw_sum != 100) {
- netdev_err(netdev,
- "Failed to validate ETS: BW sum is illegal\n");
+ if (bw_sum || (!bw_sum && !zero_sum_allowed))
+ netdev_err(netdev,
+ "Failed to validate ETS: BW sum is illegal\n");
return -EINVAL;
}
return 0;
@@ -313,7 +315,7 @@ static int mlx5e_dcbnl_ieee_setets(struc
if (!MLX5_CAP_GEN(priv->mdev, ets))
return -EOPNOTSUPP;

- err = mlx5e_dbcnl_validate_ets(netdev, ets);
+ err = mlx5e_dbcnl_validate_ets(netdev, ets, false);
if (err)
return err;

@@ -613,12 +615,9 @@ static u8 mlx5e_dcbnl_setall(struct net_
ets.prio_tc[i]);
}

- err = mlx5e_dbcnl_validate_ets(netdev, &ets);
- if (err) {
- netdev_err(netdev,
- "%s, Failed to validate ETS: %d\n", __func__, err);
+ err = mlx5e_dbcnl_validate_ets(netdev, &ets, true);
+ if (err)
goto out;
- }

err = mlx5e_dcbnl_ieee_setets_core(priv, &ets);
if (err) {



2018-07-27 09:50:03

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.17 24/66] net/mlx5: E-Switch, UBSAN fix undefined behavior in mlx5_eswitch_mode

4.17-stable review patch. If anyone has any objections, please let me know.

------------------

From: Saeed Mahameed <[email protected]>

[ Upstream commit 443a858158d35916e572b75667ca4924a6af2182 ]

With debug kernel UBSAN detects the following issue, which might happen
when eswitch instance is not created, fix this by testing the eswitch
pointer before returning the eswitch mode, if not set return mode =
SRIOV_NONE.

[ 32.528951] UBSAN: Undefined behaviour in drivers/net/ethernet/mellanox/mlx5/core/eswitch.c:2219:12
[ 32.528951] member access within null pointer of type 'struct mlx5_eswitch'
[ 32.528951] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 4.18.0-rc3-dirty #181
[ 32.528951] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.11.0-0-g63451fca13-prebuilt.qemu-project.org 04/01/2014
[ 32.528951] Call Trace:
[ 32.528951] dump_stack+0xc7/0x13b
[ 32.528951] ? show_regs_print_info+0x5/0x5
[ 32.528951] ? __pm_runtime_use_autosuspend+0x140/0x140
[ 32.528951] ubsan_epilogue+0x9/0x49
[ 32.528951] ubsan_type_mismatch_common+0x1f9/0x2c0
[ 32.528951] ? ucs2_as_utf8+0x310/0x310
[ 32.528951] ? device_initialize+0x229/0x2e0
[ 32.528951] __ubsan_handle_type_mismatch+0x9f/0xc9
[ 32.528951] ? __ubsan_handle_divrem_overflow+0x19b/0x19b
[ 32.578008] ? ib_device_get_by_index+0xf0/0xf0
[ 32.578008] mlx5_eswitch_mode+0x30/0x40
[ 32.578008] mlx5_ib_add+0x1e0/0x4a0

Fixes: 57cbd893c4c5 ("net/mlx5: E-Switch, Move representors definition to a global scope")
Signed-off-by: Saeed Mahameed <[email protected]>
Reviewed-by: Leon Romanovsky <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
drivers/net/ethernet/mellanox/mlx5/core/eswitch.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

--- a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c
@@ -2221,6 +2221,6 @@ free_out:

u8 mlx5_eswitch_mode(struct mlx5_eswitch *esw)
{
- return esw->mode;
+ return ESW_ALLOWED(esw) ? esw->mode : SRIOV_NONE;
}
EXPORT_SYMBOL_GPL(mlx5_eswitch_mode);



2018-07-27 09:50:04

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.17 15/66] ip: in cmsg IP(V6)_ORIGDSTADDR call pskb_may_pull

4.17-stable review patch. If anyone has any objections, please let me know.

------------------

From: Willem de Bruijn <[email protected]>

[ Upstream commit 2efd4fca703a6707cad16ab486eaab8fc7f0fd49 ]

Syzbot reported a read beyond the end of the skb head when returning
IPV6_ORIGDSTADDR:

BUG: KMSAN: kernel-infoleak in put_cmsg+0x5ef/0x860 net/core/scm.c:242
CPU: 0 PID: 4501 Comm: syz-executor128 Not tainted 4.17.0+ #9
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
Google 01/01/2011
Call Trace:
__dump_stack lib/dump_stack.c:77 [inline]
dump_stack+0x185/0x1d0 lib/dump_stack.c:113
kmsan_report+0x188/0x2a0 mm/kmsan/kmsan.c:1125
kmsan_internal_check_memory+0x138/0x1f0 mm/kmsan/kmsan.c:1219
kmsan_copy_to_user+0x7a/0x160 mm/kmsan/kmsan.c:1261
copy_to_user include/linux/uaccess.h:184 [inline]
put_cmsg+0x5ef/0x860 net/core/scm.c:242
ip6_datagram_recv_specific_ctl+0x1cf3/0x1eb0 net/ipv6/datagram.c:719
ip6_datagram_recv_ctl+0x41c/0x450 net/ipv6/datagram.c:733
rawv6_recvmsg+0x10fb/0x1460 net/ipv6/raw.c:521
[..]

This logic and its ipv4 counterpart read the destination port from
the packet at skb_transport_offset(skb) + 4.

With MSG_MORE and a local SOCK_RAW sender, syzbot was able to cook a
packet that stores headers exactly up to skb_transport_offset(skb) in
the head and the remainder in a frag.

Call pskb_may_pull before accessing the pointer to ensure that it lies
in skb head.

Link: http://lkml.kernel.org/r/CAF=yD-LEJwZj5a1-bAAj2Oy_hKmGygV6rsJ_WOrAYnv-fnayiQ@mail.gmail.com
Reported-by: [email protected]
Signed-off-by: Willem de Bruijn <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
net/ipv4/ip_sockglue.c | 7 +++++--
net/ipv6/datagram.c | 7 +++++--
2 files changed, 10 insertions(+), 4 deletions(-)

--- a/net/ipv4/ip_sockglue.c
+++ b/net/ipv4/ip_sockglue.c
@@ -148,15 +148,18 @@ static void ip_cmsg_recv_dstaddr(struct
{
struct sockaddr_in sin;
const struct iphdr *iph = ip_hdr(skb);
- __be16 *ports = (__be16 *)skb_transport_header(skb);
+ __be16 *ports;
+ int end;

- if (skb_transport_offset(skb) + 4 > (int)skb->len)
+ end = skb_transport_offset(skb) + 4;
+ if (end > 0 && !pskb_may_pull(skb, end))
return;

/* All current transport protocols have the port numbers in the
* first four bytes of the transport header and this function is
* written with this assumption in mind.
*/
+ ports = (__be16 *)skb_transport_header(skb);

sin.sin_family = AF_INET;
sin.sin_addr.s_addr = iph->daddr;
--- a/net/ipv6/datagram.c
+++ b/net/ipv6/datagram.c
@@ -700,13 +700,16 @@ void ip6_datagram_recv_specific_ctl(stru
}
if (np->rxopt.bits.rxorigdstaddr) {
struct sockaddr_in6 sin6;
- __be16 *ports = (__be16 *) skb_transport_header(skb);
+ __be16 *ports;
+ int end;

- if (skb_transport_offset(skb) + 4 <= (int)skb->len) {
+ end = skb_transport_offset(skb) + 4;
+ if (end <= 0 || pskb_may_pull(skb, end)) {
/* All current transport protocols have the port numbers in the
* first four bytes of the transport header and this function is
* written with this assumption in mind.
*/
+ ports = (__be16 *)skb_transport_header(skb);

sin6.sin6_family = AF_INET6;
sin6.sin6_addr = ipv6_hdr(skb)->daddr;



2018-07-27 09:50:09

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.17 23/66] tcp: do not delay ACK in DCTCP upon CE status change

4.17-stable review patch. If anyone has any objections, please let me know.

------------------

From: Yuchung Cheng <[email protected]>

[ Upstream commit a0496ef2c23b3b180902dd185d0d63ccbc624cf8 ]

Per DCTCP RFC8257 (Section 3.2) the ACK reflecting the CE status change
has to be sent immediately so the sender can respond quickly:

""" When receiving packets, the CE codepoint MUST be processed as follows:

1. If the CE codepoint is set and DCTCP.CE is false, set DCTCP.CE to
true and send an immediate ACK.

2. If the CE codepoint is not set and DCTCP.CE is true, set DCTCP.CE
to false and send an immediate ACK.
"""

Previously DCTCP implementation may continue to delay the ACK. This
patch fixes that to implement the RFC by forcing an immediate ACK.

Tested with this packetdrill script provided by Larry Brakmo

0.000 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3
0.000 setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0
0.000 setsockopt(3, SOL_TCP, TCP_CONGESTION, "dctcp", 5) = 0
0.000 bind(3, ..., ...) = 0
0.000 listen(3, 1) = 0

0.100 < [ect0] SEW 0:0(0) win 32792 <mss 1000,sackOK,nop,nop,nop,wscale 7>
0.100 > SE. 0:0(0) ack 1 <mss 1460,nop,nop,sackOK,nop,wscale 8>
0.110 < [ect0] . 1:1(0) ack 1 win 257
0.200 accept(3, ..., ...) = 4
+0 setsockopt(4, SOL_SOCKET, SO_DEBUG, [1], 4) = 0

0.200 < [ect0] . 1:1001(1000) ack 1 win 257
0.200 > [ect01] . 1:1(0) ack 1001

0.200 write(4, ..., 1) = 1
0.200 > [ect01] P. 1:2(1) ack 1001

0.200 < [ect0] . 1001:2001(1000) ack 2 win 257
+0.005 < [ce] . 2001:3001(1000) ack 2 win 257

+0.000 > [ect01] . 2:2(0) ack 2001
// Previously the ACK below would be delayed by 40ms
+0.000 > [ect01] E. 2:2(0) ack 3001

+0.500 < F. 9501:9501(0) ack 4 win 257

Signed-off-by: Yuchung Cheng <[email protected]>
Acked-by: Neal Cardwell <[email protected]>
Signed-off-by: Eric Dumazet <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
include/net/tcp.h | 1 +
net/ipv4/tcp_dctcp.c | 30 ++++++++++++++++++------------
net/ipv4/tcp_input.c | 3 ++-
3 files changed, 21 insertions(+), 13 deletions(-)

--- a/include/net/tcp.h
+++ b/include/net/tcp.h
@@ -342,6 +342,7 @@ ssize_t tcp_splice_read(struct socket *s
struct pipe_inode_info *pipe, size_t len,
unsigned int flags);

+void tcp_enter_quickack_mode(struct sock *sk);
static inline void tcp_dec_quickack_mode(struct sock *sk,
const unsigned int pkts)
{
--- a/net/ipv4/tcp_dctcp.c
+++ b/net/ipv4/tcp_dctcp.c
@@ -131,12 +131,15 @@ static void dctcp_ce_state_0_to_1(struct
struct dctcp *ca = inet_csk_ca(sk);
struct tcp_sock *tp = tcp_sk(sk);

- /* State has changed from CE=0 to CE=1 and delayed
- * ACK has not sent yet.
- */
- if (!ca->ce_state &&
- inet_csk(sk)->icsk_ack.pending & ICSK_ACK_TIMER)
- __tcp_send_ack(sk, ca->prior_rcv_nxt);
+ if (!ca->ce_state) {
+ /* State has changed from CE=0 to CE=1, force an immediate
+ * ACK to reflect the new CE state. If an ACK was delayed,
+ * send that first to reflect the prior CE state.
+ */
+ if (inet_csk(sk)->icsk_ack.pending & ICSK_ACK_TIMER)
+ __tcp_send_ack(sk, ca->prior_rcv_nxt);
+ tcp_enter_quickack_mode(sk);
+ }

ca->prior_rcv_nxt = tp->rcv_nxt;
ca->ce_state = 1;
@@ -149,12 +152,15 @@ static void dctcp_ce_state_1_to_0(struct
struct dctcp *ca = inet_csk_ca(sk);
struct tcp_sock *tp = tcp_sk(sk);

- /* State has changed from CE=1 to CE=0 and delayed
- * ACK has not sent yet.
- */
- if (ca->ce_state &&
- inet_csk(sk)->icsk_ack.pending & ICSK_ACK_TIMER)
- __tcp_send_ack(sk, ca->prior_rcv_nxt);
+ if (ca->ce_state) {
+ /* State has changed from CE=1 to CE=0, force an immediate
+ * ACK to reflect the new CE state. If an ACK was delayed,
+ * send that first to reflect the prior CE state.
+ */
+ if (inet_csk(sk)->icsk_ack.pending & ICSK_ACK_TIMER)
+ __tcp_send_ack(sk, ca->prior_rcv_nxt);
+ tcp_enter_quickack_mode(sk);
+ }

ca->prior_rcv_nxt = tp->rcv_nxt;
ca->ce_state = 0;
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -195,13 +195,14 @@ static void tcp_incr_quickack(struct soc
icsk->icsk_ack.quick = min(quickacks, TCP_MAX_QUICKACKS);
}

-static void tcp_enter_quickack_mode(struct sock *sk)
+void tcp_enter_quickack_mode(struct sock *sk)
{
struct inet_connection_sock *icsk = inet_csk(sk);
tcp_incr_quickack(sk);
icsk->icsk_ack.pingpong = 0;
icsk->icsk_ack.ato = TCP_ATO_MIN;
}
+EXPORT_SYMBOL(tcp_enter_quickack_mode);

/* Send ACKs quickly, if "quick" count is not exhausted
* and the session is not interactive.



2018-07-27 09:50:14

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.17 27/66] net/mlx5e: Add ingress/egress indication for offloaded TC flows

4.17-stable review patch. If anyone has any objections, please let me know.

------------------

From: Or Gerlitz <[email protected]>

[ Upstream commit 60bd4af814fec164c42bdd2efd7984b85d6b1e1e ]

When an e-switch TC rule is offloaded through the egdev (egress
device) mechanism, we treat this as egress, all other cases (NIC
and e-switch) are considred ingress.

This is preparation step that will allow us to identify "wrong"
stat/del offload calls made by the TC core on egdev based flows and
ignore them.

Signed-off-by: Or Gerlitz <[email protected]>
Signed-off-by: Jiri Pirko <[email protected]>
Reviewed-by: Paul Blakey <[email protected]>
Signed-off-by: Saeed Mahameed <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
drivers/net/ethernet/mellanox/mlx5/core/en.h | 3 -
drivers/net/ethernet/mellanox/mlx5/core/en_main.c | 15 ++++----
drivers/net/ethernet/mellanox/mlx5/core/en_rep.c | 32 +++++++++++++-----
drivers/net/ethernet/mellanox/mlx5/core/en_tc.c | 38 ++++++++++++++++------
drivers/net/ethernet/mellanox/mlx5/core/en_tc.h | 13 +++++--
5 files changed, 70 insertions(+), 31 deletions(-)

--- a/drivers/net/ethernet/mellanox/mlx5/core/en.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en.h
@@ -1092,9 +1092,6 @@ int mlx5e_ethtool_get_ts_info(struct mlx
int mlx5e_ethtool_flash_device(struct mlx5e_priv *priv,
struct ethtool_flash *flash);

-int mlx5e_setup_tc_block_cb(enum tc_setup_type type, void *type_data,
- void *cb_priv);
-
/* mlx5e generic netdev management API */
struct net_device*
mlx5e_create_netdev(struct mlx5_core_dev *mdev, const struct mlx5e_profile *profile,
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
@@ -3093,22 +3093,23 @@ out:

#ifdef CONFIG_MLX5_ESWITCH
static int mlx5e_setup_tc_cls_flower(struct mlx5e_priv *priv,
- struct tc_cls_flower_offload *cls_flower)
+ struct tc_cls_flower_offload *cls_flower,
+ int flags)
{
switch (cls_flower->command) {
case TC_CLSFLOWER_REPLACE:
- return mlx5e_configure_flower(priv, cls_flower);
+ return mlx5e_configure_flower(priv, cls_flower, flags);
case TC_CLSFLOWER_DESTROY:
- return mlx5e_delete_flower(priv, cls_flower);
+ return mlx5e_delete_flower(priv, cls_flower, flags);
case TC_CLSFLOWER_STATS:
- return mlx5e_stats_flower(priv, cls_flower);
+ return mlx5e_stats_flower(priv, cls_flower, flags);
default:
return -EOPNOTSUPP;
}
}

-int mlx5e_setup_tc_block_cb(enum tc_setup_type type, void *type_data,
- void *cb_priv)
+static int mlx5e_setup_tc_block_cb(enum tc_setup_type type, void *type_data,
+ void *cb_priv)
{
struct mlx5e_priv *priv = cb_priv;

@@ -3117,7 +3118,7 @@ int mlx5e_setup_tc_block_cb(enum tc_setu

switch (type) {
case TC_SETUP_CLSFLOWER:
- return mlx5e_setup_tc_cls_flower(priv, type_data);
+ return mlx5e_setup_tc_cls_flower(priv, type_data, MLX5E_TC_INGRESS);
default:
return -EOPNOTSUPP;
}
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c
@@ -723,15 +723,31 @@ static int mlx5e_rep_get_phys_port_name(

static int
mlx5e_rep_setup_tc_cls_flower(struct mlx5e_priv *priv,
- struct tc_cls_flower_offload *cls_flower)
+ struct tc_cls_flower_offload *cls_flower, int flags)
{
switch (cls_flower->command) {
case TC_CLSFLOWER_REPLACE:
- return mlx5e_configure_flower(priv, cls_flower);
+ return mlx5e_configure_flower(priv, cls_flower, flags);
case TC_CLSFLOWER_DESTROY:
- return mlx5e_delete_flower(priv, cls_flower);
+ return mlx5e_delete_flower(priv, cls_flower, flags);
case TC_CLSFLOWER_STATS:
- return mlx5e_stats_flower(priv, cls_flower);
+ return mlx5e_stats_flower(priv, cls_flower, flags);
+ default:
+ return -EOPNOTSUPP;
+ }
+}
+
+static int mlx5e_rep_setup_tc_cb_egdev(enum tc_setup_type type, void *type_data,
+ void *cb_priv)
+{
+ struct mlx5e_priv *priv = cb_priv;
+
+ if (!tc_cls_can_offload_and_chain0(priv->netdev, type_data))
+ return -EOPNOTSUPP;
+
+ switch (type) {
+ case TC_SETUP_CLSFLOWER:
+ return mlx5e_rep_setup_tc_cls_flower(priv, type_data, MLX5E_TC_EGRESS);
default:
return -EOPNOTSUPP;
}
@@ -747,7 +763,7 @@ static int mlx5e_rep_setup_tc_cb(enum tc

switch (type) {
case TC_SETUP_CLSFLOWER:
- return mlx5e_rep_setup_tc_cls_flower(priv, type_data);
+ return mlx5e_rep_setup_tc_cls_flower(priv, type_data, MLX5E_TC_INGRESS);
default:
return -EOPNOTSUPP;
}
@@ -1111,7 +1127,7 @@ mlx5e_vport_rep_load(struct mlx5_core_de

uplink_rpriv = mlx5_eswitch_get_uplink_priv(dev->priv.eswitch, REP_ETH);
upriv = netdev_priv(uplink_rpriv->netdev);
- err = tc_setup_cb_egdev_register(netdev, mlx5e_setup_tc_block_cb,
+ err = tc_setup_cb_egdev_register(netdev, mlx5e_rep_setup_tc_cb_egdev,
upriv);
if (err)
goto err_neigh_cleanup;
@@ -1126,7 +1142,7 @@ mlx5e_vport_rep_load(struct mlx5_core_de
return 0;

err_egdev_cleanup:
- tc_setup_cb_egdev_unregister(netdev, mlx5e_setup_tc_block_cb,
+ tc_setup_cb_egdev_unregister(netdev, mlx5e_rep_setup_tc_cb_egdev,
upriv);

err_neigh_cleanup:
@@ -1155,7 +1171,7 @@ mlx5e_vport_rep_unload(struct mlx5_eswit
uplink_rpriv = mlx5_eswitch_get_uplink_priv(priv->mdev->priv.eswitch,
REP_ETH);
upriv = netdev_priv(uplink_rpriv->netdev);
- tc_setup_cb_egdev_unregister(netdev, mlx5e_setup_tc_block_cb,
+ tc_setup_cb_egdev_unregister(netdev, mlx5e_rep_setup_tc_cb_egdev,
upriv);
mlx5e_rep_neigh_cleanup(rpriv);
mlx5e_detach_netdev(priv);
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
@@ -61,12 +61,16 @@ struct mlx5_nic_flow_attr {
struct mlx5_flow_table *hairpin_ft;
};

+#define MLX5E_TC_FLOW_BASE (MLX5E_TC_LAST_EXPORTED_BIT + 1)
+
enum {
- MLX5E_TC_FLOW_ESWITCH = BIT(0),
- MLX5E_TC_FLOW_NIC = BIT(1),
- MLX5E_TC_FLOW_OFFLOADED = BIT(2),
- MLX5E_TC_FLOW_HAIRPIN = BIT(3),
- MLX5E_TC_FLOW_HAIRPIN_RSS = BIT(4),
+ MLX5E_TC_FLOW_INGRESS = MLX5E_TC_INGRESS,
+ MLX5E_TC_FLOW_EGRESS = MLX5E_TC_EGRESS,
+ MLX5E_TC_FLOW_ESWITCH = BIT(MLX5E_TC_FLOW_BASE),
+ MLX5E_TC_FLOW_NIC = BIT(MLX5E_TC_FLOW_BASE + 1),
+ MLX5E_TC_FLOW_OFFLOADED = BIT(MLX5E_TC_FLOW_BASE + 2),
+ MLX5E_TC_FLOW_HAIRPIN = BIT(MLX5E_TC_FLOW_BASE + 3),
+ MLX5E_TC_FLOW_HAIRPIN_RSS = BIT(MLX5E_TC_FLOW_BASE + 4),
};

struct mlx5e_tc_flow {
@@ -2566,8 +2570,20 @@ static int parse_tc_fdb_actions(struct m
return err;
}

+static void get_flags(int flags, u8 *flow_flags)
+{
+ u8 __flow_flags = 0;
+
+ if (flags & MLX5E_TC_INGRESS)
+ __flow_flags |= MLX5E_TC_FLOW_INGRESS;
+ if (flags & MLX5E_TC_EGRESS)
+ __flow_flags |= MLX5E_TC_FLOW_EGRESS;
+
+ *flow_flags = __flow_flags;
+}
+
int mlx5e_configure_flower(struct mlx5e_priv *priv,
- struct tc_cls_flower_offload *f)
+ struct tc_cls_flower_offload *f, int flags)
{
struct mlx5_eswitch *esw = priv->mdev->priv.eswitch;
struct mlx5e_tc_flow_parse_attr *parse_attr;
@@ -2576,11 +2592,13 @@ int mlx5e_configure_flower(struct mlx5e_
int attr_size, err = 0;
u8 flow_flags = 0;

+ get_flags(flags, &flow_flags);
+
if (esw && esw->mode == SRIOV_OFFLOADS) {
- flow_flags = MLX5E_TC_FLOW_ESWITCH;
+ flow_flags |= MLX5E_TC_FLOW_ESWITCH;
attr_size = sizeof(struct mlx5_esw_flow_attr);
} else {
- flow_flags = MLX5E_TC_FLOW_NIC;
+ flow_flags |= MLX5E_TC_FLOW_NIC;
attr_size = sizeof(struct mlx5_nic_flow_attr);
}

@@ -2639,7 +2657,7 @@ err_free:
}

int mlx5e_delete_flower(struct mlx5e_priv *priv,
- struct tc_cls_flower_offload *f)
+ struct tc_cls_flower_offload *f, int flags)
{
struct mlx5e_tc_flow *flow;
struct mlx5e_tc_table *tc = &priv->fs.tc;
@@ -2659,7 +2677,7 @@ int mlx5e_delete_flower(struct mlx5e_pri
}

int mlx5e_stats_flower(struct mlx5e_priv *priv,
- struct tc_cls_flower_offload *f)
+ struct tc_cls_flower_offload *f, int flags)
{
struct mlx5e_tc_table *tc = &priv->fs.tc;
struct mlx5e_tc_flow *flow;
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.h
@@ -38,16 +38,23 @@
#define MLX5E_TC_FLOW_ID_MASK 0x0000ffff

#ifdef CONFIG_MLX5_ESWITCH
+
+enum {
+ MLX5E_TC_INGRESS = BIT(0),
+ MLX5E_TC_EGRESS = BIT(1),
+ MLX5E_TC_LAST_EXPORTED_BIT = 1,
+};
+
int mlx5e_tc_init(struct mlx5e_priv *priv);
void mlx5e_tc_cleanup(struct mlx5e_priv *priv);

int mlx5e_configure_flower(struct mlx5e_priv *priv,
- struct tc_cls_flower_offload *f);
+ struct tc_cls_flower_offload *f, int flags);
int mlx5e_delete_flower(struct mlx5e_priv *priv,
- struct tc_cls_flower_offload *f);
+ struct tc_cls_flower_offload *f, int flags);

int mlx5e_stats_flower(struct mlx5e_priv *priv,
- struct tc_cls_flower_offload *f);
+ struct tc_cls_flower_offload *f, int flags);

struct mlx5e_encap_entry;
void mlx5e_tc_encap_flows_add(struct mlx5e_priv *priv,



2018-07-27 09:50:17

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.17 14/66] ip: hash fragments consistently

4.17-stable review patch. If anyone has any objections, please let me know.

------------------

From: Paolo Abeni <[email protected]>

[ Upstream commit 3dd1c9a1270736029ffca670e9bd0265f4120600 ]

The skb hash for locally generated ip[v6] fragments belonging
to the same datagram can vary in several circumstances:
* for connected UDP[v6] sockets, the first fragment get its hash
via set_owner_w()/skb_set_hash_from_sk()
* for unconnected IPv6 UDPv6 sockets, the first fragment can get
its hash via ip6_make_flowlabel()/skb_get_hash_flowi6(), if
auto_flowlabel is enabled

For the following frags the hash is usually computed via
skb_get_hash().
The above can cause OoO for unconnected IPv6 UDPv6 socket: in that
scenario the egress tx queue can be selected on a per packet basis
via the skb hash.
It may also fool flow-oriented schedulers to place fragments belonging
to the same datagram in different flows.

Fix the issue by copying the skb hash from the head frag into
the others at fragmentation time.

Before this commit:
perf probe -a "dev_queue_xmit skb skb->hash skb->l4_hash:b1@0/8 skb->sw_hash:b1@1/8"
netperf -H $IPV4 -t UDP_STREAM -l 5 -- -m 2000 -n &
perf record -e probe:dev_queue_xmit -e probe:skb_set_owner_w -a sleep 0.1
perf script
probe:dev_queue_xmit: (ffffffff8c6b1b20) hash=3713014309 l4_hash=1 sw_hash=0
probe:dev_queue_xmit: (ffffffff8c6b1b20) hash=0 l4_hash=0 sw_hash=0

After this commit:
probe:dev_queue_xmit: (ffffffff8c6b1b20) hash=2171763177 l4_hash=1 sw_hash=0
probe:dev_queue_xmit: (ffffffff8c6b1b20) hash=2171763177 l4_hash=1 sw_hash=0

Fixes: b73c3d0e4f0e ("net: Save TX flow hash in sock and set in skbuf on xmit")
Fixes: 67800f9b1f4e ("ipv6: Call skb_get_hash_flowi6 to get skb->hash in ip6_make_flowlabel")
Signed-off-by: Paolo Abeni <[email protected]>
Reviewed-by: Eric Dumazet <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
net/ipv4/ip_output.c | 2 ++
net/ipv6/ip6_output.c | 2 ++
2 files changed, 4 insertions(+)

--- a/net/ipv4/ip_output.c
+++ b/net/ipv4/ip_output.c
@@ -523,6 +523,8 @@ static void ip_copy_metadata(struct sk_b
to->dev = from->dev;
to->mark = from->mark;

+ skb_copy_hash(to, from);
+
/* Copy the flags to each fragment. */
IPCB(to)->flags = IPCB(from)->flags;

--- a/net/ipv6/ip6_output.c
+++ b/net/ipv6/ip6_output.c
@@ -596,6 +596,8 @@ static void ip6_copy_metadata(struct sk_
to->dev = from->dev;
to->mark = from->mark;

+ skb_copy_hash(to, from);
+
#ifdef CONFIG_NET_SCHED
to->tc_index = from->tc_index;
#endif



2018-07-27 09:50:17

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.17 32/66] net: phy: consider PHY_IGNORE_INTERRUPT in phy_start_aneg_priv

4.17-stable review patch. If anyone has any objections, please let me know.

------------------

From: Heiner Kallweit <[email protected]>

[ Upstream commit 215d08a85b9acf5e1fe9dbf50f1774cde333efef ]

The situation described in the comment can occur also with
PHY_IGNORE_INTERRUPT, therefore change the condition to include it.

Fixes: f555f34fdc58 ("net: phy: fix auto-negotiation stall due to unavailable interrupt")
Signed-off-by: Heiner Kallweit <[email protected]>
Reviewed-by: Andrew Lunn <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
drivers/net/phy/phy.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

--- a/drivers/net/phy/phy.c
+++ b/drivers/net/phy/phy.c
@@ -514,7 +514,7 @@ static int phy_start_aneg_priv(struct ph
* negotiation may already be done and aneg interrupt may not be
* generated.
*/
- if (phy_interrupt_is_valid(phydev) && (phydev->state == PHY_AN)) {
+ if (phydev->irq != PHY_POLL && phydev->state == PHY_AN) {
err = phy_aneg_done(phydev);
if (err > 0) {
trigger = true;



2018-07-27 09:50:18

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.17 40/66] vxlan: make netlink notify in vxlan_fdb_destroy optional

4.17-stable review patch. If anyone has any objections, please let me know.

------------------

From: Roopa Prabhu <[email protected]>

[ Upstream commit f6e053858671bb156b6e44ad66418acc8c7f4e77 ]

Add a new option do_notify to vxlan_fdb_destroy to make
sending netlink notify optional. Used by a later patch.

Signed-off-by: Roopa Prabhu <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
drivers/net/vxlan.c | 14 ++++++++------
1 file changed, 8 insertions(+), 6 deletions(-)

--- a/drivers/net/vxlan.c
+++ b/drivers/net/vxlan.c
@@ -774,13 +774,15 @@ static void vxlan_fdb_free(struct rcu_he
kfree(f);
}

-static void vxlan_fdb_destroy(struct vxlan_dev *vxlan, struct vxlan_fdb *f)
+static void vxlan_fdb_destroy(struct vxlan_dev *vxlan, struct vxlan_fdb *f,
+ bool do_notify)
{
netdev_dbg(vxlan->dev,
"delete %pM\n", f->eth_addr);

--vxlan->addrcnt;
- vxlan_fdb_notify(vxlan, f, first_remote_rtnl(f), RTM_DELNEIGH);
+ if (do_notify)
+ vxlan_fdb_notify(vxlan, f, first_remote_rtnl(f), RTM_DELNEIGH);

hlist_del_rcu(&f->hlist);
call_rcu(&f->rcu, vxlan_fdb_free);
@@ -930,7 +932,7 @@ static int __vxlan_fdb_delete(struct vxl
goto out;
}

- vxlan_fdb_destroy(vxlan, f);
+ vxlan_fdb_destroy(vxlan, f, true);

out:
return 0;
@@ -2393,7 +2395,7 @@ static void vxlan_cleanup(struct timer_l
"garbage collect %pM\n",
f->eth_addr);
f->state = NUD_STALE;
- vxlan_fdb_destroy(vxlan, f);
+ vxlan_fdb_destroy(vxlan, f, true);
} else if (time_before(timeout, next_timer))
next_timer = timeout;
}
@@ -2444,7 +2446,7 @@ static void vxlan_fdb_delete_default(str
spin_lock_bh(&vxlan->hash_lock);
f = __vxlan_find_mac(vxlan, all_zeros_mac, vni);
if (f)
- vxlan_fdb_destroy(vxlan, f);
+ vxlan_fdb_destroy(vxlan, f, true);
spin_unlock_bh(&vxlan->hash_lock);
}

@@ -2498,7 +2500,7 @@ static void vxlan_flush(struct vxlan_dev
continue;
/* the all_zeros_mac entry is deleted at vxlan_uninit */
if (!is_zero_ether_addr(f->eth_addr))
- vxlan_fdb_destroy(vxlan, f);
+ vxlan_fdb_destroy(vxlan, f, true);
}
}
spin_unlock_bh(&vxlan->hash_lock);



2018-07-27 09:50:23

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.17 41/66] vxlan: fix default fdb entry netlink notify ordering during netdev create

4.17-stable review patch. If anyone has any objections, please let me know.

------------------

From: Roopa Prabhu <[email protected]>

[ Upstream commit e99465b952861533d9ba748fdbecc96d9a36da3e ]

Problem:
In vxlan_newlink, a default fdb entry is added before register_netdev.
The default fdb creation function also notifies user-space of the
fdb entry on the vxlan device which user-space does not know about yet.
(RTM_NEWNEIGH goes before RTM_NEWLINK for the same ifindex).

This patch fixes the user-space netlink notification ordering issue
with the following changes:
- decouple fdb notify from fdb create.
- Move fdb notify after register_netdev.
- Call rtnl_configure_link in vxlan newlink handler to notify
userspace about the newlink before fdb notify and
hence avoiding the user-space race.

Fixes: afbd8bae9c79 ("vxlan: add implicit fdb entry for default destination")
Signed-off-by: Roopa Prabhu <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
drivers/net/vxlan.c | 29 +++++++++++++++++++++--------
1 file changed, 21 insertions(+), 8 deletions(-)

--- a/drivers/net/vxlan.c
+++ b/drivers/net/vxlan.c
@@ -3190,6 +3190,7 @@ static int __vxlan_dev_create(struct net
{
struct vxlan_net *vn = net_generic(net, vxlan_net_id);
struct vxlan_dev *vxlan = netdev_priv(dev);
+ struct vxlan_fdb *f = NULL;
int err;

err = vxlan_dev_configure(net, dev, conf, false, extack);
@@ -3200,27 +3201,38 @@ static int __vxlan_dev_create(struct net

/* create an fdb entry for a valid default destination */
if (!vxlan_addr_any(&vxlan->default_dst.remote_ip)) {
- err = vxlan_fdb_update(vxlan, all_zeros_mac,
+ err = vxlan_fdb_create(vxlan, all_zeros_mac,
&vxlan->default_dst.remote_ip,
NUD_REACHABLE | NUD_PERMANENT,
- NLM_F_EXCL | NLM_F_CREATE,
vxlan->cfg.dst_port,
vxlan->default_dst.remote_vni,
vxlan->default_dst.remote_vni,
vxlan->default_dst.remote_ifindex,
- NTF_SELF);
+ NTF_SELF, &f);
if (err)
return err;
}

err = register_netdevice(dev);
+ if (err)
+ goto errout;
+
+ err = rtnl_configure_link(dev, NULL);
if (err) {
- vxlan_fdb_delete_default(vxlan, vxlan->default_dst.remote_vni);
- return err;
+ unregister_netdevice(dev);
+ goto errout;
}

+ /* notify default fdb entry */
+ if (f)
+ vxlan_fdb_notify(vxlan, f, first_remote_rtnl(f), RTM_NEWNEIGH);
+
list_add(&vxlan->next, &vn->vxlan_list);
return 0;
+errout:
+ if (f)
+ vxlan_fdb_destroy(vxlan, f, false);
+ return err;
}

static int vxlan_nl2conf(struct nlattr *tb[], struct nlattr *data[],
@@ -3449,6 +3461,7 @@ static int vxlan_changelink(struct net_d
struct vxlan_rdst *dst = &vxlan->default_dst;
struct vxlan_rdst old_dst;
struct vxlan_config conf;
+ struct vxlan_fdb *f = NULL;
int err;

err = vxlan_nl2conf(tb, data,
@@ -3474,19 +3487,19 @@ static int vxlan_changelink(struct net_d
old_dst.remote_ifindex, 0);

if (!vxlan_addr_any(&dst->remote_ip)) {
- err = vxlan_fdb_update(vxlan, all_zeros_mac,
+ err = vxlan_fdb_create(vxlan, all_zeros_mac,
&dst->remote_ip,
NUD_REACHABLE | NUD_PERMANENT,
- NLM_F_CREATE | NLM_F_APPEND,
vxlan->cfg.dst_port,
dst->remote_vni,
dst->remote_vni,
dst->remote_ifindex,
- NTF_SELF);
+ NTF_SELF, &f);
if (err) {
spin_unlock_bh(&vxlan->hash_lock);
return err;
}
+ vxlan_fdb_notify(vxlan, f, first_remote_rtnl(f), RTM_NEWNEIGH);
}
spin_unlock_bh(&vxlan->hash_lock);
}



2018-07-27 09:50:24

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.17 17/66] net/mlx4_core: Save the qpn from the input modifier in RST2INIT wrapper

4.17-stable review patch. If anyone has any objections, please let me know.

------------------

From: Jack Morgenstein <[email protected]>

[ Upstream commit 958c696f5a7274d9447a458ad7aa70719b29a50a ]

Function mlx4_RST2INIT_QP_wrapper saved the qp number passed in the qp
context, rather than the one passed in the input modifier.

However, the qp number in the qp context is not defined as a
required parameter by the FW. Therefore, drivers may choose to not
specify the qp number in the qp context for the reset-to-init transition.

Thus, we must save the qp number passed in the command input modifier --
which is always present. (This saved qp number is used as the input
modifier for command 2RST_QP when a slave's qp's are destroyed).

Fixes: c82e9aa0a8bc ("mlx4_core: resource tracking for HCA resources used by guests")
Signed-off-by: Jack Morgenstein <[email protected]>
Signed-off-by: Tariq Toukan <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
drivers/net/ethernet/mellanox/mlx4/resource_tracker.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

--- a/drivers/net/ethernet/mellanox/mlx4/resource_tracker.c
+++ b/drivers/net/ethernet/mellanox/mlx4/resource_tracker.c
@@ -2956,7 +2956,7 @@ int mlx4_RST2INIT_QP_wrapper(struct mlx4
u32 srqn = qp_get_srqn(qpc) & 0xffffff;
int use_srq = (qp_get_srqn(qpc) >> 24) & 1;
struct res_srq *srq;
- int local_qpn = be32_to_cpu(qpc->local_qpn) & 0xffffff;
+ int local_qpn = vhcr->in_modifier & 0xffffff;

err = adjust_qp_sched_queue(dev, slave, qpc, inbox);
if (err)



2018-07-27 09:50:24

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.17 18/66] net-next/hinic: fix a problem in hinic_xmit_frame()

4.17-stable review patch. If anyone has any objections, please let me know.

------------------

From: Zhao Chen <[email protected]>

[ Upstream commit f7482683f1f4925c60941dbbd0813ceaa069d106 ]

The calculation of "wqe_size" is not correct when the tx queue is busy in
hinic_xmit_frame().

When there are no free WQEs, the tx flow will unmap the skb buffer, then
ring the doobell for the pending packets. But the "wqe_size" which used
to calculate the doorbell address is not correct. The wqe size should be
cleared to 0, otherwise, it will cause a doorbell error.

This patch fixes the problem.

Reported-by: Zhou Wang <[email protected]>
Signed-off-by: Zhao Chen <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
drivers/net/ethernet/huawei/hinic/hinic_tx.c | 1 +
1 file changed, 1 insertion(+)

--- a/drivers/net/ethernet/huawei/hinic/hinic_tx.c
+++ b/drivers/net/ethernet/huawei/hinic/hinic_tx.c
@@ -229,6 +229,7 @@ netdev_tx_t hinic_xmit_frame(struct sk_b
txq->txq_stats.tx_busy++;
u64_stats_update_end(&txq->txq_stats.syncp);
err = NETDEV_TX_BUSY;
+ wqe_size = 0;
goto flush_skbs;
}




2018-07-27 09:50:25

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.17 19/66] net: skb_segment() should not return NULL

4.17-stable review patch. If anyone has any objections, please let me know.

------------------

From: Eric Dumazet <[email protected]>

[ Upstream commit ff907a11a0d68a749ce1a321f4505c03bf72190c ]

syzbot caught a NULL deref [1], caused by skb_segment()

skb_segment() has many "goto err;" that assume the @err variable
contains -ENOMEM.

A successful call to __skb_linearize() should not clear @err,
otherwise a subsequent memory allocation error could return NULL.

While we are at it, we might use -EINVAL instead of -ENOMEM when
MAX_SKB_FRAGS limit is reached.

[1]
kasan: CONFIG_KASAN_INLINE enabled
kasan: GPF could be caused by NULL-ptr deref or user memory access
general protection fault: 0000 [#1] SMP KASAN
CPU: 0 PID: 13285 Comm: syz-executor3 Not tainted 4.18.0-rc4+ #146
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
RIP: 0010:tcp_gso_segment+0x3dc/0x1780 net/ipv4/tcp_offload.c:106
Code: f0 ff ff 0f 87 1c fd ff ff e8 00 88 0b fb 48 8b 75 d0 48 b9 00 00 00 00 00 fc ff df 48 8d be 90 00 00 00 48 89 f8 48 c1 e8 03 <0f> b6 14 08 48 8d 86 94 00 00 00 48 89 c6 83 e0 07 48 c1 ee 03 0f
RSP: 0018:ffff88019b7fd060 EFLAGS: 00010206
RAX: 0000000000000012 RBX: 0000000000000020 RCX: dffffc0000000000
RDX: 0000000000040000 RSI: 0000000000000000 RDI: 0000000000000090
RBP: ffff88019b7fd0f0 R08: ffff88019510e0c0 R09: ffffed003b5c46d6
R10: ffffed003b5c46d6 R11: ffff8801dae236b3 R12: 0000000000000001
R13: ffff8801d6c581f4 R14: 0000000000000000 R15: ffff8801d6c58128
FS: 00007fcae64d6700(0000) GS:ffff8801dae00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00000000004e8664 CR3: 00000001b669b000 CR4: 00000000001406f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
tcp4_gso_segment+0x1c3/0x440 net/ipv4/tcp_offload.c:54
inet_gso_segment+0x64e/0x12d0 net/ipv4/af_inet.c:1342
inet_gso_segment+0x64e/0x12d0 net/ipv4/af_inet.c:1342
skb_mac_gso_segment+0x3b5/0x740 net/core/dev.c:2792
__skb_gso_segment+0x3c3/0x880 net/core/dev.c:2865
skb_gso_segment include/linux/netdevice.h:4099 [inline]
validate_xmit_skb+0x640/0xf30 net/core/dev.c:3104
__dev_queue_xmit+0xc14/0x3910 net/core/dev.c:3561
dev_queue_xmit+0x17/0x20 net/core/dev.c:3602
neigh_hh_output include/net/neighbour.h:473 [inline]
neigh_output include/net/neighbour.h:481 [inline]
ip_finish_output2+0x1063/0x1860 net/ipv4/ip_output.c:229
ip_finish_output+0x841/0xfa0 net/ipv4/ip_output.c:317
NF_HOOK_COND include/linux/netfilter.h:276 [inline]
ip_output+0x223/0x880 net/ipv4/ip_output.c:405
dst_output include/net/dst.h:444 [inline]
ip_local_out+0xc5/0x1b0 net/ipv4/ip_output.c:124
iptunnel_xmit+0x567/0x850 net/ipv4/ip_tunnel_core.c:91
ip_tunnel_xmit+0x1598/0x3af1 net/ipv4/ip_tunnel.c:778
ipip_tunnel_xmit+0x264/0x2c0 net/ipv4/ipip.c:308
__netdev_start_xmit include/linux/netdevice.h:4148 [inline]
netdev_start_xmit include/linux/netdevice.h:4157 [inline]
xmit_one net/core/dev.c:3034 [inline]
dev_hard_start_xmit+0x26c/0xc30 net/core/dev.c:3050
__dev_queue_xmit+0x29ef/0x3910 net/core/dev.c:3569
dev_queue_xmit+0x17/0x20 net/core/dev.c:3602
neigh_direct_output+0x15/0x20 net/core/neighbour.c:1403
neigh_output include/net/neighbour.h:483 [inline]
ip_finish_output2+0xa67/0x1860 net/ipv4/ip_output.c:229
ip_finish_output+0x841/0xfa0 net/ipv4/ip_output.c:317
NF_HOOK_COND include/linux/netfilter.h:276 [inline]
ip_output+0x223/0x880 net/ipv4/ip_output.c:405
dst_output include/net/dst.h:444 [inline]
ip_local_out+0xc5/0x1b0 net/ipv4/ip_output.c:124
ip_queue_xmit+0x9df/0x1f80 net/ipv4/ip_output.c:504
tcp_transmit_skb+0x1bf9/0x3f10 net/ipv4/tcp_output.c:1168
tcp_write_xmit+0x1641/0x5c20 net/ipv4/tcp_output.c:2363
__tcp_push_pending_frames+0xb2/0x290 net/ipv4/tcp_output.c:2536
tcp_push+0x638/0x8c0 net/ipv4/tcp.c:735
tcp_sendmsg_locked+0x2ec5/0x3f00 net/ipv4/tcp.c:1410
tcp_sendmsg+0x2f/0x50 net/ipv4/tcp.c:1447
inet_sendmsg+0x1a1/0x690 net/ipv4/af_inet.c:798
sock_sendmsg_nosec net/socket.c:641 [inline]
sock_sendmsg+0xd5/0x120 net/socket.c:651
__sys_sendto+0x3d7/0x670 net/socket.c:1797
__do_sys_sendto net/socket.c:1809 [inline]
__se_sys_sendto net/socket.c:1805 [inline]
__x64_sys_sendto+0xe1/0x1a0 net/socket.c:1805
do_syscall_64+0x1b9/0x820 arch/x86/entry/common.c:290
entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x455ab9
Code: 1d ba fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 eb b9 fb ff c3 66 2e 0f 1f 84 00 00 00 00
RSP: 002b:00007fcae64d5c68 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
RAX: ffffffffffffffda RBX: 00007fcae64d66d4 RCX: 0000000000455ab9
RDX: 0000000000000001 RSI: 0000000020000200 RDI: 0000000000000013
RBP: 000000000072bea0 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000014
R13: 00000000004c1145 R14: 00000000004d1818 R15: 0000000000000006
Modules linked in:
Dumping ftrace buffer:
(ftrace buffer empty)

Fixes: ddff00d42043 ("net: Move skb_has_shared_frag check out of GRE code and into segmentation")
Signed-off-by: Eric Dumazet <[email protected]>
Cc: Alexander Duyck <[email protected]>
Reported-by: syzbot <[email protected]>
Acked-by: Alexander Duyck <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
net/core/skbuff.c | 10 +++++-----
1 file changed, 5 insertions(+), 5 deletions(-)

--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -3705,6 +3705,7 @@ normal:
net_warn_ratelimited(
"skb_segment: too many frags: %u %u\n",
pos, mss);
+ err = -EINVAL;
goto err;
}

@@ -3738,11 +3739,10 @@ skip_fraglist:

perform_csum_check:
if (!csum) {
- if (skb_has_shared_frag(nskb)) {
- err = __skb_linearize(nskb);
- if (err)
- goto err;
- }
+ if (skb_has_shared_frag(nskb) &&
+ __skb_linearize(nskb))
+ goto err;
+
if (!nskb->remcsum_offload)
nskb->ip_summed = CHECKSUM_NONE;
SKB_GSO_CB(nskb)->csum =



2018-07-27 09:50:32

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.17 43/66] tcp: avoid collapses in tcp_prune_queue() if possible

4.17-stable review patch. If anyone has any objections, please let me know.

------------------

From: Eric Dumazet <[email protected]>

[ Upstream commit f4a3313d8e2ca9fd8d8f45e40a2903ba782607e7 ]

Right after a TCP flow is created, receiving tiny out of order
packets allways hit the condition :

if (atomic_read(&sk->sk_rmem_alloc) >= sk->sk_rcvbuf)
tcp_clamp_window(sk);

tcp_clamp_window() increases sk_rcvbuf to match sk_rmem_alloc
(guarded by tcp_rmem[2])

Calling tcp_collapse_ofo_queue() in this case is not useful,
and offers a O(N^2) surface attack to malicious peers.

Better not attempt anything before full queue capacity is reached,
forcing attacker to spend lots of resource and allow us to more
easily detect the abuse.

Signed-off-by: Eric Dumazet <[email protected]>
Acked-by: Soheil Hassas Yeganeh <[email protected]>
Acked-by: Yuchung Cheng <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
net/ipv4/tcp_input.c | 3 +++
1 file changed, 3 insertions(+)

--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -4936,6 +4936,9 @@ static int tcp_prune_queue(struct sock *
else if (tcp_under_memory_pressure(sk))
tp->rcv_ssthresh = min(tp->rcv_ssthresh, 4U * tp->advmss);

+ if (atomic_read(&sk->sk_rmem_alloc) <= sk->sk_rcvbuf)
+ return 0;
+
tcp_collapse_ofo_queue(sk);
if (!skb_queue_empty(&sk->sk_receive_queue))
tcp_collapse(sk, &sk->sk_receive_queue, NULL,



2018-07-27 09:50:34

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.17 20/66] tcp: fix dctcp delayed ACK schedule

4.17-stable review patch. If anyone has any objections, please let me know.

------------------

From: Yuchung Cheng <[email protected]>

[ Upstream commit b0c05d0e99d98d7f0cd41efc1eeec94efdc3325d ]

Previously, when a data segment was sent an ACK was piggybacked
on the data segment without generating a CA_EVENT_NON_DELAYED_ACK
event to notify congestion control modules. So the DCTCP
ca->delayed_ack_reserved flag could incorrectly stay set when
in fact there were no delayed ACKs being reserved. This could result
in sending a special ECN notification ACK that carries an older
ACK sequence, when in fact there was no need for such an ACK.
DCTCP keeps track of the delayed ACK status with its own separate
state ca->delayed_ack_reserved. Previously it may accidentally cancel
the delayed ACK without updating this field upon sending a special
ACK that carries a older ACK sequence. This inconsistency would
lead to DCTCP receiver never acknowledging the latest data until the
sender times out and retry in some cases.

Packetdrill script (provided by Larry Brakmo)

0.000 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3
0.000 setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0
0.000 setsockopt(3, SOL_TCP, TCP_CONGESTION, "dctcp", 5) = 0
0.000 bind(3, ..., ...) = 0
0.000 listen(3, 1) = 0

0.100 < [ect0] SEW 0:0(0) win 32792 <mss 1000,sackOK,nop,nop,nop,wscale 7>
0.100 > SE. 0:0(0) ack 1 <mss 1460,nop,nop,sackOK,nop,wscale 8>
0.110 < [ect0] . 1:1(0) ack 1 win 257
0.200 accept(3, ..., ...) = 4

0.200 < [ect0] . 1:1001(1000) ack 1 win 257
0.200 > [ect01] . 1:1(0) ack 1001

0.200 write(4, ..., 1) = 1
0.200 > [ect01] P. 1:2(1) ack 1001

0.200 < [ect0] . 1001:2001(1000) ack 2 win 257
0.200 write(4, ..., 1) = 1
0.200 > [ect01] P. 2:3(1) ack 2001

0.200 < [ect0] . 2001:3001(1000) ack 3 win 257
0.200 < [ect0] . 3001:4001(1000) ack 3 win 257
0.200 > [ect01] . 3:3(0) ack 4001

0.210 < [ce] P. 4001:4501(500) ack 3 win 257

+0.001 read(4, ..., 4500) = 4500
+0 write(4, ..., 1) = 1
+0 > [ect01] PE. 3:4(1) ack 4501

+0.010 < [ect0] W. 4501:5501(1000) ack 4 win 257
// Previously the ACK sequence below would be 4501, causing a long RTO
+0.040~+0.045 > [ect01] . 4:4(0) ack 5501 // delayed ack

+0.311 < [ect0] . 5501:6501(1000) ack 4 win 257 // More data
+0 > [ect01] . 4:4(0) ack 6501 // now acks everything

+0.500 < F. 9501:9501(0) ack 4 win 257

Reported-by: Larry Brakmo <[email protected]>
Signed-off-by: Yuchung Cheng <[email protected]>
Signed-off-by: Eric Dumazet <[email protected]>
Acked-by: Neal Cardwell <[email protected]>
Acked-by: Lawrence Brakmo <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
net/ipv4/tcp_dctcp.c | 6 ++++--
1 file changed, 4 insertions(+), 2 deletions(-)

--- a/net/ipv4/tcp_dctcp.c
+++ b/net/ipv4/tcp_dctcp.c
@@ -134,7 +134,8 @@ static void dctcp_ce_state_0_to_1(struct
/* State has changed from CE=0 to CE=1 and delayed
* ACK has not sent yet.
*/
- if (!ca->ce_state && ca->delayed_ack_reserved) {
+ if (!ca->ce_state &&
+ inet_csk(sk)->icsk_ack.pending & ICSK_ACK_TIMER) {
u32 tmp_rcv_nxt;

/* Save current rcv_nxt. */
@@ -164,7 +165,8 @@ static void dctcp_ce_state_1_to_0(struct
/* State has changed from CE=1 to CE=0 and delayed
* ACK has not sent yet.
*/
- if (ca->ce_state && ca->delayed_ack_reserved) {
+ if (ca->ce_state &&
+ inet_csk(sk)->icsk_ack.pending & ICSK_ACK_TIMER) {
u32 tmp_rcv_nxt;

/* Save current rcv_nxt. */



2018-07-27 09:50:38

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.17 45/66] tcp: call tcp_drop() from tcp_data_queue_ofo()

4.17-stable review patch. If anyone has any objections, please let me know.

------------------

From: Eric Dumazet <[email protected]>

[ Upstream commit 8541b21e781a22dce52a74fef0b9bed00404a1cd ]

In order to be able to give better diagnostics and detect
malicious traffic, we need to have better sk->sk_drops tracking.

Fixes: 9f5afeae5152 ("tcp: use an RB tree for ooo receive queue")
Signed-off-by: Eric Dumazet <[email protected]>
Acked-by: Soheil Hassas Yeganeh <[email protected]>
Acked-by: Yuchung Cheng <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
net/ipv4/tcp_input.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)

--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -4451,7 +4451,7 @@ coalesce_done:
/* All the bits are present. Drop. */
NET_INC_STATS(sock_net(sk),
LINUX_MIB_TCPOFOMERGE);
- __kfree_skb(skb);
+ tcp_drop(sk, skb);
skb = NULL;
tcp_dsack_set(sk, seq, end_seq);
goto add_sack;
@@ -4470,7 +4470,7 @@ coalesce_done:
TCP_SKB_CB(skb1)->end_seq);
NET_INC_STATS(sock_net(sk),
LINUX_MIB_TCPOFOMERGE);
- __kfree_skb(skb1);
+ tcp_drop(sk, skb1);
goto merge_right;
}
} else if (tcp_try_coalesce(sk, skb1,



2018-07-27 09:50:49

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.17 46/66] tcp: add tcp_ooo_try_coalesce() helper

4.17-stable review patch. If anyone has any objections, please let me know.

------------------

From: Eric Dumazet <[email protected]>

[ Upstream commit 58152ecbbcc6a0ce7fddd5bf5f6ee535834ece0c ]

In case skb in out_or_order_queue is the result of
multiple skbs coalescing, we would like to get a proper gso_segs
counter tracking, so that future tcp_drop() can report an accurate
number.

I chose to not implement this tracking for skbs in receive queue,
since they are not dropped, unless socket is disconnected.

Signed-off-by: Eric Dumazet <[email protected]>
Acked-by: Soheil Hassas Yeganeh <[email protected]>
Acked-by: Yuchung Cheng <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
net/ipv4/tcp_input.c | 25 +++++++++++++++++++++----
1 file changed, 21 insertions(+), 4 deletions(-)

--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -4299,6 +4299,23 @@ static bool tcp_try_coalesce(struct sock
return true;
}

+static bool tcp_ooo_try_coalesce(struct sock *sk,
+ struct sk_buff *to,
+ struct sk_buff *from,
+ bool *fragstolen)
+{
+ bool res = tcp_try_coalesce(sk, to, from, fragstolen);
+
+ /* In case tcp_drop() is called later, update to->gso_segs */
+ if (res) {
+ u32 gso_segs = max_t(u16, 1, skb_shinfo(to)->gso_segs) +
+ max_t(u16, 1, skb_shinfo(from)->gso_segs);
+
+ skb_shinfo(to)->gso_segs = min_t(u32, gso_segs, 0xFFFF);
+ }
+ return res;
+}
+
static void tcp_drop(struct sock *sk, struct sk_buff *skb)
{
sk_drops_add(sk, skb);
@@ -4422,8 +4439,8 @@ static void tcp_data_queue_ofo(struct so
/* In the typical case, we are adding an skb to the end of the list.
* Use of ooo_last_skb avoids the O(Log(N)) rbtree lookup.
*/
- if (tcp_try_coalesce(sk, tp->ooo_last_skb,
- skb, &fragstolen)) {
+ if (tcp_ooo_try_coalesce(sk, tp->ooo_last_skb,
+ skb, &fragstolen)) {
coalesce_done:
tcp_grow_window(sk, skb);
kfree_skb_partial(skb, fragstolen);
@@ -4473,8 +4490,8 @@ coalesce_done:
tcp_drop(sk, skb1);
goto merge_right;
}
- } else if (tcp_try_coalesce(sk, skb1,
- skb, &fragstolen)) {
+ } else if (tcp_ooo_try_coalesce(sk, skb1,
+ skb, &fragstolen)) {
goto coalesce_done;
}
p = &parent->rb_right;



2018-07-27 09:50:54

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.17 47/66] Revert "staging:r8188eu: Use lib80211 to support TKIP"

4.17-stable review patch. If anyone has any objections, please let me know.

------------------

From: Hans de Goede <[email protected]>

commit 69a1d98c831ec64cbfd381f5dcb6697e1445d239 upstream.

Commit b83b8b1881c4 ("staging:r8188eu: Use lib80211 to support TKIP")
is causing 2 problems for me:

1) One boot the wifi on a laptop with a r8188eu wifi device would not
connect and dmesg contained an oops about scheduling while atomic
pointing to the tkip code. This went away after reverting the commit.

2) I reverted the revert to try and get the oops from 1. again to be able
to add it to this commit message. But now the system did connect to the
wifi only to print a whole bunch of oopses, followed by a hardfreeze a
few seconds later. Subsequent reboots also all lead to scenario 2. Until
I reverted the commit again.

Revert the commit fixes both issues making the laptop usable again.

Fixes: b83b8b1881c4 ("staging:r8188eu: Use lib80211 to support TKIP")
Cc: [email protected]
Signed-off-by: Hans de Goede <[email protected]>
Acked-by: Ivan Safonov <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
drivers/staging/rtl8188eu/Kconfig | 1
drivers/staging/rtl8188eu/core/rtw_recv.c | 163 ++++++++++++++++++--------
drivers/staging/rtl8188eu/core/rtw_security.c | 94 +++++++-------
3 files changed, 162 insertions(+), 96 deletions(-)

--- a/drivers/staging/rtl8188eu/Kconfig
+++ b/drivers/staging/rtl8188eu/Kconfig
@@ -7,7 +7,6 @@ config R8188EU
select LIB80211
select LIB80211_CRYPT_WEP
select LIB80211_CRYPT_CCMP
- select LIB80211_CRYPT_TKIP
---help---
This option adds the Realtek RTL8188EU USB device such as TP-Link TL-WN725N.
If built as a module, it will be called r8188eu.
--- a/drivers/staging/rtl8188eu/core/rtw_recv.c
+++ b/drivers/staging/rtl8188eu/core/rtw_recv.c
@@ -23,7 +23,6 @@
#include <mon.h>
#include <wifi.h>
#include <linux/vmalloc.h>
-#include <net/lib80211.h>

#define ETHERNET_HEADER_SIZE 14 /* Ethernet Header Length */
#define LLC_HEADER_SIZE 6 /* LLC Header Length */
@@ -221,20 +220,31 @@ u32 rtw_free_uc_swdec_pending_queue(stru
static int recvframe_chkmic(struct adapter *adapter,
struct recv_frame *precvframe)
{
- int res = _SUCCESS;
- struct rx_pkt_attrib *prxattrib = &precvframe->attrib;
- struct sta_info *stainfo = rtw_get_stainfo(&adapter->stapriv, prxattrib->ta);
+ int i, res = _SUCCESS;
+ u32 datalen;
+ u8 miccode[8];
+ u8 bmic_err = false, brpt_micerror = true;
+ u8 *pframe, *payload, *pframemic;
+ u8 *mickey;
+ struct sta_info *stainfo;
+ struct rx_pkt_attrib *prxattrib = &precvframe->attrib;
+ struct security_priv *psecuritypriv = &adapter->securitypriv;
+
+ struct mlme_ext_priv *pmlmeext = &adapter->mlmeextpriv;
+ struct mlme_ext_info *pmlmeinfo = &(pmlmeext->mlmext_info);
+
+ stainfo = rtw_get_stainfo(&adapter->stapriv, &prxattrib->ta[0]);

if (prxattrib->encrypt == _TKIP_) {
- if (stainfo) {
- int key_idx;
- const int iv_len = 8, icv_len = 4, key_length = 32;
- struct sk_buff *skb = precvframe->pkt;
- u8 key[32], iv[8], icv[4], *pframe = skb->data;
- void *crypto_private = NULL;
- struct lib80211_crypto_ops *crypto_ops = try_then_request_module(lib80211_get_crypto_ops("TKIP"), "lib80211_crypt_tkip");
- struct security_priv *psecuritypriv = &adapter->securitypriv;
+ RT_TRACE(_module_rtl871x_recv_c_, _drv_info_,
+ ("\n %s: prxattrib->encrypt==_TKIP_\n", __func__));
+ RT_TRACE(_module_rtl871x_recv_c_, _drv_info_,
+ ("\n %s: da=0x%02x:0x%02x:0x%02x:0x%02x:0x%02x:0x%02x\n",
+ __func__, prxattrib->ra[0], prxattrib->ra[1], prxattrib->ra[2],
+ prxattrib->ra[3], prxattrib->ra[4], prxattrib->ra[5]));

+ /* calculate mic code */
+ if (stainfo) {
if (IS_MCAST(prxattrib->ra)) {
if (!psecuritypriv) {
res = _FAIL;
@@ -243,58 +253,115 @@ static int recvframe_chkmic(struct adapt
DBG_88E("\n %s: didn't install group key!!!!!!!!!!\n", __func__);
goto exit;
}
- key_idx = prxattrib->key_index;
- memcpy(key, psecuritypriv->dot118021XGrpKey[key_idx].skey, 16);
- memcpy(key + 16, psecuritypriv->dot118021XGrprxmickey[key_idx].skey, 16);
+ mickey = &psecuritypriv->dot118021XGrprxmickey[prxattrib->key_index].skey[0];
+
+ RT_TRACE(_module_rtl871x_recv_c_, _drv_info_,
+ ("\n %s: bcmc key\n", __func__));
} else {
- key_idx = 0;
- memcpy(key, stainfo->dot118021x_UncstKey.skey, 16);
- memcpy(key + 16, stainfo->dot11tkiprxmickey.skey, 16);
+ mickey = &stainfo->dot11tkiprxmickey.skey[0];
+ RT_TRACE(_module_rtl871x_recv_c_, _drv_err_,
+ ("\n %s: unicast key\n", __func__));
}

- if (!crypto_ops) {
- res = _FAIL;
- goto exit_lib80211_tkip;
- }
+ /* icv_len included the mic code */
+ datalen = precvframe->pkt->len-prxattrib->hdrlen -
+ prxattrib->iv_len-prxattrib->icv_len-8;
+ pframe = precvframe->pkt->data;
+ payload = pframe+prxattrib->hdrlen+prxattrib->iv_len;
+
+ RT_TRACE(_module_rtl871x_recv_c_, _drv_info_, ("\n prxattrib->iv_len=%d prxattrib->icv_len=%d\n", prxattrib->iv_len, prxattrib->icv_len));
+ rtw_seccalctkipmic(mickey, pframe, payload, datalen, &miccode[0],
+ (unsigned char)prxattrib->priority); /* care the length of the data */

- memcpy(iv, pframe + prxattrib->hdrlen, iv_len);
- memcpy(icv, pframe + skb->len - icv_len, icv_len);
- memmove(pframe + iv_len, pframe, prxattrib->hdrlen);
+ pframemic = payload+datalen;

- skb_pull(skb, iv_len);
- skb_trim(skb, skb->len - icv_len);
+ bmic_err = false;

- crypto_private = crypto_ops->init(key_idx);
- if (!crypto_private) {
- res = _FAIL;
- goto exit_lib80211_tkip;
- }
- if (crypto_ops->set_key(key, key_length, NULL, crypto_private) < 0) {
- res = _FAIL;
- goto exit_lib80211_tkip;
+ for (i = 0; i < 8; i++) {
+ if (miccode[i] != *(pframemic+i)) {
+ RT_TRACE(_module_rtl871x_recv_c_, _drv_err_,
+ ("%s: miccode[%d](%02x)!=*(pframemic+%d)(%02x) ",
+ __func__, i, miccode[i], i, *(pframemic + i)));
+ bmic_err = true;
+ }
}
- if (crypto_ops->decrypt_msdu(skb, key_idx, prxattrib->hdrlen, crypto_private)) {
+
+ if (bmic_err) {
+ RT_TRACE(_module_rtl871x_recv_c_, _drv_err_,
+ ("\n *(pframemic-8)-*(pframemic-1)=0x%02x:0x%02x:0x%02x:0x%02x:0x%02x:0x%02x:0x%02x:0x%02x\n",
+ *(pframemic-8), *(pframemic-7), *(pframemic-6),
+ *(pframemic-5), *(pframemic-4), *(pframemic-3),
+ *(pframemic-2), *(pframemic-1)));
+ RT_TRACE(_module_rtl871x_recv_c_, _drv_err_,
+ ("\n *(pframemic-16)-*(pframemic-9)=0x%02x:0x%02x:0x%02x:0x%02x:0x%02x:0x%02x:0x%02x:0x%02x\n",
+ *(pframemic-16), *(pframemic-15), *(pframemic-14),
+ *(pframemic-13), *(pframemic-12), *(pframemic-11),
+ *(pframemic-10), *(pframemic-9)));
+ {
+ uint i;
+
+ RT_TRACE(_module_rtl871x_recv_c_, _drv_err_,
+ ("\n ======demp packet (len=%d)======\n",
+ precvframe->pkt->len));
+ for (i = 0; i < precvframe->pkt->len; i += 8) {
+ RT_TRACE(_module_rtl871x_recv_c_,
+ _drv_err_,
+ ("0x%02x:0x%02x:0x%02x:0x%02x:0x%02x:0x%02x:0x%02x:0x%02x",
+ *(precvframe->pkt->data+i),
+ *(precvframe->pkt->data+i+1),
+ *(precvframe->pkt->data+i+2),
+ *(precvframe->pkt->data+i+3),
+ *(precvframe->pkt->data+i+4),
+ *(precvframe->pkt->data+i+5),
+ *(precvframe->pkt->data+i+6),
+ *(precvframe->pkt->data+i+7)));
+ }
+ RT_TRACE(_module_rtl871x_recv_c_,
+ _drv_err_,
+ ("\n ====== demp packet end [len=%d]======\n",
+ precvframe->pkt->len));
+ RT_TRACE(_module_rtl871x_recv_c_,
+ _drv_err_,
+ ("\n hrdlen=%d,\n",
+ prxattrib->hdrlen));
+ }
+
+ RT_TRACE(_module_rtl871x_recv_c_, _drv_err_,
+ ("ra=0x%.2x 0x%.2x 0x%.2x 0x%.2x 0x%.2x 0x%.2x psecuritypriv->binstallGrpkey=%d ",
+ prxattrib->ra[0], prxattrib->ra[1], prxattrib->ra[2],
+ prxattrib->ra[3], prxattrib->ra[4], prxattrib->ra[5], psecuritypriv->binstallGrpkey));
+
+ /* double check key_index for some timing issue , */
+ /* cannot compare with psecuritypriv->dot118021XGrpKeyid also cause timing issue */
+ if ((IS_MCAST(prxattrib->ra) == true) && (prxattrib->key_index != pmlmeinfo->key_index))
+ brpt_micerror = false;
+
+ if ((prxattrib->bdecrypted) && (brpt_micerror)) {
+ rtw_handle_tkip_mic_err(adapter, (u8)IS_MCAST(prxattrib->ra));
+ RT_TRACE(_module_rtl871x_recv_c_, _drv_err_, (" mic error :prxattrib->bdecrypted=%d ", prxattrib->bdecrypted));
+ DBG_88E(" mic error :prxattrib->bdecrypted=%d\n", prxattrib->bdecrypted);
+ } else {
+ RT_TRACE(_module_rtl871x_recv_c_, _drv_err_, (" mic error :prxattrib->bdecrypted=%d ", prxattrib->bdecrypted));
+ DBG_88E(" mic error :prxattrib->bdecrypted=%d\n", prxattrib->bdecrypted);
+ }
res = _FAIL;
- goto exit_lib80211_tkip;
+ } else {
+ /* mic checked ok */
+ if ((!psecuritypriv->bcheck_grpkey) && (IS_MCAST(prxattrib->ra))) {
+ psecuritypriv->bcheck_grpkey = true;
+ RT_TRACE(_module_rtl871x_recv_c_, _drv_err_, ("psecuritypriv->bcheck_grpkey = true"));
+ }
}
-
- memmove(pframe, pframe + iv_len, prxattrib->hdrlen);
- skb_push(skb, iv_len);
- skb_put(skb, icv_len);
-
- memcpy(pframe + prxattrib->hdrlen, iv, iv_len);
- memcpy(pframe + skb->len - icv_len, icv, icv_len);
-
-exit_lib80211_tkip:
- if (crypto_ops && crypto_private)
- crypto_ops->deinit(crypto_private);
} else {
RT_TRACE(_module_rtl871x_recv_c_, _drv_err_,
("%s: rtw_get_stainfo==NULL!!!\n", __func__));
}
+
+ skb_trim(precvframe->pkt, precvframe->pkt->len - 8);
}

exit:
+
return res;
}

--- a/drivers/staging/rtl8188eu/core/rtw_security.c
+++ b/drivers/staging/rtl8188eu/core/rtw_security.c
@@ -650,71 +650,71 @@ u32 rtw_tkip_encrypt(struct adapter *pad
return res;
}

+/* The hlen isn't include the IV */
u32 rtw_tkip_decrypt(struct adapter *padapter, u8 *precvframe)
-{
- struct rx_pkt_attrib *prxattrib = &((struct recv_frame *)precvframe)->attrib;
- u32 res = _SUCCESS;
+{ /* exclude ICV */
+ u16 pnl;
+ u32 pnh;
+ u8 rc4key[16];
+ u8 ttkey[16];
+ u8 crc[4];
+ struct arc4context mycontext;
+ int length;
+
+ u8 *pframe, *payload, *iv, *prwskey;
+ union pn48 dot11txpn;
+ struct sta_info *stainfo;
+ struct rx_pkt_attrib *prxattrib = &((struct recv_frame *)precvframe)->attrib;
+ struct security_priv *psecuritypriv = &padapter->securitypriv;
+ u32 res = _SUCCESS;
+
+
+ pframe = (unsigned char *)((struct recv_frame *)precvframe)->pkt->data;

/* 4 start to decrypt recvframe */
if (prxattrib->encrypt == _TKIP_) {
- struct sta_info *stainfo = rtw_get_stainfo(&padapter->stapriv, prxattrib->ta);
-
+ stainfo = rtw_get_stainfo(&padapter->stapriv, &prxattrib->ta[0]);
if (stainfo) {
- int key_idx;
- const int iv_len = 8, icv_len = 4, key_length = 32;
- void *crypto_private = NULL;
- struct sk_buff *skb = ((struct recv_frame *)precvframe)->pkt;
- u8 key[32], iv[8], icv[4], *pframe = skb->data;
- struct lib80211_crypto_ops *crypto_ops = try_then_request_module(lib80211_get_crypto_ops("TKIP"), "lib80211_crypt_tkip");
- struct security_priv *psecuritypriv = &padapter->securitypriv;
-
if (IS_MCAST(prxattrib->ra)) {
if (!psecuritypriv->binstallGrpkey) {
res = _FAIL;
DBG_88E("%s:rx bc/mc packets, but didn't install group key!!!!!!!!!!\n", __func__);
goto exit;
}
- key_idx = prxattrib->key_index;
- memcpy(key, psecuritypriv->dot118021XGrpKey[key_idx].skey, 16);
- memcpy(key + 16, psecuritypriv->dot118021XGrprxmickey[key_idx].skey, 16);
+ prwskey = psecuritypriv->dot118021XGrpKey[prxattrib->key_index].skey;
} else {
- key_idx = 0;
- memcpy(key, stainfo->dot118021x_UncstKey.skey, 16);
- memcpy(key + 16, stainfo->dot11tkiprxmickey.skey, 16);
+ RT_TRACE(_module_rtl871x_security_c_, _drv_err_, ("%s: stainfo!= NULL!!!\n", __func__));
+ prwskey = &stainfo->dot118021x_UncstKey.skey[0];
}

- if (!crypto_ops) {
- res = _FAIL;
- goto exit_lib80211_tkip;
- }
+ iv = pframe+prxattrib->hdrlen;
+ payload = pframe+prxattrib->iv_len+prxattrib->hdrlen;
+ length = ((struct recv_frame *)precvframe)->pkt->len-prxattrib->hdrlen-prxattrib->iv_len;

- memcpy(iv, pframe + prxattrib->hdrlen, iv_len);
- memcpy(icv, pframe + skb->len - icv_len, icv_len);
+ GET_TKIP_PN(iv, dot11txpn);

- crypto_private = crypto_ops->init(key_idx);
- if (!crypto_private) {
- res = _FAIL;
- goto exit_lib80211_tkip;
- }
- if (crypto_ops->set_key(key, key_length, NULL, crypto_private) < 0) {
- res = _FAIL;
- goto exit_lib80211_tkip;
- }
- if (crypto_ops->decrypt_mpdu(skb, prxattrib->hdrlen, crypto_private)) {
+ pnl = (u16)(dot11txpn.val);
+ pnh = (u32)(dot11txpn.val>>16);
+
+ phase1((u16 *)&ttkey[0], prwskey, &prxattrib->ta[0], pnh);
+ phase2(&rc4key[0], prwskey, (unsigned short *)&ttkey[0], pnl);
+
+ /* 4 decrypt payload include icv */
+
+ arcfour_init(&mycontext, rc4key, 16);
+ arcfour_encrypt(&mycontext, payload, payload, length);
+
+ *((__le32 *)crc) = getcrc32(payload, length-4);
+
+ if (crc[3] != payload[length-1] ||
+ crc[2] != payload[length-2] ||
+ crc[1] != payload[length-3] ||
+ crc[0] != payload[length-4]) {
+ RT_TRACE(_module_rtl871x_security_c_, _drv_err_,
+ ("rtw_wep_decrypt:icv error crc (%4ph)!=payload (%4ph)\n",
+ &crc, &payload[length-4]));
res = _FAIL;
- goto exit_lib80211_tkip;
}
-
- memmove(pframe, pframe + iv_len, prxattrib->hdrlen);
- skb_push(skb, iv_len);
- skb_put(skb, icv_len);
-
- memcpy(pframe + prxattrib->hdrlen, iv, iv_len);
- memcpy(pframe + skb->len - icv_len, icv, icv_len);
-
-exit_lib80211_tkip:
- if (crypto_ops && crypto_private)
- crypto_ops->deinit(crypto_private);
} else {
RT_TRACE(_module_rtl871x_security_c_, _drv_err_, ("rtw_tkip_decrypt: stainfo==NULL!!!\n"));
res = _FAIL;



2018-07-27 09:50:59

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.17 33/66] multicast: do not restore deleted record source filter mode to new one

4.17-stable review patch. If anyone has any objections, please let me know.

------------------

From: Hangbin Liu <[email protected]>

There are two scenarios that we will restore deleted records. The first is
when device down and up(or unmap/remap). In this scenario the new filter
mode is same with previous one. Because we get it from in_dev->mc_list and
we do not touch it during device down and up.

The other scenario is when a new socket join a group which was just delete
and not finish sending status reports. In this scenario, we should use the
current filter mode instead of restore old one. Here are 4 cases in total.

old_socket new_socket before_fix after_fix
IN(A) IN(A) ALLOW(A) ALLOW(A)
IN(A) EX( ) TO_IN( ) TO_EX( )
EX( ) IN(A) TO_EX( ) ALLOW(A)
EX( ) EX( ) TO_EX( ) TO_EX( )

Fixes: 24803f38a5c0b (igmp: do not remove igmp souce list info when set link down)
Fixes: 1666d49e1d416 (mld: do not remove mld souce list info when set link down)
Signed-off-by: Hangbin Liu <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
net/ipv4/igmp.c | 3 +--
net/ipv6/mcast.c | 3 +--
2 files changed, 2 insertions(+), 4 deletions(-)

--- a/net/ipv4/igmp.c
+++ b/net/ipv4/igmp.c
@@ -1201,8 +1201,7 @@ static void igmpv3_del_delrec(struct in_
if (pmc) {
im->interface = pmc->interface;
im->crcount = in_dev->mr_qrv ?: net->ipv4.sysctl_igmp_qrv;
- im->sfmode = pmc->sfmode;
- if (pmc->sfmode == MCAST_INCLUDE) {
+ if (im->sfmode == MCAST_INCLUDE) {
im->tomb = pmc->tomb;
im->sources = pmc->sources;
for (psf = im->sources; psf; psf = psf->sf_next)
--- a/net/ipv6/mcast.c
+++ b/net/ipv6/mcast.c
@@ -771,8 +771,7 @@ static void mld_del_delrec(struct inet6_
if (pmc) {
im->idev = pmc->idev;
im->mca_crcount = idev->mc_qrv;
- im->mca_sfmode = pmc->mca_sfmode;
- if (pmc->mca_sfmode == MCAST_INCLUDE) {
+ if (im->mca_sfmode == MCAST_INCLUDE) {
im->mca_tomb = pmc->mca_tomb;
im->mca_sources = pmc->mca_sources;
for (psf = im->mca_sources; psf; psf = psf->sf_next)



2018-07-27 09:51:04

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.17 42/66] tcp: free batches of packets in tcp_prune_ofo_queue()

4.17-stable review patch. If anyone has any objections, please let me know.

------------------

From: Eric Dumazet <[email protected]>

[ Upstream commit 72cd43ba64fc172a443410ce01645895850844c8 ]

Juha-Matti Tilli reported that malicious peers could inject tiny
packets in out_of_order_queue, forcing very expensive calls
to tcp_collapse_ofo_queue() and tcp_prune_ofo_queue() for
every incoming packet. out_of_order_queue rb-tree can contain
thousands of nodes, iterating over all of them is not nice.

Before linux-4.9, we would have pruned all packets in ofo_queue
in one go, every XXXX packets. XXXX depends on sk_rcvbuf and skbs
truesize, but is about 7000 packets with tcp_rmem[2] default of 6 MB.

Since we plan to increase tcp_rmem[2] in the future to cope with
modern BDP, can not revert to the old behavior, without great pain.

Strategy taken in this patch is to purge ~12.5 % of the queue capacity.

Fixes: 36a6503fedda ("tcp: refine tcp_prune_ofo_queue() to not drop all packets")
Signed-off-by: Eric Dumazet <[email protected]>
Reported-by: Juha-Matti Tilli <[email protected]>
Acked-by: Yuchung Cheng <[email protected]>
Acked-by: Soheil Hassas Yeganeh <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
net/ipv4/tcp_input.c | 15 +++++++++++----
1 file changed, 11 insertions(+), 4 deletions(-)

--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -4874,6 +4874,7 @@ new_range:
* 2) not add too big latencies if thousands of packets sit there.
* (But if application shrinks SO_RCVBUF, we could still end up
* freeing whole queue here)
+ * 3) Drop at least 12.5 % of sk_rcvbuf to avoid malicious attacks.
*
* Return true if queue has shrunk.
*/
@@ -4881,20 +4882,26 @@ static bool tcp_prune_ofo_queue(struct s
{
struct tcp_sock *tp = tcp_sk(sk);
struct rb_node *node, *prev;
+ int goal;

if (RB_EMPTY_ROOT(&tp->out_of_order_queue))
return false;

NET_INC_STATS(sock_net(sk), LINUX_MIB_OFOPRUNED);
+ goal = sk->sk_rcvbuf >> 3;
node = &tp->ooo_last_skb->rbnode;
do {
prev = rb_prev(node);
rb_erase(node, &tp->out_of_order_queue);
+ goal -= rb_to_skb(node)->truesize;
tcp_drop(sk, rb_to_skb(node));
- sk_mem_reclaim(sk);
- if (atomic_read(&sk->sk_rmem_alloc) <= sk->sk_rcvbuf &&
- !tcp_under_memory_pressure(sk))
- break;
+ if (!prev || goal <= 0) {
+ sk_mem_reclaim(sk);
+ if (atomic_read(&sk->sk_rmem_alloc) <= sk->sk_rcvbuf &&
+ !tcp_under_memory_pressure(sk))
+ break;
+ goal = sk->sk_rcvbuf >> 3;
+ }
node = prev;
} while (node);
tp->ooo_last_skb = rb_to_skb(prev);



2018-07-27 09:51:06

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.17 48/66] staging: speakup: fix wraparound in uaccess length check

4.17-stable review patch. If anyone has any objections, please let me know.

------------------

From: Samuel Thibault <[email protected]>

commit b96fba8d5855c3617adbfb43ca4723a808cac954 upstream.

If softsynthx_read() is called with `count < 3`, `count - 3` wraps, causing
the loop to copy as much data as available to the provided buffer. If
softsynthx_read() is invoked through sys_splice(), this causes an
unbounded kernel write; but even when userspace just reads from it
normally, a small size could cause userspace crashes.

Fixes: 425e586cf95b ("speakup: add unicode variant of /dev/softsynth")
Cc: [email protected]
Signed-off-by: Samuel Thibault <[email protected]>
Signed-off-by: Jann Horn <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
drivers/staging/speakup/speakup_soft.c | 6 +++++-
1 file changed, 5 insertions(+), 1 deletion(-)

--- a/drivers/staging/speakup/speakup_soft.c
+++ b/drivers/staging/speakup/speakup_soft.c
@@ -197,11 +197,15 @@ static ssize_t softsynthx_read(struct fi
int chars_sent = 0;
char __user *cp;
char *init;
+ size_t bytes_per_ch = unicode ? 3 : 1;
u16 ch;
int empty;
unsigned long flags;
DEFINE_WAIT(wait);

+ if (count < bytes_per_ch)
+ return -EINVAL;
+
spin_lock_irqsave(&speakup_info.spinlock, flags);
while (1) {
prepare_to_wait(&speakup_event, &wait, TASK_INTERRUPTIBLE);
@@ -227,7 +231,7 @@ static ssize_t softsynthx_read(struct fi
init = get_initstring();

/* Keep 3 bytes available for a 16bit UTF-8-encoded character */
- while (chars_sent <= count - 3) {
+ while (chars_sent <= count - bytes_per_ch) {
if (speakup_info.flushing) {
speakup_info.flushing = 0;
ch = '\x18';



2018-07-27 09:51:13

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.17 35/66] net/mlx5e: Dont allow aRFS for encapsulated packets

4.17-stable review patch. If anyone has any objections, please let me know.

------------------

From: Eran Ben Elisha <[email protected]>

[ Upstream commit d2e1c57bcf9a07cbb67f30ecf238f298799bce1c ]

Driver is yet to support aRFS for encapsulated packets, return early
error in such case.

Fixes: 18c908e477dc ("net/mlx5e: Add accelerated RFS support")
Signed-off-by: Eran Ben Elisha <[email protected]>
Reviewed-by: Tariq Toukan <[email protected]>
Signed-off-by: Saeed Mahameed <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
drivers/net/ethernet/mellanox/mlx5/core/en_arfs.c | 3 +++
1 file changed, 3 insertions(+)

--- a/drivers/net/ethernet/mellanox/mlx5/core/en_arfs.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_arfs.c
@@ -711,6 +711,9 @@ int mlx5e_rx_flow_steer(struct net_devic
skb->protocol != htons(ETH_P_IPV6))
return -EPROTONOSUPPORT;

+ if (skb->encapsulation)
+ return -EPROTONOSUPPORT;
+
arfs_t = arfs_get_table(arfs, arfs_get_ip_proto(skb), skb->protocol);
if (!arfs_t)
return -EPROTONOSUPPORT;



2018-07-27 09:51:13

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.17 44/66] tcp: detect malicious patterns in tcp_collapse_ofo_queue()

4.17-stable review patch. If anyone has any objections, please let me know.

------------------

From: Eric Dumazet <[email protected]>

[ Upstream commit 3d4bf93ac12003f9b8e1e2de37fe27983deebdcf ]

In case an attacker feeds tiny packets completely out of order,
tcp_collapse_ofo_queue() might scan the whole rb-tree, performing
expensive copies, but not changing socket memory usage at all.

1) Do not attempt to collapse tiny skbs.
2) Add logic to exit early when too many tiny skbs are detected.

We prefer not doing aggressive collapsing (which copies packets)
for pathological flows, and revert to tcp_prune_ofo_queue() which
will be less expensive.

In the future, we might add the possibility of terminating flows
that are proven to be malicious.

Signed-off-by: Eric Dumazet <[email protected]>
Acked-by: Soheil Hassas Yeganeh <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
net/ipv4/tcp_input.c | 15 +++++++++++++--
1 file changed, 13 insertions(+), 2 deletions(-)

--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -4834,6 +4834,7 @@ end:
static void tcp_collapse_ofo_queue(struct sock *sk)
{
struct tcp_sock *tp = tcp_sk(sk);
+ u32 range_truesize, sum_tiny = 0;
struct sk_buff *skb, *head;
u32 start, end;

@@ -4845,6 +4846,7 @@ new_range:
}
start = TCP_SKB_CB(skb)->seq;
end = TCP_SKB_CB(skb)->end_seq;
+ range_truesize = skb->truesize;

for (head = skb;;) {
skb = skb_rb_next(skb);
@@ -4855,11 +4857,20 @@ new_range:
if (!skb ||
after(TCP_SKB_CB(skb)->seq, end) ||
before(TCP_SKB_CB(skb)->end_seq, start)) {
- tcp_collapse(sk, NULL, &tp->out_of_order_queue,
- head, skb, start, end);
+ /* Do not attempt collapsing tiny skbs */
+ if (range_truesize != head->truesize ||
+ end - start >= SKB_WITH_OVERHEAD(SK_MEM_QUANTUM)) {
+ tcp_collapse(sk, NULL, &tp->out_of_order_queue,
+ head, skb, start, end);
+ } else {
+ sum_tiny += range_truesize;
+ if (sum_tiny > sk->sk_rcvbuf >> 3)
+ return;
+ }
goto new_range;
}

+ range_truesize += skb->truesize;
if (unlikely(before(TCP_SKB_CB(skb)->seq, start)))
start = TCP_SKB_CB(skb)->seq;
if (after(TCP_SKB_CB(skb)->end_seq, end))



2018-07-27 09:51:19

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.17 37/66] net/mlx5: Adjust clock overflow work period

4.17-stable review patch. If anyone has any objections, please let me know.

------------------

From: Ariel Levkovich <[email protected]>

[ Upstream commit 33180bee86a8940a84950edca46315cd9dd6deb5 ]

When driver converts HW timestamp to wall clock time it subtracts
the last saved cycle counter from the HW timestamp and converts the
difference to nanoseconds.
The conversion is done by multiplying the cycles difference with the
clock multiplier value as a first step and therefore the cycles
difference should be small enough so that the multiplication product
doesn't exceed 64bit.

The overflow handling routine is in charge of updating the last saved
cycle counter in driver and it is called periodically using kernel
delayed workqueue.

The delay period for this work is calculated using the max HW cycle
counter value (a 41 bit mask) as a base which doesn't take the 64bit
limit into account so the delay period may be incorrect and too
long to prevent a large difference between the HW counter and the last
saved counter in SW.

This change adjusts the work period for the HW clock overflow work by
taking the minimum between the previous value and the quotient of max
u64 value and the clock multiplier value.

Fixes: ef9814deafd0 ("net/mlx5e: Add HW timestamping (TS) support")
Signed-off-by: Ariel Levkovich <[email protected]>
Reviewed-by: Eran Ben Elisha <[email protected]>
Signed-off-by: Saeed Mahameed <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
drivers/net/ethernet/mellanox/mlx5/core/lib/clock.c | 12 ++++++++++--
1 file changed, 10 insertions(+), 2 deletions(-)

--- a/drivers/net/ethernet/mellanox/mlx5/core/lib/clock.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/lib/clock.c
@@ -487,6 +487,7 @@ void mlx5_pps_event(struct mlx5_core_dev
void mlx5_init_clock(struct mlx5_core_dev *mdev)
{
struct mlx5_clock *clock = &mdev->clock;
+ u64 overflow_cycles;
u64 ns;
u64 frac = 0;
u32 dev_freq;
@@ -510,10 +511,17 @@ void mlx5_init_clock(struct mlx5_core_de

/* Calculate period in seconds to call the overflow watchdog - to make
* sure counter is checked at least once every wrap around.
+ * The period is calculated as the minimum between max HW cycles count
+ * (The clock source mask) and max amount of cycles that can be
+ * multiplied by clock multiplier where the result doesn't exceed
+ * 64bits.
*/
- ns = cyclecounter_cyc2ns(&clock->cycles, clock->cycles.mask,
+ overflow_cycles = div64_u64(~0ULL >> 1, clock->cycles.mult);
+ overflow_cycles = min(overflow_cycles, clock->cycles.mask >> 1);
+
+ ns = cyclecounter_cyc2ns(&clock->cycles, overflow_cycles,
frac, &frac);
- do_div(ns, NSEC_PER_SEC / 2 / HZ);
+ do_div(ns, NSEC_PER_SEC / HZ);
clock->overflow_period = ns;

mdev->clock_info_page = alloc_page(GFP_KERNEL);



2018-07-27 09:51:31

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.17 31/66] sock: fix sg page frag coalescing in sk_alloc_sg

4.17-stable review patch. If anyone has any objections, please let me know.

------------------

From: Daniel Borkmann <[email protected]>

[ Upstream commit 144fe2bfd236dc814eae587aea7e2af03dbdd755 ]

Current sg coalescing logic in sk_alloc_sg() (latter is used by tls and
sockmap) is not quite correct in that we do fetch the previous sg entry,
however the subsequent check whether the refilled page frag from the
socket is still the same as from the last entry with prior offset and
length matching the start of the current buffer is comparing always the
first sg list entry instead of the prior one.

Fixes: 3c4d7559159b ("tls: kernel TLS support")
Signed-off-by: Daniel Borkmann <[email protected]>
Acked-by: Dave Watson <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
net/core/sock.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)

--- a/net/core/sock.c
+++ b/net/core/sock.c
@@ -2270,9 +2270,9 @@ int sk_alloc_sg(struct sock *sk, int len
pfrag->offset += use;

sge = sg + sg_curr - 1;
- if (sg_curr > first_coalesce && sg_page(sg) == pfrag->page &&
- sg->offset + sg->length == orig_offset) {
- sg->length += use;
+ if (sg_curr > first_coalesce && sg_page(sge) == pfrag->page &&
+ sge->offset + sge->length == orig_offset) {
+ sge->length += use;
} else {
sge = sg + sg_curr;
sg_unmark_end(sge);



2018-07-27 09:51:34

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.17 50/66] usb: core: handle hub C_PORT_OVER_CURRENT condition

4.17-stable review patch. If anyone has any objections, please let me know.

------------------

From: Bin Liu <[email protected]>

commit 249a32b7eeb3edb6897dd38f89651a62163ac4ed upstream.

Based on USB2.0 Spec Section 11.12.5,

"If a hub has per-port power switching and per-port current limiting,
an over-current on one port may still cause the power on another port
to fall below specific minimums. In this case, the affected port is
placed in the Power-Off state and C_PORT_OVER_CURRENT is set for the
port, but PORT_OVER_CURRENT is not set."

so let's check C_PORT_OVER_CURRENT too for over current condition.

Fixes: 08d1dec6f405 ("usb:hub set hub->change_bits when over-current happens")
Cc: <[email protected]>
Tested-by: Alessandro Antenucci <[email protected]>
Signed-off-by: Bin Liu <[email protected]>
Acked-by: Alan Stern <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
drivers/usb/core/hub.c | 8 ++++++--
1 file changed, 6 insertions(+), 2 deletions(-)

--- a/drivers/usb/core/hub.c
+++ b/drivers/usb/core/hub.c
@@ -1142,10 +1142,14 @@ static void hub_activate(struct usb_hub

if (!udev || udev->state == USB_STATE_NOTATTACHED) {
/* Tell hub_wq to disconnect the device or
- * check for a new connection
+ * check for a new connection or over current condition.
+ * Based on USB2.0 Spec Section 11.12.5,
+ * C_PORT_OVER_CURRENT could be set while
+ * PORT_OVER_CURRENT is not. So check for any of them.
*/
if (udev || (portstatus & USB_PORT_STAT_CONNECTION) ||
- (portstatus & USB_PORT_STAT_OVERCURRENT))
+ (portstatus & USB_PORT_STAT_OVERCURRENT) ||
+ (portchange & USB_PORT_STAT_C_OVERCURRENT))
set_bit(port1, hub->change_bits);

} else if (portstatus & USB_PORT_STAT_ENABLE) {



2018-07-27 09:51:42

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.17 62/66] can: xilinx_can: fix incorrect clear of non-processed interrupts

4.17-stable review patch. If anyone has any objections, please let me know.

------------------

From: Anssi Hannula <[email protected]>

commit 2f4f0f338cf453bfcdbcf089e177c16f35f023c8 upstream.

xcan_interrupt() clears ERROR|RXOFLV|BSOFF|ARBLST interrupts if any of
them is asserted. This does not take into account that some of them
could have been asserted between interrupt status read and interrupt
clear, therefore clearing them without handling them.

Fix the code to only clear those interrupts that it knows are asserted
and therefore going to be processed in xcan_err_interrupt().

Fixes: b1201e44f50b ("can: xilinx CAN controller support")
Signed-off-by: Anssi Hannula <[email protected]>
Cc: Michal Simek <[email protected]>
Cc: <[email protected]>
Signed-off-by: Marc Kleine-Budde <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
drivers/net/can/xilinx_can.c | 10 +++++-----
1 file changed, 5 insertions(+), 5 deletions(-)

--- a/drivers/net/can/xilinx_can.c
+++ b/drivers/net/can/xilinx_can.c
@@ -938,6 +938,7 @@ static irqreturn_t xcan_interrupt(int ir
struct net_device *ndev = (struct net_device *)dev_id;
struct xcan_priv *priv = netdev_priv(ndev);
u32 isr, ier;
+ u32 isr_errors;

/* Get the interrupt status from Xilinx CAN */
isr = priv->read_reg(priv, XCAN_ISR_OFFSET);
@@ -956,11 +957,10 @@ static irqreturn_t xcan_interrupt(int ir
xcan_tx_interrupt(ndev, isr);

/* Check for the type of error interrupt and Processing it */
- if (isr & (XCAN_IXR_ERROR_MASK | XCAN_IXR_RXOFLW_MASK |
- XCAN_IXR_BSOFF_MASK | XCAN_IXR_ARBLST_MASK)) {
- priv->write_reg(priv, XCAN_ICR_OFFSET, (XCAN_IXR_ERROR_MASK |
- XCAN_IXR_RXOFLW_MASK | XCAN_IXR_BSOFF_MASK |
- XCAN_IXR_ARBLST_MASK));
+ isr_errors = isr & (XCAN_IXR_ERROR_MASK | XCAN_IXR_RXOFLW_MASK |
+ XCAN_IXR_BSOFF_MASK | XCAN_IXR_ARBLST_MASK);
+ if (isr_errors) {
+ priv->write_reg(priv, XCAN_ICR_OFFSET, isr_errors);
xcan_err_interrupt(ndev, isr);
}




2018-07-27 09:51:48

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.17 65/66] can: m_can: Fix runtime resume call

4.17-stable review patch. If anyone has any objections, please let me know.

------------------

From: Faiz Abbas <[email protected]>

commit 1675bee3e732c2449e792feed9caff804f3bd42c upstream.

pm_runtime_get_sync() returns a 1 if the state of the device is already
'active'. This is not a failure case and should return a success.

Therefore fix error handling for pm_runtime_get_sync() call such that
it returns success when the value is 1.

Also cleanup the TODO for using runtime PM for sleep mode as that is
implemented.

Signed-off-by: Faiz Abbas <[email protected]>
Cc: <[email protected]
Signed-off-by: Marc Kleine-Budde <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
drivers/net/can/m_can/m_can.c | 8 ++++----
1 file changed, 4 insertions(+), 4 deletions(-)

--- a/drivers/net/can/m_can/m_can.c
+++ b/drivers/net/can/m_can/m_can.c
@@ -634,10 +634,12 @@ static int m_can_clk_start(struct m_can_
int err;

err = pm_runtime_get_sync(priv->device);
- if (err)
+ if (err < 0) {
pm_runtime_put_noidle(priv->device);
+ return err;
+ }

- return err;
+ return 0;
}

static void m_can_clk_stop(struct m_can_priv *priv)
@@ -1687,8 +1689,6 @@ failed_ret:
return ret;
}

-/* TODO: runtime PM with power down or sleep mode */
-
static __maybe_unused int m_can_suspend(struct device *dev)
{
struct net_device *ndev = dev_get_drvdata(dev);



2018-07-27 09:51:57

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.17 38/66] rtnetlink: add rtnl_link_state check in rtnl_configure_link

4.17-stable review patch. If anyone has any objections, please let me know.

------------------

From: Roopa Prabhu <[email protected]>

[ Upstream commit 5025f7f7d506fba9b39e7fe8ca10f6f34cb9bc2d ]

rtnl_configure_link sets dev->rtnl_link_state to
RTNL_LINK_INITIALIZED and unconditionally calls
__dev_notify_flags to notify user-space of dev flags.

current call sequence for rtnl_configure_link
rtnetlink_newlink
rtnl_link_ops->newlink
rtnl_configure_link (unconditionally notifies userspace of
default and new dev flags)

If a newlink handler wants to call rtnl_configure_link
early, we will end up with duplicate notifications to
user-space.

This patch fixes rtnl_configure_link to check rtnl_link_state
and call __dev_notify_flags with gchanges = 0 if already
RTNL_LINK_INITIALIZED.

Later in the series, this patch will help the following sequence
where a driver implementing newlink can call rtnl_configure_link
to initialize the link early.

makes the following call sequence work:
rtnetlink_newlink
rtnl_link_ops->newlink (vxlan) -> rtnl_configure_link (initializes
link and notifies
user-space of default
dev flags)
rtnl_configure_link (updates dev flags if requested by user ifm
and notifies user-space of new dev flags)

Signed-off-by: Roopa Prabhu <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
net/core/rtnetlink.c | 9 ++++++---
1 file changed, 6 insertions(+), 3 deletions(-)

--- a/net/core/rtnetlink.c
+++ b/net/core/rtnetlink.c
@@ -2749,9 +2749,12 @@ int rtnl_configure_link(struct net_devic
return err;
}

- dev->rtnl_link_state = RTNL_LINK_INITIALIZED;
-
- __dev_notify_flags(dev, old_flags, ~0U);
+ if (dev->rtnl_link_state == RTNL_LINK_INITIALIZED) {
+ __dev_notify_flags(dev, old_flags, 0U);
+ } else {
+ dev->rtnl_link_state = RTNL_LINK_INITIALIZED;
+ __dev_notify_flags(dev, old_flags, ~0U);
+ }
return 0;
}
EXPORT_SYMBOL(rtnl_configure_link);



2018-07-27 09:51:58

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.17 66/66] can: m_can.c: fix setup of CCCR register: clear CCCR NISO bit before checking can.ctrlmode

4.17-stable review patch. If anyone has any objections, please let me know.

------------------

From: Roman Fietze <[email protected]>

commit 393753b217f05474e714aea36c37501546ed1202 upstream.

Inside m_can_chip_config(), when setting up the new value of the CCCR,
the CCCR_NISO bit is not cleared like the others, CCCR_TEST, CCCR_MON,
CCCR_BRSE and CCCR_FDOE, before checking the can.ctrlmode bits for
CAN_CTRLMODE_FD_NON_ISO.

This way once the controller was configured for CAN_CTRLMODE_FD_NON_ISO,
this mode could never be cleared again.

This fix is only relevant for controllers with version 3.1.x or 3.2.x.
Older versions do not support NISO.

Signed-off-by: Roman Fietze <[email protected]>
Cc: linux-stable <[email protected]>
Signed-off-by: Marc Kleine-Budde <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
drivers/net/can/m_can/m_can.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)

--- a/drivers/net/can/m_can/m_can.c
+++ b/drivers/net/can/m_can/m_can.c
@@ -1111,7 +1111,8 @@ static void m_can_chip_config(struct net

} else {
/* Version 3.1.x or 3.2.x */
- cccr &= ~(CCCR_TEST | CCCR_MON | CCCR_BRSE | CCCR_FDOE);
+ cccr &= ~(CCCR_TEST | CCCR_MON | CCCR_BRSE | CCCR_FDOE |
+ CCCR_NISO);

/* Only 3.2.x has NISO Bit implemented */
if (priv->can.ctrlmode & CAN_CTRLMODE_FD_NON_ISO)



2018-07-27 09:52:04

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.17 53/66] usb: gadget: Fix OS descriptors support

4.17-stable review patch. If anyone has any objections, please let me know.

------------------

From: Benjamin Herrenschmidt <[email protected]>

commit 50b9773c13bffbef32060e67c4483ea7b2eca7b5 upstream.

The current code is broken as it re-defines "req" inside the
if block, then goto out of it. Thus the request that ends
up being sent is not the one that was populated by the
code in question.

This fixes RNDIS driver autodetect by Windows 10 for me.

The bug was introduced by Chris rework to remove the local
queuing inside the if { } block of the redefined request.

Fixes: 636ba13aec8a ("usb: gadget: composite: remove duplicated code in OS desc handling")
Cc: <[email protected]> # v4.17
Signed-off-by: Benjamin Herrenschmidt <[email protected]>
Signed-off-by: Felipe Balbi <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
drivers/usb/gadget/composite.c | 1 -
1 file changed, 1 deletion(-)

--- a/drivers/usb/gadget/composite.c
+++ b/drivers/usb/gadget/composite.c
@@ -1816,7 +1816,6 @@ unknown:
if (cdev->use_os_string && cdev->os_desc_config &&
(ctrl->bRequestType & USB_TYPE_VENDOR) &&
ctrl->bRequest == cdev->b_vendor_code) {
- struct usb_request *req;
struct usb_configuration *os_desc_cfg;
u8 *buf;
int interface;



2018-07-27 09:52:05

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.17 39/66] vxlan: add new fdb alloc and create helpers

4.17-stable review patch. If anyone has any objections, please let me know.

------------------

From: Roopa Prabhu <[email protected]>

[ Upstream commit 7431016b107c95cb5b2014aa1901fcb115f746bc ]

- Add new vxlan_fdb_alloc helper
- rename existing vxlan_fdb_create into vxlan_fdb_update:
because it really creates or updates an existing
fdb entry
- move new fdb creation into a separate vxlan_fdb_create

Main motivation for this change is to introduce the ability
to decouple vxlan fdb creation and notify, used in a later patch.

Signed-off-by: Roopa Prabhu <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
drivers/net/vxlan.c | 91 +++++++++++++++++++++++++++++++++++-----------------
1 file changed, 62 insertions(+), 29 deletions(-)

--- a/drivers/net/vxlan.c
+++ b/drivers/net/vxlan.c
@@ -636,9 +636,62 @@ static int vxlan_gro_complete(struct soc
return eth_gro_complete(skb, nhoff + sizeof(struct vxlanhdr));
}

-/* Add new entry to forwarding table -- assumes lock held */
+static struct vxlan_fdb *vxlan_fdb_alloc(struct vxlan_dev *vxlan,
+ const u8 *mac, __u16 state,
+ __be32 src_vni, __u8 ndm_flags)
+{
+ struct vxlan_fdb *f;
+
+ f = kmalloc(sizeof(*f), GFP_ATOMIC);
+ if (!f)
+ return NULL;
+ f->state = state;
+ f->flags = ndm_flags;
+ f->updated = f->used = jiffies;
+ f->vni = src_vni;
+ INIT_LIST_HEAD(&f->remotes);
+ memcpy(f->eth_addr, mac, ETH_ALEN);
+
+ return f;
+}
+
static int vxlan_fdb_create(struct vxlan_dev *vxlan,
const u8 *mac, union vxlan_addr *ip,
+ __u16 state, __be16 port, __be32 src_vni,
+ __be32 vni, __u32 ifindex, __u8 ndm_flags,
+ struct vxlan_fdb **fdb)
+{
+ struct vxlan_rdst *rd = NULL;
+ struct vxlan_fdb *f;
+ int rc;
+
+ if (vxlan->cfg.addrmax &&
+ vxlan->addrcnt >= vxlan->cfg.addrmax)
+ return -ENOSPC;
+
+ netdev_dbg(vxlan->dev, "add %pM -> %pIS\n", mac, ip);
+ f = vxlan_fdb_alloc(vxlan, mac, state, src_vni, ndm_flags);
+ if (!f)
+ return -ENOMEM;
+
+ rc = vxlan_fdb_append(f, ip, port, vni, ifindex, &rd);
+ if (rc < 0) {
+ kfree(f);
+ return rc;
+ }
+
+ ++vxlan->addrcnt;
+ hlist_add_head_rcu(&f->hlist,
+ vxlan_fdb_head(vxlan, mac, src_vni));
+
+ *fdb = f;
+
+ return 0;
+}
+
+/* Add new entry to forwarding table -- assumes lock held */
+static int vxlan_fdb_update(struct vxlan_dev *vxlan,
+ const u8 *mac, union vxlan_addr *ip,
__u16 state, __u16 flags,
__be16 port, __be32 src_vni, __be32 vni,
__u32 ifindex, __u8 ndm_flags)
@@ -687,37 +740,17 @@ static int vxlan_fdb_create(struct vxlan
if (!(flags & NLM_F_CREATE))
return -ENOENT;

- if (vxlan->cfg.addrmax &&
- vxlan->addrcnt >= vxlan->cfg.addrmax)
- return -ENOSPC;
-
/* Disallow replace to add a multicast entry */
if ((flags & NLM_F_REPLACE) &&
(is_multicast_ether_addr(mac) || is_zero_ether_addr(mac)))
return -EOPNOTSUPP;

netdev_dbg(vxlan->dev, "add %pM -> %pIS\n", mac, ip);
- f = kmalloc(sizeof(*f), GFP_ATOMIC);
- if (!f)
- return -ENOMEM;
-
- notify = 1;
- f->state = state;
- f->flags = ndm_flags;
- f->updated = f->used = jiffies;
- f->vni = src_vni;
- INIT_LIST_HEAD(&f->remotes);
- memcpy(f->eth_addr, mac, ETH_ALEN);
-
- rc = vxlan_fdb_append(f, ip, port, vni, ifindex, &rd);
- if (rc < 0) {
- kfree(f);
+ rc = vxlan_fdb_create(vxlan, mac, ip, state, port, src_vni,
+ vni, ifindex, ndm_flags, &f);
+ if (rc < 0)
return rc;
- }
-
- ++vxlan->addrcnt;
- hlist_add_head_rcu(&f->hlist,
- vxlan_fdb_head(vxlan, mac, src_vni));
+ notify = 1;
}

if (notify) {
@@ -863,7 +896,7 @@ static int vxlan_fdb_add(struct ndmsg *n
return -EAFNOSUPPORT;

spin_lock_bh(&vxlan->hash_lock);
- err = vxlan_fdb_create(vxlan, addr, &ip, ndm->ndm_state, flags,
+ err = vxlan_fdb_update(vxlan, addr, &ip, ndm->ndm_state, flags,
port, src_vni, vni, ifindex, ndm->ndm_flags);
spin_unlock_bh(&vxlan->hash_lock);

@@ -1006,7 +1039,7 @@ static bool vxlan_snoop(struct net_devic

/* close off race between vxlan_flush and incoming packets */
if (netif_running(dev))
- vxlan_fdb_create(vxlan, src_mac, src_ip,
+ vxlan_fdb_update(vxlan, src_mac, src_ip,
NUD_REACHABLE,
NLM_F_EXCL|NLM_F_CREATE,
vxlan->cfg.dst_port,
@@ -3165,7 +3198,7 @@ static int __vxlan_dev_create(struct net

/* create an fdb entry for a valid default destination */
if (!vxlan_addr_any(&vxlan->default_dst.remote_ip)) {
- err = vxlan_fdb_create(vxlan, all_zeros_mac,
+ err = vxlan_fdb_update(vxlan, all_zeros_mac,
&vxlan->default_dst.remote_ip,
NUD_REACHABLE | NUD_PERMANENT,
NLM_F_EXCL | NLM_F_CREATE,
@@ -3439,7 +3472,7 @@ static int vxlan_changelink(struct net_d
old_dst.remote_ifindex, 0);

if (!vxlan_addr_any(&dst->remote_ip)) {
- err = vxlan_fdb_create(vxlan, all_zeros_mac,
+ err = vxlan_fdb_update(vxlan, all_zeros_mac,
&dst->remote_ip,
NUD_REACHABLE | NUD_PERMANENT,
NLM_F_CREATE | NLM_F_APPEND,



2018-07-27 09:52:13

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.17 59/66] can: xilinx_can: fix recovery from error states not being propagated

4.17-stable review patch. If anyone has any objections, please let me know.

------------------

From: Anssi Hannula <[email protected]>

commit 877e0b75947e2c7acf5624331bb17ceb093c98ae upstream.

The xilinx_can driver contains no mechanism for propagating recovery
from CAN_STATE_ERROR_WARNING and CAN_STATE_ERROR_PASSIVE.

Add such a mechanism by factoring the handling of
XCAN_STATE_ERROR_PASSIVE and XCAN_STATE_ERROR_WARNING out of
xcan_err_interrupt and checking for recovery after RX and TX if the
interface is in one of those states.

Tested with the integrated CAN on Zynq-7000 SoC.

Fixes: b1201e44f50b ("can: xilinx CAN controller support")
Signed-off-by: Anssi Hannula <[email protected]>
Cc: <[email protected]>
Signed-off-by: Marc Kleine-Budde <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
drivers/net/can/xilinx_can.c | 155 +++++++++++++++++++++++++++++++++++--------
1 file changed, 127 insertions(+), 28 deletions(-)

--- a/drivers/net/can/xilinx_can.c
+++ b/drivers/net/can/xilinx_can.c
@@ -2,6 +2,7 @@
*
* Copyright (C) 2012 - 2014 Xilinx, Inc.
* Copyright (C) 2009 PetaLogix. All rights reserved.
+ * Copyright (C) 2017 Sandvik Mining and Construction Oy
*
* Description:
* This driver is developed for Axi CAN IP and for Zynq CANPS Controller.
@@ -530,6 +531,123 @@ static int xcan_rx(struct net_device *nd
}

/**
+ * xcan_current_error_state - Get current error state from HW
+ * @ndev: Pointer to net_device structure
+ *
+ * Checks the current CAN error state from the HW. Note that this
+ * only checks for ERROR_PASSIVE and ERROR_WARNING.
+ *
+ * Return:
+ * ERROR_PASSIVE or ERROR_WARNING if either is active, ERROR_ACTIVE
+ * otherwise.
+ */
+static enum can_state xcan_current_error_state(struct net_device *ndev)
+{
+ struct xcan_priv *priv = netdev_priv(ndev);
+ u32 status = priv->read_reg(priv, XCAN_SR_OFFSET);
+
+ if ((status & XCAN_SR_ESTAT_MASK) == XCAN_SR_ESTAT_MASK)
+ return CAN_STATE_ERROR_PASSIVE;
+ else if (status & XCAN_SR_ERRWRN_MASK)
+ return CAN_STATE_ERROR_WARNING;
+ else
+ return CAN_STATE_ERROR_ACTIVE;
+}
+
+/**
+ * xcan_set_error_state - Set new CAN error state
+ * @ndev: Pointer to net_device structure
+ * @new_state: The new CAN state to be set
+ * @cf: Error frame to be populated or NULL
+ *
+ * Set new CAN error state for the device, updating statistics and
+ * populating the error frame if given.
+ */
+static void xcan_set_error_state(struct net_device *ndev,
+ enum can_state new_state,
+ struct can_frame *cf)
+{
+ struct xcan_priv *priv = netdev_priv(ndev);
+ u32 ecr = priv->read_reg(priv, XCAN_ECR_OFFSET);
+ u32 txerr = ecr & XCAN_ECR_TEC_MASK;
+ u32 rxerr = (ecr & XCAN_ECR_REC_MASK) >> XCAN_ESR_REC_SHIFT;
+
+ priv->can.state = new_state;
+
+ if (cf) {
+ cf->can_id |= CAN_ERR_CRTL;
+ cf->data[6] = txerr;
+ cf->data[7] = rxerr;
+ }
+
+ switch (new_state) {
+ case CAN_STATE_ERROR_PASSIVE:
+ priv->can.can_stats.error_passive++;
+ if (cf)
+ cf->data[1] = (rxerr > 127) ?
+ CAN_ERR_CRTL_RX_PASSIVE :
+ CAN_ERR_CRTL_TX_PASSIVE;
+ break;
+ case CAN_STATE_ERROR_WARNING:
+ priv->can.can_stats.error_warning++;
+ if (cf)
+ cf->data[1] |= (txerr > rxerr) ?
+ CAN_ERR_CRTL_TX_WARNING :
+ CAN_ERR_CRTL_RX_WARNING;
+ break;
+ case CAN_STATE_ERROR_ACTIVE:
+ if (cf)
+ cf->data[1] |= CAN_ERR_CRTL_ACTIVE;
+ break;
+ default:
+ /* non-ERROR states are handled elsewhere */
+ WARN_ON(1);
+ break;
+ }
+}
+
+/**
+ * xcan_update_error_state_after_rxtx - Update CAN error state after RX/TX
+ * @ndev: Pointer to net_device structure
+ *
+ * If the device is in a ERROR-WARNING or ERROR-PASSIVE state, check if
+ * the performed RX/TX has caused it to drop to a lesser state and set
+ * the interface state accordingly.
+ */
+static void xcan_update_error_state_after_rxtx(struct net_device *ndev)
+{
+ struct xcan_priv *priv = netdev_priv(ndev);
+ enum can_state old_state = priv->can.state;
+ enum can_state new_state;
+
+ /* changing error state due to successful frame RX/TX can only
+ * occur from these states
+ */
+ if (old_state != CAN_STATE_ERROR_WARNING &&
+ old_state != CAN_STATE_ERROR_PASSIVE)
+ return;
+
+ new_state = xcan_current_error_state(ndev);
+
+ if (new_state != old_state) {
+ struct sk_buff *skb;
+ struct can_frame *cf;
+
+ skb = alloc_can_err_skb(ndev, &cf);
+
+ xcan_set_error_state(ndev, new_state, skb ? cf : NULL);
+
+ if (skb) {
+ struct net_device_stats *stats = &ndev->stats;
+
+ stats->rx_packets++;
+ stats->rx_bytes += cf->can_dlc;
+ netif_rx(skb);
+ }
+ }
+}
+
+/**
* xcan_err_interrupt - error frame Isr
* @ndev: net_device pointer
* @isr: interrupt status register value
@@ -544,16 +662,12 @@ static void xcan_err_interrupt(struct ne
struct net_device_stats *stats = &ndev->stats;
struct can_frame *cf;
struct sk_buff *skb;
- u32 err_status, status, txerr = 0, rxerr = 0;
+ u32 err_status;

skb = alloc_can_err_skb(ndev, &cf);

err_status = priv->read_reg(priv, XCAN_ESR_OFFSET);
priv->write_reg(priv, XCAN_ESR_OFFSET, err_status);
- txerr = priv->read_reg(priv, XCAN_ECR_OFFSET) & XCAN_ECR_TEC_MASK;
- rxerr = ((priv->read_reg(priv, XCAN_ECR_OFFSET) &
- XCAN_ECR_REC_MASK) >> XCAN_ESR_REC_SHIFT);
- status = priv->read_reg(priv, XCAN_SR_OFFSET);

if (isr & XCAN_IXR_BSOFF_MASK) {
priv->can.state = CAN_STATE_BUS_OFF;
@@ -563,28 +677,10 @@ static void xcan_err_interrupt(struct ne
can_bus_off(ndev);
if (skb)
cf->can_id |= CAN_ERR_BUSOFF;
- } else if ((status & XCAN_SR_ESTAT_MASK) == XCAN_SR_ESTAT_MASK) {
- priv->can.state = CAN_STATE_ERROR_PASSIVE;
- priv->can.can_stats.error_passive++;
- if (skb) {
- cf->can_id |= CAN_ERR_CRTL;
- cf->data[1] = (rxerr > 127) ?
- CAN_ERR_CRTL_RX_PASSIVE :
- CAN_ERR_CRTL_TX_PASSIVE;
- cf->data[6] = txerr;
- cf->data[7] = rxerr;
- }
- } else if (status & XCAN_SR_ERRWRN_MASK) {
- priv->can.state = CAN_STATE_ERROR_WARNING;
- priv->can.can_stats.error_warning++;
- if (skb) {
- cf->can_id |= CAN_ERR_CRTL;
- cf->data[1] |= (txerr > rxerr) ?
- CAN_ERR_CRTL_TX_WARNING :
- CAN_ERR_CRTL_RX_WARNING;
- cf->data[6] = txerr;
- cf->data[7] = rxerr;
- }
+ } else {
+ enum can_state new_state = xcan_current_error_state(ndev);
+
+ xcan_set_error_state(ndev, new_state, skb ? cf : NULL);
}

/* Check for Arbitration lost interrupt */
@@ -714,8 +810,10 @@ static int xcan_rx_poll(struct napi_stru
isr = priv->read_reg(priv, XCAN_ISR_OFFSET);
}

- if (work_done)
+ if (work_done) {
can_led_event(ndev, CAN_LED_EVENT_RX);
+ xcan_update_error_state_after_rxtx(ndev);
+ }

if (work_done < quota) {
napi_complete_done(napi, work_done);
@@ -746,6 +844,7 @@ static void xcan_tx_interrupt(struct net
isr = priv->read_reg(priv, XCAN_ISR_OFFSET);
}
can_led_event(ndev, CAN_LED_EVENT_TX);
+ xcan_update_error_state_after_rxtx(ndev);
netif_wake_queue(ndev);
}




2018-07-27 09:52:15

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.17 60/66] can: xilinx_can: fix device dropping off bus on RX overrun

4.17-stable review patch. If anyone has any objections, please let me know.

------------------

From: Anssi Hannula <[email protected]>

commit 2574fe54515ed3487405de329e4e9f13d7098c10 upstream.

The xilinx_can driver performs a software reset when an RX overrun is
detected. This causes the device to enter Configuration mode where no
messages are received or transmitted.

The documentation does not mention any need to perform a reset on an RX
overrun, and testing by inducing an RX overflow also indicated that the
device continues to work just fine without a reset.

Remove the software reset.

Tested with the integrated CAN on Zynq-7000 SoC.

Fixes: b1201e44f50b ("can: xilinx CAN controller support")
Signed-off-by: Anssi Hannula <[email protected]>
Cc: <[email protected]>
Signed-off-by: Marc Kleine-Budde <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
drivers/net/can/xilinx_can.c | 1 -
1 file changed, 1 deletion(-)

--- a/drivers/net/can/xilinx_can.c
+++ b/drivers/net/can/xilinx_can.c
@@ -696,7 +696,6 @@ static void xcan_err_interrupt(struct ne
if (isr & XCAN_IXR_RXOFLW_MASK) {
stats->rx_over_errors++;
stats->rx_errors++;
- priv->write_reg(priv, XCAN_SRR_OFFSET, XCAN_SRR_RESET_MASK);
if (skb) {
cf->can_id |= CAN_ERR_CRTL;
cf->data[1] |= CAN_ERR_CRTL_RX_OVERFLOW;



2018-07-27 09:52:16

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.17 56/66] driver core: Partially revert "driver core: correct devices shutdown order"

4.17-stable review patch. If anyone has any objections, please let me know.

------------------

From: Rafael J. Wysocki <[email protected]>

commit 722e5f2b1eec7de61117b7c0a7914761e3da2eda upstream.

Commit 52cdbdd49853 (driver core: correct device's shutdown order)
introduced a regression by breaking device shutdown on some systems.

Namely, the devices_kset_move_last() call in really_probe() added by
that commit is a mistake as it may cause parents to follow children
in the devices_kset list which then causes shutdown to fail. For
example, if a device has children before really_probe() is called
for it (which is not uncommon), that call will cause it to be
reordered after the children in the devices_kset list and the
ordering of that list will not reflect the correct device shutdown
order any more.

Also it causes the devices_kset list to be constantly reordered
until all drivers have been probed which is totally pointless
overhead in the majority of cases and it only covered an issue
with system shutdown, while system-wide suspend/resume potentially
had the same issue on the affected platforms (which was not covered).

Moreover, the shutdown issue originally addressed by the change in
really_probe() made by commit 52cdbdd49853 is not present in 4.18-rc
any more, since dra7 started to use the sdhci-omap driver which
doesn't disable any regulators during shutdown, so the really_probe()
part of commit 52cdbdd49853 can be safely reverted. [The original
issue was related to the omap_hsmmc driver used by dra7 previously.]

For the above reasons, revert the really_probe() modifications made
by commit 52cdbdd49853.

The other code changes made by commit 52cdbdd49853 are useful and
they need not be reverted.

Fixes: 52cdbdd49853 (driver core: correct device's shutdown order)
Link: https://lore.kernel.org/lkml/CAFgQCTt7VfqM=UyCnvNFxrSw8Z6cUtAi3HUwR4_xPAc03SgHjQ@mail.gmail.com/
Reported-by: Pingfan Liu <[email protected]>
Tested-by: Pingfan Liu <[email protected]>
Reviewed-by: Kishon Vijay Abraham I <[email protected]>
Signed-off-by: Rafael J. Wysocki <[email protected]>
Cc: stable <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
drivers/base/dd.c | 8 --------
1 file changed, 8 deletions(-)

--- a/drivers/base/dd.c
+++ b/drivers/base/dd.c
@@ -436,14 +436,6 @@ re_probe:
goto probe_failed;
}

- /*
- * Ensure devices are listed in devices_kset in correct order
- * It's important to move Dev to the end of devices_kset before
- * calling .probe, because it could be recursive and parent Dev
- * should always go first
- */
- devices_kset_move_last(dev);
-
if (dev->bus->probe) {
ret = dev->bus->probe(dev);
if (ret)



2018-07-27 09:52:16

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.17 34/66] net/ipv6: Fix linklocal to global address with VRF

4.17-stable review patch. If anyone has any objections, please let me know.

------------------

From: David Ahern <[email protected]>

[ Upstream commit 24b711edfc34bc45777a3f068812b7d1ed004a5d ]

Example setup:
host: ip -6 addr add dev eth1 2001:db8:104::4
where eth1 is enslaved to a VRF

switch: ip -6 ro add 2001:db8:104::4/128 dev br1
where br1 only has an LLA

ping6 2001:db8:104::4
ssh 2001:db8:104::4

(NOTE: UDP works fine if the PKTINFO has the address set to the global
address and ifindex is set to the index of eth1 with a destination an
LLA).

For ICMP, icmp6_iif needs to be updated to check if skb->dev is an
L3 master. If it is then return the ifindex from rt6i_idev similar
to what is done for loopback.

For TCP, restore the original tcp_v6_iif definition which is needed in
most places and add a new tcp_v6_iif_l3_slave that considers the
l3_slave variability. This latter check is only needed for socket
lookups.

Fixes: 9ff74384600a ("net: vrf: Handle ipv6 multicast and link-local addresses")
Signed-off-by: David Ahern <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
include/net/tcp.h | 5 +++++
net/ipv6/icmp.c | 5 +++--
net/ipv6/tcp_ipv6.c | 6 ++++--
3 files changed, 12 insertions(+), 4 deletions(-)

--- a/include/net/tcp.h
+++ b/include/net/tcp.h
@@ -829,6 +829,11 @@ struct tcp_skb_cb {
*/
static inline int tcp_v6_iif(const struct sk_buff *skb)
{
+ return TCP_SKB_CB(skb)->header.h6.iif;
+}
+
+static inline int tcp_v6_iif_l3_slave(const struct sk_buff *skb)
+{
bool l3_slave = ipv6_l3mdev_skb(TCP_SKB_CB(skb)->header.h6.flags);

return l3_slave ? skb->skb_iif : TCP_SKB_CB(skb)->header.h6.iif;
--- a/net/ipv6/icmp.c
+++ b/net/ipv6/icmp.c
@@ -402,9 +402,10 @@ static int icmp6_iif(const struct sk_buf

/* for local traffic to local address, skb dev is the loopback
* device. Check if there is a dst attached to the skb and if so
- * get the real device index.
+ * get the real device index. Same is needed for replies to a link
+ * local address on a device enslaved to an L3 master device
*/
- if (unlikely(iif == LOOPBACK_IFINDEX)) {
+ if (unlikely(iif == LOOPBACK_IFINDEX || netif_is_l3_master(skb->dev))) {
const struct rt6_info *rt6 = skb_rt6_info(skb);

if (rt6)
--- a/net/ipv6/tcp_ipv6.c
+++ b/net/ipv6/tcp_ipv6.c
@@ -934,7 +934,8 @@ static void tcp_v6_send_reset(const stru
&tcp_hashinfo, NULL, 0,
&ipv6h->saddr,
th->source, &ipv6h->daddr,
- ntohs(th->source), tcp_v6_iif(skb),
+ ntohs(th->source),
+ tcp_v6_iif_l3_slave(skb),
tcp_v6_sdif(skb));
if (!sk1)
goto out;
@@ -1605,7 +1606,8 @@ do_time_wait:
skb, __tcp_hdrlen(th),
&ipv6_hdr(skb)->saddr, th->source,
&ipv6_hdr(skb)->daddr,
- ntohs(th->dest), tcp_v6_iif(skb),
+ ntohs(th->dest),
+ tcp_v6_iif_l3_slave(skb),
sdif);
if (sk2) {
struct inet_timewait_sock *tw = inet_twsk(sk);



2018-07-27 09:52:18

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.17 61/66] can: xilinx_can: keep only 1-2 frames in TX FIFO to fix TX accounting

4.17-stable review patch. If anyone has any objections, please let me know.

------------------

From: Anssi Hannula <[email protected]>

commit 620050d9c2be15c47017ba95efe59e0832e99a56 upstream.

The xilinx_can driver assumes that the TXOK interrupt only clears after
it has been acknowledged as many times as there have been successfully
sent frames.

However, the documentation does not mention such behavior, instead
saying just that the interrupt is cleared when the clear bit is set.

Similarly, testing seems to also suggest that it is immediately cleared
regardless of the amount of frames having been sent. Performing some
heavy TX load and then going back to idle has the tx_head drifting
further away from tx_tail over time, steadily reducing the amount of
frames the driver keeps in the TX FIFO (but not to zero, as the TXOK
interrupt always frees up space for 1 frame from the driver's
perspective, so frames continue to be sent) and delaying the local echo
frames.

The TX FIFO tracking is also otherwise buggy as it does not account for
TX FIFO being cleared after software resets, causing
BUG!, TX FIFO full when queue awake!
messages to be output.

There does not seem to be any way to accurately track the state of the
TX FIFO for local echo support while using the full TX FIFO.

The Zynq version of the HW (but not the soft-AXI version) has watermark
programming support and with it an additional TX-FIFO-empty interrupt
bit.

Modify the driver to only put 1 frame into TX FIFO at a time on soft-AXI
and 2 frames at a time on Zynq. On Zynq the TXFEMP interrupt bit is used
to detect whether 1 or 2 frames have been sent at interrupt processing
time.

Tested with the integrated CAN on Zynq-7000 SoC. The 1-frame-FIFO mode
was also tested.

An alternative way to solve this would be to drop local echo support but
keep using the full TX FIFO.

v2: Add FIFO space check before TX queue wake with locking to
synchronize with queue stop. This avoids waking the queue when xmit()
had just filled it.

v3: Keep local echo support and reduce the amount of frames in FIFO
instead as suggested by Marc Kleine-Budde.

Fixes: b1201e44f50b ("can: xilinx CAN controller support")
Signed-off-by: Anssi Hannula <[email protected]>
Cc: <[email protected]>
Signed-off-by: Marc Kleine-Budde <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
drivers/net/can/xilinx_can.c | 139 ++++++++++++++++++++++++++++++++++++++-----
1 file changed, 123 insertions(+), 16 deletions(-)

--- a/drivers/net/can/xilinx_can.c
+++ b/drivers/net/can/xilinx_can.c
@@ -26,8 +26,10 @@
#include <linux/module.h>
#include <linux/netdevice.h>
#include <linux/of.h>
+#include <linux/of_device.h>
#include <linux/platform_device.h>
#include <linux/skbuff.h>
+#include <linux/spinlock.h>
#include <linux/string.h>
#include <linux/types.h>
#include <linux/can/dev.h>
@@ -119,6 +121,7 @@ enum xcan_reg {
/**
* struct xcan_priv - This definition define CAN driver instance
* @can: CAN private data structure.
+ * @tx_lock: Lock for synchronizing TX interrupt handling
* @tx_head: Tx CAN packets ready to send on the queue
* @tx_tail: Tx CAN packets successfully sended on the queue
* @tx_max: Maximum number packets the driver can send
@@ -133,6 +136,7 @@ enum xcan_reg {
*/
struct xcan_priv {
struct can_priv can;
+ spinlock_t tx_lock;
unsigned int tx_head;
unsigned int tx_tail;
unsigned int tx_max;
@@ -160,6 +164,11 @@ static const struct can_bittiming_const
.brp_inc = 1,
};

+#define XCAN_CAP_WATERMARK 0x0001
+struct xcan_devtype_data {
+ unsigned int caps;
+};
+
/**
* xcan_write_reg_le - Write a value to the device register little endian
* @priv: Driver private data structure
@@ -239,6 +248,10 @@ static int set_reset_mode(struct net_dev
usleep_range(500, 10000);
}

+ /* reset clears FIFOs */
+ priv->tx_head = 0;
+ priv->tx_tail = 0;
+
return 0;
}

@@ -393,6 +406,7 @@ static int xcan_start_xmit(struct sk_buf
struct net_device_stats *stats = &ndev->stats;
struct can_frame *cf = (struct can_frame *)skb->data;
u32 id, dlc, data[2] = {0, 0};
+ unsigned long flags;

if (can_dropped_invalid_skb(ndev, skb))
return NETDEV_TX_OK;
@@ -440,6 +454,9 @@ static int xcan_start_xmit(struct sk_buf
data[1] = be32_to_cpup((__be32 *)(cf->data + 4));

can_put_echo_skb(skb, ndev, priv->tx_head % priv->tx_max);
+
+ spin_lock_irqsave(&priv->tx_lock, flags);
+
priv->tx_head++;

/* Write the Frame to Xilinx CAN TX FIFO */
@@ -455,10 +472,16 @@ static int xcan_start_xmit(struct sk_buf
stats->tx_bytes += cf->can_dlc;
}

+ /* Clear TX-FIFO-empty interrupt for xcan_tx_interrupt() */
+ if (priv->tx_max > 1)
+ priv->write_reg(priv, XCAN_ICR_OFFSET, XCAN_IXR_TXFEMP_MASK);
+
/* Check if the TX buffer is full */
if ((priv->tx_head - priv->tx_tail) == priv->tx_max)
netif_stop_queue(ndev);

+ spin_unlock_irqrestore(&priv->tx_lock, flags);
+
return NETDEV_TX_OK;
}

@@ -832,19 +855,71 @@ static void xcan_tx_interrupt(struct net
{
struct xcan_priv *priv = netdev_priv(ndev);
struct net_device_stats *stats = &ndev->stats;
+ unsigned int frames_in_fifo;
+ int frames_sent = 1; /* TXOK => at least 1 frame was sent */
+ unsigned long flags;
+ int retries = 0;
+
+ /* Synchronize with xmit as we need to know the exact number
+ * of frames in the FIFO to stay in sync due to the TXFEMP
+ * handling.
+ * This also prevents a race between netif_wake_queue() and
+ * netif_stop_queue().
+ */
+ spin_lock_irqsave(&priv->tx_lock, flags);
+
+ frames_in_fifo = priv->tx_head - priv->tx_tail;
+
+ if (WARN_ON_ONCE(frames_in_fifo == 0)) {
+ /* clear TXOK anyway to avoid getting back here */
+ priv->write_reg(priv, XCAN_ICR_OFFSET, XCAN_IXR_TXOK_MASK);
+ spin_unlock_irqrestore(&priv->tx_lock, flags);
+ return;
+ }
+
+ /* Check if 2 frames were sent (TXOK only means that at least 1
+ * frame was sent).
+ */
+ if (frames_in_fifo > 1) {
+ WARN_ON(frames_in_fifo > priv->tx_max);
+
+ /* Synchronize TXOK and isr so that after the loop:
+ * (1) isr variable is up-to-date at least up to TXOK clear
+ * time. This avoids us clearing a TXOK of a second frame
+ * but not noticing that the FIFO is now empty and thus
+ * marking only a single frame as sent.
+ * (2) No TXOK is left. Having one could mean leaving a
+ * stray TXOK as we might process the associated frame
+ * via TXFEMP handling as we read TXFEMP *after* TXOK
+ * clear to satisfy (1).
+ */
+ while ((isr & XCAN_IXR_TXOK_MASK) && !WARN_ON(++retries == 100)) {
+ priv->write_reg(priv, XCAN_ICR_OFFSET, XCAN_IXR_TXOK_MASK);
+ isr = priv->read_reg(priv, XCAN_ISR_OFFSET);
+ }

- while ((priv->tx_head - priv->tx_tail > 0) &&
- (isr & XCAN_IXR_TXOK_MASK)) {
+ if (isr & XCAN_IXR_TXFEMP_MASK) {
+ /* nothing in FIFO anymore */
+ frames_sent = frames_in_fifo;
+ }
+ } else {
+ /* single frame in fifo, just clear TXOK */
priv->write_reg(priv, XCAN_ICR_OFFSET, XCAN_IXR_TXOK_MASK);
+ }
+
+ while (frames_sent--) {
can_get_echo_skb(ndev, priv->tx_tail %
priv->tx_max);
priv->tx_tail++;
stats->tx_packets++;
- isr = priv->read_reg(priv, XCAN_ISR_OFFSET);
}
+
+ netif_wake_queue(ndev);
+
+ spin_unlock_irqrestore(&priv->tx_lock, flags);
+
can_led_event(ndev, CAN_LED_EVENT_TX);
xcan_update_error_state_after_rxtx(ndev);
- netif_wake_queue(ndev);
}

/**
@@ -1138,6 +1213,18 @@ static const struct dev_pm_ops xcan_dev_
SET_RUNTIME_PM_OPS(xcan_runtime_suspend, xcan_runtime_resume, NULL)
};

+static const struct xcan_devtype_data xcan_zynq_data = {
+ .caps = XCAN_CAP_WATERMARK,
+};
+
+/* Match table for OF platform binding */
+static const struct of_device_id xcan_of_match[] = {
+ { .compatible = "xlnx,zynq-can-1.0", .data = &xcan_zynq_data },
+ { .compatible = "xlnx,axi-can-1.00.a", },
+ { /* end of list */ },
+};
+MODULE_DEVICE_TABLE(of, xcan_of_match);
+
/**
* xcan_probe - Platform registration call
* @pdev: Handle to the platform device structure
@@ -1152,8 +1239,10 @@ static int xcan_probe(struct platform_de
struct resource *res; /* IO mem resources */
struct net_device *ndev;
struct xcan_priv *priv;
+ const struct of_device_id *of_id;
+ int caps = 0;
void __iomem *addr;
- int ret, rx_max, tx_max;
+ int ret, rx_max, tx_max, tx_fifo_depth;

/* Get the virtual base address for the device */
res = platform_get_resource(pdev, IORESOURCE_MEM, 0);
@@ -1163,7 +1252,8 @@ static int xcan_probe(struct platform_de
goto err;
}

- ret = of_property_read_u32(pdev->dev.of_node, "tx-fifo-depth", &tx_max);
+ ret = of_property_read_u32(pdev->dev.of_node, "tx-fifo-depth",
+ &tx_fifo_depth);
if (ret < 0)
goto err;

@@ -1171,6 +1261,30 @@ static int xcan_probe(struct platform_de
if (ret < 0)
goto err;

+ of_id = of_match_device(xcan_of_match, &pdev->dev);
+ if (of_id) {
+ const struct xcan_devtype_data *devtype_data = of_id->data;
+
+ if (devtype_data)
+ caps = devtype_data->caps;
+ }
+
+ /* There is no way to directly figure out how many frames have been
+ * sent when the TXOK interrupt is processed. If watermark programming
+ * is supported, we can have 2 frames in the FIFO and use TXFEMP
+ * to determine if 1 or 2 frames have been sent.
+ * Theoretically we should be able to use TXFWMEMP to determine up
+ * to 3 frames, but it seems that after putting a second frame in the
+ * FIFO, with watermark at 2 frames, it can happen that TXFWMEMP (less
+ * than 2 frames in FIFO) is set anyway with no TXOK (a frame was
+ * sent), which is not a sensible state - possibly TXFWMEMP is not
+ * completely synchronized with the rest of the bits?
+ */
+ if (caps & XCAN_CAP_WATERMARK)
+ tx_max = min(tx_fifo_depth, 2);
+ else
+ tx_max = 1;
+
/* Create a CAN device instance */
ndev = alloc_candev(sizeof(struct xcan_priv), tx_max);
if (!ndev)
@@ -1185,6 +1299,7 @@ static int xcan_probe(struct platform_de
CAN_CTRLMODE_BERR_REPORTING;
priv->reg_base = addr;
priv->tx_max = tx_max;
+ spin_lock_init(&priv->tx_lock);

/* Get IRQ for the device */
ndev->irq = platform_get_irq(pdev, 0);
@@ -1249,9 +1364,9 @@ static int xcan_probe(struct platform_de

pm_runtime_put(&pdev->dev);

- netdev_dbg(ndev, "reg_base=0x%p irq=%d clock=%d, tx fifo depth:%d\n",
+ netdev_dbg(ndev, "reg_base=0x%p irq=%d clock=%d, tx fifo depth: actual %d, using %d\n",
priv->reg_base, ndev->irq, priv->can.clock.freq,
- priv->tx_max);
+ tx_fifo_depth, priv->tx_max);

return 0;

@@ -1285,14 +1400,6 @@ static int xcan_remove(struct platform_d
return 0;
}

-/* Match table for OF platform binding */
-static const struct of_device_id xcan_of_match[] = {
- { .compatible = "xlnx,zynq-can-1.0", },
- { .compatible = "xlnx,axi-can-1.00.a", },
- { /* end of list */ },
-};
-MODULE_DEVICE_TABLE(of, xcan_of_match);
-
static struct platform_driver xcan_driver = {
.probe = xcan_probe,
.remove = xcan_remove,



2018-07-27 09:52:22

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.17 49/66] usb: cdc_acm: Add quirk for Castles VEGA3000

4.17-stable review patch. If anyone has any objections, please let me know.

------------------

From: Lubomir Rintel <[email protected]>

commit 1445cbe476fc3dd09c0b380b206526a49403c071 upstream.

The device (a POS terminal) implements CDC ACM, but has not union
descriptor.

Signed-off-by: Lubomir Rintel <[email protected]>
Acked-by: Oliver Neukum <[email protected]>
Cc: stable <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
drivers/usb/class/cdc-acm.c | 3 +++
1 file changed, 3 insertions(+)

--- a/drivers/usb/class/cdc-acm.c
+++ b/drivers/usb/class/cdc-acm.c
@@ -1831,6 +1831,9 @@ static const struct usb_device_id acm_id
{ USB_DEVICE(0x09d8, 0x0320), /* Elatec GmbH TWN3 */
.driver_info = NO_UNION_NORMAL, /* has misplaced union descriptor */
},
+ { USB_DEVICE(0x0ca6, 0xa050), /* Castles VEGA3000 */
+ .driver_info = NO_UNION_NORMAL, /* reports zero length descriptor */
+ },

{ USB_DEVICE(0x2912, 0x0001), /* ATOL FPrint */
.driver_info = CLEAR_HALT_CONDITIONS,



2018-07-27 09:52:23

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.17 36/66] net/mlx5e: Fix quota counting in aRFS expire flow

4.17-stable review patch. If anyone has any objections, please let me know.

------------------

From: Eran Ben Elisha <[email protected]>

[ Upstream commit 2630bae8018823c3b88788b69fb9f16ea3b4a11e ]

Quota should follow the amount of rules which do expire, and not the
number of rules that were examined, fixed that.

Fixes: 18c908e477dc ("net/mlx5e: Add accelerated RFS support")
Signed-off-by: Eran Ben Elisha <[email protected]>
Reviewed-by: Maor Gottlieb <[email protected]>
Reviewed-by: Tariq Toukan <[email protected]>
Signed-off-by: Saeed Mahameed <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
drivers/net/ethernet/mellanox/mlx5/core/en_arfs.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)

--- a/drivers/net/ethernet/mellanox/mlx5/core/en_arfs.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_arfs.c
@@ -381,14 +381,14 @@ static void arfs_may_expire_flow(struct
HLIST_HEAD(del_list);
spin_lock_bh(&priv->fs.arfs.arfs_lock);
mlx5e_for_each_arfs_rule(arfs_rule, htmp, priv->fs.arfs.arfs_tables, i, j) {
- if (quota++ > MLX5E_ARFS_EXPIRY_QUOTA)
- break;
if (!work_pending(&arfs_rule->arfs_work) &&
rps_may_expire_flow(priv->netdev,
arfs_rule->rxq, arfs_rule->flow_id,
arfs_rule->filter_id)) {
hlist_del_init(&arfs_rule->hlist);
hlist_add_head(&arfs_rule->hlist, &del_list);
+ if (quota++ > MLX5E_ARFS_EXPIRY_QUOTA)
+ break;
}
}
spin_unlock_bh(&priv->fs.arfs.arfs_lock);



2018-07-27 09:52:37

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.17 52/66] usb: xhci: Fix memory leak in xhci_endpoint_reset()

4.17-stable review patch. If anyone has any objections, please let me know.

------------------

From: Zheng Xiaowei <[email protected]>

commit d89b7664f76047e7beca8f07e86f2ccfad085a28 upstream.

If td_list is not empty the cfg_cmd will not be freed,
call xhci_free_command to free it.

Signed-off-by: Zheng Xiaowei <[email protected]>
Cc: <[email protected]>
Signed-off-by: Mathias Nyman <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
drivers/usb/host/xhci.c | 1 +
1 file changed, 1 insertion(+)

--- a/drivers/usb/host/xhci.c
+++ b/drivers/usb/host/xhci.c
@@ -2981,6 +2981,7 @@ static void xhci_endpoint_reset(struct u
if (!list_empty(&ep->ring->td_list)) {
dev_err(&udev->dev, "EP not empty, refuse reset\n");
spin_unlock_irqrestore(&xhci->lock, flags);
+ xhci_free_command(xhci, cfg_cmd);
goto cleanup;
}
xhci_queue_stop_endpoint(xhci, stop_cmd, udev->slot_id, ep_index, 0);



2018-07-27 09:52:43

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.17 58/66] can: xilinx_can: fix power management handling

4.17-stable review patch. If anyone has any objections, please let me know.

------------------

From: Anssi Hannula <[email protected]>

commit 8ebd83bdb027f29870d96649dba18b91581ea829 upstream.

There are several issues with the suspend/resume handling code of the
driver:

- The device is attached and detached in the runtime_suspend() and
runtime_resume() callbacks if the interface is running. However,
during xcan_chip_start() the interface is considered running,
causing the resume handler to incorrectly call netif_start_queue()
at the beginning of xcan_chip_start(), and on xcan_chip_start() error
return the suspend handler detaches the device leaving the user
unable to bring-up the device anymore.

- The device is not brought properly up on system resume. A reset is
done and the code tries to determine the bus state after that.
However, after reset the device is always in Configuration mode
(down), so the state checking code does not make sense and
communication will also not work.

- The suspend callback tries to set the device to sleep mode (low-power
mode which monitors the bus and brings the device back to normal mode
on activity), but then immediately disables the clocks (possibly
before the device reaches the sleep mode), which does not make sense
to me. If a clean shutdown is wanted before disabling clocks, we can
just bring it down completely instead of only sleep mode.

Reorganize the PM code so that only the clock logic remains in the
runtime PM callbacks and the system PM callbacks contain the device
bring-up/down logic. This makes calling the runtime PM callbacks during
e.g. xcan_chip_start() safe.

The system PM callbacks now simply call common code to start/stop the
HW if the interface was running, replacing the broken code from before.

xcan_chip_stop() is updated to use the common reset code so that it will
wait for the reset to complete. Reset also disables all interrupts so do
not do that separately.

Also, the device_may_wakeup() checks are removed as the driver does not
have wakeup support.

Tested on Zynq-7000 integrated CAN.

Signed-off-by: Anssi Hannula <[email protected]>
Cc: Michal Simek <[email protected]>
Cc: <[email protected]>
Signed-off-by: Marc Kleine-Budde <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
drivers/net/can/xilinx_can.c | 69 +++++++++++++++++--------------------------
1 file changed, 28 insertions(+), 41 deletions(-)

--- a/drivers/net/can/xilinx_can.c
+++ b/drivers/net/can/xilinx_can.c
@@ -811,13 +811,9 @@ static irqreturn_t xcan_interrupt(int ir
static void xcan_chip_stop(struct net_device *ndev)
{
struct xcan_priv *priv = netdev_priv(ndev);
- u32 ier;

/* Disable interrupts and leave the can in configuration mode */
- ier = priv->read_reg(priv, XCAN_IER_OFFSET);
- ier &= ~XCAN_INTR_ALL;
- priv->write_reg(priv, XCAN_IER_OFFSET, ier);
- priv->write_reg(priv, XCAN_SRR_OFFSET, XCAN_SRR_RESET_MASK);
+ set_reset_mode(ndev);
priv->can.state = CAN_STATE_STOPPED;
}

@@ -950,10 +946,15 @@ static const struct net_device_ops xcan_
*/
static int __maybe_unused xcan_suspend(struct device *dev)
{
- if (!device_may_wakeup(dev))
- return pm_runtime_force_suspend(dev);
+ struct net_device *ndev = dev_get_drvdata(dev);

- return 0;
+ if (netif_running(ndev)) {
+ netif_stop_queue(ndev);
+ netif_device_detach(ndev);
+ xcan_chip_stop(ndev);
+ }
+
+ return pm_runtime_force_suspend(dev);
}

/**
@@ -965,11 +966,27 @@ static int __maybe_unused xcan_suspend(s
*/
static int __maybe_unused xcan_resume(struct device *dev)
{
- if (!device_may_wakeup(dev))
- return pm_runtime_force_resume(dev);
+ struct net_device *ndev = dev_get_drvdata(dev);
+ int ret;

- return 0;
+ ret = pm_runtime_force_resume(dev);
+ if (ret) {
+ dev_err(dev, "pm_runtime_force_resume failed on resume\n");
+ return ret;
+ }
+
+ if (netif_running(ndev)) {
+ ret = xcan_chip_start(ndev);
+ if (ret) {
+ dev_err(dev, "xcan_chip_start failed on resume\n");
+ return ret;
+ }
+
+ netif_device_attach(ndev);
+ netif_start_queue(ndev);
+ }

+ return 0;
}

/**
@@ -984,14 +1001,6 @@ static int __maybe_unused xcan_runtime_s
struct net_device *ndev = dev_get_drvdata(dev);
struct xcan_priv *priv = netdev_priv(ndev);

- if (netif_running(ndev)) {
- netif_stop_queue(ndev);
- netif_device_detach(ndev);
- }
-
- priv->write_reg(priv, XCAN_MSR_OFFSET, XCAN_MSR_SLEEP_MASK);
- priv->can.state = CAN_STATE_SLEEPING;
-
clk_disable_unprepare(priv->bus_clk);
clk_disable_unprepare(priv->can_clk);

@@ -1010,7 +1019,6 @@ static int __maybe_unused xcan_runtime_r
struct net_device *ndev = dev_get_drvdata(dev);
struct xcan_priv *priv = netdev_priv(ndev);
int ret;
- u32 isr, status;

ret = clk_prepare_enable(priv->bus_clk);
if (ret) {
@@ -1024,27 +1032,6 @@ static int __maybe_unused xcan_runtime_r
return ret;
}

- priv->write_reg(priv, XCAN_SRR_OFFSET, XCAN_SRR_RESET_MASK);
- isr = priv->read_reg(priv, XCAN_ISR_OFFSET);
- status = priv->read_reg(priv, XCAN_SR_OFFSET);
-
- if (netif_running(ndev)) {
- if (isr & XCAN_IXR_BSOFF_MASK) {
- priv->can.state = CAN_STATE_BUS_OFF;
- priv->write_reg(priv, XCAN_SRR_OFFSET,
- XCAN_SRR_RESET_MASK);
- } else if ((status & XCAN_SR_ESTAT_MASK) ==
- XCAN_SR_ESTAT_MASK) {
- priv->can.state = CAN_STATE_ERROR_PASSIVE;
- } else if (status & XCAN_SR_ERRWRN_MASK) {
- priv->can.state = CAN_STATE_ERROR_WARNING;
- } else {
- priv->can.state = CAN_STATE_ERROR_ACTIVE;
- }
- netif_device_attach(ndev);
- netif_start_queue(ndev);
- }
-
return 0;
}




2018-07-27 09:52:45

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.17 54/66] usb: gadget: f_fs: Only return delayed status when len is 0

4.17-stable review patch. If anyone has any objections, please let me know.

------------------

From: Jerry Zhang <[email protected]>

commit 4d644abf25698362bd33d17c9ddc8f7122c30f17 upstream.

Commit 1b9ba000 ("Allow function drivers to pause control
transfers") states that USB_GADGET_DELAYED_STATUS is only
supported if data phase is 0 bytes.

It seems that when the length is not 0 bytes, there is no
need to explicitly delay the data stage since the transfer
is not completed until the user responds. However, when the
length is 0, there is no data stage and the transfer is
finished once setup() returns, hence there is a need to
explicitly delay completion.

This manifests as the following bugs:

Prior to 946ef68ad4e4 ('Let setup() return
USB_GADGET_DELAYED_STATUS'), when setup is 0 bytes, ffs
would require user to queue a 0 byte request in order to
clear setup state. However, that 0 byte request was actually
not needed and would hang and cause errors in other setup
requests.

After the above commit, 0 byte setups work since the gadget
now accepts empty queues to ep0 to clear the delay, but all
other setups hang.

Fixes: 946ef68ad4e4 ("Let setup() return USB_GADGET_DELAYED_STATUS")
Signed-off-by: Jerry Zhang <[email protected]>
Cc: stable <[email protected]>
Acked-by: Felipe Balbi <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
drivers/usb/gadget/function/f_fs.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

--- a/drivers/usb/gadget/function/f_fs.c
+++ b/drivers/usb/gadget/function/f_fs.c
@@ -3242,7 +3242,7 @@ static int ffs_func_setup(struct usb_fun
__ffs_event_add(ffs, FUNCTIONFS_SETUP);
spin_unlock_irqrestore(&ffs->ev.waitq.lock, flags);

- return USB_GADGET_DELAYED_STATUS;
+ return creq->wLength == 0 ? USB_GADGET_DELAYED_STATUS : 0;
}

static bool ffs_func_req_match(struct usb_function *f,



2018-07-27 09:52:51

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.17 57/66] can: xilinx_can: fix RX loop if RXNEMP is asserted without RXOK

4.17-stable review patch. If anyone has any objections, please let me know.

------------------

From: Anssi Hannula <[email protected]>

commit 32852c561bffd613d4ed7ec464b1e03e1b7b6c5c upstream.

If the device gets into a state where RXNEMP (RX FIFO not empty)
interrupt is asserted without RXOK (new frame received successfully)
interrupt being asserted, xcan_rx_poll() will continue to try to clear
RXNEMP without actually reading frames from RX FIFO. If the RX FIFO is
not empty, the interrupt will not be cleared and napi_schedule() will
just be called again.

This situation can occur when:

(a) xcan_rx() returns without reading RX FIFO due to an error condition.
The code tries to clear both RXOK and RXNEMP but RXNEMP will not clear
due to a frame still being in the FIFO. The frame will never be read
from the FIFO as RXOK is no longer set.

(b) A frame is received between xcan_rx_poll() reading interrupt status
and clearing RXOK. RXOK will be cleared, but RXNEMP will again remain
set as the new message is still in the FIFO.

I'm able to trigger case (b) by flooding the bus with frames under load.

There does not seem to be any benefit in using both RXNEMP and RXOK in
the way the driver does, and the polling example in the reference manual
(UG585 v1.10 18.3.7 Read Messages from RxFIFO) also says that either
RXOK or RXNEMP can be used for detecting incoming messages.

Fix the issue and simplify the RX processing by only using RXNEMP
without RXOK.

Tested with the integrated CAN on Zynq-7000 SoC.

Fixes: b1201e44f50b ("can: xilinx CAN controller support")
Signed-off-by: Anssi Hannula <[email protected]>
Cc: <[email protected]>
Signed-off-by: Marc Kleine-Budde <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
drivers/net/can/xilinx_can.c | 18 +++++-------------
1 file changed, 5 insertions(+), 13 deletions(-)

--- a/drivers/net/can/xilinx_can.c
+++ b/drivers/net/can/xilinx_can.c
@@ -101,7 +101,7 @@ enum xcan_reg {
#define XCAN_INTR_ALL (XCAN_IXR_TXOK_MASK | XCAN_IXR_BSOFF_MASK |\
XCAN_IXR_WKUP_MASK | XCAN_IXR_SLP_MASK | \
XCAN_IXR_RXNEMP_MASK | XCAN_IXR_ERROR_MASK | \
- XCAN_IXR_ARBLST_MASK | XCAN_IXR_RXOK_MASK)
+ XCAN_IXR_ARBLST_MASK)

/* CAN register bit shift - XCAN_<REG>_<BIT>_SHIFT */
#define XCAN_BTR_SJW_SHIFT 7 /* Synchronous jump width */
@@ -709,15 +709,7 @@ static int xcan_rx_poll(struct napi_stru

isr = priv->read_reg(priv, XCAN_ISR_OFFSET);
while ((isr & XCAN_IXR_RXNEMP_MASK) && (work_done < quota)) {
- if (isr & XCAN_IXR_RXOK_MASK) {
- priv->write_reg(priv, XCAN_ICR_OFFSET,
- XCAN_IXR_RXOK_MASK);
- work_done += xcan_rx(ndev);
- } else {
- priv->write_reg(priv, XCAN_ICR_OFFSET,
- XCAN_IXR_RXNEMP_MASK);
- break;
- }
+ work_done += xcan_rx(ndev);
priv->write_reg(priv, XCAN_ICR_OFFSET, XCAN_IXR_RXNEMP_MASK);
isr = priv->read_reg(priv, XCAN_ISR_OFFSET);
}
@@ -728,7 +720,7 @@ static int xcan_rx_poll(struct napi_stru
if (work_done < quota) {
napi_complete_done(napi, work_done);
ier = priv->read_reg(priv, XCAN_IER_OFFSET);
- ier |= (XCAN_IXR_RXOK_MASK | XCAN_IXR_RXNEMP_MASK);
+ ier |= XCAN_IXR_RXNEMP_MASK;
priv->write_reg(priv, XCAN_IER_OFFSET, ier);
}
return work_done;
@@ -800,9 +792,9 @@ static irqreturn_t xcan_interrupt(int ir
}

/* Check for the type of receive interrupt and Processing it */
- if (isr & (XCAN_IXR_RXNEMP_MASK | XCAN_IXR_RXOK_MASK)) {
+ if (isr & XCAN_IXR_RXNEMP_MASK) {
ier = priv->read_reg(priv, XCAN_IER_OFFSET);
- ier &= ~(XCAN_IXR_RXNEMP_MASK | XCAN_IXR_RXOK_MASK);
+ ier &= ~XCAN_IXR_RXNEMP_MASK;
priv->write_reg(priv, XCAN_IER_OFFSET, ier);
napi_schedule(&priv->napi);
}



2018-07-27 09:52:56

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.17 64/66] can: peak_canfd: fix firmware < v3.3.0: limit allocation to 32-bit DMA addr only

4.17-stable review patch. If anyone has any objections, please let me know.

------------------

From: Stephane Grosjean <[email protected]>

commit 5d4c94ed9f564224d7b37dbee13f7c5d4a8a01ac upstream.

The DMA logic in firmwares < v3.3.0 embedded in the PCAN-PCIe FD cards
family is not capable of handling a mix of 32-bit and 64-bit logical
addresses. If the board is equipped with 2 or 4 CAN ports, then such a
situation might lead to a PCIe Bus Error "Malformed TLP" packet
as well as "irq xx: nobody cared" issue.

This patch adds a workaround that requests only 32-bit DMA addresses
when these might be allocated outside of the 4 GB area.

This issue has been fixed in firmware v3.3.0 and next.

Signed-off-by: Stephane Grosjean <[email protected]>
Cc: linux-stable <[email protected]>
Signed-off-by: Marc Kleine-Budde <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
drivers/net/can/peak_canfd/peak_pciefd_main.c | 19 +++++++++++++++++++
1 file changed, 19 insertions(+)

--- a/drivers/net/can/peak_canfd/peak_pciefd_main.c
+++ b/drivers/net/can/peak_canfd/peak_pciefd_main.c
@@ -58,6 +58,10 @@ MODULE_LICENSE("GPL v2");
#define PCIEFD_REG_SYS_VER1 0x0040 /* version reg #1 */
#define PCIEFD_REG_SYS_VER2 0x0044 /* version reg #2 */

+#define PCIEFD_FW_VERSION(x, y, z) (((u32)(x) << 24) | \
+ ((u32)(y) << 16) | \
+ ((u32)(z) << 8))
+
/* System Control Registers Bits */
#define PCIEFD_SYS_CTL_TS_RST 0x00000001 /* timestamp clock */
#define PCIEFD_SYS_CTL_CLK_EN 0x00000002 /* system clock */
@@ -783,6 +787,21 @@ static int peak_pciefd_probe(struct pci_
"%ux CAN-FD PCAN-PCIe FPGA v%u.%u.%u:\n", can_count,
hw_ver_major, hw_ver_minor, hw_ver_sub);

+#ifdef CONFIG_ARCH_DMA_ADDR_T_64BIT
+ /* FW < v3.3.0 DMA logic doesn't handle correctly the mix of 32-bit and
+ * 64-bit logical addresses: this workaround forces usage of 32-bit
+ * DMA addresses only when such a fw is detected.
+ */
+ if (PCIEFD_FW_VERSION(hw_ver_major, hw_ver_minor, hw_ver_sub) <
+ PCIEFD_FW_VERSION(3, 3, 0)) {
+ err = dma_set_mask_and_coherent(&pdev->dev, DMA_BIT_MASK(32));
+ if (err)
+ dev_warn(&pdev->dev,
+ "warning: can't set DMA mask %llxh (err %d)\n",
+ DMA_BIT_MASK(32), err);
+ }
+#endif
+
/* stop system clock */
pciefd_sys_writereg(pciefd, PCIEFD_SYS_CTL_CLK_EN,
PCIEFD_REG_SYS_CTL_CLR);



2018-07-27 09:52:58

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.17 63/66] can: xilinx_can: fix RX overflow interrupt not being enabled

4.17-stable review patch. If anyone has any objections, please let me know.

------------------

From: Anssi Hannula <[email protected]>

commit 83997997252f5d3fc7f04abc24a89600c2b504ab upstream.

RX overflow interrupt (RXOFLW) is disabled even though xcan_interrupt()
processes it. This means that an RX overflow interrupt will only be
processed when another interrupt gets asserted (e.g. for RX/TX).

Fix that by enabling the RXOFLW interrupt.

Fixes: b1201e44f50b ("can: xilinx CAN controller support")
Signed-off-by: Anssi Hannula <[email protected]>
Cc: Michal Simek <[email protected]>
Cc: <[email protected]>
Signed-off-by: Marc Kleine-Budde <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
drivers/net/can/xilinx_can.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

--- a/drivers/net/can/xilinx_can.c
+++ b/drivers/net/can/xilinx_can.c
@@ -104,7 +104,7 @@ enum xcan_reg {
#define XCAN_INTR_ALL (XCAN_IXR_TXOK_MASK | XCAN_IXR_BSOFF_MASK |\
XCAN_IXR_WKUP_MASK | XCAN_IXR_SLP_MASK | \
XCAN_IXR_RXNEMP_MASK | XCAN_IXR_ERROR_MASK | \
- XCAN_IXR_ARBLST_MASK)
+ XCAN_IXR_RXOFLW_MASK | XCAN_IXR_ARBLST_MASK)

/* CAN register bit shift - XCAN_<REG>_<BIT>_SHIFT */
#define XCAN_BTR_SJW_SHIFT 7 /* Synchronous jump width */



2018-07-27 09:53:13

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.17 55/66] ACPICA: AML Parser: ignore dispatcher error status during table load

4.17-stable review patch. If anyone has any objections, please let me know.

------------------

From: Schmauss, Erik <[email protected]>

commit 73c2a01c52b657f4a0ead6c95f64c5279efbd000 upstream.

The dispatcher and the executer process the parse nodes During table
load. Error status from the evaluation confuses the AML parser. This
results in the parser failing to complete parsing of the current
scope op which becomes problematic. For the incorrect AML below, _ADR
never gets created.

definition_block(...)
{
Scope (\_SB)
{
Device (PCI0){...}
Name (OBJ1, 0x0)
OBJ1 = PCI0 + 5 // Results in an operand error.
} // \_SB not closed

// parser looks for \_SB._SB.PCI0, results in AE_NOT_FOUND error
// Entire scope block gets skipped.
Scope (\_SB.PCI0)
{
Name (_ADR, 0x0)
}
}

Fix the above error by properly completing the initial \_SB scope
after an error by clearing errors that occur during table load. In
the above case, this means that OBJ1 = PIC0 + 5 is skipped.

Fixes: 5088814a6e93 (ACPICA: AML parser: attempt to continue loading table after error)
Link: https://bugzilla.kernel.org/show_bug.cgi?id=200363
Tested-by: Bastien Nocera <[email protected]>
Signed-off-by: Erik Schmauss <[email protected]>
Cc: 4.17+ <[email protected]> # 4.17+
Signed-off-by: Rafael J. Wysocki <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
drivers/acpi/acpica/psloop.c | 26 ++++++++++++++++++++++++++
1 file changed, 26 insertions(+)

--- a/drivers/acpi/acpica/psloop.c
+++ b/drivers/acpi/acpica/psloop.c
@@ -497,6 +497,18 @@ acpi_status acpi_ps_parse_loop(struct ac
status =
acpi_ps_create_op(walk_state, aml_op_start, &op);
if (ACPI_FAILURE(status)) {
+ /*
+ * ACPI_PARSE_MODULE_LEVEL means that we are loading a table by
+ * executing it as a control method. However, if we encounter
+ * an error while loading the table, we need to keep trying to
+ * load the table rather than aborting the table load. Set the
+ * status to AE_OK to proceed with the table load.
+ */
+ if ((walk_state->
+ parse_flags & ACPI_PARSE_MODULE_LEVEL)
+ && status == AE_ALREADY_EXISTS) {
+ status = AE_OK;
+ }
if (status == AE_CTRL_PARSE_CONTINUE) {
continue;
}
@@ -694,6 +706,20 @@ acpi_status acpi_ps_parse_loop(struct ac
acpi_ps_next_parse_state(walk_state, op, status);
if (status == AE_CTRL_PENDING) {
status = AE_OK;
+ } else
+ if ((walk_state->
+ parse_flags & ACPI_PARSE_MODULE_LEVEL)
+ && ACPI_FAILURE(status)) {
+ /*
+ * ACPI_PARSE_MODULE_LEVEL means that we are loading a table by
+ * executing it as a control method. However, if we encounter
+ * an error while loading the table, we need to keep trying to
+ * load the table rather than aborting the table load. Set the
+ * status to AE_OK to proceed with the table load. If we get a
+ * failure at this point, it means that the dispatcher got an
+ * error while processing Op (most likely an AML operand error.
+ */
+ status = AE_OK;
}
}




2018-07-27 09:55:06

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.17 22/66] tcp: do not cancel delay-AcK on DCTCP special ACK

4.17-stable review patch. If anyone has any objections, please let me know.

------------------

From: Yuchung Cheng <[email protected]>

[ Upstream commit 27cde44a259c380a3c09066fc4b42de7dde9b1ad ]

Currently when a DCTCP receiver delays an ACK and receive a
data packet with a different CE mark from the previous one's, it
sends two immediate ACKs acking previous and latest sequences
respectly (for ECN accounting).

Previously sending the first ACK may mark off the delayed ACK timer
(tcp_event_ack_sent). This may subsequently prevent sending the
second ACK to acknowledge the latest sequence (tcp_ack_snd_check).
The culprit is that tcp_send_ack() assumes it always acknowleges
the latest sequence, which is not true for the first special ACK.

The fix is to not make the assumption in tcp_send_ack and check the
actual ack sequence before cancelling the delayed ACK. Further it's
safer to pass the ack sequence number as a local variable into
tcp_send_ack routine, instead of intercepting tp->rcv_nxt to avoid
future bugs like this.

Reported-by: Neal Cardwell <[email protected]>
Signed-off-by: Yuchung Cheng <[email protected]>
Acked-by: Neal Cardwell <[email protected]>
Signed-off-by: Eric Dumazet <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
include/net/tcp.h | 1 +
net/ipv4/tcp_dctcp.c | 34 ++++------------------------------
net/ipv4/tcp_output.c | 11 ++++++++---
3 files changed, 13 insertions(+), 33 deletions(-)

--- a/include/net/tcp.h
+++ b/include/net/tcp.h
@@ -535,6 +535,7 @@ void tcp_send_fin(struct sock *sk);
void tcp_send_active_reset(struct sock *sk, gfp_t priority);
int tcp_send_synack(struct sock *);
void tcp_push_one(struct sock *, unsigned int mss_now);
+void __tcp_send_ack(struct sock *sk, u32 rcv_nxt);
void tcp_send_ack(struct sock *sk);
void tcp_send_delayed_ack(struct sock *sk);
void tcp_send_loss_probe(struct sock *sk);
--- a/net/ipv4/tcp_dctcp.c
+++ b/net/ipv4/tcp_dctcp.c
@@ -135,21 +135,8 @@ static void dctcp_ce_state_0_to_1(struct
* ACK has not sent yet.
*/
if (!ca->ce_state &&
- inet_csk(sk)->icsk_ack.pending & ICSK_ACK_TIMER) {
- u32 tmp_rcv_nxt;
-
- /* Save current rcv_nxt. */
- tmp_rcv_nxt = tp->rcv_nxt;
-
- /* Generate previous ack with CE=0. */
- tp->ecn_flags &= ~TCP_ECN_DEMAND_CWR;
- tp->rcv_nxt = ca->prior_rcv_nxt;
-
- tcp_send_ack(sk);
-
- /* Recover current rcv_nxt. */
- tp->rcv_nxt = tmp_rcv_nxt;
- }
+ inet_csk(sk)->icsk_ack.pending & ICSK_ACK_TIMER)
+ __tcp_send_ack(sk, ca->prior_rcv_nxt);

ca->prior_rcv_nxt = tp->rcv_nxt;
ca->ce_state = 1;
@@ -166,21 +153,8 @@ static void dctcp_ce_state_1_to_0(struct
* ACK has not sent yet.
*/
if (ca->ce_state &&
- inet_csk(sk)->icsk_ack.pending & ICSK_ACK_TIMER) {
- u32 tmp_rcv_nxt;
-
- /* Save current rcv_nxt. */
- tmp_rcv_nxt = tp->rcv_nxt;
-
- /* Generate previous ack with CE=1. */
- tp->ecn_flags |= TCP_ECN_DEMAND_CWR;
- tp->rcv_nxt = ca->prior_rcv_nxt;
-
- tcp_send_ack(sk);
-
- /* Recover current rcv_nxt. */
- tp->rcv_nxt = tmp_rcv_nxt;
- }
+ inet_csk(sk)->icsk_ack.pending & ICSK_ACK_TIMER)
+ __tcp_send_ack(sk, ca->prior_rcv_nxt);

ca->prior_rcv_nxt = tp->rcv_nxt;
ca->ce_state = 0;
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -160,8 +160,13 @@ static void tcp_event_data_sent(struct t
}

/* Account for an ACK we sent. */
-static inline void tcp_event_ack_sent(struct sock *sk, unsigned int pkts)
+static inline void tcp_event_ack_sent(struct sock *sk, unsigned int pkts,
+ u32 rcv_nxt)
{
+ struct tcp_sock *tp = tcp_sk(sk);
+
+ if (unlikely(rcv_nxt != tp->rcv_nxt))
+ return; /* Special ACK sent by DCTCP to reflect ECN */
tcp_dec_quickack_mode(sk, pkts);
inet_csk_clear_xmit_timer(sk, ICSK_TIME_DACK);
}
@@ -1149,7 +1154,7 @@ static int __tcp_transmit_skb(struct soc
icsk->icsk_af_ops->send_check(sk, skb);

if (likely(tcb->tcp_flags & TCPHDR_ACK))
- tcp_event_ack_sent(sk, tcp_skb_pcount(skb));
+ tcp_event_ack_sent(sk, tcp_skb_pcount(skb), rcv_nxt);

if (skb->len != tcp_header_size) {
tcp_event_data_sent(tp, sk);
@@ -3627,12 +3632,12 @@ void __tcp_send_ack(struct sock *sk, u32
/* Send it off, this clears delayed acks for us. */
__tcp_transmit_skb(sk, buff, 0, (__force gfp_t)0, rcv_nxt);
}
+EXPORT_SYMBOL_GPL(__tcp_send_ack);

void tcp_send_ack(struct sock *sk)
{
__tcp_send_ack(sk, tcp_sk(sk)->rcv_nxt);
}
-EXPORT_SYMBOL_GPL(tcp_send_ack);

/* This routine sends a packet with an out of date sequence
* number. It assumes the other end will try to ack it.



2018-07-27 09:55:43

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.17 21/66] tcp: helpers to send special DCTCP ack

4.17-stable review patch. If anyone has any objections, please let me know.

------------------

From: Yuchung Cheng <[email protected]>

[ Upstream commit 2987babb6982306509380fc11b450227a844493b ]

Refactor and create helpers to send the special ACK in DCTCP.

Signed-off-by: Yuchung Cheng <[email protected]>
Acked-by: Neal Cardwell <[email protected]>
Signed-off-by: Eric Dumazet <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
net/ipv4/tcp_output.c | 22 +++++++++++++++++-----
1 file changed, 17 insertions(+), 5 deletions(-)

--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -1031,8 +1031,8 @@ static void tcp_update_skb_after_send(st
* We are working here with either a clone of the original
* SKB, or a fresh unique copy made by the retransmit engine.
*/
-static int tcp_transmit_skb(struct sock *sk, struct sk_buff *skb, int clone_it,
- gfp_t gfp_mask)
+static int __tcp_transmit_skb(struct sock *sk, struct sk_buff *skb,
+ int clone_it, gfp_t gfp_mask, u32 rcv_nxt)
{
const struct inet_connection_sock *icsk = inet_csk(sk);
struct inet_sock *inet;
@@ -1108,7 +1108,7 @@ static int tcp_transmit_skb(struct sock
th->source = inet->inet_sport;
th->dest = inet->inet_dport;
th->seq = htonl(tcb->seq);
- th->ack_seq = htonl(tp->rcv_nxt);
+ th->ack_seq = htonl(rcv_nxt);
*(((__be16 *)th) + 6) = htons(((tcp_header_size >> 2) << 12) |
tcb->tcp_flags);

@@ -1186,6 +1186,13 @@ static int tcp_transmit_skb(struct sock
return err;
}

+static int tcp_transmit_skb(struct sock *sk, struct sk_buff *skb, int clone_it,
+ gfp_t gfp_mask)
+{
+ return __tcp_transmit_skb(sk, skb, clone_it, gfp_mask,
+ tcp_sk(sk)->rcv_nxt);
+}
+
/* This routine just queues the buffer for sending.
*
* NOTE: probe0 timer is not checked, do not forget tcp_push_pending_frames,
@@ -3583,7 +3590,7 @@ void tcp_send_delayed_ack(struct sock *s
}

/* This routine sends an ack and also updates the window. */
-void tcp_send_ack(struct sock *sk)
+void __tcp_send_ack(struct sock *sk, u32 rcv_nxt)
{
struct sk_buff *buff;

@@ -3618,7 +3625,12 @@ void tcp_send_ack(struct sock *sk)
skb_set_tcp_pure_ack(buff);

/* Send it off, this clears delayed acks for us. */
- tcp_transmit_skb(sk, buff, 0, (__force gfp_t)0);
+ __tcp_transmit_skb(sk, buff, 0, (__force gfp_t)0, rcv_nxt);
+}
+
+void tcp_send_ack(struct sock *sk)
+{
+ __tcp_send_ack(sk, tcp_sk(sk)->rcv_nxt);
}
EXPORT_SYMBOL_GPL(tcp_send_ack);




2018-07-27 17:39:48

by Guenter Roeck

[permalink] [raw]
Subject: Re: [PATCH 4.17 00/66] 4.17.11-stable review

On Fri, Jul 27, 2018 at 11:44:53AM +0200, Greg Kroah-Hartman wrote:
> This is the start of the stable review cycle for the 4.17.11 release.
> There are 66 patches in this series, all will be posted as a response
> to this one. If anyone has any issues with these being applied, please
> let me know.
>
> Responses should be made by Sun Jul 29 09:37:38 UTC 2018.
> Anything received after that time might be too late.
>

Build results:
total: 134 pass: 134 fail: 0
Qemu test results:
total: 177 pass: 177 fail: 0

Details are available at http://kerneltests.org/builders/.

Guenter

2018-07-27 19:51:00

by Shuah Khan

[permalink] [raw]
Subject: Re: [PATCH 4.17 00/66] 4.17.11-stable review

On 07/27/2018 03:44 AM, Greg Kroah-Hartman wrote:
> This is the start of the stable review cycle for the 4.17.11 release.
> There are 66 patches in this series, all will be posted as a response
> to this one. If anyone has any issues with these being applied, please
> let me know.
>
> Responses should be made by Sun Jul 29 09:37:38 UTC 2018.
> Anything received after that time might be too late.
>
> The whole patch series can be found in one patch at:
> https://www.kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.17.11-rc1.gz
> or in the git tree and branch at:
> git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-4.17.y
> and the diffstat can be found below.
>
> thanks,
>
> greg k-h
>

Missed a few releases while I was away :)

Compiled and booted on my test system. No dmesg regressions.

thanks,
-- Shuah

2018-07-28 05:42:06

by Greg Kroah-Hartman

[permalink] [raw]
Subject: Re: [PATCH 4.17 00/66] 4.17.11-stable review

On Fri, Jul 27, 2018 at 01:49:42PM -0600, Shuah Khan wrote:
> On 07/27/2018 03:44 AM, Greg Kroah-Hartman wrote:
> > This is the start of the stable review cycle for the 4.17.11 release.
> > There are 66 patches in this series, all will be posted as a response
> > to this one. If anyone has any issues with these being applied, please
> > let me know.
> >
> > Responses should be made by Sun Jul 29 09:37:38 UTC 2018.
> > Anything received after that time might be too late.
> >
> > The whole patch series can be found in one patch at:
> > https://www.kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.17.11-rc1.gz
> > or in the git tree and branch at:
> > git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-4.17.y
> > and the diffstat can be found below.
> >
> > thanks,
> >
> > greg k-h
> >
>
> Missed a few releases while I was away :)

Welcome back!

> Compiled and booted on my test system. No dmesg regressions.

Thanks for testing all of these and letting me know.

greg k-h

2018-07-28 05:43:02

by Greg Kroah-Hartman

[permalink] [raw]
Subject: Re: [PATCH 4.17 00/66] 4.17.11-stable review

On Fri, Jul 27, 2018 at 10:31:48AM -0700, Guenter Roeck wrote:
> On Fri, Jul 27, 2018 at 11:44:53AM +0200, Greg Kroah-Hartman wrote:
> > This is the start of the stable review cycle for the 4.17.11 release.
> > There are 66 patches in this series, all will be posted as a response
> > to this one. If anyone has any issues with these being applied, please
> > let me know.
> >
> > Responses should be made by Sun Jul 29 09:37:38 UTC 2018.
> > Anything received after that time might be too late.
> >
>
> Build results:
> total: 134 pass: 134 fail: 0
> Qemu test results:
> total: 177 pass: 177 fail: 0
>
> Details are available at http://kerneltests.org/builders/.

Thanks for testing all of these and letting me know.

greg k-h

2018-07-28 06:57:25

by Naresh Kamboju

[permalink] [raw]
Subject: Re: [PATCH 4.17 00/66] 4.17.11-stable review

On 27 July 2018 at 15:14, Greg Kroah-Hartman <[email protected]> wrote:
> This is the start of the stable review cycle for the 4.17.11 release.
> There are 66 patches in this series, all will be posted as a response
> to this one. If anyone has any issues with these being applied, please
> let me know.
>
> Responses should be made by Sun Jul 29 09:37:38 UTC 2018.
> Anything received after that time might be too late.
>
> The whole patch series can be found in one patch at:
> https://www.kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.17.11-rc1.gz
> or in the git tree and branch at:
> git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-4.17.y
> and the diffstat can be found below.
>
> thanks,
>
> greg k-h

Results from Linaro’s test farm.
No regressions on arm64, arm and x86_64.

Summary
------------------------------------------------------------------------

kernel: 4.17.11-rc1
git repo: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git
git branch: linux-4.17.y
git commit: 26cb8e50f1730a1512904d9eb24a86213a1515a2
git describe: v4.17.10-67-g26cb8e50f173
Test details: https://qa-reports.linaro.org/lkft/linux-stable-rc-4.17-oe/build/v4.17.10-67-g26cb8e50f173

No regressions (compared to build v4.17.10)


--
Linaro QA (BETA)
https://qa-reports.linaro.org

2018-07-28 07:22:08

by Greg Kroah-Hartman

[permalink] [raw]
Subject: Re: [PATCH 4.17 00/66] 4.17.11-stable review

On Sat, Jul 28, 2018 at 12:24:34PM +0530, Naresh Kamboju wrote:
> On 27 July 2018 at 15:14, Greg Kroah-Hartman <[email protected]> wrote:
> > This is the start of the stable review cycle for the 4.17.11 release.
> > There are 66 patches in this series, all will be posted as a response
> > to this one. If anyone has any issues with these being applied, please
> > let me know.
> >
> > Responses should be made by Sun Jul 29 09:37:38 UTC 2018.
> > Anything received after that time might be too late.
> >
> > The whole patch series can be found in one patch at:
> > https://www.kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.17.11-rc1.gz
> > or in the git tree and branch at:
> > git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-4.17.y
> > and the diffstat can be found below.
> >
> > thanks,
> >
> > greg k-h
>
> Results from Linaro’s test farm.
> No regressions on arm64, arm and x86_64.

Great, thanks for testing all of these and letting me know.

greg k-h

2018-07-30 09:55:04

by Wysocki, Rafael J

[permalink] [raw]
Subject: Re: [PATCH 4.17 55/66] ACPICA: AML Parser: ignore dispatcher error status during table load

On 7/27/2018 11:45 AM, Greg Kroah-Hartman wrote:
> 4.17-stable review patch. If anyone has any objections, please let me know.
>
> ------------------
>
> From: Schmauss, Erik <[email protected]>
>
> commit 73c2a01c52b657f4a0ead6c95f64c5279efbd000 upstream.
>
> The dispatcher and the executer process the parse nodes During table
> load. Error status from the evaluation confuses the AML parser. This
> results in the parser failing to complete parsing of the current
> scope op which becomes problematic. For the incorrect AML below, _ADR
> never gets created.
>
> definition_block(...)
> {
> Scope (\_SB)
> {
> Device (PCI0){...}
> Name (OBJ1, 0x0)
> OBJ1 = PCI0 + 5 // Results in an operand error.
> } // \_SB not closed
>
> // parser looks for \_SB._SB.PCI0, results in AE_NOT_FOUND error
> // Entire scope block gets skipped.
> Scope (\_SB.PCI0)
> {
> Name (_ADR, 0x0)
> }
> }
>
> Fix the above error by properly completing the initial \_SB scope
> after an error by clearing errors that occur during table load. In
> the above case, this means that OBJ1 = PIC0 + 5 is skipped.
>
> Fixes: 5088814a6e93 (ACPICA: AML parser: attempt to continue loading table after error)
> Link: https://bugzilla.kernel.org/show_bug.cgi?id=200363
> Tested-by: Bastien Nocera <[email protected]>
> Signed-off-by: Erik Schmauss <[email protected]>
> Cc: 4.17+ <[email protected]> # 4.17+
> Signed-off-by: Rafael J. Wysocki <[email protected]>
> Signed-off-by: Greg Kroah-Hartman <[email protected]>

Has this gone in already?  If not, please hold on.

There is a fix on top of this that will go to Linus tomorrow:
https://patchwork.kernel.org/patch/10548059/

> ---
> drivers/acpi/acpica/psloop.c | 26 ++++++++++++++++++++++++++
> 1 file changed, 26 insertions(+)
>
> --- a/drivers/acpi/acpica/psloop.c
> +++ b/drivers/acpi/acpica/psloop.c
> @@ -497,6 +497,18 @@ acpi_status acpi_ps_parse_loop(struct ac
> status =
> acpi_ps_create_op(walk_state, aml_op_start, &op);
> if (ACPI_FAILURE(status)) {
> + /*
> + * ACPI_PARSE_MODULE_LEVEL means that we are loading a table by
> + * executing it as a control method. However, if we encounter
> + * an error while loading the table, we need to keep trying to
> + * load the table rather than aborting the table load. Set the
> + * status to AE_OK to proceed with the table load.
> + */
> + if ((walk_state->
> + parse_flags & ACPI_PARSE_MODULE_LEVEL)
> + && status == AE_ALREADY_EXISTS) {
> + status = AE_OK;
> + }
> if (status == AE_CTRL_PARSE_CONTINUE) {
> continue;
> }
> @@ -694,6 +706,20 @@ acpi_status acpi_ps_parse_loop(struct ac
> acpi_ps_next_parse_state(walk_state, op, status);
> if (status == AE_CTRL_PENDING) {
> status = AE_OK;
> + } else
> + if ((walk_state->
> + parse_flags & ACPI_PARSE_MODULE_LEVEL)
> + && ACPI_FAILURE(status)) {
> + /*
> + * ACPI_PARSE_MODULE_LEVEL means that we are loading a table by
> + * executing it as a control method. However, if we encounter
> + * an error while loading the table, we need to keep trying to
> + * load the table rather than aborting the table load. Set the
> + * status to AE_OK to proceed with the table load. If we get a
> + * failure at this point, it means that the dispatcher got an
> + * error while processing Op (most likely an AML operand error.
> + */
> + status = AE_OK;
> }
> }
>
>
>


2018-07-30 11:45:40

by Greg Kroah-Hartman

[permalink] [raw]
Subject: Re: [PATCH 4.17 55/66] ACPICA: AML Parser: ignore dispatcher error status during table load

On Mon, Jul 30, 2018 at 11:52:17AM +0200, Rafael J. Wysocki wrote:
> On 7/27/2018 11:45 AM, Greg Kroah-Hartman wrote:
> > 4.17-stable review patch. If anyone has any objections, please let me know.
> >
> > ------------------
> >
> > From: Schmauss, Erik <[email protected]>
> >
> > commit 73c2a01c52b657f4a0ead6c95f64c5279efbd000 upstream.
> >
> > The dispatcher and the executer process the parse nodes During table
> > load. Error status from the evaluation confuses the AML parser. This
> > results in the parser failing to complete parsing of the current
> > scope op which becomes problematic. For the incorrect AML below, _ADR
> > never gets created.
> >
> > definition_block(...)
> > {
> > Scope (\_SB)
> > {
> > Device (PCI0){...}
> > Name (OBJ1, 0x0)
> > OBJ1 = PCI0 + 5 // Results in an operand error.
> > } // \_SB not closed
> >
> > // parser looks for \_SB._SB.PCI0, results in AE_NOT_FOUND error
> > // Entire scope block gets skipped.
> > Scope (\_SB.PCI0)
> > {
> > Name (_ADR, 0x0)
> > }
> > }
> >
> > Fix the above error by properly completing the initial \_SB scope
> > after an error by clearing errors that occur during table load. In
> > the above case, this means that OBJ1 = PIC0 + 5 is skipped.
> >
> > Fixes: 5088814a6e93 (ACPICA: AML parser: attempt to continue loading table after error)
> > Link: https://bugzilla.kernel.org/show_bug.cgi?id=200363
> > Tested-by: Bastien Nocera <[email protected]>
> > Signed-off-by: Erik Schmauss <[email protected]>
> > Cc: 4.17+ <[email protected]> # 4.17+
> > Signed-off-by: Rafael J. Wysocki <[email protected]>
> > Signed-off-by: Greg Kroah-Hartman <[email protected]>
>
> Has this gone in already?? If not, please hold on.
>
> There is a fix on top of this that will go to Linus tomorrow:
> https://patchwork.kernel.org/patch/10548059/

Yes, it did, it is in 4.17.11, and I've already gotten a report about it :(

greg k-h